The determination of suitable sowing time and optimum
selection of suitable cultivator for specific areas are most important for high
yield and quality of cotton. Environmental factor also determine the best
season for high yield of a specific crop. It also reveals the adoptive
capabilities of that specific crop. Temperature is one of the most influential
environmental component which effect the yield and growth of crops. An increase
in global warming, all over the world, increasing heat is a serious threat for
the productivity of crops. Rise in the temperature of the world which has been
forecasted by different climatic models, have direct impact on plant growth and
their productivity as well as quality. This type of climatic behavior can be
created by cultivating the crops at different sowing times so that crops at
their respective temperature and relative humidity.

Data mining is the process of extracting previous patterns
and models for large data sets. There are two types of data mining either
predictive or descriptive. Predictive data mining involve predicting the value
for a particular target variable based on previous data. When this target
distinct, we mention the task at hand as classification. There are many
application of predictive data mining such are predicting repayment behavior of
loan applicants.

On the other hand the task of predicting the value of
continuous target variable are called Regression. Credit loss, sales amount
prediction and stock price are typical examples.  Descriptive data mining focuses at finding
patterns which describe relationships in the data, for example association
rules. Association rule mining looks for frequently occurring
patterns in the data and is often used for market basket analysis. A well-known
example is the rule: if someone buys nappies then this person is also likely to
buy beer. Although its actual truthfulness has been questioned, its use as a
marketing vehicle for data mining has surely been effective. Since data mining
is often used to support decision making, it can to a large extent be
considered part of operational research (OR), a substantial part of which also
focuses on developing various kinds of decision models from data. Hence,
clearly there is substantial potential for cross- fertilization between both
disciplines. (Basean et al., 2009)

Factors which effect the cotton growth, composition and
quality are genotype, environment and agronomic practices. Environmental
factors are classified into predictable and unpredictable variables (Allard and
Bradshaw 1964). Sowing time is among one of the predictable factors, it is
under human control and can be slightly changed as per requirements, therefore
it is declared as predictable factor. Planting time is major agronomic factors
that affect growth and yield (Gecgel et al. 2007).

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
Writers Experience
Recommended Service
From $13.90 per page
4,6 / 5
Writers Experience
From $20.00 per page
4,5 / 5
Writers Experience
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

purpose of data mining is to learn from previous data for better future predictions.
There are two major categories of data mining strategies: supervised and
unsupervised learning. Supervised learning methods are used when values of
variables (inputs) are used to make forecast about another variable (target)
with known values. Unsupervised learning can also be used in similar situations
but are deployed on data for which target with a known value does not exist. An
example of a supervised method would be healthcare organizations finding out
through predictive modeling what attributes distinguish fraudulent claims. In
supervised methods, the models and attributes are known and are applied to the
data to predict and dis-cover information. With unsupervised modeling, the
attributes and models of fraud are not known, but the pat-terns and clusters of
data uncovered by data mining can lead to new discoveries. Prediction
algorithms determine models or rules to predict continuous or discrete target
values given input data. For example, a prediction problem could include
attempting to predict the amount of an insurance claim or death rates given a
set of inputs. Classification algorithms determine models to predict discrete
values given input data. A classification problem might involve trying to
determine whether a particular purchase represents peculiar behavior is based
on some indicators (e.g., where the purchase was made, the amount of the
purchase, or the type of purchase). Exploration uncovers dimensionality in
input data. Trying to identify groups of similar customers based on spending habits
for a large, targeted mailing is an exploration problem. Affinity analysis
determines events which are going to occur in conjunction with another event.
Retailers use affinity analysis to analyze product purchase combinations in
grocery stores. A potential medical example would be analysis of patients’
signs and symptoms that occur together in a clinical trial. (Obenshain et al.,

Many researcher uses different process
model for the prediction of crop yield. Some researcher use Weka, open source tool
which used in many application algorithms for data analysis and prediction of
the model that gives good productivity within time. Is Weka, SPSS, Mat lab
gives the same result within given time and less error rate? In order to
address this issue, there should be a prediction of the model that analyzes
which process model gives best solution

The main objective of this research
work is prediction of cotton yield with area, yield, production, soil,
temperature, wind, air, rainfall, water availability, humidity, sunrise,
pesticides and fertilizer parameter using data mining technique (SVM, Cluster
analysis, Bayesian network) and statistical method (Regression) Weka, SPSS
which model give better prediction result within time and having a minimum
error rate.

et al.,(2017) told that timesharing lease management is organization
where the dataset is conducted for evaluated. In this paper, data are moving
from one to other platform for evaluated whole process of transformation.
Kalman filtering algorithm is used for the prediction on decision tree. Kalman
filtering is also called linear quadratic estimation. Intervention
probability of monitoring data and covariance ratio took as factor and examine.
Filtering and prediction function are complete by given data set which
collected from main platform. It
concluded that maintainability of data is high and intervention of data is low
and select limit value, regression establishes relationship between limit,
measure and theoretical value.

            Baruah et al.,(2016) have studied that prediction of tea plantation using
data mining approaches with various changes in Metrologic factors and
influence. The datasets collected from last 30 years of region Assam, India.
Using multiple linear Regression techniques with climate changes (rainfall, temperature, relative humidity, evaporation, and sunshine) to
the prediction. This technique used for the future
prediction of the crop production and adopt the various method to maximize the
yield. Hence this model in order
to achieve maximum yields from the plantations in the tea industry in making
best decisions for management well.

          Sujatha et al.,(2016) said that
very challenging function to the prediction of crop yield. The uncertainty of
weather changes and water deficiency resource due to any disaster can affect
the production of the crop that’s why having some precaution and agriculture
data set. This data sets should be used for analysis. To obtain information by
using the data mining technique of the receptive result of crop yield. By using
the data mining technique improves the value and gain of the land. This paper describes methodologies of yield prediction and
how to improve the efficiency of agriculture and crop yield.

            Paul et al.,(2015) stated that
prediction of cotton yield has the significant effect on farmers. Farmer’s
experience of crop makes possibilities of yield prediction. Using data mining
technique to the prediction of crop yield to given soil dataset. Naive
Bayes and K-Nearest Neighbor technique are used to the prediction of crop
yield. In this paper, to predict the crop yield using data mining
techniques, soil used as the variable that is analyses into low, medium and
high classification. The land produces a better result from crop production of
the help of soil analysts and farmers opinion to sowing the crop.

              Savla et al.,(2015) told that
collection of the large amount of agriculture data and using data mining
approach to the yield prediction. In this study, classification technique of
data mining is used. Data variables are collected over the year that used in
various classification algorithms. Compare these and evaluate one of the best
algorithms with respect to classification techniques to the prediction of the
soybeans crop. Support vector machine, random forest, neural network, REPTree,
bagging, and Bayes algorithms are used for the prediction of crop yield. Hence,
bagging algorithm gives the best result for the soybeans yield prediction.

              Malhotra et al.,(2015)  developed models for the prediction of
software quality with various data mining approaches like Machine learning and
search based techniques(SBT). Data
collected from three open source software and used the presentation for ten
machine learning and search-based techniques. One data set used to developing a
prediction model and two other data sets used for inter-project validation to
indicate accurate results. The comparable performance of SBT and ML techniques
are shown. This study as we successfully applied the model created using the
training data of one project on other similar projects and for yield good
results, it supports inter project volition and effectively creates the model. ML
and SBT yield good result as compare to statistical LR techniques by applied
to change proneness prediction problem.
The scholar used these result to preservation the efficiency and costs by
change prone classes.

            Gandhi et al.,(2015) talked about in this article are this data mining
technique for decision making. In this study, various techniques (artificial neural networks, Bayesian networks
and support vector machines) are efficiently performing the relationship
between weather and other factors that are used in the crop production.
Decision support tools are used in these techniques for the production of the
crop and also explained predictions, pest and disease management. By using GIS technologies, understand these
techniques with composite agricultural datasets for prediction of the crop
yield under the seasonal and spatial factors.

              Cooper et al.,(2011) told that in
education goals chemistry failure rate is high in higher education system. In
this paper use the accurate predictive system to investigate students are at
risk or falling in C grade in chemistry. There are using a genetically
optimized neural network to investigate the results of a diagnostic algebra
test designed for a specific population. Once at-risk students have been
identified, they can be helped to improve their chances of success using techniques
such as concurrent support courses, online tutorials, “just-in-time”
instructional aides, study skills, motivational interviewing, and/or peer

                Bo (2011) said that the polyester/cotton blended yarn hairiness in warping process
is affected by fiber performance and processing parameters, which makes its
prediction difficult. Among these processes, warping process parameters play an
important role in warping yarn hairiness through warping process. To examine
the effect of various warping process parameters on yarn hairiness, in this
work, we used the ANN method to predict the hairiness of polyester/cotton yarn
in warping process with warping process parameters. The results show that the
hairiness can be well predicted by us. The results show that the ANN model
yields more accurate and stable predictions, which indicates that the ANN
theory is an effective and viable modeling method.

             Thomas et al.,(2004) have studied
that students’
characteristics and experiences that affect
satisfaction, and used regression and decision tree analysis with the CHAID
algorithm and examine student-opinion data. There are three measures of general
satisfaction to analyze the specific aspects of students’ university experience
using data mining techniques. It must be have good relationship between student
achievement and principle determinant. Social integration and pre-enrollment
opinions are also important. Decision tree analysis reveals that social
integration has more effect on the satisfaction of students who are less
academically engaged.

           Yang et al., have studied that using the cluster analysis approach of
data mining to predict the crop yield. The indexes of data included daily air
temperature, rain, snow, and sunshine and homogenize the cotton rational key.
These factors play important role in production of the crop. Metrologic
condition depends upon the yield of the crop. Metrologic data and rational key
of cotton operated at the growth period of cotton correspond. The prediction of
cotton yield possibility by meteorological effect and factor which improves the
accuracy of cluster analysis.

This study is based on secondary data of
crops for predicting production and yield of cotton crops. The production and
yield data for cotton have been taken from various issues of Economic Survey of
Pakistan (GOP, Various issues), and Agricultural Statistics of Pakistan (GOP).
The study covers data from 2000 to 2017. Average annual growth rate of
production and yield and other for cotton crops are reported. Various models
have been used to forecast time series data; however, Regressive Analysis,
SVMs, Clustering, and Bayesian Network technique is used by this study to
forecast the yield of cotton.

vector machines (SVMs) were applied to build the predictive models based on
explanatory variables representing the growth and nutrition conditions after
the heading stage and the meteorological environment after the late spikelet
initiation stage. The models achieved quantitative accuracy that was within
approximately 1 tons per hectare in yield for 85.1% of the total data sets. Further,
patterns of explanatory variables classified in three classes of yield, which
were visualized by the predictive models, were reasonable in terms of knowledge
of crop science. The predictive models using SVMs had the potential to describe
a relation between yield and multiple explanatory variables that reflected
diverse rice production in actual fields, and could provide useful knowledge
for decision-making of topdressing and basal fertilization. (Saruta et al., 2013)


Clustering is a data mining technique, which is used
to place data elements into related groups without advance knowledge of the
group definitions. Here, we propose an incremental clustering technique for
managing knowledge in edaphology, a study concerned with the influence of soils
on living things, particularly plants. The soil information along with the
appropriate plants to be cultivated on it for better yield, collected by
edaphologists , are utilized in the proposed system. Initially, an incremental
DBSCAN algorithm is applied to a dynamic database where, the data may be
frequently updated. Then, the data available in the soil database is grouped
into clusters and every new element is added into it without the need of
rerunning process. Finally, we have performed the plant prediction using
regression model. The experimentation is carried out in soil database to
analyze the performance of the proposed system in plant prediction. (Meenakshi
et al., 2012)

analysis is data mining approach in which same data elements take place in same
group without knowledge of any group. In this paper, using incremental cluster
analysis technique for handling the soil effect of living thing such as plant
and collected the deferent type of soil information where plant cultivated.
Incremental Density-based
spatial clustering of applications with noise (DBSCAN) on vigorous data set and
data is updated repeatedly. In cluster analysis, soil data are classified into
groups and similar type of soil placed in similar last regression
analysis model used for prediction of the concluded that performance
of plant prediction examined by the soil information.

Networks (BN) is the
computer science technique to assess the model and parameter uncertainty by
investigating the reasoning on conditional dependencies. BN is a method for
representing beliefs and knowledge using probabilities, especially relevant for
systems that are highly complex in their structure and functional interactions.
BN uses the probabilistic components of a framework, as opposed to
deterministic comparisons to describe the connections among variables. (Gandhi et al., 2012)


All the data will be collected from
Statistic bureau of Pakistan. Techniques use of data collecting will be
research articles. The results of the mentioned techniques will be analyzed by
applying statistical tool. The final selection of model will be made on the
basis of results this will also help to find that which process model satisfies
the prediction of crop yield within given time while lowering error rate and
maintain quality.

1      Y.
Long et al., “Research on Kalman Filter Prediction Method Based on
Decision Tree Analysis,” 2017 4th Int. Conf. Inf. Sci. Control Eng., pp.
1656–1658, 2017.

2      R.
D. Baruah, S. Roy, R. M. Bhagat, and L. N. Sethi, “Use of data mining technique
for prediction of tea yield in the face of climate change of Assam, India,” Proc.
– 2016 15th Int. Conf. Inf. Technol. ICIT 2016, pp. 265–269, 2017.

3      A.
B. Baesens et al., “50 Years of Data Mining and OR?: Upcoming Trends and
Challenges Linked references are available on JSTOR for this article?: 50 years
of data mining and OR?: upcoming trends and challenges,” vol. 60, 2017.

 4     K. Ullah et al., “Impact of
temperature on yield and related traits in cotton genotypes,” J. Integr.
Agric., vol. 15, no. 3, pp. 678–683, 2016

5      S.
Merugula, “A Study on Software Defect Prediction Using Classification
Techniques,” vol. 7, no. 11, pp. 437–440, 2016.

6      N.
Gandhi and L. J. Armstrong, “A review of the application of data mining
techniques for decision making in agriculture,” pp. 1–6, 2016.

7      E.
H. Thomas and N. Galambos, “What Satisfies Students?? Mining Student-Opinion
Data with Regression and Decision Tree Analysis Author ( s ): Emily H . Thomas
and Nora Galambos Source?: Research in Higher Education , Vol . 45 , No . 3 (
May , 2004 ), pp . 251-269 Published by?: Springe,” vol. 45, no. 3, pp.
251–269, 2016.

8      M.
Paul, S. K. Vishwakarma, and A. Verma, “Analysis of Soil Behaviour and
Prediction of Crop Yield Using Data Mining Approach,” 2015 Int. Conf.
Comput. Intell. Commun. Networks, pp. 766–771, 2015.

9      A.
Savla and A. Mandholia, “Survey of classification algorithms for formulating
yield prediction accuracy in precision agriculture,” 2015.

10      R. Malhotra
and M. Khanna, “Mining the impact of object oriented metrics for change           prediction using Machine Learning and
Search-based techniques,” 2015 Int. Conf. Adv. Comput. Commun. Informatics,
ICACCI 2015, pp. 228–234, 2015.

11      S. Ali, N. Badar, and H. Fatima,
“Forecasting Production and Yield of Sugarcane and Cotton Crops of Pakistan for
2013-2030,” Sarhad J. Agric., vol. 31, no. 1, pp. 1–10, 2015.

12    K.
Saruta, Y. Hirai, K. Tanaka, E. Inoue, T. Okayasu, and M. Mitsuoka, “Predictive
models for yield and protein content of brown rice using support vector
machine,” Comput. Electron. Agric., vol. 99, pp. 93–100, 2013.

13      C. I. Cooper and P. T. Pearson, “A
Genetically Optimized Predictive System for Success in General Chemistry Using
a Diagnostic Algebra Test,” J. Sci. Educ. Technol., vol. 21, no. 1, psp.
197–205, 2012.

14      F. Yang, “A
Study of The Method for Dynamic Prediction of Chinese Cotton â€TM s
Unit Yield.”

15    A.
Meenakshi, “Localized Matching Model for Plant Prediction Using Incremental
Clustering,” 2012.

 16   Zhao Bo, “Applying artificial neural network
technique and theory to study the hairiness of polyester/cotton blended yarn in
warping process,” 2011 Int. Conf. Inf. Technol. Comput. Eng. Manag. Sci.,
pp. 282–285, 2011.

 17   M.
K. Obenshain, “Application of data mining techniques to healthcare data.,” Infect.
Control Hosp. Epidemiol., vol. 25, no. 8, pp. 690–5, 2004