Research article

EXPLORATORY DATA ANALYSIS AND OPTIMAL FEATURE SELECTION ON POMEGRANATE DATASET FOR ENHANCING THE PERFORMANCE OF POMEGRANATE DISEASE PREDICTION

Vaishali Nirgude, Sheetal Rathi

Online First: October 18, 2022


Exploratory Data Analysis (EDA) and Feature Selection (FS) are crucial in the fields of machine learning and data mining. FS is a dimensionality reduction technique to select optimal features from the original features by eliminating noisy, unimportant, and redundant features. FS simplifies the models by reducing the number of features, lower computational cost by decreasing the training time, reduces overfitting, solves the curse of dimensionality, improves the machine learning models’ accuracy, and enhances the performance of the classification or prediction models. In this paper, a data collection framework using agriculture drone and sensors have been designed to collect real field weather, soil, and water parameters. FS techniques are applied to select important features from the original features. Statistical methods are used to analyze the correlation between all micro-level parameters with pomegranate diseases. The Machine Learning (ML) approach is used to develop a prediction model. Optimal features are provided to the pomegranate disease prediction model to predict the accurate disease and recommend disease preventive measures. Also, studied and analyzed the impact of sudden changes in climatic conditions on pomegranate diseases. Further, evaluated and compared the accuracy and loss of the various binary and multi-classification ML models. Experimental results prove that Random Forest (RF) has achieved excellent performance with an accuracy of 96.53%. The proposed method will help the agro-industry to detect and classify the most prominent diseases on pomegranate and to improve the growth and quality of fruits.

Keywords

Feature Selection, Exploratory Data Analysis, Machine Learning, Pomegranate, Disease Detection, Agriculture