Research article

AN EFFECTIVE DATA IMPUTATION TECHNIQUE FOR MEDICINAL AGRICULTURE COMMODITY TIME-SERIES DATA

Aditi M Joshi1 and Sanjay G Patel2

Online First: October 29, 2022


Imputation of missing data is a big aspect of feature engineering. What should you do if your data has missing data elements? This is what happens in the actual world when you make an observation. One of the most typical challenges that develop throughout the data observation or data recording process is missing values. The need to solve the problem of the incompleteness of data for advanced analysis becomes crucial. Imputation is a method of replacing missing data with a substitute value while keeping the majority of the data/information in a dataset. These methodologies are used because regularly removing data from a dataset is impractical and can result in a marked decrease in dataset size, which not only raises concerns about biasing the dataset but also leads to erroneous analysis. The rapid expansion of computer power has accelerated the generation of digital data in recent years. These make it possible to get new insights from large databases, also known as big data. Data analysts are working in fields as diverse as healthcare, banking and commerce, and finance to uncover hidden insights from massive amounts of data. For fruitful data analytics, data quality is a major concern for them. Missing data is an issue that hinders performance in data analytics. A bad prediction could result from incorrect imputation of missing variables. In the current era of big data, when a vast amount of data is generated every second and data utilization is a major concern for stakeholders, handling missing values efficiently becomes more crucial. For a medicinal agricultural product, the data we utilized has a lot of missing numbers in the middle of the time. The soft Impute approach was shown to be the best for Imputation of Timeseries data after analyzing the various imputation techniques for filling missing values for small and big gaps of missing data. In this case, we shall explain and describe many earlier pieces of literature on methodologies and approaches for handling missing values in time series data. This research also contains several potential missing data estimation methodologies that other researchers in this sector might examine. The purpose of the discussion is to help them in determining which method is currently in use, as well as the benefits and limitations of each.

Keywords

Agriculture, Commodities, Imputation, soft Impute, Moving Average, Linear Interpolation, Polynomial Interpolation, Missing data