In the case of high dimensional data with missing values, the process of collecting data from various sources may be miss accidentally, which affected the quality of learning outcomes. a large number of machine learning methods can be applied to explore the search area for imputation and selection of features and parameters. ML classification needs preprocessing with self-organizing map imputation (SOMI) before the imputation of missing values is done to improve the accuracy of the model. This study introduces a new approach that combines naive Bayes classification (NBC) and genetic algorithm (GA) optimization procedures to effectively explore the search space based on a sample of experimental points. GA is a classification model approach based on the selection of features that cause computational problems, such as reduced dimensions, uncertainty and imbalanced data sets with various classes. In the experiment, preprocessing the data using SOMI yielded error results that were up to 10% for various data sets with missing data compared to other methods. In the SOMI-GANB hybrid model, the experimental results show that the proposed method can significantly improve accuracy by up to 90% compared to other imputation methods and without feature selection. SOMI can be used for homogeneous, heterogeneous and mixed data sets. The results from the experiment clearly showed that the proposed method could significantly increase the yield compared to the other imputation methods and without feature selection. The combination of GA and naive Bayes classification was chosen because they are simple, easy-to-understand methods that are very effective in finding optimal solutions from a set of possible solutions. Naive Bayes imputation had higher accuracy compared to neural network imputation.
|Number of pages||10|
|Journal||International Journal of Intelligent Engineering and Systems|
|Publication status||Published - Feb 2020|
- Feature selection
- Genetic algorithm
- Incomplete data
- Naïve bayes