TY - GEN
T1 - Chronic Disease Prediction Model Using Integration of DBSCAN, SMOTE-ENN, and Random Forest
AU - Fitriyani, Norma Latif
AU - Syafrudin, Muhammad
AU - Alfian, Ganjar
AU - Yang, Chuan Kai
AU - Rhee, Jongtae
AU - Ulyah, Siti Maghfirotul
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Heart disease (HD) is number one chronic disease and becomes a major cause of worldwide disability and death. Aside of HD, type 2 diabetes (T2D) is also as the most deathful diseases that causes serious issues if untreated and undetected. HD and T2D predictions are the most effective measures to control the HD and T2D. Thus, early HD and T2D predictions are important to help individuals in preventing the occurrence of the worst cases. This study proposes a chronic disease prediction model for HD and T2D prediction. The proposed study utilized random forest combined with DBSCAN as outlier detection method and SMOTE-ENN as data balancing method. Two HD datasets (Statlog and Cleveland) and one T2D dataset (NHIS Korea) were used for building the model and comparing the results with other existing machine learning (ML) algorithms, including GNB, LR, MLP, DT, and SVM. To measure the performance of the model, k-fold (10) cross-validation and several performance metrics including accuracy, precision, f-measure, and recall are applied in this study. The results show the model that we proposed outperforms other classification models, as well as previous studies, with accuracy rates 97.63%, 97.69%, and 94.85% for Statlog HD dataset, Cleveland HD dataset and NHIS T2D dataset, respectively. By utilizing the proposed model, it could increase the expectation in preventing the occurrence of the worst case and helping individuals in taking fast and precise actions when status of HD and T2D are detected.
AB - Heart disease (HD) is number one chronic disease and becomes a major cause of worldwide disability and death. Aside of HD, type 2 diabetes (T2D) is also as the most deathful diseases that causes serious issues if untreated and undetected. HD and T2D predictions are the most effective measures to control the HD and T2D. Thus, early HD and T2D predictions are important to help individuals in preventing the occurrence of the worst cases. This study proposes a chronic disease prediction model for HD and T2D prediction. The proposed study utilized random forest combined with DBSCAN as outlier detection method and SMOTE-ENN as data balancing method. Two HD datasets (Statlog and Cleveland) and one T2D dataset (NHIS Korea) were used for building the model and comparing the results with other existing machine learning (ML) algorithms, including GNB, LR, MLP, DT, and SVM. To measure the performance of the model, k-fold (10) cross-validation and several performance metrics including accuracy, precision, f-measure, and recall are applied in this study. The results show the model that we proposed outperforms other classification models, as well as previous studies, with accuracy rates 97.63%, 97.69%, and 94.85% for Statlog HD dataset, Cleveland HD dataset and NHIS T2D dataset, respectively. By utilizing the proposed model, it could increase the expectation in preventing the occurrence of the worst case and helping individuals in taking fast and precise actions when status of HD and T2D are detected.
KW - heart disease
KW - machine learning
KW - outlier
KW - type 2 diabetes
KW - unbalanced data
UR - http://www.scopus.com/inward/record.url?scp=85140914955&partnerID=8YFLogxK
U2 - 10.1109/ICETSIS55481.2022.9888806
DO - 10.1109/ICETSIS55481.2022.9888806
M3 - Conference contribution
AN - SCOPUS:85140914955
T3 - 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems, ICETSIS 2022
SP - 289
EP - 294
BT - 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems, ICETSIS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems, ICETSIS 2022
Y2 - 22 June 2022 through 23 June 2022
ER -