TY - JOUR
T1 - TUBERCULOSIS CLASSIFICATION USING RANDOM FOREST WITH K-PROTOTYPE AS A METHOD TO OVERCOME MISSING VALUE
AU - Rochman, Eka Mala Sari
AU - Miswanto,
AU - Suprajitno, Herry
AU - Kamilah, Istifadatul
AU - Rachmad, Aeri
AU - Santosa, Iwan
N1 - Publisher Copyright:
© 2023 the author(s).
PY - 2023
Y1 - 2023
N2 - Tuberculosis is a disease that attacks the core of the respiratory organs, which affects many people. This disease is one of the contributors to high mortality cases, especially in Indonesia. Based on its anatomical location, tuberculosis is divided into two classes, namely pulmonary for tuberculosis detected in lung parenchymal tissue and extrapulmonary for tuberculosis detected in organs other than the lungs. Detecting the location of the infection in the lungs requires some analysis of laboratory results for the triggering parameters where the analysis process is still done manually, so it takes longer, and because the input process is still done manually, patient data which causes the possibility of human error to be very high. Therefore, the solution offered and the aim of this study is the ease of patient diagnosis in determining the classification of TB disease. The method used in this study is k-prototype imputation to repair missing values that have different data types, then for tuberculosis data classification methods and medical record data processing using the Random Forest, Support Vector Machine, and Backpropagation methods. Of the three classification methods proposed in this study, all three have an excellent level of accuracy. However, the Random Forest method performs more than other methods, reaching 98.8%.
AB - Tuberculosis is a disease that attacks the core of the respiratory organs, which affects many people. This disease is one of the contributors to high mortality cases, especially in Indonesia. Based on its anatomical location, tuberculosis is divided into two classes, namely pulmonary for tuberculosis detected in lung parenchymal tissue and extrapulmonary for tuberculosis detected in organs other than the lungs. Detecting the location of the infection in the lungs requires some analysis of laboratory results for the triggering parameters where the analysis process is still done manually, so it takes longer, and because the input process is still done manually, patient data which causes the possibility of human error to be very high. Therefore, the solution offered and the aim of this study is the ease of patient diagnosis in determining the classification of TB disease. The method used in this study is k-prototype imputation to repair missing values that have different data types, then for tuberculosis data classification methods and medical record data processing using the Random Forest, Support Vector Machine, and Backpropagation methods. Of the three classification methods proposed in this study, all three have an excellent level of accuracy. However, the Random Forest method performs more than other methods, reaching 98.8%.
KW - K-prototype
KW - classification
KW - imputation
KW - missing values
KW - random forest
KW - tuberculosis
UR - http://www.scopus.com/inward/record.url?scp=85168130544&partnerID=8YFLogxK
U2 - 10.28919/cmbn/7873
DO - 10.28919/cmbn/7873
M3 - Article
AN - SCOPUS:85168130544
SN - 2052-2541
VL - 2023
JO - Communications in Mathematical Biology and Neuroscience
JF - Communications in Mathematical Biology and Neuroscience
M1 - 81
ER -