TY - JOUR
T1 - Hydrophobicity signal analysis for robust SARS-CoV-2 classification
AU - Jamhuri, Mohammad
AU - Irawan, Mohammad Isa
AU - Mukhlash, Imam
AU - Puspaningsih, Ni Nyoman Tri
N1 - Publisher Copyright:
© 2025 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2025/2
Y1 - 2025/2
N2 - Rapid and accurate classification of viral pathogens is critical for effective public health interventions. This study introduces a novel approach using convolutional neural networks (CNN) to classify SARS-CoV-2 and non-SARS-CoV-2 viruses via hydrophobicity signal derived from DNA sequences. Conventional machine learning methods grapple with the variability of viral genetic material, requiring fixed-length sequences and extensive preprocessing. The proposed method transforms genetic sequences into image-based representations, enabling CNNs to handle complexity and variability without these constraints. The dataset includes 8,143 DNA sequences from seven coronaviruses, translated into amino acid sequences and evaluated for hydrophobicity. Experimental results demonstrate that the CNN model achieves superior performance, with an accuracy of over 99.84% in the classification task. The model also performs well with extended sequence lengths, showcasing robustness and adaptability. Compared to previous studies, this method offers higher accuracy and computational efficiency, providing a reliable solution for rapid virus detection with potential applications in bioinformatics and clinical settings.
AB - Rapid and accurate classification of viral pathogens is critical for effective public health interventions. This study introduces a novel approach using convolutional neural networks (CNN) to classify SARS-CoV-2 and non-SARS-CoV-2 viruses via hydrophobicity signal derived from DNA sequences. Conventional machine learning methods grapple with the variability of viral genetic material, requiring fixed-length sequences and extensive preprocessing. The proposed method transforms genetic sequences into image-based representations, enabling CNNs to handle complexity and variability without these constraints. The dataset includes 8,143 DNA sequences from seven coronaviruses, translated into amino acid sequences and evaluated for hydrophobicity. Experimental results demonstrate that the CNN model achieves superior performance, with an accuracy of over 99.84% in the classification task. The model also performs well with extended sequence lengths, showcasing robustness and adaptability. Compared to previous studies, this method offers higher accuracy and computational efficiency, providing a reliable solution for rapid virus detection with potential applications in bioinformatics and clinical settings.
KW - Convolutional neural networks
KW - Genetic sequencing
KW - Kyte-Doolittle scale
KW - Machine learning
KW - Virus identification
UR - http://www.scopus.com/inward/record.url?scp=85210762624&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v37.i2.pp1294-1305
DO - 10.11591/ijeecs.v37.i2.pp1294-1305
M3 - Article
AN - SCOPUS:85210762624
SN - 2502-4752
VL - 37
SP - 1294
EP - 1305
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 2
ER -