TY - GEN
T1 - An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification
AU - Dina, Nasa Zata
AU - Ravana, Sri Devi
AU - Idris, Norisma
N1 - Funding Information:
ACKNOWLEDGMENT This works was supported by RU Faculty Program grant GPF098C-2020, Universiti Malaya.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.
AB - Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.
KW - hybrid feature selection
KW - SVM-RFE
KW - text sentiment classification
UR - http://www.scopus.com/inward/record.url?scp=85148595168&partnerID=8YFLogxK
U2 - 10.1109/SKIMA57145.2022.10029452
DO - 10.1109/SKIMA57145.2022.10029452
M3 - Conference contribution
AN - SCOPUS:85148595168
T3 - International Conference on Software, Knowledge Information, Industrial Management and Applications, SKIMA
SP - 270
EP - 275
BT - 14th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2022
Y2 - 2 December 2022 through 4 December 2022
ER -