An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification

Nasa Zata Dina, Sri Devi Ravana, Norisma Idris

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.

Original languageEnglish
Title of host publication14th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages270-275
Number of pages6
ISBN (Electronic)9781665493345
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event14th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2022 - Phnom Penh, Cambodia
Duration: 2 Dec 20224 Dec 2022

Publication series

NameInternational Conference on Software, Knowledge Information, Industrial Management and Applications, SKIMA
Volume2022-December
ISSN (Print)2373-082X
ISSN (Electronic)2573-3214

Conference

Conference14th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2022
Country/TerritoryCambodia
CityPhnom Penh
Period2/12/224/12/22

Keywords

  • hybrid feature selection
  • SVM-RFE
  • text sentiment classification

Fingerprint

Dive into the research topics of 'An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification'. Together they form a unique fingerprint.

Cite this