Pseudo-Relevance Feedback Combining Statistical and Semantic Term Extraction for Searching Arabic Documents

Maryamah Maryamah, Agus Zainal Arifin, Riyanarto Sarno, Rarasmaya Indraswari, Rizka Wakhidatus Sholikah

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Pseudo-relevance feedback (PRF) is an effective query expansion method for searching candidate terms based on top-ranked documents. PRF uses a statistical approach that relies heavily on term-based document retrievals. It causes a semantic gap between query and document that causes vocabulary mismatch problems, such as synonyms and polysemy. In this paper, we proposed pseudo-relevance feedback that combines statistical and semantic term extraction using Term Frequency-Inverse Document Frequency (TFIDF), Bidirectional Encoder Representations from Transformers (BERT), and Yet Another Keyword Extraction (YAKE) for searching Arabic documents. Pseudorelevance feedback will produce top-ranked documents that are relevant to the query. Term extraction on top-ranked documents using TFIDF, BERT, and YAKE was conducted to obtain candidate terms related statistically and semantically, thus understanding the context of the query sentence. The term extraction results will be re-ranked using Borda ranking to ensure that the Top-K candidate terms are relevant and used for searching the final document with the original query. From the experimental results, the proposed method obtains Precision 5 (P5), P10, Recall, Mean Reciprocal Rate (MRR), and Success Rate (SR@K) of 21%, 14%, 42%, 42%, and 58%, respectively. These results indicate that PRF with a combination of statistical and semantic term extraction for searching Arabic documents can improve the relevant document results.

Original languageEnglish
Pages (from-to)238-246
Number of pages9
JournalInternational Journal of Intelligent Engineering and Systems
Volume14
Issue number5
DOIs
Publication statusPublished - 2021
Externally publishedYes

Keywords

  • Pseudo-relevant feedback
  • Searching arabic documents
  • Semantic feature
  • Statistical feature

Fingerprint

Dive into the research topics of 'Pseudo-Relevance Feedback Combining Statistical and Semantic Term Extraction for Searching Arabic Documents'. Together they form a unique fingerprint.

Cite this