Pseudo-relevance feedback (PRF) is an effective query expansion method for searching candidate terms based on top-ranked documents. PRF uses a statistical approach that relies heavily on term-based document retrievals. It causes a semantic gap between query and document that causes vocabulary mismatch problems, such as synonyms and polysemy. In this paper, we proposed pseudo-relevance feedback that combines statistical and semantic term extraction using Term Frequency-Inverse Document Frequency (TFIDF), Bidirectional Encoder Representations from Transformers (BERT), and Yet Another Keyword Extraction (YAKE) for searching Arabic documents. Pseudorelevance feedback will produce top-ranked documents that are relevant to the query. Term extraction on top-ranked documents using TFIDF, BERT, and YAKE was conducted to obtain candidate terms related statistically and semantically, thus understanding the context of the query sentence. The term extraction results will be re-ranked using Borda ranking to ensure that the Top-K candidate terms are relevant and used for searching the final document with the original query. From the experimental results, the proposed method obtains Precision 5 (P5), P10, Recall, Mean Reciprocal Rate (MRR), and Success Rate (SR@K) of 21%, 14%, 42%, 42%, and 58%, respectively. These results indicate that PRF with a combination of statistical and semantic term extraction for searching Arabic documents can improve the relevant document results.
|Number of pages||9|
|Journal||International Journal of Intelligent Engineering and Systems|
|Publication status||Published - 2021|
- Pseudo-relevant feedback
- Searching arabic documents
- Semantic feature
- Statistical feature