Information Retrieval in Arabic Language

Information Retrieval in Arabic Language

Malek Boualem (France Telecom Orange Labs, France)

Ramzi Abbes (Lyon 2 University / ICAR-CNRS, France)

LREC 2008

Malek BOUALEM & Ramzi ABBES v2

Abstract

Web search engines provide quite good results for Latin characters-based languages. However, they still show many weaknesses when searching in other languages such as Arabic. This paper discusses a qualitative analysis of information retrieval in Arabic, highlighting some of the numerous limitations of available search engines, mainly when they are not properly adapted to the Arabic language features. To support our analysis we present some results based on thorough observations about various Arabic linguistic phenomena. To validate these observations, we mainly have tested the Google search engine. Arabic information ret rieval still faces many difficulties due to the Arabic linguistic features, especially its complex morphology and the absence of vowels in available documents and texts. These specificities often cause signific ant dissymmetry between the indexation process and the query analysis. We present in this paper some of the morphological constraints of Arabic language and we show through experimental tests how search engines deal with them. Finally this paper clearly states that information retrieval in Arabic language will never succeed without including language processing tools at all the linguistic levels (lexical, syntactic and semantic).
Keywords:
Information retrieval, Arabic language, Google