Document (#27067)

Author
Abdelali, A.
Title
Localization in modern standard Arabic
Source
Journal of the American Society for Information Science and technology. 55(2004) no.1, S.23-28
Year
2004
Abstract
Modern Standard Arabic (MSA) is the official language used in all Arabic countries. In this paper we describe an investigation of the uniformity of MSA across different countries. Many studies have been carried out locally or regionally an Arabic and its dialects. Here we look an a more global scale by studying language variations between countries. The source material used in this investigation was derived from national newspapers available an the Web, which provided samples of common media usage in each country. This corpus has been used to investigate the lexical characteristics of Modern Standard Arabic as found in 10 different Arabic speaking countries. We describe our collection methods, the types of lexical analysis performed, and the results of our investigations. With respect to newspaper articles, MSA seems to be very uniform across all the countries included in the study, but we have detected various types of differences, with implications for computational processing of MSA.
Theme
Computerlinguistik

Similar documents (content)

  1. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.22
    0.22135533 = sum of:
      0.22135533 = product of:
        0.9223139 = sum of:
          0.007935338 = weight(abstract_txt:this in 413) [ClassicSimilarity], result of:
            0.007935338 = score(doc=413,freq=2.0), product of:
              0.037310615 = queryWeight, product of:
                1.0690088 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014504847 = queryNorm
              0.21268311 = fieldWeight in 413, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
          0.012678536 = weight(abstract_txt:been in 413) [ClassicSimilarity], result of:
            0.012678536 = score(doc=413,freq=1.0), product of:
              0.056123894 = queryWeight, product of:
                1.0705165 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014504847 = queryNorm
              0.22590263 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
          0.019483274 = weight(abstract_txt:language in 413) [ClassicSimilarity], result of:
            0.019483274 = score(doc=413,freq=1.0), product of:
              0.074738264 = queryWeight, product of:
                1.2353526 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014504847 = queryNorm
              0.26068673 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
          0.021551782 = weight(abstract_txt:used in 413) [ClassicSimilarity], result of:
            0.021551782 = score(doc=413,freq=2.0), product of:
              0.07262915 = queryWeight, product of:
                1.4914906 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014504847 = queryNorm
              0.29673737 = fieldWeight in 413, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
          0.07798416 = weight(abstract_txt:modern in 413) [ClassicSimilarity], result of:
            0.07798416 = score(doc=413,freq=1.0), product of:
              0.21567664 = queryWeight, product of:
                2.5701983 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.014504847 = queryNorm
              0.3615791 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
          0.78268087 = weight(abstract_txt:arabic in 413) [ClassicSimilarity], result of:
            0.78268087 = score(doc=413,freq=5.0), product of:
              0.73939294 = queryWeight, product of:
                6.730041 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.014504847 = queryNorm
              1.0585452 = fieldWeight in 413, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=413)
        0.24 = coord(6/25)
    
  2. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.22
    0.2160468 = sum of:
      0.2160468 = product of:
        0.900195 = sum of:
          0.0069434205 = weight(abstract_txt:this in 3953) [ClassicSimilarity], result of:
            0.0069434205 = score(doc=3953,freq=2.0), product of:
              0.037310615 = queryWeight, product of:
                1.0690088 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014504847 = queryNorm
              0.18609773 = fieldWeight in 3953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.019214883 = weight(abstract_txt:been in 3953) [ClassicSimilarity], result of:
            0.019214883 = score(doc=3953,freq=3.0), product of:
              0.056123894 = queryWeight, product of:
                1.0705165 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014504847 = queryNorm
              0.34236547 = fieldWeight in 3953, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.016286075 = weight(abstract_txt:different in 3953) [ClassicSimilarity], result of:
            0.016286075 = score(doc=3953,freq=2.0), product of:
              0.05753922 = queryWeight, product of:
                1.0839305 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.014504847 = queryNorm
              0.28304303 = fieldWeight in 3953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.034095727 = weight(abstract_txt:language in 3953) [ClassicSimilarity], result of:
            0.034095727 = score(doc=3953,freq=4.0), product of:
              0.074738264 = queryWeight, product of:
                1.2353526 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014504847 = queryNorm
              0.45620176 = fieldWeight in 3953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.013334485 = weight(abstract_txt:used in 3953) [ClassicSimilarity], result of:
            0.013334485 = score(doc=3953,freq=1.0), product of:
              0.07262915 = queryWeight, product of:
                1.4914906 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014504847 = queryNorm
              0.18359688 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.81032044 = weight(abstract_txt:arabic in 3953) [ClassicSimilarity], result of:
            0.81032044 = score(doc=3953,freq=7.0), product of:
              0.73939294 = queryWeight, product of:
                6.730041 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.014504847 = queryNorm
              1.0959266 = fieldWeight in 3953, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
        0.24 = coord(6/25)
    
  3. Mutawa, F.; Alnajem, S.; Alzhouri, F.: ¬An HPSG approach to Arabic nominal sentences (2008) 0.21
    0.20824915 = sum of:
      0.20824915 = product of:
        1.3015572 = sum of:
          0.015870675 = weight(abstract_txt:this in 2368) [ClassicSimilarity], result of:
            0.015870675 = score(doc=2368,freq=2.0), product of:
              0.037310615 = queryWeight, product of:
                1.0690088 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014504847 = queryNorm
              0.42536622 = fieldWeight in 2368, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.125 = fieldNorm(doc=2368)
          0.025357071 = weight(abstract_txt:been in 2368) [ClassicSimilarity], result of:
            0.025357071 = score(doc=2368,freq=1.0), product of:
              0.056123894 = queryWeight, product of:
                1.0705165 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014504847 = queryNorm
              0.45180526 = fieldWeight in 2368, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.125 = fieldNorm(doc=2368)
          0.04780541 = weight(abstract_txt:types in 2368) [ClassicSimilarity], result of:
            0.04780541 = score(doc=2368,freq=1.0), product of:
              0.08565113 = queryWeight, product of:
                1.3224705 = boost
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.014504847 = queryNorm
              0.55814105 = fieldWeight in 2368, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.125 = fieldNorm(doc=2368)
          1.212524 = weight(abstract_txt:arabic in 2368) [ClassicSimilarity], result of:
            1.212524 = score(doc=2368,freq=3.0), product of:
              0.73939294 = queryWeight, product of:
                6.730041 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.014504847 = queryNorm
              1.6398913 = fieldWeight in 2368, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.125 = fieldNorm(doc=2368)
        0.16 = coord(4/25)
    
  4. Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.20
    0.19817694 = sum of:
      0.19817694 = product of:
        1.2386059 = sum of:
          0.0119030075 = weight(abstract_txt:this in 83) [ClassicSimilarity], result of:
            0.0119030075 = score(doc=83,freq=2.0), product of:
              0.037310615 = queryWeight, product of:
                1.0690088 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014504847 = queryNorm
              0.31902468 = fieldWeight in 83, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=83)
          0.032939803 = weight(abstract_txt:been in 83) [ClassicSimilarity], result of:
            0.032939803 = score(doc=83,freq=3.0), product of:
              0.056123894 = queryWeight, product of:
                1.0705165 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014504847 = queryNorm
              0.5869123 = fieldWeight in 83, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.09375 = fieldNorm(doc=83)
          0.019741705 = weight(abstract_txt:different in 83) [ClassicSimilarity], result of:
            0.019741705 = score(doc=83,freq=1.0), product of:
              0.05753922 = queryWeight, product of:
                1.0839305 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.014504847 = queryNorm
              0.34309995 = fieldWeight in 83, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.09375 = fieldNorm(doc=83)
          1.1740214 = weight(abstract_txt:arabic in 83) [ClassicSimilarity], result of:
            1.1740214 = score(doc=83,freq=5.0), product of:
              0.73939294 = queryWeight, product of:
                6.730041 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.014504847 = queryNorm
              1.5878179 = fieldWeight in 83, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.09375 = fieldNorm(doc=83)
        0.16 = coord(4/25)
    
  5. Aqeel, S.U.; Beitzel, S.M.; Jensen, E.C.; Grossman, D.; Frieder, O.: On the development of name search techniques for Arabic (2006) 0.17
    0.17065673 = sum of:
      0.17065673 = product of:
        0.8532836 = sum of:
          0.009919172 = weight(abstract_txt:this in 289) [ClassicSimilarity], result of:
            0.009919172 = score(doc=289,freq=2.0), product of:
              0.037310615 = queryWeight, product of:
                1.0690088 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014504847 = queryNorm
              0.26585388 = fieldWeight in 289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=289)
          0.01584817 = weight(abstract_txt:been in 289) [ClassicSimilarity], result of:
            0.01584817 = score(doc=289,freq=1.0), product of:
              0.056123894 = queryWeight, product of:
                1.0705165 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014504847 = queryNorm
              0.2823783 = fieldWeight in 289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.078125 = fieldNorm(doc=289)
          0.01645142 = weight(abstract_txt:different in 289) [ClassicSimilarity], result of:
            0.01645142 = score(doc=289,freq=1.0), product of:
              0.05753922 = queryWeight, product of:
                1.0839305 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.014504847 = queryNorm
              0.28591663 = fieldWeight in 289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=289)
          0.053237326 = weight(abstract_txt:standard in 289) [ClassicSimilarity], result of:
            0.053237326 = score(doc=289,freq=1.0), product of:
              0.14410187 = queryWeight, product of:
                2.1008735 = boost
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.014504847 = queryNorm
              0.36944228 = fieldWeight in 289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.078125 = fieldNorm(doc=289)
          0.7578275 = weight(abstract_txt:arabic in 289) [ClassicSimilarity], result of:
            0.7578275 = score(doc=289,freq=3.0), product of:
              0.73939294 = queryWeight, product of:
                6.730041 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.014504847 = queryNorm
              1.024932 = fieldWeight in 289, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.078125 = fieldNorm(doc=289)
        0.2 = coord(5/25)