Document (#27067)

Author
Abdelali, A.
Title
Localization in modern standard Arabic
Source
Journal of the American Society for Information Science and technology. 55(2004) no.1, S.23-28
Year
2004
Abstract
Modern Standard Arabic (MSA) is the official language used in all Arabic countries. In this paper we describe an investigation of the uniformity of MSA across different countries. Many studies have been carried out locally or regionally an Arabic and its dialects. Here we look an a more global scale by studying language variations between countries. The source material used in this investigation was derived from national newspapers available an the Web, which provided samples of common media usage in each country. This corpus has been used to investigate the lexical characteristics of Modern Standard Arabic as found in 10 different Arabic speaking countries. We describe our collection methods, the types of lexical analysis performed, and the results of our investigations. With respect to newspaper articles, MSA seems to be very uniform across all the countries included in the study, but we have detected various types of differences, with implications for computational processing of MSA.
Theme
Computerlinguistik

Similar documents (content)

  1. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.22
    0.22118406 = sum of:
      0.22118406 = product of:
        0.9216003 = sum of:
          0.0127137145 = weight(abstract_txt:been in 3426) [ClassicSimilarity], result of:
            0.0127137145 = score(doc=3426,freq=1.0), product of:
              0.05623082 = queryWeight, product of:
                1.0721726 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144974515 = queryNorm
              0.22609869 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.008003982 = weight(abstract_txt:this in 3426) [ClassicSimilarity], result of:
            0.008003982 = score(doc=3426,freq=2.0), product of:
              0.037527584 = queryWeight, product of:
                1.0727499 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0144974515 = queryNorm
              0.21328263 = fieldWeight in 3426, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.019642543 = weight(abstract_txt:language in 3426) [ClassicSimilarity], result of:
            0.019642543 = score(doc=3426,freq=1.0), product of:
              0.07514924 = queryWeight, product of:
                1.2394809 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0144974515 = queryNorm
              0.26138046 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.021595871 = weight(abstract_txt:used in 3426) [ClassicSimilarity], result of:
            0.021595871 = score(doc=3426,freq=2.0), product of:
              0.072732255 = queryWeight, product of:
                1.4934362 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0144974515 = queryNorm
              0.2969229 = fieldWeight in 3426, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.07825076 = weight(abstract_txt:modern in 3426) [ClassicSimilarity], result of:
            0.07825076 = score(doc=3426,freq=1.0), product of:
              0.21618003 = queryWeight, product of:
                2.5747256 = boost
                5.7915254 = idf(docFreq=366, maxDocs=44218)
                0.0144974515 = queryNorm
              0.36197034 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7915254 = idf(docFreq=366, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.7813934 = weight(abstract_txt:arabic in 3426) [ClassicSimilarity], result of:
            0.7813934 = score(doc=3426,freq=5.0), product of:
              0.7386234 = queryWeight, product of:
                6.7305365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0144974515 = queryNorm
              1.0579051 = fieldWeight in 3426, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
        0.24 = coord(6/25)
    
  2. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.22
    0.21584673 = sum of:
      0.21584673 = product of:
        0.89936143 = sum of:
          0.0192682 = weight(abstract_txt:been in 2953) [ClassicSimilarity], result of:
            0.0192682 = score(doc=2953,freq=3.0), product of:
              0.05623082 = queryWeight, product of:
                1.0721726 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144974515 = queryNorm
              0.3426626 = fieldWeight in 2953, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.0070034843 = weight(abstract_txt:this in 2953) [ClassicSimilarity], result of:
            0.0070034843 = score(doc=2953,freq=2.0), product of:
              0.037527584 = queryWeight, product of:
                1.0727499 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0144974515 = queryNorm
              0.1866223 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.01636597 = weight(abstract_txt:different in 2953) [ClassicSimilarity], result of:
            0.01636597 = score(doc=2953,freq=2.0), product of:
              0.057730492 = queryWeight, product of:
                1.086376 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0144974515 = queryNorm
              0.28348917 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.03437445 = weight(abstract_txt:language in 2953) [ClassicSimilarity], result of:
            0.03437445 = score(doc=2953,freq=4.0), product of:
              0.07514924 = queryWeight, product of:
                1.2394809 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0144974515 = queryNorm
              0.45741582 = fieldWeight in 2953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.013361764 = weight(abstract_txt:used in 2953) [ClassicSimilarity], result of:
            0.013361764 = score(doc=2953,freq=1.0), product of:
              0.072732255 = queryWeight, product of:
                1.4934362 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0144974515 = queryNorm
              0.18371168 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.80898756 = weight(abstract_txt:arabic in 2953) [ClassicSimilarity], result of:
            0.80898756 = score(doc=2953,freq=7.0), product of:
              0.7386234 = queryWeight, product of:
                6.7305365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0144974515 = queryNorm
              1.095264 = fieldWeight in 2953, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
        0.24 = coord(6/25)
    
  3. Mutawa, F.; Alnajem, S.; Alzhouri, F.: ¬An HPSG approach to Arabic nominal sentences (2008) 0.21
    0.20798187 = sum of:
      0.20798187 = product of:
        1.2998867 = sum of:
          0.025427429 = weight(abstract_txt:been in 1368) [ClassicSimilarity], result of:
            0.025427429 = score(doc=1368,freq=1.0), product of:
              0.05623082 = queryWeight, product of:
                1.0721726 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144974515 = queryNorm
              0.45219737 = fieldWeight in 1368, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.125 = fieldNorm(doc=1368)
          0.016007964 = weight(abstract_txt:this in 1368) [ClassicSimilarity], result of:
            0.016007964 = score(doc=1368,freq=2.0), product of:
              0.037527584 = queryWeight, product of:
                1.0727499 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0144974515 = queryNorm
              0.42656526 = fieldWeight in 1368, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.125 = fieldNorm(doc=1368)
          0.04792182 = weight(abstract_txt:types in 1368) [ClassicSimilarity], result of:
            0.04792182 = score(doc=1368,freq=1.0), product of:
              0.08579494 = queryWeight, product of:
                1.324367 = boost
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.0144974515 = queryNorm
              0.55856234 = fieldWeight in 1368, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.125 = fieldNorm(doc=1368)
          1.2105294 = weight(abstract_txt:arabic in 1368) [ClassicSimilarity], result of:
            1.2105294 = score(doc=1368,freq=3.0), product of:
              0.7386234 = queryWeight, product of:
                6.7305365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0144974515 = queryNorm
              1.6388994 = fieldWeight in 1368, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.125 = fieldNorm(doc=1368)
        0.16 = coord(4/25)
    
  4. Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.20
    0.19791453 = sum of:
      0.19791453 = product of:
        1.2369658 = sum of:
          0.0330312 = weight(abstract_txt:been in 3096) [ClassicSimilarity], result of:
            0.0330312 = score(doc=3096,freq=3.0), product of:
              0.05623082 = queryWeight, product of:
                1.0721726 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144974515 = queryNorm
              0.5874216 = fieldWeight in 3096, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.012005973 = weight(abstract_txt:this in 3096) [ClassicSimilarity], result of:
            0.012005973 = score(doc=3096,freq=2.0), product of:
              0.037527584 = queryWeight, product of:
                1.0727499 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0144974515 = queryNorm
              0.31992394 = fieldWeight in 3096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.019838553 = weight(abstract_txt:different in 3096) [ClassicSimilarity], result of:
            0.019838553 = score(doc=3096,freq=1.0), product of:
              0.057730492 = queryWeight, product of:
                1.086376 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0144974515 = queryNorm
              0.3436408 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          1.17209 = weight(abstract_txt:arabic in 3096) [ClassicSimilarity], result of:
            1.17209 = score(doc=3096,freq=5.0), product of:
              0.7386234 = queryWeight, product of:
                6.7305365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0144974515 = queryNorm
              1.5868576 = fieldWeight in 3096, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
        0.16 = coord(4/25)
    
  5. Aqeel, S.U.; Beitzel, S.M.; Jensen, E.C.; Grossman, D.; Frieder, O.: On the development of name search techniques for Arabic (2006) 0.17
    0.17042671 = sum of:
      0.17042671 = product of:
        0.8521335 = sum of:
          0.015892142 = weight(abstract_txt:been in 5289) [ClassicSimilarity], result of:
            0.015892142 = score(doc=5289,freq=1.0), product of:
              0.05623082 = queryWeight, product of:
                1.0721726 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144974515 = queryNorm
              0.28262335 = fieldWeight in 5289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.078125 = fieldNorm(doc=5289)
          0.010004978 = weight(abstract_txt:this in 5289) [ClassicSimilarity], result of:
            0.010004978 = score(doc=5289,freq=2.0), product of:
              0.037527584 = queryWeight, product of:
                1.0727499 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0144974515 = queryNorm
              0.2666033 = fieldWeight in 5289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=5289)
          0.016532127 = weight(abstract_txt:different in 5289) [ClassicSimilarity], result of:
            0.016532127 = score(doc=5289,freq=1.0), product of:
              0.057730492 = queryWeight, product of:
                1.086376 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0144974515 = queryNorm
              0.28636733 = fieldWeight in 5289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=5289)
          0.053123347 = weight(abstract_txt:standard in 5289) [ClassicSimilarity], result of:
            0.053123347 = score(doc=5289,freq=1.0), product of:
              0.1439042 = queryWeight, product of:
                2.1006799 = boost
                4.725219 = idf(docFreq=1065, maxDocs=44218)
                0.0144974515 = queryNorm
              0.36915773 = fieldWeight in 5289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.725219 = idf(docFreq=1065, maxDocs=44218)
                0.078125 = fieldNorm(doc=5289)
          0.7565809 = weight(abstract_txt:arabic in 5289) [ClassicSimilarity], result of:
            0.7565809 = score(doc=5289,freq=3.0), product of:
              0.7386234 = queryWeight, product of:
                6.7305365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0144974515 = queryNorm
              1.0243121 = fieldWeight in 5289, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.078125 = fieldNorm(doc=5289)
        0.2 = coord(5/25)