Document (#32588)

Author
Lazarinis, F.
Title
Engineering and utilizing a stopword list in Greek Web retrieval
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1645-1652
Year
2007
Abstract
The main aim of the article is the presentation of the construction process of a stopword list for a non-Latin language and the evaluation of the effect of stopword elimination from user queries. The article presents the phases of engineering a stopword list for the Greek language as well as the problems faced and the inferences deduced from this procedure. A set of 32 authentic queries are proposed by users and are run in Google with and without the stopwords. The importance of eliminating the stopwords from the user queries is then evaluated, in terms of relevance, in the top-10 results from Google.

Similar documents (content)

  1. Johnson, B.; Peterson, E.: Reviewing initial stopword selection (1992) 0.27
    0.27179748 = sum of:
      0.27179748 = product of:
        2.2649791 = sum of:
          0.13995403 = weight(abstract_txt:list in 3628) [ClassicSimilarity], result of:
            0.13995403 = score(doc=3628,freq=2.0), product of:
              0.16787466 = queryWeight, product of:
                2.983687 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.010439138 = queryNorm
              0.8336817 = fieldWeight in 3628, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.109375 = fieldNorm(doc=3628)
          0.37538517 = weight(abstract_txt:stopwords in 3628) [ClassicSimilarity], result of:
            0.37538517 = score(doc=3628,freq=1.0), product of:
              0.35669127 = queryWeight, product of:
                3.551087 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010439138 = queryNorm
              1.0524092 = fieldWeight in 3628, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.109375 = fieldNorm(doc=3628)
          1.7496399 = weight(abstract_txt:stopword in 3628) [ClassicSimilarity], result of:
            1.7496399 = score(doc=3628,freq=5.0), product of:
              0.73332 = queryWeight, product of:
                7.2007346 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010439138 = queryNorm
              2.385916 = fieldWeight in 3628, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.109375 = fieldNorm(doc=3628)
        0.12 = coord(3/25)
    
  2. Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.17
    0.17255387 = sum of:
      0.17255387 = product of:
        1.0784616 = sum of:
          0.021840679 = weight(abstract_txt:language in 306) [ClassicSimilarity], result of:
            0.021840679 = score(doc=306,freq=1.0), product of:
              0.06702506 = queryWeight, product of:
                1.5393368 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.010439138 = queryNorm
              0.3258584 = fieldWeight in 306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.017887013 = weight(abstract_txt:from in 306) [ClassicSimilarity], result of:
            0.017887013 = score(doc=306,freq=2.0), product of:
              0.058670305 = queryWeight, product of:
                2.036757 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.010439138 = queryNorm
              0.30487338 = fieldWeight in 306, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.070687465 = weight(abstract_txt:list in 306) [ClassicSimilarity], result of:
            0.070687465 = score(doc=306,freq=1.0), product of:
              0.16787466 = queryWeight, product of:
                2.983687 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.010439138 = queryNorm
              0.42107287 = fieldWeight in 306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.9680465 = weight(abstract_txt:stopword in 306) [ClassicSimilarity], result of:
            0.9680465 = score(doc=306,freq=3.0), product of:
              0.73332 = queryWeight, product of:
                7.2007346 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010439138 = queryNorm
              1.3200874 = fieldWeight in 306, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
        0.16 = coord(4/25)
    
  3. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.14
    0.13658418 = sum of:
      0.13658418 = product of:
        0.85365117 = sum of:
          0.026208814 = weight(abstract_txt:language in 2373) [ClassicSimilarity], result of:
            0.026208814 = score(doc=2373,freq=1.0), product of:
              0.06702506 = queryWeight, product of:
                1.5393368 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.010439138 = queryNorm
              0.39103007 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.07193504 = weight(abstract_txt:queries in 2373) [ClassicSimilarity], result of:
            0.07193504 = score(doc=2373,freq=1.0), product of:
              0.15040526 = queryWeight, product of:
                2.824179 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.010439138 = queryNorm
              0.47827476 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.084824964 = weight(abstract_txt:list in 2373) [ClassicSimilarity], result of:
            0.084824964 = score(doc=2373,freq=1.0), product of:
              0.16787466 = queryWeight, product of:
                2.983687 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.010439138 = queryNorm
              0.50528747 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.6706823 = weight(abstract_txt:stopword in 2373) [ClassicSimilarity], result of:
            0.6706823 = score(doc=2373,freq=1.0), product of:
              0.73332 = queryWeight, product of:
                7.2007346 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010439138 = queryNorm
              0.91458344 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
        0.16 = coord(4/25)
    
  4. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.11
    0.1077266 = sum of:
      0.1077266 = product of:
        0.89772165 = sum of:
          0.070687465 = weight(abstract_txt:list in 955) [ClassicSimilarity], result of:
            0.070687465 = score(doc=955,freq=1.0), product of:
              0.16787466 = queryWeight, product of:
                2.983687 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.010439138 = queryNorm
              0.42107287 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.26813224 = weight(abstract_txt:stopwords in 955) [ClassicSimilarity], result of:
            0.26813224 = score(doc=955,freq=1.0), product of:
              0.35669127 = queryWeight, product of:
                3.551087 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010439138 = queryNorm
              0.7517208 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.55890197 = weight(abstract_txt:stopword in 955) [ClassicSimilarity], result of:
            0.55890197 = score(doc=955,freq=1.0), product of:
              0.73332 = queryWeight, product of:
                7.2007346 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010439138 = queryNorm
              0.7621529 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
        0.12 = coord(3/25)
    
  5. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.10
    0.09510866 = sum of:
      0.09510866 = product of:
        0.79257214 = sum of:
          0.03706486 = weight(abstract_txt:language in 5865) [ClassicSimilarity], result of:
            0.03706486 = score(doc=5865,freq=2.0), product of:
              0.06702506 = queryWeight, product of:
                1.5393368 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.010439138 = queryNorm
              0.5530001 = fieldWeight in 5865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.084824964 = weight(abstract_txt:list in 5865) [ClassicSimilarity], result of:
            0.084824964 = score(doc=5865,freq=1.0), product of:
              0.16787466 = queryWeight, product of:
                2.983687 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.010439138 = queryNorm
              0.50528747 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.6706823 = weight(abstract_txt:stopword in 5865) [ClassicSimilarity], result of:
            0.6706823 = score(doc=5865,freq=1.0), product of:
              0.73332 = queryWeight, product of:
                7.2007346 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010439138 = queryNorm
              0.91458344 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
        0.12 = coord(3/25)