Document (#35320)

Author
Dolamic, L.
Savoy, J.
Title
When stopword lists make the difference
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203
Year
2009
Series
Brief communication
Abstract
In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
Theme
Automatisches Indexieren

Similar documents (author)

  1. Savoy, J.: Stemming of French words based on grammatical categories (1993) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:savoy in 4649) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 4649, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=4649)
    
  2. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:savoy in 6510) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 6510, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=6510)
    
  3. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:savoy in 7291) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 7291, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=7291)
    
  4. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:savoy in 260) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 260, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=260)
    
  5. Savoy, J.: Searching information in legal hypertext systems (1993/94) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:savoy in 825) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 825, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=825)
    

Similar documents (content)

  1. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.18
    0.17654902 = sum of:
      0.17654902 = product of:
        0.5517157 = sum of:
          0.029115846 = weight(abstract_txt:usually in 288) [ClassicSimilarity], result of:
            0.029115846 = score(doc=288,freq=1.0), product of:
              0.07619183 = queryWeight, product of:
                1.0048519 = boost
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.012401246 = queryNorm
              0.3821387 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.0356714 = weight(abstract_txt:lower in 288) [ClassicSimilarity], result of:
            0.0356714 = score(doc=288,freq=1.0), product of:
              0.087237306 = queryWeight, product of:
                1.0752242 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.012401246 = queryNorm
              0.40890077 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.057110976 = weight(abstract_txt:statistically in 288) [ClassicSimilarity], result of:
            0.057110976 = score(doc=288,freq=2.0), product of:
              0.09476003 = queryWeight, product of:
                1.1206255 = boost
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.012401246 = queryNorm
              0.6026906 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.07115625 = weight(abstract_txt:divergence in 288) [ClassicSimilarity], result of:
            0.07115625 = score(doc=288,freq=1.0), product of:
              0.13823907 = queryWeight, product of:
                1.3535157 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.012401246 = queryNorm
              0.51473325 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.04350659 = weight(abstract_txt:performance in 288) [ClassicSimilarity], result of:
            0.04350659 = score(doc=288,freq=3.0), product of:
              0.086995155 = queryWeight, product of:
                1.5184847 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.012401246 = queryNorm
              0.5001036 = fieldWeight in 288, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.10257835 = weight(abstract_txt:randomness in 288) [ClassicSimilarity], result of:
            0.10257835 = score(doc=288,freq=1.0), product of:
              0.1764111 = queryWeight, product of:
                1.5290118 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.012401246 = queryNorm
              0.5814733 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.02723595 = weight(abstract_txt:when in 288) [ClassicSimilarity], result of:
            0.02723595 = score(doc=288,freq=1.0), product of:
              0.10510521 = queryWeight, product of:
                2.044187 = boost
                4.1460857 = idf(docFreq=1910, maxDocs=44421)
                0.012401246 = queryNorm
              0.25913036 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1460857 = idf(docFreq=1910, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.18534032 = weight(abstract_txt:okapi in 288) [ClassicSimilarity], result of:
            0.18534032 = score(doc=288,freq=2.0), product of:
              0.261699 = queryWeight, product of:
                2.6336856 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.012401246 = queryNorm
              0.70821947 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.32 = coord(8/25)
    
  2. Johnson, B.; Peterson, E.: Reviewing initial stopword selection (1992) 0.15
    0.15239522 = sum of:
      0.15239522 = product of:
        1.9049404 = sum of:
          0.053772893 = weight(abstract_txt:drawn in 3628) [ClassicSimilarity], result of:
            0.053772893 = score(doc=3628,freq=1.0), product of:
              0.07897792 = queryWeight, product of:
                1.0230591 = boost
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.012401246 = queryNorm
              0.6808598 = fieldWeight in 3628, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.109375 = fieldNorm(doc=3628)
          1.8511674 = weight(abstract_txt:stopword in 3628) [ClassicSimilarity], result of:
            1.8511674 = score(doc=3628,freq=5.0), product of:
              0.7758728 = queryWeight, product of:
                6.413176 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.012401246 = queryNorm
              2.385916 = fieldWeight in 3628, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.109375 = fieldNorm(doc=3628)
        0.08 = coord(2/25)
    
  3. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.09
    0.094387695 = sum of:
      0.094387695 = product of:
        0.7865642 = sum of:
          0.05328447 = weight(abstract_txt:performance in 2373) [ClassicSimilarity], result of:
            0.05328447 = score(doc=2373,freq=2.0), product of:
              0.086995155 = queryWeight, product of:
                1.5184847 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.012401246 = queryNorm
              0.6124993 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.023679275 = weight(abstract_txt:using in 2373) [ClassicSimilarity], result of:
            0.023679275 = score(doc=2373,freq=1.0), product of:
              0.07306577 = queryWeight, product of:
                1.7043763 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.012401246 = queryNorm
              0.32408163 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.70960045 = weight(abstract_txt:stopword in 2373) [ClassicSimilarity], result of:
            0.70960045 = score(doc=2373,freq=1.0), product of:
              0.7758728 = queryWeight, product of:
                6.413176 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.012401246 = queryNorm
              0.91458344 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
        0.12 = coord(3/25)
    
  4. Dadashkarimia, J.; Shakery, A.; Failia, H.; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents (2017) 0.09
    0.089940116 = sum of:
      0.089940116 = product of:
        0.28106287 = sum of:
          0.025476364 = weight(abstract_txt:usually in 4296) [ClassicSimilarity], result of:
            0.025476364 = score(doc=4296,freq=1.0), product of:
              0.07619183 = queryWeight, product of:
                1.0048519 = boost
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.012401246 = queryNorm
              0.33437136 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.028095806 = weight(abstract_txt:ones in 4296) [ClassicSimilarity], result of:
            0.028095806 = score(doc=4296,freq=1.0), product of:
              0.08132881 = queryWeight, product of:
                1.0381739 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.012401246 = queryNorm
              0.34545946 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.02997028 = weight(abstract_txt:french in 4296) [ClassicSimilarity], result of:
            0.02997028 = score(doc=4296,freq=1.0), product of:
              0.08490709 = queryWeight, product of:
                1.0607667 = boost
                6.4544435 = idf(docFreq=189, maxDocs=44421)
                0.012401246 = queryNorm
              0.35297737 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4544435 = idf(docFreq=189, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.009711021 = weight(abstract_txt:other in 4296) [ClassicSimilarity], result of:
            0.009711021 = score(doc=4296,freq=1.0), product of:
              0.05046652 = queryWeight, product of:
                1.1565504 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.012401246 = queryNorm
              0.19242501 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.062261716 = weight(abstract_txt:divergence in 4296) [ClassicSimilarity], result of:
            0.062261716 = score(doc=4296,freq=1.0), product of:
              0.13823907 = queryWeight, product of:
                1.3535157 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.012401246 = queryNorm
              0.4503916 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.021978723 = weight(abstract_txt:performance in 4296) [ClassicSimilarity], result of:
            0.021978723 = score(doc=4296,freq=1.0), product of:
              0.086995155 = queryWeight, product of:
                1.5184847 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.012401246 = queryNorm
              0.25264308 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.08975605 = weight(abstract_txt:persian in 4296) [ClassicSimilarity], result of:
            0.08975605 = score(doc=4296,freq=1.0), product of:
              0.1764111 = queryWeight, product of:
                1.5290118 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.012401246 = queryNorm
              0.5087891 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
          0.013812911 = weight(abstract_txt:using in 4296) [ClassicSimilarity], result of:
            0.013812911 = score(doc=4296,freq=1.0), product of:
              0.07306577 = queryWeight, product of:
                1.7043763 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.012401246 = queryNorm
              0.18904762 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4296)
        0.32 = coord(8/25)
    
  5. Kang, I.-H.; Kim, G.C.: Integration of multiple evidences based on a query type for web search (2004) 0.09
    0.089308105 = sum of:
      0.089308105 = product of:
        0.3189575 = sum of:
          0.028696116 = weight(abstract_txt:difference in 3568) [ClassicSimilarity], result of:
            0.028696116 = score(doc=3568,freq=1.0), product of:
              0.07545781 = queryWeight, product of:
                6.0846963 = idf(docFreq=274, maxDocs=44421)
                0.012401246 = queryNorm
              0.38029352 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0846963 = idf(docFreq=274, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.0356714 = weight(abstract_txt:lower in 3568) [ClassicSimilarity], result of:
            0.0356714 = score(doc=3568,freq=1.0), product of:
              0.087237306 = queryWeight, product of:
                1.0752242 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.012401246 = queryNorm
              0.40890077 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.011098309 = weight(abstract_txt:other in 3568) [ClassicSimilarity], result of:
            0.011098309 = score(doc=3568,freq=1.0), product of:
              0.05046652 = queryWeight, product of:
                1.1565504 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.012401246 = queryNorm
              0.2199143 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.03552298 = weight(abstract_txt:performance in 3568) [ClassicSimilarity], result of:
            0.03552298 = score(doc=3568,freq=2.0), product of:
              0.086995155 = queryWeight, product of:
                1.5184847 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.012401246 = queryNorm
              0.40833285 = fieldWeight in 3568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.04967736 = weight(abstract_txt:short in 3568) [ClassicSimilarity], result of:
            0.04967736 = score(doc=3568,freq=1.0), product of:
              0.13706854 = queryWeight, product of:
                1.906039 = boost
                5.7988343 = idf(docFreq=365, maxDocs=44421)
                0.012401246 = queryNorm
              0.36242715 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7988343 = idf(docFreq=365, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.02723595 = weight(abstract_txt:when in 3568) [ClassicSimilarity], result of:
            0.02723595 = score(doc=3568,freq=1.0), product of:
              0.10510521 = queryWeight, product of:
                2.044187 = boost
                4.1460857 = idf(docFreq=1910, maxDocs=44421)
                0.012401246 = queryNorm
              0.25913036 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1460857 = idf(docFreq=1910, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
          0.1310554 = weight(abstract_txt:okapi in 3568) [ClassicSimilarity], result of:
            0.1310554 = score(doc=3568,freq=1.0), product of:
              0.261699 = queryWeight, product of:
                2.6336856 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.012401246 = queryNorm
              0.5007868 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0625 = fieldNorm(doc=3568)
        0.28 = coord(7/25)