Document (#29479)

Author
Tomov, D.T.
Title
Some critical remarks on the stop word lists of ISI publications
Source
Journal of documentation. 57(2001) no.6, S.798-808
Year
2001
Abstract
A semantic analysis of the "Weekly Subject Index Stop Word List" of Current Contents of the Institute for Scientific Information (ISI) as well as of the full-stop word and semi-stop word lists of the Permuterm Subject Index of Science Citation Index was carried out. Selected terms from the first issues for 1997, 1999 and 2000 of the CCODAb/Life Sciences, of the first issues for 1997 and 2000 of CCOD Proceedings, as well as from the SCI CDE for 1997 and January-June of 2000 were screened. True full-stop and semi-stop words commonly occur in the dictionaries of these databases which proves that there is an abundance of meaningless terms in titles and abstracts. On the other hand, many synonyms and antonyms are absent in these lists. Proper list enlarging could contribute to more effective preparation of both printed reference publications and large databases thus ensuring a more economic information retrieval by practical users and scientometricians. The necessity of an improved, semantically oriented policy in preparing the lists of fullstop words and semi-stop words used in modern databases worldwide is emphasised. Journal editors should encourage authors to reduce stop-word usage in article titles and keyword sets.
Footnote
Vgl. auch unter: http://www.emeraldinsight.com/10.1108/EUM0000000007101.
Object
Current Contents
Science citation index

Similar documents (content)

  1. Witschel, H.F.: Global term weights in distributed environments (2008) 0.23
    0.23256344 = sum of:
      0.23256344 = product of:
        0.96901435 = sum of:
          0.024182346 = weight(abstract_txt:terms in 3096) [ClassicSimilarity], result of:
            0.024182346 = score(doc=3096,freq=2.0), product of:
              0.054126903 = queryWeight, product of:
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.01338545 = queryNorm
              0.44677126 = fieldWeight in 3096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
          0.0701302 = weight(abstract_txt:list in 3096) [ClassicSimilarity], result of:
            0.0701302 = score(doc=3096,freq=3.0), product of:
              0.09615839 = queryWeight, product of:
                1.3328676 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.01338545 = queryNorm
              0.7293196 = fieldWeight in 3096, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
          0.059595667 = weight(abstract_txt:words in 3096) [ClassicSimilarity], result of:
            0.059595667 = score(doc=3096,freq=1.0), product of:
              0.14242879 = queryWeight, product of:
                1.9867257 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01338545 = queryNorm
              0.4184243 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
          0.087259196 = weight(abstract_txt:lists in 3096) [ClassicSimilarity], result of:
            0.087259196 = score(doc=3096,freq=1.0), product of:
              0.20213509 = queryWeight, product of:
                2.7329347 = boost
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.01338545 = queryNorm
              0.43168753 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
          0.10397214 = weight(abstract_txt:word in 3096) [ClassicSimilarity], result of:
            0.10397214 = score(doc=3096,freq=1.0), product of:
              0.24472718 = queryWeight, product of:
                3.3620527 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01338545 = queryNorm
              0.42484915 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
          0.6238748 = weight(abstract_txt:stop in 3096) [ClassicSimilarity], result of:
            0.6238748 = score(doc=3096,freq=2.0), product of:
              0.75018066 = queryWeight, product of:
                7.4457135 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01338545 = queryNorm
              0.83163273 = fieldWeight in 3096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=3096)
        0.24 = coord(6/25)
    
  2. Pritchard, J.: Information retrieval : smarter indexing (1991) 0.23
    0.23131695 = sum of:
      0.23131695 = product of:
        1.1565847 = sum of:
          0.04942777 = weight(abstract_txt:full in 4889) [ClassicSimilarity], result of:
            0.04942777 = score(doc=4889,freq=1.0), product of:
              0.08028941 = queryWeight, product of:
                1.2179306 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.01338545 = queryNorm
              0.6156201 = fieldWeight in 4889, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.125 = fieldNorm(doc=4889)
          0.09535307 = weight(abstract_txt:words in 4889) [ClassicSimilarity], result of:
            0.09535307 = score(doc=4889,freq=1.0), product of:
              0.14242879 = queryWeight, product of:
                1.9867257 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01338545 = queryNorm
              0.6694789 = fieldWeight in 4889, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.125 = fieldNorm(doc=4889)
          0.13961472 = weight(abstract_txt:lists in 4889) [ClassicSimilarity], result of:
            0.13961472 = score(doc=4889,freq=1.0), product of:
              0.20213509 = queryWeight, product of:
                2.7329347 = boost
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.01338545 = queryNorm
              0.69070005 = fieldWeight in 4889, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.125 = fieldNorm(doc=4889)
          0.16635542 = weight(abstract_txt:word in 4889) [ClassicSimilarity], result of:
            0.16635542 = score(doc=4889,freq=1.0), product of:
              0.24472718 = queryWeight, product of:
                3.3620527 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01338545 = queryNorm
              0.67975867 = fieldWeight in 4889, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.125 = fieldNorm(doc=4889)
          0.7058338 = weight(abstract_txt:stop in 4889) [ClassicSimilarity], result of:
            0.7058338 = score(doc=4889,freq=1.0), product of:
              0.75018066 = queryWeight, product of:
                7.4457135 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01338545 = queryNorm
              0.94088507 = fieldWeight in 4889, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.125 = fieldNorm(doc=4889)
        0.2 = coord(5/25)
    
  3. O'Neill, E.T.; Kammerer, K.A.; Bennett, R.: ¬The aboutness of words (2017) 0.18
    0.17634249 = sum of:
      0.17634249 = product of:
        0.88171244 = sum of:
          0.04859303 = weight(abstract_txt:titles in 4835) [ClassicSimilarity], result of:
            0.04859303 = score(doc=4835,freq=1.0), product of:
              0.10859426 = queryWeight, product of:
                1.4164356 = boost
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.01338545 = queryNorm
              0.44747326 = fieldWeight in 4835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.078125 = fieldNorm(doc=4835)
          0.1576753 = weight(abstract_txt:words in 4835) [ClassicSimilarity], result of:
            0.1576753 = score(doc=4835,freq=7.0), product of:
              0.14242879 = queryWeight, product of:
                1.9867257 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01338545 = queryNorm
              1.1070466 = fieldWeight in 4835, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=4835)
          0.087259196 = weight(abstract_txt:lists in 4835) [ClassicSimilarity], result of:
            0.087259196 = score(doc=4835,freq=1.0), product of:
              0.20213509 = queryWeight, product of:
                2.7329347 = boost
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.01338545 = queryNorm
              0.43168753 = fieldWeight in 4835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.078125 = fieldNorm(doc=4835)
          0.1470388 = weight(abstract_txt:word in 4835) [ClassicSimilarity], result of:
            0.1470388 = score(doc=4835,freq=2.0), product of:
              0.24472718 = queryWeight, product of:
                3.3620527 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01338545 = queryNorm
              0.60082746 = fieldWeight in 4835, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=4835)
          0.4411461 = weight(abstract_txt:stop in 4835) [ClassicSimilarity], result of:
            0.4411461 = score(doc=4835,freq=1.0), product of:
              0.75018066 = queryWeight, product of:
                7.4457135 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01338545 = queryNorm
              0.58805317 = fieldWeight in 4835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=4835)
        0.2 = coord(5/25)
    
  4. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17080553 = sum of:
      0.17080553 = product of:
        0.7116897 = sum of:
          0.0205194 = weight(abstract_txt:terms in 188) [ClassicSimilarity], result of:
            0.0205194 = score(doc=188,freq=4.0), product of:
              0.054126903 = queryWeight, product of:
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.01338545 = queryNorm
              0.379098 = fieldWeight in 188, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.015819276 = weight(abstract_txt:first in 188) [ClassicSimilarity], result of:
            0.015819276 = score(doc=188,freq=2.0), product of:
              0.057337373 = queryWeight, product of:
                1.0292296 = boost
                4.1619086 = idf(docFreq=1880, maxDocs=44421)
                0.01338545 = queryNorm
              0.27589816 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1619086 = idf(docFreq=1880, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.06193363 = weight(abstract_txt:words in 188) [ClassicSimilarity], result of:
            0.06193363 = score(doc=188,freq=3.0), product of:
              0.14242879 = queryWeight, product of:
                1.9867257 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01338545 = queryNorm
              0.43483928 = fieldWeight in 188, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.07404188 = weight(abstract_txt:lists in 188) [ClassicSimilarity], result of:
            0.07404188 = score(doc=188,freq=2.0), product of:
              0.20213509 = queryWeight, product of:
                2.7329347 = boost
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.01338545 = queryNorm
              0.366299 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.16505064 = weight(abstract_txt:word in 188) [ClassicSimilarity], result of:
            0.16505064 = score(doc=188,freq=7.0), product of:
              0.24472718 = queryWeight, product of:
                3.3620527 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01338545 = queryNorm
              0.6744271 = fieldWeight in 188, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.3743249 = weight(abstract_txt:stop in 188) [ClassicSimilarity], result of:
            0.3743249 = score(doc=188,freq=2.0), product of:
              0.75018066 = queryWeight, product of:
                7.4457135 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01338545 = queryNorm
              0.49897966 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
        0.24 = coord(6/25)
    
  5. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.15
    0.15457915 = sum of:
      0.15457915 = product of:
        0.6440798 = sum of:
          0.019345876 = weight(abstract_txt:terms in 1604) [ClassicSimilarity], result of:
            0.019345876 = score(doc=1604,freq=2.0), product of:
              0.054126903 = queryWeight, product of:
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.01338545 = queryNorm
              0.35741702 = fieldWeight in 1604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.021092368 = weight(abstract_txt:first in 1604) [ClassicSimilarity], result of:
            0.021092368 = score(doc=1604,freq=2.0), product of:
              0.057337373 = queryWeight, product of:
                1.0292296 = boost
                4.1619086 = idf(docFreq=1880, maxDocs=44421)
                0.01338545 = queryNorm
              0.36786422 = fieldWeight in 1604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1619086 = idf(docFreq=1880, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.037740577 = weight(abstract_txt:databases in 1604) [ClassicSimilarity], result of:
            0.037740577 = score(doc=1604,freq=2.0), product of:
              0.09673649 = queryWeight, product of:
                1.6373224 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.01338545 = queryNorm
              0.39013794 = fieldWeight in 1604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.09535307 = weight(abstract_txt:words in 1604) [ClassicSimilarity], result of:
            0.09535307 = score(doc=1604,freq=4.0), product of:
              0.14242879 = queryWeight, product of:
                1.9867257 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01338545 = queryNorm
              0.6694789 = fieldWeight in 1604, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.11763105 = weight(abstract_txt:word in 1604) [ClassicSimilarity], result of:
            0.11763105 = score(doc=1604,freq=2.0), product of:
              0.24472718 = queryWeight, product of:
                3.3620527 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01338545 = queryNorm
              0.48066196 = fieldWeight in 1604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.3529169 = weight(abstract_txt:stop in 1604) [ClassicSimilarity], result of:
            0.3529169 = score(doc=1604,freq=1.0), product of:
              0.75018066 = queryWeight, product of:
                7.4457135 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01338545 = queryNorm
              0.47044253 = fieldWeight in 1604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
        0.24 = coord(6/25)