Document (#18154)

Author
Schutze, H.
Pederson, J.O.
Title
¬A cooccurrence-based thesaurus and two applications to information retrieval
Source
Information processing and management. 33(1997) no.3, S.307-318
Year
1997
Abstract
Presents a new method for computing a thesaurus from a text corpus. Each word is represented as a vector in a multi-dimensional space that captures cooccurrence information. Words are defined to be similar if they have similar cooccurrence patterns. 2 different methods for using these thesaurus vectors in information retrieval are shown to significantly improve performance over the Tipster reference corpus as compared to a vector space baseline

Similar documents (content)

  1. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.25
    0.25305852 = sum of:
      0.25305852 = product of:
        0.6326463 = sum of:
          0.027815022 = weight(abstract_txt:words in 2428) [ClassicSimilarity], result of:
            0.027815022 = score(doc=2428,freq=1.0), product of:
              0.083094545 = queryWeight, product of:
                1.1593313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.013382525 = queryNorm
              0.33473945 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.029939573 = weight(abstract_txt:significantly in 2428) [ClassicSimilarity], result of:
            0.029939573 = score(doc=2428,freq=1.0), product of:
              0.08727369 = queryWeight, product of:
                1.1881273 = boost
                5.4888616 = idf(docFreq=498, maxDocs=44421)
                0.013382525 = queryNorm
              0.34305385 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4888616 = idf(docFreq=498, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.03406508 = weight(abstract_txt:represented in 2428) [ClassicSimilarity], result of:
            0.03406508 = score(doc=2428,freq=1.0), product of:
              0.095117226 = queryWeight, product of:
                1.240369 = boost
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.013382525 = queryNorm
              0.35813785 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.08090383 = weight(abstract_txt:dimensional in 2428) [ClassicSimilarity], result of:
            0.08090383 = score(doc=2428,freq=2.0), product of:
              0.1343865 = queryWeight, product of:
                1.474345 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.013382525 = queryNorm
              0.6020235 = fieldWeight in 2428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.015214423 = weight(abstract_txt:retrieval in 2428) [ClassicSimilarity], result of:
            0.015214423 = score(doc=2428,freq=1.0), product of:
              0.07002179 = queryWeight, product of:
                1.5050569 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013382525 = queryNorm
              0.21728125 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.021742951 = weight(abstract_txt:information in 2428) [ClassicSimilarity], result of:
            0.021742951 = score(doc=2428,freq=8.0), product of:
              0.05084821 = queryWeight, product of:
                1.5707959 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.013382525 = queryNorm
              0.42760506 = fieldWeight in 2428, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.085569344 = weight(abstract_txt:vectors in 2428) [ClassicSimilarity], result of:
            0.085569344 = score(doc=2428,freq=1.0), product of:
              0.17576472 = queryWeight, product of:
                1.6861149 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.013382525 = queryNorm
              0.48684028 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.11361592 = weight(abstract_txt:space in 2428) [ClassicSimilarity], result of:
            0.11361592 = score(doc=2428,freq=4.0), product of:
              0.16852683 = queryWeight, product of:
                2.3349137 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013382525 = queryNorm
              0.67417115 = fieldWeight in 2428, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.08186674 = weight(abstract_txt:corpus in 2428) [ClassicSimilarity], result of:
            0.08186674 = score(doc=2428,freq=1.0), product of:
              0.21501458 = queryWeight, product of:
                2.637365 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.013382525 = queryNorm
              0.38074973 = fieldWeight in 2428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
          0.14191338 = weight(abstract_txt:vector in 2428) [ClassicSimilarity], result of:
            0.14191338 = score(doc=2428,freq=2.0), product of:
              0.24626449 = queryWeight, product of:
                2.8225212 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.013382525 = queryNorm
              0.5762641 = fieldWeight in 2428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=2428)
        0.4 = coord(10/25)
    
  2. Bernier-Colborne, G.: Identifying semantic relations in a specialized corpus through distributional analysis of a cooccurrence tensor (2014) 0.25
    0.25057968 = sum of:
      0.25057968 = product of:
        1.2528983 = sum of:
          0.058232177 = weight(abstract_txt:word in 3153) [ClassicSimilarity], result of:
            0.058232177 = score(doc=3153,freq=1.0), product of:
              0.085665956 = queryWeight, product of:
                1.1771327 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.013382525 = queryNorm
              0.67975867 = fieldWeight in 3153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.125 = fieldNorm(doc=3153)
          0.015374589 = weight(abstract_txt:information in 3153) [ClassicSimilarity], result of:
            0.015374589 = score(doc=3153,freq=1.0), product of:
              0.05084821 = queryWeight, product of:
                1.5707959 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.013382525 = queryNorm
              0.30236244 = fieldWeight in 3153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.125 = fieldNorm(doc=3153)
          0.11361592 = weight(abstract_txt:space in 3153) [ClassicSimilarity], result of:
            0.11361592 = score(doc=3153,freq=1.0), product of:
              0.16852683 = queryWeight, product of:
                2.3349137 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013382525 = queryNorm
              0.67417115 = fieldWeight in 3153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.125 = fieldNorm(doc=3153)
          0.16373348 = weight(abstract_txt:corpus in 3153) [ClassicSimilarity], result of:
            0.16373348 = score(doc=3153,freq=1.0), product of:
              0.21501458 = queryWeight, product of:
                2.637365 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.013382525 = queryNorm
              0.76149946 = fieldWeight in 3153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.125 = fieldNorm(doc=3153)
          0.9019422 = weight(abstract_txt:cooccurrence in 3153) [ClassicSimilarity], result of:
            0.9019422 = score(doc=3153,freq=1.0), product of:
              0.7677016 = queryWeight, product of:
                6.103489 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013382525 = queryNorm
              1.1748604 = fieldWeight in 3153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.125 = fieldNorm(doc=3153)
        0.2 = coord(5/25)
    
  3. Lund, K.; Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence (1996) 0.22
    0.2193766 = sum of:
      0.2193766 = product of:
        0.78348786 = sum of:
          0.06303818 = weight(abstract_txt:word in 2704) [ClassicSimilarity], result of:
            0.06303818 = score(doc=2704,freq=3.0), product of:
              0.085665956 = queryWeight, product of:
                1.1771327 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.013382525 = queryNorm
              0.73586035 = fieldWeight in 2704, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.071509555 = weight(abstract_txt:dimensional in 2704) [ClassicSimilarity], result of:
            0.071509555 = score(doc=2704,freq=1.0), product of:
              0.1343865 = queryWeight, product of:
                1.474345 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.013382525 = queryNorm
              0.5321186 = fieldWeight in 2704, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.01664348 = weight(abstract_txt:information in 2704) [ClassicSimilarity], result of:
            0.01664348 = score(doc=2704,freq=3.0), product of:
              0.05084821 = queryWeight, product of:
                1.5707959 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.013382525 = queryNorm
              0.32731694 = fieldWeight in 2704, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.2391736 = weight(abstract_txt:vectors in 2704) [ClassicSimilarity], result of:
            0.2391736 = score(doc=2704,freq=5.0), product of:
              0.17576472 = queryWeight, product of:
                1.6861149 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.013382525 = queryNorm
              1.36076 = fieldWeight in 2704, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.07100996 = weight(abstract_txt:space in 2704) [ClassicSimilarity], result of:
            0.07100996 = score(doc=2704,freq=1.0), product of:
              0.16852683 = queryWeight, product of:
                2.3349137 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013382525 = queryNorm
              0.42135698 = fieldWeight in 2704, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.14472133 = weight(abstract_txt:corpus in 2704) [ClassicSimilarity], result of:
            0.14472133 = score(doc=2704,freq=2.0), product of:
              0.21501458 = queryWeight, product of:
                2.637365 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.013382525 = queryNorm
              0.6730768 = fieldWeight in 2704, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
          0.17739172 = weight(abstract_txt:vector in 2704) [ClassicSimilarity], result of:
            0.17739172 = score(doc=2704,freq=2.0), product of:
              0.24626449 = queryWeight, product of:
                2.8225212 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.013382525 = queryNorm
              0.7203301 = fieldWeight in 2704, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=2704)
        0.28 = coord(7/25)
    
  4. Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.20
    0.20140891 = sum of:
      0.20140891 = product of:
        0.55946916 = sum of:
          0.022313451 = weight(abstract_txt:performance in 4458) [ClassicSimilarity], result of:
            0.022313451 = score(doc=4458,freq=1.0), product of:
              0.061824042 = queryWeight, product of:
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.013382525 = queryNorm
              0.36091867 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.028448652 = weight(abstract_txt:compared in 4458) [ClassicSimilarity], result of:
            0.028448652 = score(doc=4458,freq=1.0), product of:
              0.072692126 = queryWeight, product of:
                1.0843388 = boost
                5.0093837 = idf(docFreq=805, maxDocs=44421)
                0.013382525 = queryNorm
              0.3913581 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0093837 = idf(docFreq=805, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.03639511 = weight(abstract_txt:word in 4458) [ClassicSimilarity], result of:
            0.03639511 = score(doc=4458,freq=1.0), product of:
              0.085665956 = queryWeight, product of:
                1.1771327 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.013382525 = queryNorm
              0.42484915 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.04258135 = weight(abstract_txt:represented in 4458) [ClassicSimilarity], result of:
            0.04258135 = score(doc=4458,freq=1.0), product of:
              0.095117226 = queryWeight, product of:
                1.240369 = boost
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.013382525 = queryNorm
              0.4476723 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.071509555 = weight(abstract_txt:dimensional in 4458) [ClassicSimilarity], result of:
            0.071509555 = score(doc=4458,freq=1.0), product of:
              0.1343865 = queryWeight, product of:
                1.474345 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.013382525 = queryNorm
              0.5321186 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.026895553 = weight(abstract_txt:retrieval in 4458) [ClassicSimilarity], result of:
            0.026895553 = score(doc=4458,freq=2.0), product of:
              0.07002179 = queryWeight, product of:
                1.5050569 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013382525 = queryNorm
              0.3841026 = fieldWeight in 4458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.06387072 = weight(abstract_txt:similar in 4458) [ClassicSimilarity], result of:
            0.06387072 = score(doc=4458,freq=1.0), product of:
              0.15703294 = queryWeight, product of:
                2.2538846 = boost
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.013382525 = queryNorm
              0.40673453 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.14201991 = weight(abstract_txt:space in 4458) [ClassicSimilarity], result of:
            0.14201991 = score(doc=4458,freq=4.0), product of:
              0.16852683 = queryWeight, product of:
                2.3349137 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013382525 = queryNorm
              0.84271395 = fieldWeight in 4458, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
          0.12543489 = weight(abstract_txt:vector in 4458) [ClassicSimilarity], result of:
            0.12543489 = score(doc=4458,freq=1.0), product of:
              0.24626449 = queryWeight, product of:
                2.8225212 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.013382525 = queryNorm
              0.5093503 = fieldWeight in 4458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=4458)
        0.36 = coord(9/25)
    
  5. Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.17
    0.16769329 = sum of:
      0.16769329 = product of:
        0.5989046 = sum of:
          0.03476878 = weight(abstract_txt:words in 115) [ClassicSimilarity], result of:
            0.03476878 = score(doc=115,freq=1.0), product of:
              0.083094545 = queryWeight, product of:
                1.1593313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.013382525 = queryNorm
              0.4184243 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.04258135 = weight(abstract_txt:represented in 115) [ClassicSimilarity], result of:
            0.04258135 = score(doc=115,freq=1.0), product of:
              0.095117226 = queryWeight, product of:
                1.240369 = boost
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.013382525 = queryNorm
              0.4476723 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.071509555 = weight(abstract_txt:dimensional in 115) [ClassicSimilarity], result of:
            0.071509555 = score(doc=115,freq=1.0), product of:
              0.1343865 = queryWeight, product of:
                1.474345 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.013382525 = queryNorm
              0.5321186 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.15126666 = weight(abstract_txt:vectors in 115) [ClassicSimilarity], result of:
            0.15126666 = score(doc=115,freq=2.0), product of:
              0.17576472 = queryWeight, product of:
                1.6861149 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.013382525 = queryNorm
              0.86062014 = fieldWeight in 115, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.07100996 = weight(abstract_txt:space in 115) [ClassicSimilarity], result of:
            0.07100996 = score(doc=115,freq=1.0), product of:
              0.16852683 = queryWeight, product of:
                2.3349137 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013382525 = queryNorm
              0.42135698 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.10233343 = weight(abstract_txt:corpus in 115) [ClassicSimilarity], result of:
            0.10233343 = score(doc=115,freq=1.0), product of:
              0.21501458 = queryWeight, product of:
                2.637365 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.013382525 = queryNorm
              0.47593716 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.12543489 = weight(abstract_txt:vector in 115) [ClassicSimilarity], result of:
            0.12543489 = score(doc=115,freq=1.0), product of:
              0.24626449 = queryWeight, product of:
                2.8225212 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.013382525 = queryNorm
              0.5093503 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
        0.28 = coord(7/25)