Document (#23752)

Author
Bookstein, A.
Raita, T.
Title
Discovering term occurence structure in text
Source
Journal of the American Society for Information Science and technology. 52(2001) no.6, S.476-486
Year
2001
Abstract
This article examines some consequences for information control of the tendency of occurrences of contentbearing terms to appear together, or clump. Properties of previously defined clumping measures are reviewed and extended, and the significance of these measures for devising retrieval strategies discussed. A new type of clumping measure, which extends the earlier measures by permitting gaps within a clump, is defined, and several variants examined. Experiments are carried out that indicate the relation between the new measure and one of the earlier measures, as well as the ability of the two types of measure to predict compression efficiency
Theme
Informetrie

Similar documents (author)

  1. Bookstein, A.: Probability and Fuzzy-set applications to information retrieval (1985) 5.35
    5.353733 = sum of:
      5.353733 = weight(author_txt:bookstein in 780) [ClassicSimilarity], result of:
        5.353733 = fieldWeight in 780, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.565973 = idf(docFreq=22, maxDocs=44421)
          0.625 = fieldNorm(doc=780)
    
  2. Bookstein, A.: Relevance (1979) 5.35
    5.353733 = sum of:
      5.353733 = weight(author_txt:bookstein in 838) [ClassicSimilarity], result of:
        5.353733 = fieldWeight in 838, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.565973 = idf(docFreq=22, maxDocs=44421)
          0.625 = fieldNorm(doc=838)
    
  3. Bookstein, A.: Fuzzy requests : an approach to weighted Boolean searches (1979) 5.35
    5.353733 = sum of:
      5.353733 = weight(author_txt:bookstein in 5503) [ClassicSimilarity], result of:
        5.353733 = fieldWeight in 5503, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.565973 = idf(docFreq=22, maxDocs=44421)
          0.625 = fieldNorm(doc=5503)
    
  4. Bookstein, A.: Informetric distributions : I. Unified overview (1990) 5.35
    5.353733 = sum of:
      5.353733 = weight(author_txt:bookstein in 6901) [ClassicSimilarity], result of:
        5.353733 = fieldWeight in 6901, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.565973 = idf(docFreq=22, maxDocs=44421)
          0.625 = fieldNorm(doc=6901)
    
  5. Bookstein, A.: ¬The bibliometric distributions (1976) 5.35
    5.353733 = sum of:
      5.353733 = weight(author_txt:bookstein in 5129) [ClassicSimilarity], result of:
        5.353733 = fieldWeight in 5129, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.565973 = idf(docFreq=22, maxDocs=44421)
          0.625 = fieldNorm(doc=5129)
    

Similar documents (content)

  1. Bookstein, A.; Kulyukin, V.; Raita, T.; Nicholson, J.: Adapting measures of clumping strength to assess term-term similarity (2003) 0.14
    0.14187126 = sum of:
      0.14187126 = product of:
        0.8866954 = sum of:
          0.051439248 = weight(abstract_txt:previously in 2609) [ClassicSimilarity], result of:
            0.051439248 = score(doc=2609,freq=1.0), product of:
              0.10728826 = queryWeight, product of:
                1.070982 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.016323663 = queryNorm
              0.479449 = fieldWeight in 2609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.078125 = fieldNorm(doc=2609)
          0.084305085 = weight(abstract_txt:tendency in 2609) [ClassicSimilarity], result of:
            0.084305085 = score(doc=2609,freq=1.0), product of:
              0.14913915 = queryWeight, product of:
                1.2627051 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.016323663 = queryNorm
              0.56527805 = fieldWeight in 2609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.078125 = fieldNorm(doc=2609)
          0.43316162 = weight(abstract_txt:clumping in 2609) [ClassicSimilarity], result of:
            0.43316162 = score(doc=2609,freq=1.0), product of:
              0.5594987 = queryWeight, product of:
                3.4587617 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.016323663 = queryNorm
              0.7741959 = fieldWeight in 2609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=2609)
          0.31778947 = weight(abstract_txt:measures in 2609) [ClassicSimilarity], result of:
            0.31778947 = score(doc=2609,freq=5.0), product of:
              0.33533493 = queryWeight, product of:
                3.786827 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.016323663 = queryNorm
              0.9476778 = fieldWeight in 2609, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=2609)
        0.16 = coord(4/25)
    
  2. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.08
    0.08211398 = sum of:
      0.08211398 = product of:
        0.5132124 = sum of:
          0.0401813 = weight(abstract_txt:extended in 2808) [ClassicSimilarity], result of:
            0.0401813 = score(doc=2808,freq=1.0), product of:
              0.10559543 = queryWeight, product of:
                1.0624993 = boost
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.016323663 = queryNorm
              0.3805212 = fieldWeight in 2808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.050869644 = weight(abstract_txt:defined in 2808) [ClassicSimilarity], result of:
            0.050869644 = score(doc=2808,freq=1.0), product of:
              0.15569629 = queryWeight, product of:
                1.8245686 = boost
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.016323663 = queryNorm
              0.32672355 = fieldWeight in 2808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.121350594 = weight(abstract_txt:measure in 2808) [ClassicSimilarity], result of:
            0.121350594 = score(doc=2808,freq=2.0), product of:
              0.25255394 = queryWeight, product of:
                2.8460581 = boost
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.016323663 = queryNorm
              0.48049375 = fieldWeight in 2808, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.30081084 = weight(abstract_txt:measures in 2808) [ClassicSimilarity], result of:
            0.30081084 = score(doc=2808,freq=7.0), product of:
              0.33533493 = queryWeight, product of:
                3.786827 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.016323663 = queryNorm
              0.89704597 = fieldWeight in 2808, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
        0.16 = coord(4/25)
    
  3. Bar-Hillel, Y.; Carnap, R.: ¬An outline of a theory of semantic information (1952) 0.08
    0.081309475 = sum of:
      0.081309475 = product of:
        0.40654737 = sum of:
          0.03349948 = weight(abstract_txt:carried in 4369) [ClassicSimilarity], result of:
            0.03349948 = score(doc=4369,freq=1.0), product of:
              0.09353795 = queryWeight, product of:
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.016323663 = queryNorm
              0.35813785 = fieldWeight in 4369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.0625 = fieldNorm(doc=4369)
          0.04003748 = weight(abstract_txt:efficiency in 4369) [ClassicSimilarity], result of:
            0.04003748 = score(doc=4369,freq=1.0), product of:
              0.105343305 = queryWeight, product of:
                1.0612301 = boost
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.016323663 = queryNorm
              0.38006666 = fieldWeight in 4369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.0625 = fieldNorm(doc=4369)
          0.050869644 = weight(abstract_txt:defined in 4369) [ClassicSimilarity], result of:
            0.050869644 = score(doc=4369,freq=1.0), product of:
              0.15569629 = queryWeight, product of:
                1.8245686 = boost
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.016323663 = queryNorm
              0.32672355 = fieldWeight in 4369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.0625 = fieldNorm(doc=4369)
          0.121350594 = weight(abstract_txt:measure in 4369) [ClassicSimilarity], result of:
            0.121350594 = score(doc=4369,freq=2.0), product of:
              0.25255394 = queryWeight, product of:
                2.8460581 = boost
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.016323663 = queryNorm
              0.48049375 = fieldWeight in 4369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.0625 = fieldNorm(doc=4369)
          0.16079016 = weight(abstract_txt:measures in 4369) [ClassicSimilarity], result of:
            0.16079016 = score(doc=4369,freq=2.0), product of:
              0.33533493 = queryWeight, product of:
                3.786827 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.016323663 = queryNorm
              0.47949123 = fieldWeight in 4369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=4369)
        0.2 = coord(5/25)
    
  4. Eck, N.J. van; Waltman, L.: How to normalize cooccurrence data? : an analysis of some well-known similarity measures (2009) 0.08
    0.07628899 = sum of:
      0.07628899 = product of:
        0.6357416 = sum of:
          0.045208458 = weight(abstract_txt:properties in 3942) [ClassicSimilarity], result of:
            0.045208458 = score(doc=3942,freq=1.0), product of:
              0.09843939 = queryWeight, product of:
                1.0258658 = boost
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.016323663 = queryNorm
              0.4592517 = fieldWeight in 3942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.078125 = fieldNorm(doc=3942)
          0.21451958 = weight(abstract_txt:measure in 3942) [ClassicSimilarity], result of:
            0.21451958 = score(doc=3942,freq=4.0), product of:
              0.25255394 = queryWeight, product of:
                2.8460581 = boost
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.016323663 = queryNorm
              0.849401 = fieldWeight in 3942, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.078125 = fieldNorm(doc=3942)
          0.37601358 = weight(abstract_txt:measures in 3942) [ClassicSimilarity], result of:
            0.37601358 = score(doc=3942,freq=7.0), product of:
              0.33533493 = queryWeight, product of:
                3.786827 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.016323663 = queryNorm
              1.1213075 = fieldWeight in 3942, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=3942)
        0.12 = coord(3/25)
    
  5. Heine, M.H.: Distance between sets as an objective measure of retrieval effectiveness (1973) 0.07
    0.0728673 = sum of:
      0.0728673 = product of:
        0.4554206 = sum of:
          0.063934416 = weight(abstract_txt:properties in 5514) [ClassicSimilarity], result of:
            0.063934416 = score(doc=5514,freq=2.0), product of:
              0.09843939 = queryWeight, product of:
                1.0258658 = boost
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.016323663 = queryNorm
              0.64948 = fieldWeight in 5514, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.078125 = fieldNorm(doc=5514)
          0.063587055 = weight(abstract_txt:defined in 5514) [ClassicSimilarity], result of:
            0.063587055 = score(doc=5514,freq=1.0), product of:
              0.15569629 = queryWeight, product of:
                1.8245686 = boost
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.016323663 = queryNorm
              0.40840444 = fieldWeight in 5514, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2275767 = idf(docFreq=647, maxDocs=44421)
                0.078125 = fieldNorm(doc=5514)
          0.1857794 = weight(abstract_txt:measure in 5514) [ClassicSimilarity], result of:
            0.1857794 = score(doc=5514,freq=3.0), product of:
              0.25255394 = queryWeight, product of:
                2.8460581 = boost
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.016323663 = queryNorm
              0.73560286 = fieldWeight in 5514, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.078125 = fieldNorm(doc=5514)
          0.14211977 = weight(abstract_txt:measures in 5514) [ClassicSimilarity], result of:
            0.14211977 = score(doc=5514,freq=1.0), product of:
              0.33533493 = queryWeight, product of:
                3.786827 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.016323663 = queryNorm
              0.4238144 = fieldWeight in 5514, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=5514)
        0.16 = coord(4/25)