Document (#24977)

Author
Losee, R.M.
Title
Term dependence : a basis for Luhn and Zipf models
Source
Journal of the American Society for Information Science and technology. 52(2001) no.12, S.1019-1025
Year
2001
Abstract
There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhn's model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse Representation- EMIM principle. We also suggest an Inverse EMIM term weight for indexing or retrieval applications that is consistent with Luhn's distribution. An information theoretic interpretation of Zipf's Law is provided. Using the regularity noted here, we suggest that Zipf's Law is a consequence of the statistical dependencies that exist between terms, described here using information theoretic concepts.
Theme
Informetrie
Object
Luhn-Modell
Zipf-Gesetz

Similar documents (author)

  1. Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 5.19
    5.187669 = sum of:
      5.187669 = weight(author_txt:losee in 2334) [ClassicSimilarity], result of:
        5.187669 = fieldWeight in 2334, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.30027 = idf(docFreq=29, maxDocs=44421)
          0.625 = fieldNorm(doc=2334)
    
  2. Losee, R.M.: ¬The relative shelf location of circulated books : a study of classification, users, and browsing (1993) 5.19
    5.187669 = sum of:
      5.187669 = weight(author_txt:losee in 4484) [ClassicSimilarity], result of:
        5.187669 = fieldWeight in 4484, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.30027 = idf(docFreq=29, maxDocs=44421)
          0.625 = fieldNorm(doc=4484)
    
  3. Losee, R.M.: Seven fundamental questions for the science of library classification (1993) 5.19
    5.187669 = sum of:
      5.187669 = weight(author_txt:losee in 4507) [ClassicSimilarity], result of:
        5.187669 = fieldWeight in 4507, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.30027 = idf(docFreq=29, maxDocs=44421)
          0.625 = fieldNorm(doc=4507)
    
  4. Losee, R.M.: Term dependence : truncating the Bahadur Lazarsfeld expansion (1994) 5.19
    5.187669 = sum of:
      5.187669 = weight(author_txt:losee in 7389) [ClassicSimilarity], result of:
        5.187669 = fieldWeight in 7389, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.30027 = idf(docFreq=29, maxDocs=44421)
          0.625 = fieldNorm(doc=7389)
    
  5. Losee, R.M.: Upper bounds for retrieval performance and their user measuring performance and generating optimal queries : can it get any better than this? (1994) 5.19
    5.187669 = sum of:
      5.187669 = weight(author_txt:losee in 7417) [ClassicSimilarity], result of:
        5.187669 = fieldWeight in 7417, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.30027 = idf(docFreq=29, maxDocs=44421)
          0.625 = fieldNorm(doc=7417)
    

Similar documents (content)

  1. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 0.22
    0.22427887 = sum of:
      0.22427887 = product of:
        0.93449533 = sum of:
          0.11411056 = weight(abstract_txt:zipf in 546) [ClassicSimilarity], result of:
            0.11411056 = score(doc=546,freq=1.0), product of:
              0.16963334 = queryWeight, product of:
                1.0996337 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.017915899 = queryNorm
              0.67268944 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.023643391 = weight(abstract_txt:that in 546) [ClassicSimilarity], result of:
            0.023643391 = score(doc=546,freq=4.0), product of:
              0.06398387 = queryWeight, product of:
                1.5101243 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017915899 = queryNorm
              0.3695211 = fieldWeight in 546, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.13367489 = weight(abstract_txt:rank in 546) [ClassicSimilarity], result of:
            0.13367489 = score(doc=546,freq=2.0), product of:
              0.18850689 = queryWeight, product of:
                1.6393476 = boost
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.017915899 = queryNorm
              0.7091247 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.04179021 = weight(abstract_txt:between in 546) [ClassicSimilarity], result of:
            0.04179021 = score(doc=546,freq=2.0), product of:
              0.109400764 = queryWeight, product of:
                1.7661704 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.017915899 = queryNorm
              0.38199192 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.08966368 = weight(abstract_txt:here in 546) [ClassicSimilarity], result of:
            0.08966368 = score(doc=546,freq=1.0), product of:
              0.20832694 = queryWeight, product of:
                2.1106963 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.017915899 = queryNorm
              0.43039885 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.53161263 = weight(abstract_txt:zipf's in 546) [ClassicSimilarity], result of:
            0.53161263 = score(doc=546,freq=3.0), product of:
              0.41335872 = queryWeight, product of:
                2.4275656 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.017915899 = queryNorm
              1.2860806 = fieldWeight in 546, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
        0.24 = coord(6/25)
    
  2. Losee, R.M.: Decisions in thesaurus construction and use (2007) 0.18
    0.17654814 = sum of:
      0.17654814 = product of:
        0.63052905 = sum of:
          0.055729467 = weight(abstract_txt:term in 1924) [ClassicSimilarity], result of:
            0.055729467 = score(doc=1924,freq=2.0), product of:
              0.105200365 = queryWeight, product of:
                1.2246615 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.017915899 = queryNorm
              0.52974594 = fieldWeight in 1924, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.016718404 = weight(abstract_txt:that in 1924) [ClassicSimilarity], result of:
            0.016718404 = score(doc=1924,freq=2.0), product of:
              0.06398387 = queryWeight, product of:
                1.5101243 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017915899 = queryNorm
              0.2612909 = fieldWeight in 1924, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.012649565 = weight(abstract_txt:information in 1924) [ClassicSimilarity], result of:
            0.012649565 = score(doc=1924,freq=1.0), product of:
              0.06693723 = queryWeight, product of:
                1.5445832 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.017915899 = queryNorm
              0.18897653 = fieldWeight in 1924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.116078086 = weight(abstract_txt:phrases in 1924) [ClassicSimilarity], result of:
            0.116078086 = score(doc=1924,freq=1.0), product of:
              0.21617435 = queryWeight, product of:
                1.755535 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.017915899 = queryNorm
              0.53696513 = fieldWeight in 1924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.17591639 = weight(abstract_txt:theoretic in 1924) [ClassicSimilarity], result of:
            0.17591639 = score(doc=1924,freq=1.0), product of:
              0.28521663 = queryWeight, product of:
                2.0164843 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.017915899 = queryNorm
              0.61678165 = fieldWeight in 1924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.08966368 = weight(abstract_txt:here in 1924) [ClassicSimilarity], result of:
            0.08966368 = score(doc=1924,freq=1.0), product of:
              0.20832694 = queryWeight, product of:
                2.1106963 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.017915899 = queryNorm
              0.43039885 = fieldWeight in 1924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
          0.16377342 = weight(abstract_txt:terms in 1924) [ClassicSimilarity], result of:
            0.16377342 = score(doc=1924,freq=3.0), product of:
              0.299304 = queryWeight, product of:
                4.1313663 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.017915899 = queryNorm
              0.54718083 = fieldWeight in 1924, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=1924)
        0.28 = coord(7/25)
    
  3. Wong, S.K.M.; Yao, Y.Y.: ¬An information-theoretic measure of term specifics (1992) 0.16
    0.16482455 = sum of:
      0.16482455 = product of:
        0.68676895 = sum of:
          0.11583153 = weight(abstract_txt:term in 4806) [ClassicSimilarity], result of:
            0.11583153 = score(doc=4806,freq=6.0), product of:
              0.105200365 = queryWeight, product of:
                1.2246615 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.017915899 = queryNorm
              1.1010563 = fieldWeight in 4806, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
          0.020062083 = weight(abstract_txt:that in 4806) [ClassicSimilarity], result of:
            0.020062083 = score(doc=4806,freq=2.0), product of:
              0.06398387 = queryWeight, product of:
                1.5101243 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017915899 = queryNorm
              0.31354907 = fieldWeight in 4806, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
          0.021467024 = weight(abstract_txt:information in 4806) [ClassicSimilarity], result of:
            0.021467024 = score(doc=4806,freq=2.0), product of:
              0.06693723 = queryWeight, product of:
                1.5445832 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.017915899 = queryNorm
              0.3207038 = fieldWeight in 4806, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
          0.035460167 = weight(abstract_txt:between in 4806) [ClassicSimilarity], result of:
            0.035460167 = score(doc=4806,freq=1.0), product of:
              0.109400764 = queryWeight, product of:
                1.7661704 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.017915899 = queryNorm
              0.3241309 = fieldWeight in 4806, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
          0.19540814 = weight(abstract_txt:inverse in 4806) [ClassicSimilarity], result of:
            0.19540814 = score(doc=4806,freq=1.0), product of:
              0.27090162 = queryWeight, product of:
                1.9652293 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.017915899 = queryNorm
              0.7213251 = fieldWeight in 4806, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
          0.29854 = weight(abstract_txt:theoretic in 4806) [ClassicSimilarity], result of:
            0.29854 = score(doc=4806,freq=2.0), product of:
              0.28521663 = queryWeight, product of:
                2.0164843 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.017915899 = queryNorm
              1.0467131 = fieldWeight in 4806, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.09375 = fieldNorm(doc=4806)
        0.24 = coord(6/25)
    
  4. Aizawa, A.: ¬An information-theoretic perspective of tf-idf measures (2003) 0.16
    0.15952353 = sum of:
      0.15952353 = product of:
        0.6646814 = sum of:
          0.047288023 = weight(abstract_txt:term in 5155) [ClassicSimilarity], result of:
            0.047288023 = score(doc=5155,freq=1.0), product of:
              0.105200365 = queryWeight, product of:
                1.2246615 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.017915899 = queryNorm
              0.44950435 = fieldWeight in 5155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
          0.020062083 = weight(abstract_txt:that in 5155) [ClassicSimilarity], result of:
            0.020062083 = score(doc=5155,freq=2.0), product of:
              0.06398387 = queryWeight, product of:
                1.5101243 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017915899 = queryNorm
              0.31354907 = fieldWeight in 5155, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
          0.030358957 = weight(abstract_txt:information in 5155) [ClassicSimilarity], result of:
            0.030358957 = score(doc=5155,freq=4.0), product of:
              0.06693723 = queryWeight, product of:
                1.5445832 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.017915899 = queryNorm
              0.45354366 = fieldWeight in 5155, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
          0.19540814 = weight(abstract_txt:inverse in 5155) [ClassicSimilarity], result of:
            0.19540814 = score(doc=5155,freq=1.0), product of:
              0.27090162 = queryWeight, product of:
                1.9652293 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.017915899 = queryNorm
              0.7213251 = fieldWeight in 5155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
          0.21109965 = weight(abstract_txt:theoretic in 5155) [ClassicSimilarity], result of:
            0.21109965 = score(doc=5155,freq=1.0), product of:
              0.28521663 = queryWeight, product of:
                2.0164843 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.017915899 = queryNorm
              0.74013793 = fieldWeight in 5155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
          0.16046453 = weight(abstract_txt:terms in 5155) [ClassicSimilarity], result of:
            0.16046453 = score(doc=5155,freq=2.0), product of:
              0.299304 = queryWeight, product of:
                4.1313663 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.017915899 = queryNorm
              0.53612554 = fieldWeight in 5155, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.09375 = fieldNorm(doc=5155)
        0.24 = coord(6/25)
    
  5. Egghe, L.: Zipfian and Lotkaian continuous concentration theory (2005) 0.15
    0.15450566 = sum of:
      0.15450566 = product of:
        0.7725283 = sum of:
          0.19764529 = weight(abstract_txt:zipf in 4678) [ClassicSimilarity], result of:
            0.19764529 = score(doc=4678,freq=3.0), product of:
              0.16963334 = queryWeight, product of:
                1.0996337 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.017915899 = queryNorm
              1.1651323 = fieldWeight in 4678, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.016718404 = weight(abstract_txt:that in 4678) [ClassicSimilarity], result of:
            0.016718404 = score(doc=4678,freq=2.0), product of:
              0.06398387 = queryWeight, product of:
                1.5101243 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017915899 = queryNorm
              0.2612909 = fieldWeight in 4678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.02955014 = weight(abstract_txt:between in 4678) [ClassicSimilarity], result of:
            0.02955014 = score(doc=4678,freq=1.0), product of:
              0.109400764 = queryWeight, product of:
                1.7661704 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.017915899 = queryNorm
              0.2701091 = fieldWeight in 4678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.43405986 = weight(abstract_txt:zipf's in 4678) [ClassicSimilarity], result of:
            0.43405986 = score(doc=4678,freq=2.0), product of:
              0.41335872 = queryWeight, product of:
                2.4275656 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.017915899 = queryNorm
              1.0500803 = fieldWeight in 4678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.09455463 = weight(abstract_txt:terms in 4678) [ClassicSimilarity], result of:
            0.09455463 = score(doc=4678,freq=1.0), product of:
              0.299304 = queryWeight, product of:
                4.1313663 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.017915899 = queryNorm
              0.31591502 = fieldWeight in 4678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
        0.2 = coord(5/25)