Document (#21064)

Author
Sun, Q.
Shaw, D.
Davis, C.H.
Title
¬A model for estimating the occurence of same-frequency words and the boundary between high- and low-frequency words in texts
Source
Journal of the American Society for Information Science. 50(1999) no.3, S.280-286
Year
1999
Abstract
A simpler model is proposed for estimating the frequency of any same-frequency words and identifying the boundary point between high-frequency words and low-frequency words in a text. The model, based on a 'maximum-ranking method', assigns ranks to the words and estimates word frequency by a formula. The boundary value between high-frequency and low-frequency words is obtained by taking the square root of the number of different words in the text. This straightforward model was used successfully with both English and Chinese texts
Theme
Informetrie

Similar documents (author)

  1. Davis, C.H.; Shaw, D.: Comparison of retrieval system interfaces using an objective measure of screen design effectiveness (1989) 5.55
    5.5467997 = sum of:
      5.5467997 = sum of:
        2.5951765 = weight(author_txt:davis in 3393) [ClassicSimilarity], result of:
          2.5951765 = score(doc=3393,freq=1.0), product of:
            0.67616916 = queryWeight, product of:
              7.676116 = idf(docFreq=55, maxDocs=44421)
              0.08808741 = queryNorm
            3.838058 = fieldWeight in 3393, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.676116 = idf(docFreq=55, maxDocs=44421)
              0.5 = fieldNorm(doc=3393)
        2.951623 = weight(author_txt:shaw in 3393) [ClassicSimilarity], result of:
          2.951623 = score(doc=3393,freq=1.0), product of:
            0.73674643 = queryWeight, product of:
              1.0438337 = boost
              8.0125885 = idf(docFreq=39, maxDocs=44421)
              0.08808741 = queryNorm
            4.0062943 = fieldWeight in 3393, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.0125885 = idf(docFreq=39, maxDocs=44421)
              0.5 = fieldNorm(doc=3393)
    
  2. Shaw, R.R.: Classification systems (1962/63) 1.84
    1.8447644 = sum of:
      1.8447644 = product of:
        3.6895287 = sum of:
          3.6895287 = weight(author_txt:shaw in 602) [ClassicSimilarity], result of:
            3.6895287 = score(doc=602,freq=1.0), product of:
              0.73674643 = queryWeight, product of:
                1.0438337 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.08808741 = queryNorm
              5.007868 = fieldWeight in 602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=602)
        0.5 = coord(1/2)
    
  3. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.84
    1.8447644 = sum of:
      1.8447644 = product of:
        3.6895287 = sum of:
          3.6895287 = weight(author_txt:shaw in 4840) [ClassicSimilarity], result of:
            3.6895287 = score(doc=4840,freq=1.0), product of:
              0.73674643 = queryWeight, product of:
                1.0438337 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.08808741 = queryNorm
              5.007868 = fieldWeight in 4840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=4840)
        0.5 = coord(1/2)
    
  4. Shaw, W.M.: Subject and citation indexing : pt.2: the optimal, cluster-based retrieval performance of composite representations (1991) 1.84
    1.8447644 = sum of:
      1.8447644 = product of:
        3.6895287 = sum of:
          3.6895287 = weight(author_txt:shaw in 4841) [ClassicSimilarity], result of:
            3.6895287 = score(doc=4841,freq=1.0), product of:
              0.73674643 = queryWeight, product of:
                1.0438337 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.08808741 = queryNorm
              5.007868 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=4841)
        0.5 = coord(1/2)
    
  5. Shaw, S.: ¬The Internet as an entertainment system (1994) 1.84
    1.8447644 = sum of:
      1.8447644 = product of:
        3.6895287 = sum of:
          3.6895287 = weight(author_txt:shaw in 266) [ClassicSimilarity], result of:
            3.6895287 = score(doc=266,freq=1.0), product of:
              0.73674643 = queryWeight, product of:
                1.0438337 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.08808741 = queryNorm
              5.007868 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=266)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.27
    0.2656188 = sum of:
      0.2656188 = product of:
        1.106745 = sum of:
          0.05215783 = weight(abstract_txt:chinese in 206) [ClassicSimilarity], result of:
            0.05215783 = score(doc=206,freq=3.0), product of:
              0.076493 = queryWeight, product of:
                1.1186455 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.010856056 = queryNorm
              0.6818641 = fieldWeight in 206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.05339789 = weight(abstract_txt:formula in 206) [ClassicSimilarity], result of:
            0.05339789 = score(doc=206,freq=1.0), product of:
              0.11206376 = queryWeight, product of:
                1.3539861 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.010856056 = queryNorm
              0.47649562 = fieldWeight in 206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.015901787 = weight(abstract_txt:text in 206) [ClassicSimilarity], result of:
            0.015901787 = score(doc=206,freq=1.0), product of:
              0.06296363 = queryWeight, product of:
                1.435296 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.010856056 = queryNorm
              0.25255513 = fieldWeight in 206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.030451385 = weight(abstract_txt:model in 206) [ClassicSimilarity], result of:
            0.030451385 = score(doc=206,freq=1.0), product of:
              0.12233212 = queryWeight, product of:
                2.8293188 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.010856056 = queryNorm
              0.24892388 = fieldWeight in 206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.44430304 = weight(abstract_txt:words in 206) [ClassicSimilarity], result of:
            0.44430304 = score(doc=206,freq=9.0), product of:
              0.4424367 = queryWeight, product of:
                7.6094313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.010856056 = queryNorm
              1.0042183 = fieldWeight in 206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.51053303 = weight(abstract_txt:frequency in 206) [ClassicSimilarity], result of:
            0.51053303 = score(doc=206,freq=5.0), product of:
              0.61407655 = queryWeight, product of:
                9.508547 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.010856056 = queryNorm
              0.83138335 = fieldWeight in 206, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
        0.24 = coord(6/25)
    
  2. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.22
    0.22273955 = sum of:
      0.22273955 = product of:
        0.92808145 = sum of:
          0.0496014 = weight(abstract_txt:straightforward in 45) [ClassicSimilarity], result of:
            0.0496014 = score(doc=45,freq=1.0), product of:
              0.106687054 = queryWeight, product of:
                1.3211055 = boost
                7.438788 = idf(docFreq=70, maxDocs=44421)
                0.010856056 = queryNorm
              0.46492425 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.438788 = idf(docFreq=70, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.0795067 = weight(abstract_txt:assigns in 45) [ClassicSimilarity], result of:
            0.0795067 = score(doc=45,freq=1.0), product of:
              0.1461229 = queryWeight, product of:
                1.5461113 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.010856056 = queryNorm
              0.54410845 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.061070886 = weight(abstract_txt:texts in 45) [ClassicSimilarity], result of:
            0.061070886 = score(doc=45,freq=2.0), product of:
              0.12255713 = queryWeight, product of:
                2.0024695 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.010856056 = queryNorm
              0.49830544 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.041248653 = weight(abstract_txt:high in 45) [ClassicSimilarity], result of:
            0.041248653 = score(doc=45,freq=1.0), product of:
              0.1360701 = queryWeight, product of:
                2.5841851 = boost
                4.8502827 = idf(docFreq=944, maxDocs=44421)
                0.010856056 = queryNorm
              0.30314267 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8502827 = idf(docFreq=944, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.46833652 = weight(abstract_txt:words in 45) [ClassicSimilarity], result of:
            0.46833652 = score(doc=45,freq=10.0), product of:
              0.4424367 = queryWeight, product of:
                7.6094313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.010856056 = queryNorm
              1.058539 = fieldWeight in 45, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.2283173 = weight(abstract_txt:frequency in 45) [ClassicSimilarity], result of:
            0.2283173 = score(doc=45,freq=1.0), product of:
              0.61407655 = queryWeight, product of:
                9.508547 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.010856056 = queryNorm
              0.37180594 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
        0.24 = coord(6/25)
    
  3. Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.21
    0.20734014 = sum of:
      0.20734014 = product of:
        0.86391723 = sum of:
          0.060226675 = weight(abstract_txt:chinese in 734) [ClassicSimilarity], result of:
            0.060226675 = score(doc=734,freq=4.0), product of:
              0.076493 = queryWeight, product of:
                1.1186455 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.010856056 = queryNorm
              0.7873488 = fieldWeight in 734, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
          0.05339789 = weight(abstract_txt:formula in 734) [ClassicSimilarity], result of:
            0.05339789 = score(doc=734,freq=1.0), product of:
              0.11206376 = queryWeight, product of:
                1.3539861 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.010856056 = queryNorm
              0.47649562 = fieldWeight in 734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
          0.025877167 = weight(abstract_txt:between in 734) [ClassicSimilarity], result of:
            0.025877167 = score(doc=734,freq=3.0), product of:
              0.069139615 = queryWeight, product of:
                1.8420683 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.010856056 = queryNorm
              0.3742741 = fieldWeight in 734, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
          0.058334406 = weight(abstract_txt:high in 734) [ClassicSimilarity], result of:
            0.058334406 = score(doc=734,freq=2.0), product of:
              0.1360701 = queryWeight, product of:
                2.5841851 = boost
                4.8502827 = idf(docFreq=944, maxDocs=44421)
                0.010856056 = queryNorm
              0.42870846 = fieldWeight in 734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8502827 = idf(docFreq=944, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
          0.20944646 = weight(abstract_txt:words in 734) [ClassicSimilarity], result of:
            0.20944646 = score(doc=734,freq=2.0), product of:
              0.4424367 = queryWeight, product of:
                7.6094313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.010856056 = queryNorm
              0.47339305 = fieldWeight in 734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
          0.4566346 = weight(abstract_txt:frequency in 734) [ClassicSimilarity], result of:
            0.4566346 = score(doc=734,freq=4.0), product of:
              0.61407655 = queryWeight, product of:
                9.508547 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.010856056 = queryNorm
              0.7436119 = fieldWeight in 734, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=734)
        0.24 = coord(6/25)
    
  4. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.20
    0.20240603 = sum of:
      0.20240603 = product of:
        1.0120301 = sum of:
          0.07804709 = weight(abstract_txt:ranks in 3417) [ClassicSimilarity], result of:
            0.07804709 = score(doc=3417,freq=1.0), product of:
              0.11014363 = queryWeight, product of:
                1.3423363 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.010856056 = queryNorm
              0.7085937 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.08009683 = weight(abstract_txt:formula in 3417) [ClassicSimilarity], result of:
            0.08009683 = score(doc=3417,freq=1.0), product of:
              0.11206376 = queryWeight, product of:
                1.3539861 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.010856056 = queryNorm
              0.71474344 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.03854893 = weight(abstract_txt:same in 3417) [ClassicSimilarity], result of:
            0.03854893 = score(doc=3417,freq=1.0), product of:
              0.086710796 = queryWeight, product of:
                1.6843534 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.010856056 = queryNorm
              0.444569 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.22215152 = weight(abstract_txt:words in 3417) [ClassicSimilarity], result of:
            0.22215152 = score(doc=3417,freq=1.0), product of:
              0.4424367 = queryWeight, product of:
                7.6094313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.010856056 = queryNorm
              0.50210917 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.5931858 = weight(abstract_txt:frequency in 3417) [ClassicSimilarity], result of:
            0.5931858 = score(doc=3417,freq=3.0), product of:
              0.61407655 = queryWeight, product of:
                9.508547 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.010856056 = queryNorm
              0.9659802 = fieldWeight in 3417, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
        0.2 = coord(5/25)
    
  5. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 0.19
    0.18985686 = sum of:
      0.18985686 = product of:
        0.94928426 = sum of:
          0.076077834 = weight(abstract_txt:root in 546) [ClassicSimilarity], result of:
            0.076077834 = score(doc=546,freq=1.0), product of:
              0.12227787 = queryWeight, product of:
                1.4143456 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.010856056 = queryNorm
              0.6221717 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.09066611 = weight(abstract_txt:square in 546) [ClassicSimilarity], result of:
            0.09066611 = score(doc=546,freq=1.0), product of:
              0.1374482 = queryWeight, product of:
                1.4995162 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.010856056 = queryNorm
              0.65963835 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.026410775 = weight(abstract_txt:between in 546) [ClassicSimilarity], result of:
            0.026410775 = score(doc=546,freq=2.0), product of:
              0.069139615 = queryWeight, product of:
                1.8420683 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.010856056 = queryNorm
              0.38199192 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.26180807 = weight(abstract_txt:words in 546) [ClassicSimilarity], result of:
            0.26180807 = score(doc=546,freq=2.0), product of:
              0.4424367 = queryWeight, product of:
                7.6094313 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.010856056 = queryNorm
              0.5917413 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.49432147 = weight(abstract_txt:frequency in 546) [ClassicSimilarity], result of:
            0.49432147 = score(doc=546,freq=3.0), product of:
              0.61407655 = queryWeight, product of:
                9.508547 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.010856056 = queryNorm
              0.80498344 = fieldWeight in 546, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
        0.2 = coord(5/25)