Document (#32272)

Author
Egghe, L.
Title
Untangling Herdan's law and Heaps' law : mathematical and informetric arguments
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702-709
Year
2007
Abstract
Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
Theme
Informetrie
Object
Herdan-Gesetz
Heaps-Gesetz

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 6882) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 6882, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=6882)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 7118) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 7118, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=7118)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 1978) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 1978, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=1978)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 4462) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 4462, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=4462)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 5136) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 5136, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=5136)
    

Similar documents (content)

  1. Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.22
    0.22057974 = sum of:
      0.22057974 = product of:
        0.9190823 = sum of:
          0.087491386 = weight(abstract_txt:lotkaian in 4466) [ClassicSimilarity], result of:
            0.087491386 = score(doc=4466,freq=1.0), product of:
              0.15046501 = queryWeight, product of:
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.01617282 = queryNorm
              0.5814733 = fieldWeight in 4466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
          0.049254857 = weight(abstract_txt:increasing in 4466) [ClassicSimilarity], result of:
            0.049254857 = score(doc=4466,freq=1.0), product of:
              0.14795573 = queryWeight, product of:
                1.7175475 = boost
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.01617282 = queryNorm
              0.33290267 = fieldWeight in 4466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
          0.014226132 = weight(abstract_txt:that in 4466) [ClassicSimilarity], result of:
            0.014226132 = score(doc=4466,freq=2.0), product of:
              0.06805696 = queryWeight, product of:
                1.7793752 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01617282 = queryNorm
              0.20903271 = fieldWeight in 4466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
          0.16518892 = weight(abstract_txt:argument in 4466) [ClassicSimilarity], result of:
            0.16518892 = score(doc=4466,freq=3.0), product of:
              0.22985077 = queryWeight, product of:
                2.1407495 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01617282 = queryNorm
              0.718679 = fieldWeight in 4466, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
          0.23699336 = weight(abstract_txt:laws in 4466) [ClassicSimilarity], result of:
            0.23699336 = score(doc=4466,freq=4.0), product of:
              0.26564595 = queryWeight, product of:
                2.3014123 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.01617282 = queryNorm
              0.8921399 = fieldWeight in 4466, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
          0.36592767 = weight(abstract_txt:informetric in 4466) [ClassicSimilarity], result of:
            0.36592767 = score(doc=4466,freq=2.0), product of:
              0.53011346 = queryWeight, product of:
                4.1971226 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.01617282 = queryNorm
              0.6902818 = fieldWeight in 4466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.0625 = fieldNorm(doc=4466)
        0.24 = coord(6/25)
    
  2. Ye, F.Y.: ¬A theoretical approach to the unification of informetric models by wave-heat equations (2011) 0.19
    0.1874261 = sum of:
      0.1874261 = product of:
        1.1714132 = sum of:
          0.01760394 = weight(abstract_txt:that in 464) [ClassicSimilarity], result of:
            0.01760394 = score(doc=464,freq=1.0), product of:
              0.06805696 = queryWeight, product of:
                1.7793752 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01617282 = queryNorm
              0.2586648 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=464)
          0.09869751 = weight(abstract_txt:function in 464) [ClassicSimilarity], result of:
            0.09869751 = score(doc=464,freq=1.0), product of:
              0.16193642 = queryWeight, product of:
                1.7968636 = boost
                5.5724173 = idf(docFreq=458, maxDocs=44421)
                0.01617282 = queryNorm
              0.6094831 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5724173 = idf(docFreq=458, maxDocs=44421)
                0.109375 = fieldNorm(doc=464)
          0.41473836 = weight(abstract_txt:laws in 464) [ClassicSimilarity], result of:
            0.41473836 = score(doc=464,freq=4.0), product of:
              0.26564595 = queryWeight, product of:
                2.3014123 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.01617282 = queryNorm
              1.5612448 = fieldWeight in 464, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.109375 = fieldNorm(doc=464)
          0.6403734 = weight(abstract_txt:informetric in 464) [ClassicSimilarity], result of:
            0.6403734 = score(doc=464,freq=2.0), product of:
              0.53011346 = queryWeight, product of:
                4.1971226 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.01617282 = queryNorm
              1.2079931 = fieldWeight in 464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.109375 = fieldNorm(doc=464)
        0.16 = coord(4/25)
    
  3. Burrell, Q.L.: "Ambiguity" ans scientometric measurement : a dissenting view (2001) 0.13
    0.12741336 = sum of:
      0.12741336 = product of:
        0.79633355 = sum of:
          0.025148485 = weight(abstract_txt:that in 981) [ClassicSimilarity], result of:
            0.025148485 = score(doc=981,freq=4.0), product of:
              0.06805696 = queryWeight, product of:
                1.7793752 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01617282 = queryNorm
              0.3695211 = fieldWeight in 981, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=981)
          0.10430097 = weight(abstract_txt:mathematical in 981) [ClassicSimilarity], result of:
            0.10430097 = score(doc=981,freq=1.0), product of:
              0.21025741 = queryWeight, product of:
                2.0474746 = boost
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.01617282 = queryNorm
              0.49606323 = fieldWeight in 981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.078125 = fieldNorm(doc=981)
          0.2094745 = weight(abstract_txt:laws in 981) [ClassicSimilarity], result of:
            0.2094745 = score(doc=981,freq=2.0), product of:
              0.26564595 = queryWeight, product of:
                2.3014123 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.01617282 = queryNorm
              0.7885477 = fieldWeight in 981, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.078125 = fieldNorm(doc=981)
          0.4574096 = weight(abstract_txt:informetric in 981) [ClassicSimilarity], result of:
            0.4574096 = score(doc=981,freq=2.0), product of:
              0.53011346 = queryWeight, product of:
                4.1971226 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.01617282 = queryNorm
              0.8628523 = fieldWeight in 981, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.078125 = fieldNorm(doc=981)
        0.16 = coord(4/25)
    
  4. Egghe, L.; Rousseau, R.: ¬The Hirsch index of a shifted Lotka function and its relation with the impact factor (2012) 0.11
    0.11401399 = sum of:
      0.11401399 = product of:
        0.47505832 = sum of:
          0.034777787 = weight(abstract_txt:sources in 1243) [ClassicSimilarity], result of:
            0.034777787 = score(doc=1243,freq=1.0), product of:
              0.07821243 = queryWeight, product of:
                1.0196124 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.01617282 = queryNorm
              0.44465804 = fieldWeight in 1243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
          0.06036343 = weight(abstract_txt:total in 1243) [ClassicSimilarity], result of:
            0.06036343 = score(doc=1243,freq=1.0), product of:
              0.11295976 = queryWeight, product of:
                1.225347 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.01617282 = queryNorm
              0.53437996 = fieldWeight in 1243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
          0.10448533 = weight(abstract_txt:increasing in 1243) [ClassicSimilarity], result of:
            0.10448533 = score(doc=1243,freq=2.0), product of:
              0.14795573 = queryWeight, product of:
                1.7175475 = boost
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.01617282 = queryNorm
              0.7061932 = fieldWeight in 1243, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
          0.021339199 = weight(abstract_txt:that in 1243) [ClassicSimilarity], result of:
            0.021339199 = score(doc=1243,freq=2.0), product of:
              0.06805696 = queryWeight, product of:
                1.7793752 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01617282 = queryNorm
              0.31354907 = fieldWeight in 1243, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
          0.16919573 = weight(abstract_txt:function in 1243) [ClassicSimilarity], result of:
            0.16919573 = score(doc=1243,freq=4.0), product of:
              0.16193642 = queryWeight, product of:
                1.7968636 = boost
                5.5724173 = idf(docFreq=458, maxDocs=44421)
                0.01617282 = queryNorm
              1.0448282 = fieldWeight in 1243, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5724173 = idf(docFreq=458, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
          0.08489687 = weight(abstract_txt:items in 1243) [ClassicSimilarity], result of:
            0.08489687 = score(doc=1243,freq=1.0), product of:
              0.16231775 = queryWeight, product of:
                1.7989781 = boost
                5.5789747 = idf(docFreq=455, maxDocs=44421)
                0.01617282 = queryNorm
              0.52302885 = fieldWeight in 1243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5789747 = idf(docFreq=455, maxDocs=44421)
                0.09375 = fieldNorm(doc=1243)
        0.24 = coord(6/25)
    
  5. Garfield, E.; Paris, S.W.; Stock, W.G.: HistCite(TM) : a software tool for informetric analysis of citation linkage (2006) 0.11
    0.11313984 = sum of:
      0.11313984 = product of:
        0.707124 = sum of:
          0.034777787 = weight(abstract_txt:sources in 204) [ClassicSimilarity], result of:
            0.034777787 = score(doc=204,freq=1.0), product of:
              0.07821243 = queryWeight, product of:
                1.0196124 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.01617282 = queryNorm
              0.44465804 = fieldWeight in 204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.09375 = fieldNorm(doc=204)
          0.015089092 = weight(abstract_txt:that in 204) [ClassicSimilarity], result of:
            0.015089092 = score(doc=204,freq=1.0), product of:
              0.06805696 = queryWeight, product of:
                1.7793752 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01617282 = queryNorm
              0.22171268 = fieldWeight in 204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=204)
          0.10836561 = weight(abstract_txt:shows in 204) [ClassicSimilarity], result of:
            0.10836561 = score(doc=204,freq=1.0), product of:
              0.22645512 = queryWeight, product of:
                2.7432053 = boost
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.01617282 = queryNorm
              0.47853017 = fieldWeight in 204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.09375 = fieldNorm(doc=204)
          0.5488915 = weight(abstract_txt:informetric in 204) [ClassicSimilarity], result of:
            0.5488915 = score(doc=204,freq=2.0), product of:
              0.53011346 = queryWeight, product of:
                4.1971226 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.01617282 = queryNorm
              1.0354227 = fieldWeight in 204, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.09375 = fieldNorm(doc=204)
        0.16 = coord(4/25)