Document (#34763)

Author
Ferrer-i-Cancho, R.
Gavaldà, R.
Title
¬The frequency spectrum of finite samples from the intermittent silence process
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.837-843
Year
2009
Abstract
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.

Similar documents (author)

  1. Sapena, A. Ferrer- => Ferrer-Sapena, A.: 4.93
    4.9339643 = sum of:
      4.9339643 = weight(author_txt:ferrer in 771) [ClassicSimilarity], result of:
        4.9339643 = fieldWeight in 771, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.375 = fieldNorm(doc=771)
    
  2. Ferrer, N. Ferran- => Ferran-Ferrer, N.: 4.93
    4.9339643 = sum of:
      4.9339643 = weight(author_txt:ferrer in 2285) [ClassicSimilarity], result of:
        4.9339643 = fieldWeight in 2285, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.375 = fieldNorm(doc=2285)
    
  3. Centelles, M.; Ferran-Ferrer, N.: Taxonomies and ontologies in Wikipedia and Wikidata : an in-depth examination of knowledge organization systems (This article examines Wikipedia's knowledge organization system (KOS) and the broader KOS of Wikidata. We study the structure, functions, and relationship of Wikipedia's KOS to concepts like taxonomies and folksonomies, highlighting its unique characteristics compared to social media. A significant aspect of our examination is the gender-related content classification in the Catalan edition of Wikipedia (Viquipèdia), which notably excludes female categories and non-binary gender classifications. We explore the potential implications of these restrictions on gender bias within the platform. Furthermore, we broaden our investigative methodology to assess the KOS of Wikidata. Wikidata is a dataset built on ontological principles, designed to enhance and enrich Wikipedia's digital, collaborative encyclopedia. The findings shed light on the presence or absence of gender bias and contribute to the ongoing discourse on promoting inclusivity and diversity in online knowledge sharing.) 4.07
    4.070313 = sum of:
      4.070313 = weight(author_txt:ferrer in 2296) [ClassicSimilarity], result of:
        4.070313 = fieldWeight in 2296, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.4375 = fieldNorm(doc=2296)
    
  4. Miro, A.B.; Sahun, X.B.; Ferrer, M.E.: ¬La Library of Congress Classification à la Biblioteca de la Universitat Pompeu Fabra (1993) 3.49
    3.4888396 = sum of:
      3.4888396 = weight(author_txt:ferrer in 7089) [ClassicSimilarity], result of:
        3.4888396 = fieldWeight in 7089, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.375 = fieldNorm(doc=7089)
    
  5. Ferrer Morillo, L.M.; Portillo de Hernández, R.: Tesauros transdisciplinarios : del reduccionismo científico a la unidad del conocimiento (2007) 3.49
    3.4888396 = sum of:
      3.4888396 = weight(author_txt:ferrer in 2107) [ClassicSimilarity], result of:
        3.4888396 = fieldWeight in 2107, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.375 = fieldNorm(doc=2107)
    

Similar documents (content)

  1. Sun, Q.; Shaw, D.; Davis, C.H.: ¬A model for estimating the occurence of same-frequency words and the boundary between high- and low-frequency words in texts (1999) 0.11
    0.10745405 = sum of:
      0.10745405 = product of:
        0.6715878 = sum of:
          0.037880205 = weight(abstract_txt:text in 4063) [ClassicSimilarity], result of:
            0.037880205 = score(doc=4063,freq=2.0), product of:
              0.07070496 = queryWeight, product of:
                1.1904533 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014698105 = queryNorm
              0.5357503 = fieldWeight in 4063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=4063)
          0.028714936 = weight(abstract_txt:number in 4063) [ClassicSimilarity], result of:
            0.028714936 = score(doc=4063,freq=1.0), product of:
              0.07406111 = queryWeight, product of:
                1.2183794 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.014698105 = queryNorm
              0.38771948 = fieldWeight in 4063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.09375 = fieldNorm(doc=4063)
          0.15276542 = weight(abstract_txt:words in 4063) [ClassicSimilarity], result of:
            0.15276542 = score(doc=4063,freq=6.0), product of:
              0.12420849 = queryWeight, product of:
                1.5778403 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014698105 = queryNorm
              1.2299113 = fieldWeight in 4063, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=4063)
          0.45222723 = weight(abstract_txt:frequency in 4063) [ClassicSimilarity], result of:
            0.45222723 = score(doc=4063,freq=7.0), product of:
              0.3064786 = queryWeight, product of:
                3.5051167 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014698105 = queryNorm
              1.475559 = fieldWeight in 4063, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.09375 = fieldNorm(doc=4063)
        0.16 = coord(4/25)
    
  2. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 0.09
    0.094586916 = sum of:
      0.094586916 = product of:
        0.47293457 = sum of:
          0.073974974 = weight(abstract_txt:frequencies in 546) [ClassicSimilarity], result of:
            0.073974974 = score(doc=546,freq=1.0), product of:
              0.12474382 = queryWeight, product of:
                1.1181033 = boost
                7.590594 = idf(docFreq=60, maxDocs=44421)
                0.014698105 = queryNorm
              0.59301513 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.590594 = idf(docFreq=60, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.033840876 = weight(abstract_txt:number in 546) [ClassicSimilarity], result of:
            0.033840876 = score(doc=546,freq=2.0), product of:
              0.07406111 = queryWeight, product of:
                1.2183794 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.014698105 = queryNorm
              0.45693177 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.07349929 = weight(abstract_txt:words in 546) [ClassicSimilarity], result of:
            0.07349929 = score(doc=546,freq=2.0), product of:
              0.12420849 = queryWeight, product of:
                1.5778403 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014698105 = queryNorm
              0.5917413 = fieldWeight in 546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.04490925 = weight(abstract_txt:process in 546) [ClassicSimilarity], result of:
            0.04490925 = score(doc=546,freq=1.0), product of:
              0.1419732 = queryWeight, product of:
                2.3856437 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.014698105 = queryNorm
              0.31632203 = fieldWeight in 546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
          0.24671018 = weight(abstract_txt:frequency in 546) [ClassicSimilarity], result of:
            0.24671018 = score(doc=546,freq=3.0), product of:
              0.3064786 = queryWeight, product of:
                3.5051167 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014698105 = queryNorm
              0.80498344 = fieldWeight in 546, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.078125 = fieldNorm(doc=546)
        0.2 = coord(5/25)
    
  3. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.08
    0.08455866 = sum of:
      0.08455866 = product of:
        0.5284916 = sum of:
          0.073974974 = weight(abstract_txt:frequencies in 4463) [ClassicSimilarity], result of:
            0.073974974 = score(doc=4463,freq=1.0), product of:
              0.12474382 = queryWeight, product of:
                1.1181033 = boost
                7.590594 = idf(docFreq=60, maxDocs=44421)
                0.014698105 = queryNorm
              0.59301513 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.590594 = idf(docFreq=60, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.1039437 = weight(abstract_txt:words in 4463) [ClassicSimilarity], result of:
            0.1039437 = score(doc=4463,freq=4.0), product of:
              0.12420849 = queryWeight, product of:
                1.5778403 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014698105 = queryNorm
              0.8368486 = fieldWeight in 4463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.1038628 = weight(abstract_txt:size in 4463) [ClassicSimilarity], result of:
            0.1038628 = score(doc=4463,freq=1.0), product of:
              0.22558467 = queryWeight, product of:
                2.604281 = boost
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.014698105 = queryNorm
              0.46041605 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.24671018 = weight(abstract_txt:frequency in 4463) [ClassicSimilarity], result of:
            0.24671018 = score(doc=4463,freq=3.0), product of:
              0.3064786 = queryWeight, product of:
                3.5051167 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014698105 = queryNorm
              0.80498344 = fieldWeight in 4463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
        0.16 = coord(4/25)
    
  4. Colina, J.: ¬Un algoritmo informetrico para la evaluacion de un vocabulario de busqueda (1995) 0.08
    0.08371991 = sum of:
      0.08371991 = product of:
        1.0464989 = sum of:
          0.10425125 = weight(abstract_txt:vocabulary in 6823) [ClassicSimilarity], result of:
            0.10425125 = score(doc=6823,freq=1.0), product of:
              0.124453366 = queryWeight, product of:
                1.5793949 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.014698105 = queryNorm
              0.8376732 = fieldWeight in 6823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.15625 = fieldNorm(doc=6823)
          0.9422476 = weight(abstract_txt:silence in 6823) [ClassicSimilarity], result of:
            0.9422476 = score(doc=6823,freq=1.0), product of:
              0.6181487 = queryWeight, product of:
                4.3110147 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.014698105 = queryNorm
              1.5243058 = fieldWeight in 6823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.15625 = fieldNorm(doc=6823)
        0.08 = coord(2/25)
    
  5. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.08
    0.07810635 = sum of:
      0.07810635 = product of:
        0.32544315 = sum of:
          0.042337872 = weight(abstract_txt:spaces in 4913) [ClassicSimilarity], result of:
            0.042337872 = score(doc=4913,freq=1.0), product of:
              0.09978268 = queryWeight, product of:
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.014698105 = queryNorm
              0.4243008 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.030929059 = weight(abstract_txt:text in 4913) [ClassicSimilarity], result of:
            0.030929059 = score(doc=4913,freq=3.0), product of:
              0.07070496 = queryWeight, product of:
                1.1904533 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014698105 = queryNorm
              0.4374383 = fieldWeight in 4913, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.01914329 = weight(abstract_txt:number in 4913) [ClassicSimilarity], result of:
            0.01914329 = score(doc=4913,freq=1.0), product of:
              0.07406111 = queryWeight, product of:
                1.2183794 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.014698105 = queryNorm
              0.25847965 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.08315496 = weight(abstract_txt:words in 4913) [ClassicSimilarity], result of:
            0.08315496 = score(doc=4913,freq=4.0), product of:
              0.12420849 = queryWeight, product of:
                1.5778403 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014698105 = queryNorm
              0.6694789 = fieldWeight in 4913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.0359274 = weight(abstract_txt:process in 4913) [ClassicSimilarity], result of:
            0.0359274 = score(doc=4913,freq=1.0), product of:
              0.1419732 = queryWeight, product of:
                2.3856437 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.014698105 = queryNorm
              0.25305763 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.11395056 = weight(abstract_txt:frequency in 4913) [ClassicSimilarity], result of:
            0.11395056 = score(doc=4913,freq=1.0), product of:
              0.3064786 = queryWeight, product of:
                3.5051167 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014698105 = queryNorm
              0.37180594 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
        0.24 = coord(6/25)