Document (#31320)

Author
Egghe, L.
Title
Properties of the n-overlap vector and n-overlap similarity theory
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.1165-1177
Year
2006
Abstract
In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 6882) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 6882, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=6882)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 7118) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 7118, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=7118)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 1978) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 1978, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=1978)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 4462) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 4462, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=4462)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.74
    4.744121 = sum of:
      4.744121 = weight(author_txt:egghe in 5136) [ClassicSimilarity], result of:
        4.744121 = fieldWeight in 5136, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.590594 = idf(docFreq=60, maxDocs=44421)
          0.625 = fieldNorm(doc=5136)
    

Similar documents (content)

  1. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.43
    0.42613485 = sum of:
      0.42613485 = product of:
        1.5219102 = sum of:
          0.057557818 = weight(abstract_txt:concentration in 980) [ClassicSimilarity], result of:
            0.057557818 = score(doc=980,freq=1.0), product of:
              0.09223206 = queryWeight, product of:
                1.0059983 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.011477633 = queryNorm
              0.6240543 = fieldWeight in 980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.10895987 = weight(abstract_txt:similarity in 980) [ClassicSimilarity], result of:
            0.10895987 = score(doc=980,freq=6.0), product of:
              0.097862504 = queryWeight, product of:
                1.4654784 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.011477633 = queryNorm
              1.1133975 = fieldWeight in 980, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.031489715 = weight(abstract_txt:theory in 980) [ClassicSimilarity], result of:
            0.031489715 = score(doc=980,freq=1.0), product of:
              0.08898127 = queryWeight, product of:
                1.7114577 = boost
                4.529811 = idf(docFreq=1301, maxDocs=44421)
                0.011477633 = queryNorm
              0.3538915 = fieldWeight in 980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.529811 = idf(docFreq=1301, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.27847654 = weight(abstract_txt:jaccard in 980) [ClassicSimilarity], result of:
            0.27847654 = score(doc=980,freq=3.0), product of:
              0.23048414 = queryWeight, product of:
                2.249012 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.011477633 = queryNorm
              1.2082243 = fieldWeight in 980, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.16810025 = weight(abstract_txt:lorenz in 980) [ClassicSimilarity], result of:
            0.16810025 = score(doc=980,freq=1.0), product of:
              0.23742948 = queryWeight, product of:
                2.2826462 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011477633 = queryNorm
              0.7080008 = fieldWeight in 980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.22129513 = weight(abstract_txt:vector in 980) [ClassicSimilarity], result of:
            0.22129513 = score(doc=980,freq=2.0), product of:
              0.3072135 = queryWeight, product of:
                4.105456 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011477633 = queryNorm
              0.7203301 = fieldWeight in 980, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.6560309 = weight(abstract_txt:overlap in 980) [ClassicSimilarity], result of:
            0.6560309 = score(doc=980,freq=3.0), product of:
              0.6977848 = queryWeight, product of:
                8.75018 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.011477633 = queryNorm
              0.94016224 = fieldWeight in 980, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
        0.28 = coord(7/25)
    
  2. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.13
    0.12913547 = sum of:
      0.12913547 = product of:
        0.8070967 = sum of:
          0.04448268 = weight(abstract_txt:similarity in 3708) [ClassicSimilarity], result of:
            0.04448268 = score(doc=3708,freq=1.0), product of:
              0.097862504 = queryWeight, product of:
                1.4654784 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.011477633 = queryNorm
              0.4545426 = fieldWeight in 3708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=3708)
          0.22737515 = weight(abstract_txt:jaccard in 3708) [ClassicSimilarity], result of:
            0.22737515 = score(doc=3708,freq=2.0), product of:
              0.23048414 = queryWeight, product of:
                2.249012 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.011477633 = queryNorm
              0.98651105 = fieldWeight in 3708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=3708)
          0.15647928 = weight(abstract_txt:vector in 3708) [ClassicSimilarity], result of:
            0.15647928 = score(doc=3708,freq=1.0), product of:
              0.3072135 = queryWeight, product of:
                4.105456 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011477633 = queryNorm
              0.5093503 = fieldWeight in 3708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=3708)
          0.37875962 = weight(abstract_txt:overlap in 3708) [ClassicSimilarity], result of:
            0.37875962 = score(doc=3708,freq=1.0), product of:
              0.6977848 = queryWeight, product of:
                8.75018 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.011477633 = queryNorm
              0.54280293 = fieldWeight in 3708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=3708)
        0.16 = coord(4/25)
    
  3. Hood, W.W.; Wilson, C.S.: ¬The relationship of records in multiple databases to their usage or citedness (2005) 0.13
    0.12770183 = sum of:
      0.12770183 = product of:
        1.0641819 = sum of:
          0.101652965 = weight(abstract_txt:indexed in 4680) [ClassicSimilarity], result of:
            0.101652965 = score(doc=4680,freq=2.0), product of:
              0.10768133 = queryWeight, product of:
                1.5372392 = boost
                6.1030455 = idf(docFreq=269, maxDocs=44421)
                0.011477633 = queryNorm
              0.94401664 = fieldWeight in 4680, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1030455 = idf(docFreq=269, maxDocs=44421)
                0.109375 = fieldNorm(doc=4680)
          0.044085596 = weight(abstract_txt:theory in 4680) [ClassicSimilarity], result of:
            0.044085596 = score(doc=4680,freq=1.0), product of:
              0.08898127 = queryWeight, product of:
                1.7114577 = boost
                4.529811 = idf(docFreq=1301, maxDocs=44421)
                0.011477633 = queryNorm
              0.49544805 = fieldWeight in 4680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.529811 = idf(docFreq=1301, maxDocs=44421)
                0.109375 = fieldNorm(doc=4680)
          0.9184433 = weight(abstract_txt:overlap in 4680) [ClassicSimilarity], result of:
            0.9184433 = score(doc=4680,freq=3.0), product of:
              0.6977848 = queryWeight, product of:
                8.75018 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.011477633 = queryNorm
              1.3162272 = fieldWeight in 4680, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.109375 = fieldNorm(doc=4680)
        0.12 = coord(3/25)
    
  4. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.12
    0.119116694 = sum of:
      0.119116694 = product of:
        0.74447936 = sum of:
          0.027895635 = weight(abstract_txt:author in 214) [ClassicSimilarity], result of:
            0.027895635 = score(doc=214,freq=1.0), product of:
              0.07169902 = queryWeight, product of:
                1.2543764 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.011477633 = queryNorm
              0.38906577 = fieldWeight in 214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.078125 = fieldNorm(doc=214)
          0.07704625 = weight(abstract_txt:similarity in 214) [ClassicSimilarity], result of:
            0.07704625 = score(doc=214,freq=3.0), product of:
              0.097862504 = queryWeight, product of:
                1.4654784 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.011477633 = queryNorm
              0.7872909 = fieldWeight in 214, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=214)
          0.10389049 = weight(abstract_txt:section in 214) [ClassicSimilarity], result of:
            0.10389049 = score(doc=214,freq=2.0), product of:
              0.15651646 = queryWeight, product of:
                2.2698486 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.011477633 = queryNorm
              0.6637672 = fieldWeight in 214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.078125 = fieldNorm(doc=214)
          0.535647 = weight(abstract_txt:overlap in 214) [ClassicSimilarity], result of:
            0.535647 = score(doc=214,freq=2.0), product of:
              0.6977848 = queryWeight, product of:
                8.75018 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.011477633 = queryNorm
              0.7676392 = fieldWeight in 214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=214)
        0.16 = coord(4/25)
    
  5. Rorvig, M.: Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets (1999) 0.10
    0.10048868 = sum of:
      0.10048868 = product of:
        0.8374057 = sum of:
          0.088071205 = weight(abstract_txt:similarity in 4767) [ClassicSimilarity], result of:
            0.088071205 = score(doc=4767,freq=2.0), product of:
              0.097862504 = queryWeight, product of:
                1.4654784 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.011477633 = queryNorm
              0.8999484 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.109375 = fieldNorm(doc=4767)
          0.21907099 = weight(abstract_txt:vector in 4767) [ClassicSimilarity], result of:
            0.21907099 = score(doc=4767,freq=1.0), product of:
              0.3072135 = queryWeight, product of:
                4.105456 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011477633 = queryNorm
              0.7130904 = fieldWeight in 4767, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.109375 = fieldNorm(doc=4767)
          0.5302635 = weight(abstract_txt:overlap in 4767) [ClassicSimilarity], result of:
            0.5302635 = score(doc=4767,freq=1.0), product of:
              0.6977848 = queryWeight, product of:
                8.75018 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.011477633 = queryNorm
              0.7599241 = fieldWeight in 4767, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.109375 = fieldNorm(doc=4767)
        0.12 = coord(3/25)