Document (#31320)

Author
Egghe, L.
Title
Properties of the n-overlap vector and n-overlap similarity theory
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.1165-1177
Year
2006
Abstract
In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 6883, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=6883)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 7119) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 7119, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=7119)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 1910) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 1910, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=1910)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 4394) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 4394, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=4394)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 5068) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 5068, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=5068)
    

Similar documents (content)

  1. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.43
    0.4261522 = sum of:
      0.4261522 = product of:
        1.5219722 = sum of:
          0.058043722 = weight(abstract_txt:concentration in 3993) [ClassicSimilarity], result of:
            0.058043722 = score(doc=3993,freq=1.0), product of:
              0.09277709 = queryWeight, product of:
                1.0091134 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.011480908 = queryNorm
              0.6256256 = fieldWeight in 3993, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.10911045 = weight(abstract_txt:similarity in 3993) [ClassicSimilarity], result of:
            0.10911045 = score(doc=3993,freq=6.0), product of:
              0.09798081 = queryWeight, product of:
                1.4665779 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.011480908 = queryNorm
              1.11359 = fieldWeight in 3993, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.031647105 = weight(abstract_txt:theory in 3993) [ClassicSimilarity], result of:
            0.031647105 = score(doc=3993,freq=1.0), product of:
              0.08930319 = queryWeight, product of:
                1.714801 = boost
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.011480908 = queryNorm
              0.35437822 = fieldWeight in 3993, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.27828807 = weight(abstract_txt:jaccard in 3993) [ClassicSimilarity], result of:
            0.27828807 = score(doc=3993,freq=3.0), product of:
              0.23044637 = queryWeight, product of:
                2.2491558 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.011480908 = queryNorm
              1.2076045 = fieldWeight in 3993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.16799031 = weight(abstract_txt:lorenz in 3993) [ClassicSimilarity], result of:
            0.16799031 = score(doc=3993,freq=1.0), product of:
              0.23739415 = queryWeight, product of:
                2.2828093 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011480908 = queryNorm
              0.707643 = fieldWeight in 3993, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.22159345 = weight(abstract_txt:vector in 3993) [ClassicSimilarity], result of:
            0.22159345 = score(doc=3993,freq=2.0), product of:
              0.3075779 = queryWeight, product of:
                4.1084895 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.011480908 = queryNorm
              0.7204466 = fieldWeight in 3993, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
          0.6552991 = weight(abstract_txt:overlap in 3993) [ClassicSimilarity], result of:
            0.6552991 = score(doc=3993,freq=3.0), product of:
              0.69746625 = queryWeight, product of:
                8.749459 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.011480908 = queryNorm
              0.9395424 = fieldWeight in 3993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=3993)
        0.28 = coord(7/25)
    
  2. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.13
    0.12908684 = sum of:
      0.12908684 = product of:
        0.80679274 = sum of:
          0.044544153 = weight(abstract_txt:similarity in 2708) [ClassicSimilarity], result of:
            0.044544153 = score(doc=2708,freq=1.0), product of:
              0.09798081 = queryWeight, product of:
                1.4665779 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.011480908 = queryNorm
              0.4546212 = fieldWeight in 2708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=2708)
          0.22722128 = weight(abstract_txt:jaccard in 2708) [ClassicSimilarity], result of:
            0.22722128 = score(doc=2708,freq=2.0), product of:
              0.23044637 = queryWeight, product of:
                2.2491558 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.011480908 = queryNorm
              0.986005 = fieldWeight in 2708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=2708)
          0.15669023 = weight(abstract_txt:vector in 2708) [ClassicSimilarity], result of:
            0.15669023 = score(doc=2708,freq=1.0), product of:
              0.3075779 = queryWeight, product of:
                4.1084895 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.011480908 = queryNorm
              0.5094326 = fieldWeight in 2708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.078125 = fieldNorm(doc=2708)
          0.37833712 = weight(abstract_txt:overlap in 2708) [ClassicSimilarity], result of:
            0.37833712 = score(doc=2708,freq=1.0), product of:
              0.69746625 = queryWeight, product of:
                8.749459 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.011480908 = queryNorm
              0.54244506 = fieldWeight in 2708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=2708)
        0.16 = coord(4/25)
    
  3. Hood, W.W.; Wilson, C.S.: ¬The relationship of records in multiple databases to their usage or citedness (2005) 0.13
    0.12761064 = sum of:
      0.12761064 = product of:
        1.063422 = sum of:
          0.10169718 = weight(abstract_txt:indexed in 3680) [ClassicSimilarity], result of:
            0.10169718 = score(doc=3680,freq=2.0), product of:
              0.10774351 = queryWeight, product of:
                1.5379074 = boost
                6.1021757 = idf(docFreq=268, maxDocs=44218)
                0.011480908 = queryNorm
              0.94388217 = fieldWeight in 3680, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1021757 = idf(docFreq=268, maxDocs=44218)
                0.109375 = fieldNorm(doc=3680)
          0.044305947 = weight(abstract_txt:theory in 3680) [ClassicSimilarity], result of:
            0.044305947 = score(doc=3680,freq=1.0), product of:
              0.08930319 = queryWeight, product of:
                1.714801 = boost
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.011480908 = queryNorm
              0.4961295 = fieldWeight in 3680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.109375 = fieldNorm(doc=3680)
          0.9174188 = weight(abstract_txt:overlap in 3680) [ClassicSimilarity], result of:
            0.9174188 = score(doc=3680,freq=3.0), product of:
              0.69746625 = queryWeight, product of:
                8.749459 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.011480908 = queryNorm
              1.3153594 = fieldWeight in 3680, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.109375 = fieldNorm(doc=3680)
        0.12 = coord(3/25)
    
  4. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.12
    0.11904047 = sum of:
      0.11904047 = product of:
        0.74400294 = sum of:
          0.027883258 = weight(abstract_txt:author in 4214) [ClassicSimilarity], result of:
            0.027883258 = score(doc=4214,freq=1.0), product of:
              0.07169841 = queryWeight, product of:
                1.254554 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011480908 = queryNorm
              0.38889644 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.07715273 = weight(abstract_txt:similarity in 4214) [ClassicSimilarity], result of:
            0.07715273 = score(doc=4214,freq=3.0), product of:
              0.09798081 = queryWeight, product of:
                1.4665779 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.011480908 = queryNorm
              0.78742695 = fieldWeight in 4214, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.10391746 = weight(abstract_txt:section in 4214) [ClassicSimilarity], result of:
            0.10391746 = score(doc=4214,freq=2.0), product of:
              0.15658854 = queryWeight, product of:
                2.2707026 = boost
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.011480908 = queryNorm
              0.6636339 = fieldWeight in 4214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.5350495 = weight(abstract_txt:overlap in 4214) [ClassicSimilarity], result of:
            0.5350495 = score(doc=4214,freq=2.0), product of:
              0.69746625 = queryWeight, product of:
                8.749459 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.011480908 = queryNorm
              0.7671332 = fieldWeight in 4214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
        0.16 = coord(4/25)
    
  5. Rorvig, M.: Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets (1999) 0.10
    0.10046774 = sum of:
      0.10046774 = product of:
        0.8372312 = sum of:
          0.088192925 = weight(abstract_txt:similarity in 3767) [ClassicSimilarity], result of:
            0.088192925 = score(doc=3767,freq=2.0), product of:
              0.09798081 = queryWeight, product of:
                1.4665779 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.011480908 = queryNorm
              0.90010405 = fieldWeight in 3767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.109375 = fieldNorm(doc=3767)
          0.21936631 = weight(abstract_txt:vector in 3767) [ClassicSimilarity], result of:
            0.21936631 = score(doc=3767,freq=1.0), product of:
              0.3075779 = queryWeight, product of:
                4.1084895 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.011480908 = queryNorm
              0.7132057 = fieldWeight in 3767, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.109375 = fieldNorm(doc=3767)
          0.52967197 = weight(abstract_txt:overlap in 3767) [ClassicSimilarity], result of:
            0.52967197 = score(doc=3767,freq=1.0), product of:
              0.69746625 = queryWeight, product of:
                8.749459 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.011480908 = queryNorm
              0.7594231 = fieldWeight in 3767, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.109375 = fieldNorm(doc=3767)
        0.12 = coord(3/25)