Document (#5817)

Author
Shaw, R.J.
Willett, P.
Title
On the non-random nature of nearest-neighbour document clusters
Source
Information processing and management. 29(1993) no.4, S.449-452
Year
1993
Abstract
It has been suggested that the observed values of retrieval effectiveness that are obtained in searches of files of nearest-neighbour clusters can be explained by assuming that the pairwise inter-document similarities used to construct the clusters have been generated randomly. Such similarities are significantly different from those obtained by a random generation procedure

Similar documents (author)

  1. Willett, P.: Recent trends in hierarchic document clustering : a critical review (1988) 1.78
    1.7817373 = sum of:
      1.7817373 = product of:
        3.5634747 = sum of:
          3.5634747 = weight(author_txt:willett in 2603) [ClassicSimilarity], result of:
            3.5634747 = score(doc=2603,freq=1.0), product of:
              0.70933396 = queryWeight, product of:
                1.0031596 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.087970644 = queryNorm
              5.023691 = fieldWeight in 2603, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.625 = fieldNorm(doc=2603)
        0.5 = coord(1/2)
    
  2. Willett, P.: Best-match text retrieval (1993) 1.78
    1.7817373 = sum of:
      1.7817373 = product of:
        3.5634747 = sum of:
          3.5634747 = weight(author_txt:willett in 7817) [ClassicSimilarity], result of:
            3.5634747 = score(doc=7817,freq=1.0), product of:
              0.70933396 = queryWeight, product of:
                1.0031596 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.087970644 = queryNorm
              5.023691 = fieldWeight in 7817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.625 = fieldNorm(doc=7817)
        0.5 = coord(1/2)
    
  3. Willett, P.: From chemical documentation to chemoinformatics : 50 years of chemical information science (2009) 1.78
    1.7817373 = sum of:
      1.7817373 = product of:
        3.5634747 = sum of:
          3.5634747 = weight(author_txt:willett in 643) [ClassicSimilarity], result of:
            3.5634747 = score(doc=643,freq=1.0), product of:
              0.70933396 = queryWeight, product of:
                1.0031596 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.087970644 = queryNorm
              5.023691 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.625 = fieldNorm(doc=643)
        0.5 = coord(1/2)
    
  4. Shaw, R.R.: Classification systems (1962/63) 1.76
    1.7649543 = sum of:
      1.7649543 = product of:
        3.5299087 = sum of:
          3.5299087 = weight(author_txt:shaw in 602) [ClassicSimilarity], result of:
            3.5299087 = score(doc=602,freq=1.0), product of:
              0.70487255 = queryWeight, product of:
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.087970644 = queryNorm
              5.007868 = fieldWeight in 602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=602)
        0.5 = coord(1/2)
    
  5. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.76
    1.7649543 = sum of:
      1.7649543 = product of:
        3.5299087 = sum of:
          3.5299087 = weight(author_txt:shaw in 4840) [ClassicSimilarity], result of:
            3.5299087 = score(doc=4840,freq=1.0), product of:
              0.70487255 = queryWeight, product of:
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.087970644 = queryNorm
              5.007868 = fieldWeight in 4840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.625 = fieldNorm(doc=4840)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Sembok, T.M.T.; Rijsbergen, C.J. van: IMAGING: a relevant feedback retrieval with nearest neighbour clusters (1994) 0.29
    0.28846365 = sum of:
      0.28846365 = product of:
        1.8028979 = sum of:
          0.17794351 = weight(abstract_txt:obtained in 1139) [ClassicSimilarity], result of:
            0.17794351 = score(doc=1139,freq=1.0), product of:
              0.19794069 = queryWeight, product of:
                2.257944 = boost
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.015236839 = queryNorm
              0.8989739 = fieldWeight in 1139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.15625 = fieldNorm(doc=1139)
          0.505072 = weight(abstract_txt:nearest in 1139) [ClassicSimilarity], result of:
            0.505072 = score(doc=1139,freq=1.0), product of:
              0.3968099 = queryWeight, product of:
                3.1969576 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.015236839 = queryNorm
              1.2728311 = fieldWeight in 1139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.15625 = fieldNorm(doc=1139)
          0.73148894 = weight(abstract_txt:neighbour in 1139) [ClassicSimilarity], result of:
            0.73148894 = score(doc=1139,freq=1.0), product of:
              0.5079475 = queryWeight, product of:
                3.6170545 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015236839 = queryNorm
              1.4400877 = fieldWeight in 1139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.15625 = fieldNorm(doc=1139)
          0.38839343 = weight(abstract_txt:clusters in 1139) [ClassicSimilarity], result of:
            0.38839343 = score(doc=1139,freq=1.0), product of:
              0.38126358 = queryWeight, product of:
                3.8379908 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.015236839 = queryNorm
              1.0187006 = fieldWeight in 1139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.15625 = fieldNorm(doc=1139)
        0.16 = coord(4/25)
    
  2. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.21
    0.20631514 = sum of:
      0.20631514 = product of:
        1.0315757 = sum of:
          0.043281335 = weight(abstract_txt:effectiveness in 324) [ClassicSimilarity], result of:
            0.043281335 = score(doc=324,freq=1.0), product of:
              0.07764951 = queryWeight, product of:
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.015236839 = queryNorm
              0.55739355 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.091818474 = weight(abstract_txt:explained in 324) [ClassicSimilarity], result of:
            0.091818474 = score(doc=324,freq=1.0), product of:
              0.12820108 = queryWeight, product of:
                1.2849212 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.015236839 = queryNorm
              0.7162067 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.030883243 = weight(abstract_txt:been in 324) [ClassicSimilarity], result of:
            0.030883243 = score(doc=324,freq=1.0), product of:
              0.07812024 = queryWeight, product of:
                1.4184936 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.015236839 = queryNorm
              0.3953296 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.3535504 = weight(abstract_txt:nearest in 324) [ClassicSimilarity], result of:
            0.3535504 = score(doc=324,freq=1.0), product of:
              0.3968099 = queryWeight, product of:
                3.1969576 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.015236839 = queryNorm
              0.8909818 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.5120423 = weight(abstract_txt:neighbour in 324) [ClassicSimilarity], result of:
            0.5120423 = score(doc=324,freq=1.0), product of:
              0.5079475 = queryWeight, product of:
                3.6170545 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015236839 = queryNorm
              1.0080614 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
        0.2 = coord(5/25)
    
  3. Small, H.G.: Structural dynamics of scientific literature (2015) 0.20
    0.1988934 = sum of:
      0.1988934 = product of:
        0.7103336 = sum of:
          0.051971402 = weight(abstract_txt:observed in 3356) [ClassicSimilarity], result of:
            0.051971402 = score(doc=3356,freq=1.0), product of:
              0.10978254 = queryWeight, product of:
                1.1890422 = boost
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.015236839 = queryNorm
              0.4734032 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.06683354 = weight(abstract_txt:procedure in 3356) [ClassicSimilarity], result of:
            0.06683354 = score(doc=3356,freq=1.0), product of:
              0.12982349 = queryWeight, product of:
                1.2930261 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.015236839 = queryNorm
              0.5148031 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.009268725 = weight(abstract_txt:that in 3356) [ClassicSimilarity], result of:
            0.009268725 = score(doc=3356,freq=1.0), product of:
              0.05016615 = queryWeight, product of:
                1.392184 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015236839 = queryNorm
              0.18476056 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.02205946 = weight(abstract_txt:been in 3356) [ClassicSimilarity], result of:
            0.02205946 = score(doc=3356,freq=1.0), product of:
              0.07812024 = queryWeight, product of:
                1.4184936 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.015236839 = queryNorm
              0.2823783 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.03699156 = weight(abstract_txt:document in 3356) [ClassicSimilarity], result of:
            0.03699156 = score(doc=3356,freq=1.0), product of:
              0.11026443 = queryWeight, product of:
                1.6852461 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015236839 = queryNorm
              0.33548045 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.08897176 = weight(abstract_txt:obtained in 3356) [ClassicSimilarity], result of:
            0.08897176 = score(doc=3356,freq=1.0), product of:
              0.19794069 = queryWeight, product of:
                2.257944 = boost
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.015236839 = queryNorm
              0.44948694 = fieldWeight in 3356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
          0.43423712 = weight(abstract_txt:clusters in 3356) [ClassicSimilarity], result of:
            0.43423712 = score(doc=3356,freq=5.0), product of:
              0.38126358 = queryWeight, product of:
                3.8379908 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.015236839 = queryNorm
              1.138942 = fieldWeight in 3356, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=3356)
        0.28 = coord(7/25)
    
  4. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.15
    0.14885713 = sum of:
      0.14885713 = product of:
        0.9303571 = sum of:
          0.012976216 = weight(abstract_txt:that in 2299) [ClassicSimilarity], result of:
            0.012976216 = score(doc=2299,freq=1.0), product of:
              0.05016615 = queryWeight, product of:
                1.392184 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015236839 = queryNorm
              0.2586648 = fieldWeight in 2299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=2299)
          0.051788185 = weight(abstract_txt:document in 2299) [ClassicSimilarity], result of:
            0.051788185 = score(doc=2299,freq=1.0), product of:
              0.11026443 = queryWeight, product of:
                1.6852461 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015236839 = queryNorm
              0.46967265 = fieldWeight in 2299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.109375 = fieldNorm(doc=2299)
          0.3535504 = weight(abstract_txt:nearest in 2299) [ClassicSimilarity], result of:
            0.3535504 = score(doc=2299,freq=1.0), product of:
              0.3968099 = queryWeight, product of:
                3.1969576 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.015236839 = queryNorm
              0.8909818 = fieldWeight in 2299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.109375 = fieldNorm(doc=2299)
          0.5120423 = weight(abstract_txt:neighbour in 2299) [ClassicSimilarity], result of:
            0.5120423 = score(doc=2299,freq=1.0), product of:
              0.5079475 = queryWeight, product of:
                3.6170545 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015236839 = queryNorm
              1.0080614 = fieldWeight in 2299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=2299)
        0.16 = coord(4/25)
    
  5. Rasmussen, E.: Clustering algorithms (1992) 0.13
    0.12570347 = sum of:
      0.12570347 = product of:
        0.5237645 = sum of:
          0.024732191 = weight(abstract_txt:effectiveness in 4513) [ClassicSimilarity], result of:
            0.024732191 = score(doc=4513,freq=1.0), product of:
              0.07764951 = queryWeight, product of:
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.015236839 = queryNorm
              0.3185106 = fieldWeight in 4513, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.010486366 = weight(abstract_txt:that in 4513) [ClassicSimilarity], result of:
            0.010486366 = score(doc=4513,freq=2.0), product of:
              0.05016615 = queryWeight, product of:
                1.392184 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015236839 = queryNorm
              0.20903271 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.024957428 = weight(abstract_txt:been in 4513) [ClassicSimilarity], result of:
            0.024957428 = score(doc=4513,freq=2.0), product of:
              0.07812024 = queryWeight, product of:
                1.4184936 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.015236839 = queryNorm
              0.31947455 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.041851174 = weight(abstract_txt:document in 4513) [ClassicSimilarity], result of:
            0.041851174 = score(doc=4513,freq=2.0), product of:
              0.11026443 = queryWeight, product of:
                1.6852461 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015236839 = queryNorm
              0.3795528 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.2020288 = weight(abstract_txt:nearest in 4513) [ClassicSimilarity], result of:
            0.2020288 = score(doc=4513,freq=1.0), product of:
              0.3968099 = queryWeight, product of:
                3.1969576 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.015236839 = queryNorm
              0.50913244 = fieldWeight in 4513, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.2197085 = weight(abstract_txt:clusters in 4513) [ClassicSimilarity], result of:
            0.2197085 = score(doc=4513,freq=2.0), product of:
              0.38126358 = queryWeight, product of:
                3.8379908 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.015236839 = queryNorm
              0.5762641 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
        0.24 = coord(6/25)