Document (#7294)

Author
Can, F.
Title
On the efficiency of best-match cluster searches
Source
Information processing and management. 30(1994) no.3, S.343-361
Year
1994
Abstract
The efficiency of various cluster-based retrieval (CBR) strategies is analyzed. The possibility of combining CBR and inverted index search (IIS) is investigated. A method for combining the two approaches is proposed and shown to be cost effective in terms of paging and CPU time. In the new method, the selection of documents from the best-matching clusters is done using the inverted index for all documents. Although this is counterintuitive to the concept of best-match CBR, the observations prove that it is much more efficient than conventional approaches. In the experiments, the effects of the number of selected clusters, page size, centroid length, and matching functions are considered. The experiments show that the storage overhead of the new method would be moderately higher than that of IIS

Similar documents (content)

  1. Kang, I.-S.; Na, S.-H.; Kim, J.; Lee, J.-H.: Cluster-based patent retrieval (2007) 0.23
    0.23260778 = sum of:
      0.23260778 = product of:
        0.7268993 = sum of:
          0.012574722 = weight(abstract_txt:that in 1930) [ClassicSimilarity], result of:
            0.012574722 = score(doc=1930,freq=3.0), product of:
              0.04911776 = queryWeight, product of:
                1.131955 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018348059 = queryNorm
              0.25601172 = fieldWeight in 1930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.04443152 = weight(abstract_txt:documents in 1930) [ClassicSimilarity], result of:
            0.04443152 = score(doc=1930,freq=3.0), product of:
              0.09954129 = queryWeight, product of:
                1.3157274 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018348059 = queryNorm
              0.4463627 = fieldWeight in 1930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.035574347 = weight(abstract_txt:approaches in 1930) [ClassicSimilarity], result of:
            0.035574347 = score(doc=1930,freq=1.0), product of:
              0.12378676 = queryWeight, product of:
                1.4672407 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.018348059 = queryNorm
              0.2873841 = fieldWeight in 1930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.055243883 = weight(abstract_txt:experiments in 1930) [ClassicSimilarity], result of:
            0.055243883 = score(doc=1930,freq=1.0), product of:
              0.16599908 = queryWeight, product of:
                1.6990929 = boost
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.018348059 = queryNorm
              0.3327963 = fieldWeight in 1930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.094525956 = weight(abstract_txt:match in 1930) [ClassicSimilarity], result of:
            0.094525956 = score(doc=1930,freq=1.0), product of:
              0.23747449 = queryWeight, product of:
                2.0322294 = boost
                6.3687487 = idf(docFreq=206, maxDocs=44421)
                0.018348059 = queryNorm
              0.3980468 = fieldWeight in 1930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3687487 = idf(docFreq=206, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.14341131 = weight(abstract_txt:clusters in 1930) [ClassicSimilarity], result of:
            0.14341131 = score(doc=1930,freq=2.0), product of:
              0.24886386 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.018348059 = queryNorm
              0.5762641 = fieldWeight in 1930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.2718309 = weight(abstract_txt:cluster in 1930) [ClassicSimilarity], result of:
            0.2718309 = score(doc=1930,freq=7.0), product of:
              0.25104377 = queryWeight, product of:
                2.0894837 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.018348059 = queryNorm
              1.0828028 = fieldWeight in 1930, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
          0.06930665 = weight(abstract_txt:best in 1930) [ClassicSimilarity], result of:
            0.06930665 = score(doc=1930,freq=1.0), product of:
              0.22103614 = queryWeight, product of:
                2.401273 = boost
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.018348059 = queryNorm
              0.31355348 = fieldWeight in 1930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.0625 = fieldNorm(doc=1930)
        0.32 = coord(8/25)
    
  2. Willett, P.: Best-match text retrieval (1993) 0.17
    0.17232811 = sum of:
      0.17232811 = product of:
        0.8616406 = sum of:
          0.01452004 = weight(abstract_txt:that in 7817) [ClassicSimilarity], result of:
            0.01452004 = score(doc=7817,freq=1.0), product of:
              0.04911776 = queryWeight, product of:
                1.131955 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018348059 = queryNorm
              0.2956169 = fieldWeight in 7817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.125 = fieldNorm(doc=7817)
          0.0513051 = weight(abstract_txt:documents in 7817) [ClassicSimilarity], result of:
            0.0513051 = score(doc=7817,freq=1.0), product of:
              0.09954129 = queryWeight, product of:
                1.3157274 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018348059 = queryNorm
              0.51541525 = fieldWeight in 7817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.125 = fieldNorm(doc=7817)
          0.22828259 = weight(abstract_txt:matching in 7817) [ClassicSimilarity], result of:
            0.22828259 = score(doc=7817,freq=2.0), product of:
              0.21373129 = queryWeight, product of:
                1.9279613 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.018348059 = queryNorm
              1.0680822 = fieldWeight in 7817, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.125 = fieldNorm(doc=7817)
          0.32744753 = weight(abstract_txt:match in 7817) [ClassicSimilarity], result of:
            0.32744753 = score(doc=7817,freq=3.0), product of:
              0.23747449 = queryWeight, product of:
                2.0322294 = boost
                6.3687487 = idf(docFreq=206, maxDocs=44421)
                0.018348059 = queryNorm
              1.3788745 = fieldWeight in 7817, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3687487 = idf(docFreq=206, maxDocs=44421)
                0.125 = fieldNorm(doc=7817)
          0.24008529 = weight(abstract_txt:best in 7817) [ClassicSimilarity], result of:
            0.24008529 = score(doc=7817,freq=3.0), product of:
              0.22103614 = queryWeight, product of:
                2.401273 = boost
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.018348059 = queryNorm
              1.0861812 = fieldWeight in 7817, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.125 = fieldNorm(doc=7817)
        0.2 = coord(5/25)
    
  3. Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.15
    0.15336187 = sum of:
      0.15336187 = product of:
        0.47925586 = sum of:
          0.0063525173 = weight(abstract_txt:that in 1947) [ClassicSimilarity], result of:
            0.0063525173 = score(doc=1947,freq=1.0), product of:
              0.04911776 = queryWeight, product of:
                1.131955 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018348059 = queryNorm
              0.1293324 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.018886978 = weight(abstract_txt:than in 1947) [ClassicSimilarity], result of:
            0.018886978 = score(doc=1947,freq=1.0), product of:
              0.088719524 = queryWeight, product of:
                1.2421495 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.018348059 = queryNorm
              0.21288413 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.03174341 = weight(abstract_txt:documents in 1947) [ClassicSimilarity], result of:
            0.03174341 = score(doc=1947,freq=2.0), product of:
              0.09954129 = queryWeight, product of:
                1.3157274 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018348059 = queryNorm
              0.31889692 = fieldWeight in 1947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.048338395 = weight(abstract_txt:experiments in 1947) [ClassicSimilarity], result of:
            0.048338395 = score(doc=1947,freq=1.0), product of:
              0.16599908 = queryWeight, product of:
                1.6990929 = boost
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.018348059 = queryNorm
              0.29119676 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.088731214 = weight(abstract_txt:clusters in 1947) [ClassicSimilarity], result of:
            0.088731214 = score(doc=1947,freq=1.0), product of:
              0.24886386 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.018348059 = queryNorm
              0.3565452 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.1557107 = weight(abstract_txt:cluster in 1947) [ClassicSimilarity], result of:
            0.1557107 = score(doc=1947,freq=3.0), product of:
              0.25104377 = queryWeight, product of:
                2.0894837 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.018348059 = queryNorm
              0.6202532 = fieldWeight in 1947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.04373006 = weight(abstract_txt:method in 1947) [ClassicSimilarity], result of:
            0.04373006 = score(doc=1947,freq=1.0), product of:
              0.1777439 = queryWeight, product of:
                2.1533134 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.018348059 = queryNorm
              0.24602848 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
          0.085762605 = weight(abstract_txt:best in 1947) [ClassicSimilarity], result of:
            0.085762605 = score(doc=1947,freq=2.0), product of:
              0.22103614 = queryWeight, product of:
                2.401273 = boost
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.018348059 = queryNorm
              0.38800263 = fieldWeight in 1947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1947)
        0.32 = coord(8/25)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.15
    0.14819826 = sum of:
      0.14819826 = product of:
        0.46311957 = sum of:
          0.056312248 = weight(abstract_txt:conventional in 6699) [ClassicSimilarity], result of:
            0.056312248 = score(doc=6699,freq=1.0), product of:
              0.11500096 = queryWeight, product of:
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.018348059 = queryNorm
              0.48966762 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.06404461 = weight(abstract_txt:length in 6699) [ClassicSimilarity], result of:
            0.06404461 = score(doc=6699,freq=1.0), product of:
              0.12530102 = queryWeight, product of:
                1.0438223 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.018348059 = queryNorm
              0.511126 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.012834024 = weight(abstract_txt:that in 6699) [ClassicSimilarity], result of:
            0.012834024 = score(doc=6699,freq=2.0), product of:
              0.04911776 = queryWeight, product of:
                1.131955 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018348059 = queryNorm
              0.2612909 = fieldWeight in 6699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.026981398 = weight(abstract_txt:than in 6699) [ClassicSimilarity], result of:
            0.026981398 = score(doc=6699,freq=1.0), product of:
              0.088719524 = queryWeight, product of:
                1.2421495 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.018348059 = queryNorm
              0.30412018 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.032065686 = weight(abstract_txt:documents in 6699) [ClassicSimilarity], result of:
            0.032065686 = score(doc=6699,freq=1.0), product of:
              0.09954129 = queryWeight, product of:
                1.3157274 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018348059 = queryNorm
              0.32213452 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.062887155 = weight(abstract_txt:approaches in 6699) [ClassicSimilarity], result of:
            0.062887155 = score(doc=6699,freq=2.0), product of:
              0.12378676 = queryWeight, product of:
                1.4672407 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.018348059 = queryNorm
              0.5080281 = fieldWeight in 6699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.09765831 = weight(abstract_txt:experiments in 6699) [ClassicSimilarity], result of:
            0.09765831 = score(doc=6699,freq=2.0), product of:
              0.16599908 = queryWeight, product of:
                1.6990929 = boost
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.018348059 = queryNorm
              0.5883063 = fieldWeight in 6699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
          0.110336125 = weight(abstract_txt:combining in 6699) [ClassicSimilarity], result of:
            0.110336125 = score(doc=6699,freq=1.0), product of:
              0.22687574 = queryWeight, product of:
                1.9863615 = boost
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.018348059 = queryNorm
              0.48632845 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.078125 = fieldNorm(doc=6699)
        0.32 = coord(8/25)
    
  5. He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.15
    0.14530072 = sum of:
      0.14530072 = product of:
        0.60541964 = sum of:
          0.016233899 = weight(abstract_txt:that in 355) [ClassicSimilarity], result of:
            0.016233899 = score(doc=355,freq=5.0), product of:
              0.04911776 = queryWeight, product of:
                1.131955 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018348059 = queryNorm
              0.33050975 = fieldWeight in 355, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
          0.021585118 = weight(abstract_txt:than in 355) [ClassicSimilarity], result of:
            0.021585118 = score(doc=355,freq=1.0), product of:
              0.088719524 = queryWeight, product of:
                1.2421495 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.018348059 = queryNorm
              0.24329615 = fieldWeight in 355, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
          0.06283566 = weight(abstract_txt:documents in 355) [ClassicSimilarity], result of:
            0.06283566 = score(doc=355,freq=6.0), product of:
              0.09954129 = queryWeight, product of:
                1.3157274 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018348059 = queryNorm
              0.6312522 = fieldWeight in 355, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
          0.055243883 = weight(abstract_txt:experiments in 355) [ClassicSimilarity], result of:
            0.055243883 = score(doc=355,freq=1.0), product of:
              0.16599908 = queryWeight, product of:
                1.6990929 = boost
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.018348059 = queryNorm
              0.3327963 = fieldWeight in 355, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
          0.30422133 = weight(abstract_txt:clusters in 355) [ClassicSimilarity], result of:
            0.30422133 = score(doc=355,freq=9.0), product of:
              0.24886386 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.018348059 = queryNorm
              1.2224407 = fieldWeight in 355, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
          0.14529972 = weight(abstract_txt:cluster in 355) [ClassicSimilarity], result of:
            0.14529972 = score(doc=355,freq=2.0), product of:
              0.25104377 = queryWeight, product of:
                2.0894837 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.018348059 = queryNorm
              0.57878244 = fieldWeight in 355, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=355)
        0.24 = coord(6/25)