Document (#15827)

Author
Persin, M.
Zobel, J.
Sacks-Davis, R.
Title
Filtered document retrieval with frequency-sorted indexes
Source
Journal of the American Society for Information SCience. 47(1996) no.10, S.749-764
Year
1996
Abstract
Proposes an evaluation technique for ranking that uses early recognition of which documents are likely to be highly ranked to reduce costs. Queries are evaluated in 2% of the memory of standard implementation without degradation in retrieval effectiveness. CPU time and disc traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. Inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces CPU time and disk traffic to around 1/3rd of the original requirement. Frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed

Similar documents (author)

  1. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 1.95
    1.9547786 = sum of:
      1.9547786 = product of:
        3.909557 = sum of:
          3.909557 = weight(author_txt:zobel in 6764) [ClassicSimilarity], result of:
            3.909557 = score(doc=6764,freq=1.0), product of:
              0.83191955 = queryWeight, product of:
                1.2244321 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.07228869 = queryNorm
              4.6994414 = fieldWeight in 6764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.5 = fieldNorm(doc=6764)
        0.5 = coord(1/2)
    
  2. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 1.95
    1.9547786 = sum of:
      1.9547786 = product of:
        3.909557 = sum of:
          3.909557 = weight(author_txt:zobel in 2678) [ClassicSimilarity], result of:
            3.909557 = score(doc=2678,freq=1.0), product of:
              0.83191955 = queryWeight, product of:
                1.2244321 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.07228869 = queryNorm
              4.6994414 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.5 = fieldNorm(doc=2678)
        0.5 = coord(1/2)
    
  3. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 1.95
    1.9547786 = sum of:
      1.9547786 = product of:
        3.909557 = sum of:
          3.909557 = weight(author_txt:zobel in 4055) [ClassicSimilarity], result of:
            3.909557 = score(doc=4055,freq=1.0), product of:
              0.83191955 = queryWeight, product of:
                1.2244321 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.07228869 = queryNorm
              4.6994414 = fieldWeight in 4055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.5 = fieldNorm(doc=4055)
        0.5 = coord(1/2)
    
  4. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 1.95
    1.9547786 = sum of:
      1.9547786 = product of:
        3.909557 = sum of:
          3.909557 = weight(author_txt:zobel in 159) [ClassicSimilarity], result of:
            3.909557 = score(doc=159,freq=1.0), product of:
              0.83191955 = queryWeight, product of:
                1.2244321 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.07228869 = queryNorm
              4.6994414 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.5 = fieldNorm(doc=159)
        0.5 = coord(1/2)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 1.95
    1.9547786 = sum of:
      1.9547786 = product of:
        3.909557 = sum of:
          3.909557 = weight(author_txt:zobel in 1009) [ClassicSimilarity], result of:
            3.909557 = score(doc=1009,freq=1.0), product of:
              0.83191955 = queryWeight, product of:
                1.2244321 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.07228869 = queryNorm
              4.6994414 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.5 = fieldNorm(doc=1009)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.18
    0.18158281 = sum of:
      0.18158281 = product of:
        0.7565951 = sum of:
          0.116805956 = weight(abstract_txt:disc in 2716) [ClassicSimilarity], result of:
            0.116805956 = score(doc=2716,freq=2.0), product of:
              0.14540745 = queryWeight, product of:
                1.0503163 = boost
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.019041155 = queryNorm
              0.80330104 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.15297516 = weight(abstract_txt:compressed in 2716) [ClassicSimilarity], result of:
            0.15297516 = score(doc=2716,freq=1.0), product of:
              0.21929763 = queryWeight, product of:
                1.2898635 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.019041155 = queryNorm
              0.69756866 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.079766035 = weight(abstract_txt:index in 2716) [ClassicSimilarity], result of:
            0.079766035 = score(doc=2716,freq=3.0), product of:
              0.124108516 = queryWeight, product of:
                1.3722794 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019041155 = queryNorm
              0.642712 = fieldWeight in 2716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.08108365 = weight(abstract_txt:indexes in 2716) [ClassicSimilarity], result of:
            0.08108365 = score(doc=2716,freq=1.0), product of:
              0.18096122 = queryWeight, product of:
                1.6570458 = boost
                5.735321 = idf(docFreq=389, maxDocs=44421)
                0.019041155 = queryNorm
              0.44807196 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.735321 = idf(docFreq=389, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.051048495 = weight(abstract_txt:document in 2716) [ClassicSimilarity], result of:
            0.051048495 = score(doc=2716,freq=1.0), product of:
              0.15216534 = queryWeight, product of:
                1.8609953 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.019041155 = queryNorm
              0.33548045 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.27491578 = weight(abstract_txt:inverted in 2716) [ClassicSimilarity], result of:
            0.27491578 = score(doc=2716,freq=2.0), product of:
              0.3241553 = queryWeight, product of:
                2.2177794 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.019041155 = queryNorm
              0.848099 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
        0.24 = coord(6/25)
    
  2. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.16392633 = sum of:
      0.16392633 = product of:
        0.68302643 = sum of:
          0.025022887 = weight(abstract_txt:retrieval in 1009) [ClassicSimilarity], result of:
            0.025022887 = score(doc=1009,freq=3.0), product of:
              0.06648973 = queryWeight, product of:
                1.0044285 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.019041155 = queryNorm
              0.37634215 = fieldWeight in 1009, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.0821578 = weight(abstract_txt:reduced in 1009) [ClassicSimilarity], result of:
            0.0821578 = score(doc=1009,freq=2.0), product of:
              0.13344917 = queryWeight, product of:
                1.0062009 = boost
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.019041155 = queryNorm
              0.6156486 = fieldWeight in 1009, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.054900073 = weight(abstract_txt:time in 1009) [ClassicSimilarity], result of:
            0.054900073 = score(doc=1009,freq=5.0), product of:
              0.09468807 = queryWeight, product of:
                1.1986418 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019041155 = queryNorm
              0.57979923 = fieldWeight in 1009, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.17307162 = weight(abstract_txt:compressed in 1009) [ClassicSimilarity], result of:
            0.17307162 = score(doc=1009,freq=2.0), product of:
              0.21929763 = queryWeight, product of:
                1.2898635 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.019041155 = queryNorm
              0.7892088 = fieldWeight in 1009, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.036842354 = weight(abstract_txt:index in 1009) [ClassicSimilarity], result of:
            0.036842354 = score(doc=1009,freq=1.0), product of:
              0.124108516 = queryWeight, product of:
                1.3722794 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019041155 = queryNorm
              0.29685596 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.3110317 = weight(abstract_txt:inverted in 1009) [ClassicSimilarity], result of:
            0.3110317 = score(doc=1009,freq=4.0), product of:
              0.3241553 = queryWeight, product of:
                2.2177794 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.019041155 = queryNorm
              0.9595145 = fieldWeight in 1009, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
        0.24 = coord(6/25)
    
  3. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.15
    0.15188046 = sum of:
      0.15188046 = product of:
        0.63283527 = sum of:
          0.020431101 = weight(abstract_txt:retrieval in 1819) [ClassicSimilarity], result of:
            0.020431101 = score(doc=1819,freq=2.0), product of:
              0.06648973 = queryWeight, product of:
                1.0044285 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.019041155 = queryNorm
              0.3072821 = fieldWeight in 1819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.07200559 = weight(abstract_txt:requirement in 1819) [ClassicSimilarity], result of:
            0.07200559 = score(doc=1819,freq=1.0), product of:
              0.15398231 = queryWeight, product of:
                1.0808419 = boost
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.019041155 = queryNorm
              0.4676225 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.036842354 = weight(abstract_txt:index in 1819) [ClassicSimilarity], result of:
            0.036842354 = score(doc=1819,freq=1.0), product of:
              0.124108516 = queryWeight, product of:
                1.3722794 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019041155 = queryNorm
              0.29685596 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.064866915 = weight(abstract_txt:indexes in 1819) [ClassicSimilarity], result of:
            0.064866915 = score(doc=1819,freq=1.0), product of:
              0.18096122 = queryWeight, product of:
                1.6570458 = boost
                5.735321 = idf(docFreq=389, maxDocs=44421)
                0.019041155 = queryNorm
              0.35845757 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.735321 = idf(docFreq=389, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.05775478 = weight(abstract_txt:document in 1819) [ClassicSimilarity], result of:
            0.05775478 = score(doc=1819,freq=2.0), product of:
              0.15216534 = queryWeight, product of:
                1.8609953 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.019041155 = queryNorm
              0.3795528 = fieldWeight in 1819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.3809345 = weight(abstract_txt:inverted in 1819) [ClassicSimilarity], result of:
            0.3809345 = score(doc=1819,freq=6.0), product of:
              0.3241553 = queryWeight, product of:
                2.2177794 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.019041155 = queryNorm
              1.1751605 = fieldWeight in 1819, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
        0.24 = coord(6/25)
    
  4. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.12
    0.12483794 = sum of:
      0.12483794 = product of:
        0.5201581 = sum of:
          0.058094334 = weight(abstract_txt:reduced in 226) [ClassicSimilarity], result of:
            0.058094334 = score(doc=226,freq=1.0), product of:
              0.13344917 = queryWeight, product of:
                1.0062009 = boost
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.019041155 = queryNorm
              0.43532932 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.08125303 = weight(abstract_txt:decreasing in 226) [ClassicSimilarity], result of:
            0.08125303 = score(doc=226,freq=1.0), product of:
              0.16689874 = queryWeight, product of:
                1.1252611 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.019041155 = queryNorm
              0.48684028 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.024552058 = weight(abstract_txt:time in 226) [ClassicSimilarity], result of:
            0.024552058 = score(doc=226,freq=1.0), product of:
              0.09468807 = queryWeight, product of:
                1.1986418 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019041155 = queryNorm
              0.2592941 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.05775478 = weight(abstract_txt:document in 226) [ClassicSimilarity], result of:
            0.05775478 = score(doc=226,freq=2.0), product of:
              0.15216534 = queryWeight, product of:
                1.8609953 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.019041155 = queryNorm
              0.3795528 = fieldWeight in 226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.18992382 = weight(abstract_txt:sorted in 226) [ClassicSimilarity], result of:
            0.18992382 = score(doc=226,freq=1.0), product of:
              0.37035906 = queryWeight, product of:
                2.3705726 = boost
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.019041155 = queryNorm
              0.51281 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.108580105 = weight(abstract_txt:frequency in 226) [ClassicSimilarity], result of:
            0.108580105 = score(doc=226,freq=1.0), product of:
              0.29203436 = queryWeight, product of:
                2.5781274 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.019041155 = queryNorm
              0.37180594 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
        0.24 = coord(6/25)
    
  5. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.12
    0.12294853 = sum of:
      0.12294853 = product of:
        0.61474264 = sum of:
          0.018058714 = weight(abstract_txt:retrieval in 2010) [ClassicSimilarity], result of:
            0.018058714 = score(doc=2010,freq=1.0), product of:
              0.06648973 = queryWeight, product of:
                1.0044285 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.019041155 = queryNorm
              0.27160156 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.080265254 = weight(abstract_txt:reduction in 2010) [ClassicSimilarity], result of:
            0.080265254 = score(doc=2010,freq=1.0), product of:
              0.14266093 = queryWeight, product of:
                1.0403496 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.019041155 = queryNorm
              0.5626295 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.03069007 = weight(abstract_txt:time in 2010) [ClassicSimilarity], result of:
            0.03069007 = score(doc=2010,freq=1.0), product of:
              0.09468807 = queryWeight, product of:
                1.1986418 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019041155 = queryNorm
              0.3241176 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.051048495 = weight(abstract_txt:document in 2010) [ClassicSimilarity], result of:
            0.051048495 = score(doc=2010,freq=1.0), product of:
              0.15216534 = queryWeight, product of:
                1.8609953 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.019041155 = queryNorm
              0.33548045 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.43468007 = weight(abstract_txt:inverted in 2010) [ClassicSimilarity], result of:
            0.43468007 = score(doc=2010,freq=5.0), product of:
              0.3241553 = queryWeight, product of:
                2.2177794 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.019041155 = queryNorm
              1.3409624 = fieldWeight in 2010, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
        0.2 = coord(5/25)