Document (#20418)

Author
Lee, D.L.
Ren, L.
Title
Document ranking on weight-partitioned signature files
Source
ACM transactions on information systems. 14(1996) no.2, S.109-137
Year
1996
Abstract
Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.43
    0.42743176 = sum of:
      0.42743176 = product of:
        1.526542 = sum of:
          0.027008208 = weight(abstract_txt:analytical in 1303) [ClassicSimilarity], result of:
            0.027008208 = score(doc=1303,freq=1.0), product of:
              0.0662238 = queryWeight, product of:
                1.0781206 = boost
                6.5253177 = idf(docFreq=176, maxDocs=44421)
                0.009413369 = queryNorm
              0.40783235 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5253177 = idf(docFreq=176, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.05159276 = weight(abstract_txt:files in 1303) [ClassicSimilarity], result of:
            0.05159276 = score(doc=1303,freq=2.0), product of:
              0.101955205 = queryWeight, product of:
                1.8918208 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.009413369 = queryNorm
              0.5060336 = fieldWeight in 1303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.08558192 = weight(abstract_txt:false in 1303) [ClassicSimilarity], result of:
            0.08558192 = score(doc=1303,freq=1.0), product of:
              0.18000376 = queryWeight, product of:
                2.5137153 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.009413369 = queryNorm
              0.47544518 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.038485073 = weight(abstract_txt:document in 1303) [ClassicSimilarity], result of:
            0.038485073 = score(doc=1303,freq=1.0), product of:
              0.14339536 = queryWeight, product of:
                3.547422 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.009413369 = queryNorm
              0.26838437 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.15133001 = weight(abstract_txt:file in 1303) [ClassicSimilarity], result of:
            0.15133001 = score(doc=1303,freq=5.0), product of:
              0.1939383 = queryWeight, product of:
                3.6899636 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.009413369 = queryNorm
              0.7802997 = fieldWeight in 1303, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.2170417 = weight(abstract_txt:partitioned in 1303) [ClassicSimilarity], result of:
            0.2170417 = score(doc=1303,freq=1.0), product of:
              0.38319466 = queryWeight, product of:
                4.4919057 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.009413369 = queryNorm
              0.56640065 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.9555023 = weight(abstract_txt:signature in 1303) [ClassicSimilarity], result of:
            0.9555023 = score(doc=1303,freq=7.0), product of:
              0.6779367 = queryWeight, product of:
                8.44949 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009413369 = queryNorm
              1.409427 = fieldWeight in 1303, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
        0.28 = coord(7/25)
    
  2. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.23
    0.22520469 = sum of:
      0.22520469 = product of:
        1.4075294 = sum of:
          0.07898496 = weight(abstract_txt:files in 2029) [ClassicSimilarity], result of:
            0.07898496 = score(doc=2029,freq=3.0), product of:
              0.101955205 = queryWeight, product of:
                1.8918208 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.009413369 = queryNorm
              0.77470255 = fieldWeight in 2029, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.078125 = fieldNorm(doc=2029)
          0.053573325 = weight(abstract_txt:term in 2029) [ClassicSimilarity], result of:
            0.053573325 = score(doc=2029,freq=1.0), product of:
              0.14301974 = queryWeight, product of:
                3.1687522 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.009413369 = queryNorm
              0.37458694 = fieldWeight in 2029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=2029)
          0.16919208 = weight(abstract_txt:file in 2029) [ClassicSimilarity], result of:
            0.16919208 = score(doc=2029,freq=4.0), product of:
              0.1939383 = queryWeight, product of:
                3.6899636 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.009413369 = queryNorm
              0.8724016 = fieldWeight in 2029, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.078125 = fieldNorm(doc=2029)
          1.105779 = weight(abstract_txt:signature in 2029) [ClassicSimilarity], result of:
            1.105779 = score(doc=2029,freq=6.0), product of:
              0.6779367 = queryWeight, product of:
                8.44949 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009413369 = queryNorm
              1.6310949 = fieldWeight in 2029, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=2029)
        0.16 = coord(4/25)
    
  3. Lee, D.L.: Massive parallelism on the hybrid text-retrieval machine (1995) 0.16
    0.16384348 = sum of:
      0.16384348 = product of:
        1.3653624 = sum of:
          0.10151525 = weight(abstract_txt:file in 4143) [ClassicSimilarity], result of:
            0.10151525 = score(doc=4143,freq=1.0), product of:
              0.1939383 = queryWeight, product of:
                3.6899636 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.009413369 = queryNorm
              0.52344096 = fieldWeight in 4143, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.09375 = fieldNorm(doc=4143)
          0.32556254 = weight(abstract_txt:partitioned in 4143) [ClassicSimilarity], result of:
            0.32556254 = score(doc=4143,freq=1.0), product of:
              0.38319466 = queryWeight, product of:
                4.4919057 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.009413369 = queryNorm
              0.849601 = fieldWeight in 4143, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.09375 = fieldNorm(doc=4143)
          0.9382846 = weight(abstract_txt:signature in 4143) [ClassicSimilarity], result of:
            0.9382846 = score(doc=4143,freq=3.0), product of:
              0.6779367 = queryWeight, product of:
                8.44949 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009413369 = queryNorm
              1.3840299 = fieldWeight in 4143, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.09375 = fieldNorm(doc=4143)
        0.12 = coord(3/25)
    
  4. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.15
    0.14643292 = sum of:
      0.14643292 = product of:
        1.2202743 = sum of:
          0.09120398 = weight(abstract_txt:files in 42) [ClassicSimilarity], result of:
            0.09120398 = score(doc=42,freq=4.0), product of:
              0.101955205 = queryWeight, product of:
                1.8918208 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.009413369 = queryNorm
              0.8945495 = fieldWeight in 42, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.078125 = fieldNorm(doc=42)
          0.11963686 = weight(abstract_txt:file in 42) [ClassicSimilarity], result of:
            0.11963686 = score(doc=42,freq=2.0), product of:
              0.1939383 = queryWeight, product of:
                3.6899636 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.009413369 = queryNorm
              0.6168811 = fieldWeight in 42, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.078125 = fieldNorm(doc=42)
          1.0094335 = weight(abstract_txt:signature in 42) [ClassicSimilarity], result of:
            1.0094335 = score(doc=42,freq=5.0), product of:
              0.6779367 = queryWeight, product of:
                8.44949 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009413369 = queryNorm
              1.4889791 = fieldWeight in 42, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=42)
        0.12 = coord(3/25)
    
  5. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.11
    0.11004701 = sum of:
      0.11004701 = product of:
        0.68779385 = sum of:
          0.060247067 = weight(abstract_txt:ranking in 1690) [ClassicSimilarity], result of:
            0.060247067 = score(doc=1690,freq=2.0), product of:
              0.09743183 = queryWeight, product of:
                1.8493781 = boost
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.009413369 = queryNorm
              0.618351 = fieldWeight in 1690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.09279173 = weight(abstract_txt:term in 1690) [ClassicSimilarity], result of:
            0.09279173 = score(doc=1690,freq=3.0), product of:
              0.14301974 = queryWeight, product of:
                3.1687522 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.009413369 = queryNorm
              0.64880365 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.08332262 = weight(abstract_txt:document in 1690) [ClassicSimilarity], result of:
            0.08332262 = score(doc=1690,freq=3.0), product of:
              0.14339536 = queryWeight, product of:
                3.547422 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.009413369 = queryNorm
              0.5810692 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.4514324 = weight(abstract_txt:signature in 1690) [ClassicSimilarity], result of:
            0.4514324 = score(doc=1690,freq=1.0), product of:
              0.6779367 = queryWeight, product of:
                8.44949 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009413369 = queryNorm
              0.6658917 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
        0.16 = coord(4/25)