Document (#33011)

Author
Shieh, W.-Y.
Chung, C.-P.
Title
¬A statistics-based approach to incrementally update inverted files
Source
Information processing and management. 41(2005) no.2, S.275-288
Year
2005
Abstract
Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected.

Similar documents (author)

  1. Chung, T.M.: ¬A corpus comparison approach for terminology extraction (2003) 5.11
    5.1094418 = sum of:
      5.1094418 = weight(author_txt:chung in 5072) [ClassicSimilarity], result of:
        5.1094418 = fieldWeight in 5072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.175107 = idf(docFreq=33, maxDocs=44421)
          0.625 = fieldNorm(doc=5072)
    
  2. Chung, H.H.: User friendly audiovisual material cataloging at Westchester County Public Library System (2001) 5.11
    5.1094418 = sum of:
      5.1094418 = weight(author_txt:chung in 415) [ClassicSimilarity], result of:
        5.1094418 = fieldWeight in 415, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.175107 = idf(docFreq=33, maxDocs=44421)
          0.625 = fieldNorm(doc=415)
    
  3. Chung, Y.-K.: Characteristics of references in international classification systems literature (1995) 4.09
    4.0875535 = sum of:
      4.0875535 = weight(author_txt:chung in 3007) [ClassicSimilarity], result of:
        4.0875535 = fieldWeight in 3007, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.175107 = idf(docFreq=33, maxDocs=44421)
          0.5 = fieldNorm(doc=3007)
    
  4. Chung, Y.-K.: Bradford distribution and core authors in classification systems literature (1994) 4.09
    4.0875535 = sum of:
      4.0875535 = weight(author_txt:chung in 5134) [ClassicSimilarity], result of:
        4.0875535 = fieldWeight in 5134, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.175107 = idf(docFreq=33, maxDocs=44421)
          0.5 = fieldNorm(doc=5134)
    
  5. Chung, Y.-K.: Core international journals of classification systems : an application of Bradford's law (1994) 4.09
    4.0875535 = sum of:
      4.0875535 = weight(author_txt:chung in 5138) [ClassicSimilarity], result of:
        4.0875535 = fieldWeight in 5138, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.175107 = idf(docFreq=33, maxDocs=44421)
          0.5 = fieldNorm(doc=5138)
    

Similar documents (content)

  1. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.29
    0.28837794 = sum of:
      0.28837794 = product of:
        1.0299212 = sum of:
          0.005970008 = weight(abstract_txt:this in 1819) [ClassicSimilarity], result of:
            0.005970008 = score(doc=1819,freq=2.0), product of:
              0.028069967 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.011665515 = queryNorm
              0.21268311 = fieldWeight in 1819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.042811353 = weight(abstract_txt:updates in 1819) [ClassicSimilarity], result of:
            0.042811353 = score(doc=1819,freq=1.0), product of:
              0.09118726 = queryWeight, product of:
                1.040604 = boost
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.011665515 = queryNorm
              0.4694883 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.05101353 = weight(abstract_txt:incremental in 1819) [ClassicSimilarity], result of:
            0.05101353 = score(doc=1819,freq=1.0), product of:
              0.10249086 = queryWeight, product of:
                1.1032171 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.011665515 = queryNorm
              0.49773738 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.1321012 = weight(abstract_txt:update in 1819) [ClassicSimilarity], result of:
            0.1321012 = score(doc=1819,freq=4.0), product of:
              0.15339907 = queryWeight, product of:
                1.9087312 = boost
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.011665515 = queryNorm
              0.8611604 = fieldWeight in 1819, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.021154264 = weight(abstract_txt:approach in 1819) [ClassicSimilarity], result of:
            0.021154264 = score(doc=1819,freq=1.0), product of:
              0.09047186 = queryWeight, product of:
                2.073028 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011665515 = queryNorm
              0.2338215 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.105478816 = weight(abstract_txt:file in 1819) [ClassicSimilarity], result of:
            0.105478816 = score(doc=1819,freq=1.0), product of:
              0.30226564 = queryWeight, product of:
                4.640753 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.011665515 = queryNorm
              0.34896064 = fieldWeight in 1819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
          0.67139196 = weight(abstract_txt:inverted in 1819) [ClassicSimilarity], result of:
            0.67139196 = score(doc=1819,freq=6.0), product of:
              0.57131934 = queryWeight, product of:
                6.3801885 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.011665515 = queryNorm
              1.1751605 = fieldWeight in 1819, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=1819)
        0.28 = coord(7/25)
    
  2. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.25
    0.24541737 = sum of:
      0.24541737 = product of:
        0.8764906 = sum of:
          0.017502982 = weight(abstract_txt:most in 776) [ClassicSimilarity], result of:
            0.017502982 = score(doc=776,freq=2.0), product of:
              0.05023074 = queryWeight, product of:
                1.0922403 = boost
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.011665515 = queryNorm
              0.3484516 = fieldWeight in 776, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.03326175 = weight(abstract_txt:indexing in 776) [ClassicSimilarity], result of:
            0.03326175 = score(doc=776,freq=4.0), product of:
              0.06116668 = queryWeight, product of:
                1.2052882 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.011665515 = queryNorm
              0.5437887 = fieldWeight in 776, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.08173782 = weight(abstract_txt:speed in 776) [ClassicSimilarity], result of:
            0.08173782 = score(doc=776,freq=2.0), product of:
              0.14033853 = queryWeight, product of:
                1.8256683 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011665515 = queryNorm
              0.5824332 = fieldWeight in 776, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.021154264 = weight(abstract_txt:approach in 776) [ClassicSimilarity], result of:
            0.021154264 = score(doc=776,freq=1.0), product of:
              0.09047186 = queryWeight, product of:
                2.073028 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011665515 = queryNorm
              0.2338215 = fieldWeight in 776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.105478816 = weight(abstract_txt:file in 776) [ClassicSimilarity], result of:
            0.105478816 = score(doc=776,freq=1.0), product of:
              0.30226564 = queryWeight, product of:
                4.640753 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.011665515 = queryNorm
              0.34896064 = fieldWeight in 776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.47474575 = weight(abstract_txt:inverted in 776) [ClassicSimilarity], result of:
            0.47474575 = score(doc=776,freq=3.0), product of:
              0.57131934 = queryWeight, product of:
                6.3801885 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.011665515 = queryNorm
              0.8309639 = fieldWeight in 776, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
          0.14260924 = weight(abstract_txt:space in 776) [ClassicSimilarity], result of:
            0.14260924 = score(doc=776,freq=1.0), product of:
              0.42306536 = queryWeight, product of:
                6.724243 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.011665515 = queryNorm
              0.33708557 = fieldWeight in 776, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=776)
        0.28 = coord(7/25)
    
  3. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.16
    0.15958975 = sum of:
      0.15958975 = product of:
        0.66495734 = sum of:
          0.0042214333 = weight(abstract_txt:this in 5295) [ClassicSimilarity], result of:
            0.0042214333 = score(doc=5295,freq=1.0), product of:
              0.028069967 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.011665515 = queryNorm
              0.15038967 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.012376478 = weight(abstract_txt:most in 5295) [ClassicSimilarity], result of:
            0.012376478 = score(doc=5295,freq=1.0), product of:
              0.05023074 = queryWeight, product of:
                1.0922403 = boost
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.011665515 = queryNorm
              0.2463925 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.06552676 = weight(abstract_txt:offs in 5295) [ClassicSimilarity], result of:
            0.06552676 = score(doc=5295,freq=1.0), product of:
              0.12110832 = queryWeight, product of:
                1.1992381 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.011665515 = queryNorm
              0.5410591 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.023519607 = weight(abstract_txt:indexing in 5295) [ClassicSimilarity], result of:
            0.023519607 = score(doc=5295,freq=2.0), product of:
              0.06116668 = queryWeight, product of:
                1.2052882 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.011665515 = queryNorm
              0.38451666 = fieldWeight in 5295, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.27409458 = weight(abstract_txt:inverted in 5295) [ClassicSimilarity], result of:
            0.27409458 = score(doc=5295,freq=1.0), product of:
              0.57131934 = queryWeight, product of:
                6.3801885 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.011665515 = queryNorm
              0.47975725 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.28521848 = weight(abstract_txt:space in 5295) [ClassicSimilarity], result of:
            0.28521848 = score(doc=5295,freq=4.0), product of:
              0.42306536 = queryWeight, product of:
                6.724243 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.011665515 = queryNorm
              0.67417115 = fieldWeight in 5295, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
        0.24 = coord(6/25)
    
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.14
    0.14228357 = sum of:
      0.14228357 = product of:
        0.88927233 = sum of:
          0.020788593 = weight(abstract_txt:indexing in 2716) [ClassicSimilarity], result of:
            0.020788593 = score(doc=2716,freq=1.0), product of:
              0.06116668 = queryWeight, product of:
                1.2052882 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.011665515 = queryNorm
              0.33986792 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.13184851 = weight(abstract_txt:file in 2716) [ClassicSimilarity], result of:
            0.13184851 = score(doc=2716,freq=1.0), product of:
              0.30226564 = queryWeight, product of:
                4.640753 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.011665515 = queryNorm
              0.4362008 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.48453537 = weight(abstract_txt:inverted in 2716) [ClassicSimilarity], result of:
            0.48453537 = score(doc=2716,freq=2.0), product of:
              0.57131934 = queryWeight, product of:
                6.3801885 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.011665515 = queryNorm
              0.848099 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.25209987 = weight(abstract_txt:space in 2716) [ClassicSimilarity], result of:
            0.25209987 = score(doc=2716,freq=2.0), product of:
              0.42306536 = queryWeight, product of:
                6.724243 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.011665515 = queryNorm
              0.59588873 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
        0.16 = coord(4/25)
    
  5. Mazur, Z.: Inverted file organization in the information retrieval system based on thesaurus with weights (1979) 0.13
    0.13190871 = sum of:
      0.13190871 = product of:
        1.0992393 = sum of:
          0.007387508 = weight(abstract_txt:this in 5493) [ClassicSimilarity], result of:
            0.007387508 = score(doc=5493,freq=1.0), product of:
              0.028069967 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.011665515 = queryNorm
              0.26318192 = fieldWeight in 5493, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.109375 = fieldNorm(doc=5493)
          0.26104674 = weight(abstract_txt:file in 5493) [ClassicSimilarity], result of:
            0.26104674 = score(doc=5493,freq=2.0), product of:
              0.30226564 = queryWeight, product of:
                4.640753 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.011665515 = queryNorm
              0.8636335 = fieldWeight in 5493, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.109375 = fieldNorm(doc=5493)
          0.83080506 = weight(abstract_txt:inverted in 5493) [ClassicSimilarity], result of:
            0.83080506 = score(doc=5493,freq=3.0), product of:
              0.57131934 = queryWeight, product of:
                6.3801885 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.011665515 = queryNorm
              1.4541868 = fieldWeight in 5493, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.109375 = fieldNorm(doc=5493)
        0.12 = coord(3/25)