Document (#23706)

Author
Cannane, A.
Williams, H.E.
Title
General-purpose compression for efficient retrieval
Source
Journal of the American Society for Information Science and technology. 52(2001) no.5, S.430-437
Year
2001
Abstract
Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications -such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach -which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, RAY achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Williams, R.M.: ISI search network research front specialties (1983) 4.51
    4.5080194 = sum of:
      4.5080194 = weight(author_txt:williams in 1473) [ClassicSimilarity], result of:
        4.5080194 = fieldWeight in 1473, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.212831 = idf(docFreq=88, maxDocs=44421)
          0.625 = fieldNorm(doc=1473)
    
  2. Williams, J.W.: Serials cataloging, 1985-1990 : an overview of a half-decade (1992) 4.51
    4.5080194 = sum of:
      4.5080194 = weight(author_txt:williams in 4206) [ClassicSimilarity], result of:
        4.5080194 = fieldWeight in 4206, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.212831 = idf(docFreq=88, maxDocs=44421)
          0.625 = fieldNorm(doc=4206)
    
  3. Williams, D.A.: Information skills in the school curriculum (1991) 4.51
    4.5080194 = sum of:
      4.5080194 = weight(author_txt:williams in 4834) [ClassicSimilarity], result of:
        4.5080194 = fieldWeight in 4834, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.212831 = idf(docFreq=88, maxDocs=44421)
          0.625 = fieldNorm(doc=4834)
    
  4. Williams, M.: Transparent information systems through gateways, front ends, intermediaries, and interfaces (1986) 4.51
    4.5080194 = sum of:
      4.5080194 = weight(author_txt:williams in 5134) [ClassicSimilarity], result of:
        4.5080194 = fieldWeight in 5134, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.212831 = idf(docFreq=88, maxDocs=44421)
          0.625 = fieldNorm(doc=5134)
    
  5. Williams, F.: Appraisal and evaluation of software products (1992) 4.51
    4.5080194 = sum of:
      4.5080194 = weight(author_txt:williams in 5306) [ClassicSimilarity], result of:
        4.5080194 = fieldWeight in 5306, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.212831 = idf(docFreq=88, maxDocs=44421)
          0.625 = fieldNorm(doc=5306)
    

Similar documents (content)

  1. Bell, T.C.; Moffat, A.; Nevill-Manning, C.G.; Witten, I.H.; Zobel, J.: Data compression in full-text retrieval system (1993) 0.30
    0.30110154 = sum of:
      0.30110154 = product of:
        1.2545898 = sum of:
          0.037808154 = weight(abstract_txt:stored in 5642) [ClassicSimilarity], result of:
            0.037808154 = score(doc=5642,freq=1.0), product of:
              0.063418545 = queryWeight, product of:
                1.0768318 = boost
                6.3591332 = idf(docFreq=208, maxDocs=44421)
                0.009261269 = queryNorm
              0.59616876 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3591332 = idf(docFreq=208, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.012355162 = weight(abstract_txt:retrieval in 5642) [ClassicSimilarity], result of:
            0.012355162 = score(doc=5642,freq=1.0), product of:
              0.037908353 = queryWeight, product of:
                1.177395 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009261269 = queryNorm
              0.3259219 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.008250604 = weight(abstract_txt:that in 5642) [ClassicSimilarity], result of:
            0.008250604 = score(doc=5642,freq=2.0), product of:
              0.026313597 = queryWeight, product of:
                1.2014079 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.009261269 = queryNorm
              0.31354907 = fieldWeight in 5642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.025065377 = weight(abstract_txt:show in 5642) [ClassicSimilarity], result of:
            0.025065377 = score(doc=5642,freq=1.0), product of:
              0.06075082 = queryWeight, product of:
                1.4904959 = boost
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.009261269 = queryNorm
              0.41259325 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.071521446 = weight(abstract_txt:databases in 5642) [ClassicSimilarity], result of:
            0.071521446 = score(doc=5642,freq=2.0), product of:
              0.12221565 = queryWeight, product of:
                2.989738 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.009261269 = queryNorm
              0.5852069 = fieldWeight in 5642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          1.0995891 = weight(abstract_txt:compression in 5642) [ClassicSimilarity], result of:
            1.0995891 = score(doc=5642,freq=3.0), product of:
              0.8959268 = queryWeight, product of:
                12.798998 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.009261269 = queryNorm
              1.2273203 = fieldWeight in 5642, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
        0.24 = coord(6/25)
    
  2. Cheng, K.-S.; Young, G.H.; Wong, K.-F.: ¬A study on word-based and integral-bit Chinese text compression algorithms (1999) 0.23
    0.23070945 = sum of:
      0.23070945 = product of:
        1.4419341 = sum of:
          0.009625704 = weight(abstract_txt:that in 4056) [ClassicSimilarity], result of:
            0.009625704 = score(doc=4056,freq=2.0), product of:
              0.026313597 = queryWeight, product of:
                1.2014079 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.009261269 = queryNorm
              0.36580724 = fieldWeight in 4056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.029242942 = weight(abstract_txt:show in 4056) [ClassicSimilarity], result of:
            0.029242942 = score(doc=4056,freq=1.0), product of:
              0.06075082 = queryWeight, product of:
                1.4904959 = boost
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.009261269 = queryNorm
              0.4813588 = fieldWeight in 4056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.120211564 = weight(abstract_txt:scheme in 4056) [ClassicSimilarity], result of:
            0.120211564 = score(doc=4056,freq=2.0), product of:
              0.14164113 = queryWeight, product of:
                2.7873728 = boost
                5.4868593 = idf(docFreq=499, maxDocs=44421)
                0.009261269 = queryNorm
              0.84870523 = fieldWeight in 4056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4868593 = idf(docFreq=499, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          1.282854 = weight(abstract_txt:compression in 4056) [ClassicSimilarity], result of:
            1.282854 = score(doc=4056,freq=3.0), product of:
              0.8959268 = queryWeight, product of:
                12.798998 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.009261269 = queryNorm
              1.4318737 = fieldWeight in 4056, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
        0.16 = coord(4/25)
    
  3. Moffat, A.; Isal, R.Y.K.: Word-based text compression using the Burrows-Wheeler transform (2005) 0.20
    0.19953068 = sum of:
      0.19953068 = product of:
        0.99765337 = sum of:
          0.02643391 = weight(abstract_txt:modeling in 2044) [ClassicSimilarity], result of:
            0.02643391 = score(doc=2044,freq=1.0), product of:
              0.05641411 = queryWeight, product of:
                1.0156255 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.009261269 = queryNorm
              0.46856913 = fieldWeight in 2044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.078125 = fieldNorm(doc=2044)
          0.02914562 = weight(abstract_txt:costs in 2044) [ClassicSimilarity], result of:
            0.02914562 = score(doc=2044,freq=1.0), product of:
              0.060209125 = queryWeight, product of:
                1.0492305 = boost
                6.196136 = idf(docFreq=245, maxDocs=44421)
                0.009261269 = queryNorm
              0.48407313 = fieldWeight in 2044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.196136 = idf(docFreq=245, maxDocs=44421)
                0.078125 = fieldNorm(doc=2044)
          0.004861715 = weight(abstract_txt:that in 2044) [ClassicSimilarity], result of:
            0.004861715 = score(doc=2044,freq=1.0), product of:
              0.026313597 = queryWeight, product of:
                1.2014079 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.009261269 = queryNorm
              0.18476056 = fieldWeight in 2044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=2044)
          0.020887816 = weight(abstract_txt:show in 2044) [ClassicSimilarity], result of:
            0.020887816 = score(doc=2044,freq=1.0), product of:
              0.06075082 = queryWeight, product of:
                1.4904959 = boost
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.009261269 = queryNorm
              0.34382772 = fieldWeight in 2044, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.078125 = fieldNorm(doc=2044)
          0.9163243 = weight(abstract_txt:compression in 2044) [ClassicSimilarity], result of:
            0.9163243 = score(doc=2044,freq=3.0), product of:
              0.8959268 = queryWeight, product of:
                12.798998 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.009261269 = queryNorm
              1.022767 = fieldWeight in 2044, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=2044)
        0.2 = coord(5/25)
    
  4. Adiego, J.; Navarro, G.; Fuente, P. de la: Lempel-Ziv compression of highly structured documents (2007) 0.18
    0.17629924 = sum of:
      0.17629924 = product of:
        0.8814962 = sum of:
          0.038614575 = weight(abstract_txt:random in 5993) [ClassicSimilarity], result of:
            0.038614575 = score(doc=5993,freq=2.0), product of:
              0.066892534 = queryWeight, product of:
                1.1059324 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.009261269 = queryNorm
              0.5772628 = fieldWeight in 5993, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0625 = fieldNorm(doc=5993)
          0.046149004 = weight(abstract_txt:adaptive in 5993) [ClassicSimilarity], result of:
            0.046149004 = score(doc=5993,freq=2.0), product of:
              0.07533296 = queryWeight, product of:
                1.1736329 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.009261269 = queryNorm
              0.61260045 = fieldWeight in 5993, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=5993)
          0.0067365896 = weight(abstract_txt:that in 5993) [ClassicSimilarity], result of:
            0.0067365896 = score(doc=5993,freq=3.0), product of:
              0.026313597 = queryWeight, product of:
                1.2014079 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.009261269 = queryNorm
              0.25601172 = fieldWeight in 5993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=5993)
          0.05693661 = weight(abstract_txt:efficient in 5993) [ClassicSimilarity], result of:
            0.05693661 = score(doc=5993,freq=1.0), product of:
              0.15746655 = queryWeight, product of:
                2.9389658 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.009261269 = queryNorm
              0.3615791 = fieldWeight in 5993, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.0625 = fieldNorm(doc=5993)
          0.7330594 = weight(abstract_txt:compression in 5993) [ClassicSimilarity], result of:
            0.7330594 = score(doc=5993,freq=3.0), product of:
              0.8959268 = queryWeight, product of:
                12.798998 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.009261269 = queryNorm
              0.8182135 = fieldWeight in 5993, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=5993)
        0.2 = coord(5/25)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.1644406 = sum of:
      0.1644406 = product of:
        0.58728784 = sum of:
          0.023316495 = weight(abstract_txt:costs in 1009) [ClassicSimilarity], result of:
            0.023316495 = score(doc=1009,freq=1.0), product of:
              0.060209125 = queryWeight, product of:
                1.0492305 = boost
                6.196136 = idf(docFreq=245, maxDocs=44421)
                0.009261269 = queryNorm
              0.3872585 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.196136 = idf(docFreq=245, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.024338422 = weight(abstract_txt:reduce in 1009) [ClassicSimilarity], result of:
            0.024338422 = score(doc=1009,freq=1.0), product of:
              0.06195577 = queryWeight, product of:
                1.0643405 = boost
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.009261269 = queryNorm
              0.39283544 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.0142665105 = weight(abstract_txt:retrieval in 1009) [ClassicSimilarity], result of:
            0.0142665105 = score(doc=1009,freq=3.0), product of:
              0.037908353 = queryWeight, product of:
                1.177395 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009261269 = queryNorm
              0.37634215 = fieldWeight in 1009, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.0038893719 = weight(abstract_txt:that in 1009) [ClassicSimilarity], result of:
            0.0038893719 = score(doc=1009,freq=1.0), product of:
              0.026313597 = queryWeight, product of:
                1.2014079 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.009261269 = queryNorm
              0.14780845 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.064529486 = weight(abstract_txt:similarly in 1009) [ClassicSimilarity], result of:
            0.064529486 = score(doc=1009,freq=2.0), product of:
              0.094199575 = queryWeight, product of:
                1.3123939 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.009261269 = queryNorm
              0.6850295 = fieldWeight in 1009, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.03371553 = weight(abstract_txt:databases in 1009) [ClassicSimilarity], result of:
            0.03371553 = score(doc=1009,freq=1.0), product of:
              0.12221565 = queryWeight, product of:
                2.989738 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.009261269 = queryNorm
              0.2758692 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
          0.42323205 = weight(abstract_txt:compression in 1009) [ClassicSimilarity], result of:
            0.42323205 = score(doc=1009,freq=1.0), product of:
              0.8959268 = queryWeight, product of:
                12.798998 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.009261269 = queryNorm
              0.4723958 = fieldWeight in 1009, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=1009)
        0.28 = coord(7/25)