Document (#5306)

Author
Gillman, P.
Title
Data handling and text compression
Source
Journal of information science. 18(1992), S.105-110
Year
1992
Abstract
Data compression has a function in text storage and data handling, but not at the level of compressing data files. The reason is that the decompression of such files add a time delay to the retrieval process, and users can see this delay as a drawback of the system concerned. Compression techniques can with benefit be applied to index files. A more relevant data handling problem is that posed by the need, in most systems, to store two versions of imported text. The first id the 'native' version, as it might have come from a word processor or text editor. The second is the ASCII version which is what is actually imported. Inverted file indexes form yet another version. The problem arises out of the need for dynamic indexing and re-indexing of revisable documents in very large database applications such as are found in Office Automation systems. Four mainstream text-management packages are used to show how this problem is handled, and how generic document architectures such as OCA/CDA and SGML might help

Similar documents (author)

  1. Gillman, P.: Text retrieval : key points (1992) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:gillman in 4449) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 4449, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=4449)
    
  2. Gillman, P.: Transferring text (1993) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:gillman in 6245) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 6245, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=6245)
    
  3. Gillman, P.: Intelligent OCR (1993) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:gillman in 7048) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 7048, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=7048)
    
  4. Gillman, P.: Assessing customer requirements (1994) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:gillman in 1465) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 1465, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=1465)
    
  5. Gillman, P.: ConQuest: retrieval on a large scale (1995) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:gillman in 3429) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 3429, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=3429)
    

Similar documents (content)

  1. Perez, E.: Zyindex: quick and not-to-dirty text databases (1992) 0.17
    0.168415 = sum of:
      0.168415 = product of:
        0.84207493 = sum of:
          0.15484595 = weight(abstract_txt:ascii in 4257) [ClassicSimilarity], result of:
            0.15484595 = score(doc=4257,freq=1.0), product of:
              0.15361927 = queryWeight, product of:
                1.1144794 = boost
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.017093442 = queryNorm
              1.0079852 = fieldWeight in 4257, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.125 = fieldNorm(doc=4257)
          0.18851295 = weight(abstract_txt:processor in 4257) [ClassicSimilarity], result of:
            0.18851295 = score(doc=4257,freq=1.0), product of:
              0.17514856 = queryWeight, product of:
                1.1900151 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.017093442 = queryNorm
              1.0763031 = fieldWeight in 4257, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.125 = fieldNorm(doc=4257)
          0.06876586 = weight(abstract_txt:indexing in 4257) [ClassicSimilarity], result of:
            0.06876586 = score(doc=4257,freq=2.0), product of:
              0.08941857 = queryWeight, product of:
                1.2024804 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.017093442 = queryNorm
              0.7690333 = fieldWeight in 4257, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.125 = fieldNorm(doc=4257)
          0.2351022 = weight(abstract_txt:files in 4257) [ClassicSimilarity], result of:
            0.2351022 = score(doc=4257,freq=2.0), product of:
              0.23229901 = queryWeight, product of:
                2.3737419 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.017093442 = queryNorm
              1.0120672 = fieldWeight in 4257, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.125 = fieldNorm(doc=4257)
          0.19484802 = weight(abstract_txt:text in 4257) [ClassicSimilarity], result of:
            0.19484802 = score(doc=4257,freq=4.0), product of:
              0.19287671 = queryWeight, product of:
                2.792377 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017093442 = queryNorm
              1.0102205 = fieldWeight in 4257, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.125 = fieldNorm(doc=4257)
        0.2 = coord(5/25)
    
  2. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.13
    0.13350195 = sum of:
      0.13350195 = product of:
        0.8343872 = sum of:
          0.022168264 = weight(abstract_txt:such in 6705) [ClassicSimilarity], result of:
            0.022168264 = score(doc=6705,freq=1.0), product of:
              0.08294445 = queryWeight, product of:
                1.4184155 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.017093442 = queryNorm
              0.2672664 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.03408304 = weight(abstract_txt:data in 6705) [ClassicSimilarity], result of:
            0.03408304 = score(doc=6705,freq=1.0), product of:
              0.13100101 = queryWeight, product of:
                2.3012908 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017093442 = queryNorm
              0.26017386 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.060890004 = weight(abstract_txt:text in 6705) [ClassicSimilarity], result of:
            0.060890004 = score(doc=6705,freq=1.0), product of:
              0.19287671 = queryWeight, product of:
                2.792377 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017093442 = queryNorm
              0.3156939 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.7172459 = weight(abstract_txt:compression in 6705) [ClassicSimilarity], result of:
            0.7172459 = score(doc=6705,freq=9.0), product of:
              0.4048841 = queryWeight, product of:
                3.1338282 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.017093442 = queryNorm
              1.7714844 = fieldWeight in 6705, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
        0.16 = coord(4/25)
    
  3. Broadhurst, R.N.: Caere PageKeeper (1993) 0.13
    0.12619205 = sum of:
      0.12619205 = product of:
        0.7887003 = sum of:
          0.06876586 = weight(abstract_txt:indexing in 6303) [ClassicSimilarity], result of:
            0.06876586 = score(doc=6303,freq=2.0), product of:
              0.08941857 = queryWeight, product of:
                1.2024804 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.017093442 = queryNorm
              0.7690333 = fieldWeight in 6303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.125 = fieldNorm(doc=6303)
          0.23997931 = weight(abstract_txt:handling in 6303) [ClassicSimilarity], result of:
            0.23997931 = score(doc=6303,freq=1.0), product of:
              0.29671222 = queryWeight, product of:
                2.6827333 = boost
                6.470359 = idf(docFreq=186, maxDocs=44421)
                0.017093442 = queryNorm
              0.80879486 = fieldWeight in 6303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.470359 = idf(docFreq=186, maxDocs=44421)
                0.125 = fieldNorm(doc=6303)
          0.09742401 = weight(abstract_txt:text in 6303) [ClassicSimilarity], result of:
            0.09742401 = score(doc=6303,freq=1.0), product of:
              0.19287671 = queryWeight, product of:
                2.792377 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017093442 = queryNorm
              0.50511026 = fieldWeight in 6303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.125 = fieldNorm(doc=6303)
          0.3825311 = weight(abstract_txt:compression in 6303) [ClassicSimilarity], result of:
            0.3825311 = score(doc=6303,freq=1.0), product of:
              0.4048841 = queryWeight, product of:
                3.1338282 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.017093442 = queryNorm
              0.9447916 = fieldWeight in 6303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.125 = fieldNorm(doc=6303)
        0.16 = coord(4/25)
    
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.12
    0.12112068 = sum of:
      0.12112068 = product of:
        0.5046695 = sum of:
          0.11805572 = weight(abstract_txt:inverted in 2716) [ClassicSimilarity], result of:
            0.11805572 = score(doc=2716,freq=2.0), product of:
              0.1392004 = queryWeight, product of:
                1.0608877 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.017093442 = queryNorm
              0.848099 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.030390503 = weight(abstract_txt:indexing in 2716) [ClassicSimilarity], result of:
            0.030390503 = score(doc=2716,freq=1.0), product of:
              0.08941857 = queryWeight, product of:
                1.2024804 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.017093442 = queryNorm
              0.33986792 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.022168264 = weight(abstract_txt:such in 2716) [ClassicSimilarity], result of:
            0.022168264 = score(doc=2716,freq=1.0), product of:
              0.08294445 = queryWeight, product of:
                1.4184155 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.017093442 = queryNorm
              0.2672664 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.03408304 = weight(abstract_txt:data in 2716) [ClassicSimilarity], result of:
            0.03408304 = score(doc=2716,freq=1.0), product of:
              0.13100101 = queryWeight, product of:
                2.3012908 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017093442 = queryNorm
              0.26017386 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.060890004 = weight(abstract_txt:text in 2716) [ClassicSimilarity], result of:
            0.060890004 = score(doc=2716,freq=1.0), product of:
              0.19287671 = queryWeight, product of:
                2.792377 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017093442 = queryNorm
              0.3156939 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.23908193 = weight(abstract_txt:compression in 2716) [ClassicSimilarity], result of:
            0.23908193 = score(doc=2716,freq=1.0), product of:
              0.4048841 = queryWeight, product of:
                3.1338282 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.017093442 = queryNorm
              0.59049475 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
        0.24 = coord(6/25)
    
  5. Bell, T.C.; Moffat, A.; Nevill-Manning, C.G.; Witten, I.H.; Zobel, J.: Data compression in full-text retrieval system (1993) 0.12
    0.11894455 = sum of:
      0.11894455 = product of:
        0.74340343 = sum of:
          0.02660192 = weight(abstract_txt:such in 5642) [ClassicSimilarity], result of:
            0.02660192 = score(doc=5642,freq=1.0), product of:
              0.08294445 = queryWeight, product of:
                1.4184155 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.017093442 = queryNorm
              0.3207197 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.04089965 = weight(abstract_txt:data in 5642) [ClassicSimilarity], result of:
            0.04089965 = score(doc=5642,freq=1.0), product of:
              0.13100101 = queryWeight, product of:
                2.3012908 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017093442 = queryNorm
              0.31220865 = fieldWeight in 5642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.17897934 = weight(abstract_txt:text in 5642) [ClassicSimilarity], result of:
            0.17897934 = score(doc=5642,freq=6.0), product of:
              0.19287671 = queryWeight, product of:
                2.792377 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017093442 = queryNorm
              0.92794687 = fieldWeight in 5642, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
          0.4969225 = weight(abstract_txt:compression in 5642) [ClassicSimilarity], result of:
            0.4969225 = score(doc=5642,freq=3.0), product of:
              0.4048841 = queryWeight, product of:
                3.1338282 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.017093442 = queryNorm
              1.2273203 = fieldWeight in 5642, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.09375 = fieldNorm(doc=5642)
        0.16 = coord(4/25)