Document (#5307)

Author
Gillman, P.
Title
Data handling and text compression
Source
Journal of information science. 18(1992), S.105-110
Year
1992
Abstract
Data compression has a function in text storage and data handling, but not at the level of compressing data files. The reason is that the decompression of such files add a time delay to the retrieval process, and users can see this delay as a drawback of the system concerned. Compression techniques can with benefit be applied to index files. A more relevant data handling problem is that posed by the need, in most systems, to store two versions of imported text. The first id the 'native' version, as it might have come from a word processor or text editor. The second is the ASCII version which is what is actually imported. Inverted file indexes form yet another version. The problem arises out of the need for dynamic indexing and re-indexing of revisable documents in very large database applications such as are found in Office Automation systems. Four mainstream text-management packages are used to show how this problem is handled, and how generic document architectures such as OCA/CDA and SGML might help

Similar documents (author)

  1. Gillman, P.: Text retrieval : key points (1992) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:gillman in 4450) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 4450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=4450)
    
  2. Gillman, P.: Transferring text (1993) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:gillman in 6246) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 6246, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=6246)
    
  3. Gillman, P.: Intelligent OCR (1993) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:gillman in 7049) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 7049, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=7049)
    
  4. Gillman, P.: Assessing customer requirements (1994) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:gillman in 1397) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 1397, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=1397)
    
  5. Gillman, P.: ConQuest: retrieval on a large scale (1995) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:gillman in 3361) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 3361, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=3361)
    

Similar documents (content)

  1. Perez, E.: Zyindex: quick and not-to-dirty text databases (1992) 0.17
    0.16782029 = sum of:
      0.16782029 = product of:
        0.83910143 = sum of:
          0.15417011 = weight(abstract_txt:ascii in 4258) [ClassicSimilarity], result of:
            0.15417011 = score(doc=4258,freq=1.0), product of:
              0.1530357 = queryWeight, product of:
                1.1145519 = boost
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.017037077 = queryNorm
              1.0074127 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.125 = fieldNorm(doc=4258)
          0.18771046 = weight(abstract_txt:processor in 4258) [ClassicSimilarity], result of:
            0.18771046 = score(doc=4258,freq=1.0), product of:
              0.1744958 = queryWeight, product of:
                1.1901355 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.017037077 = queryNorm
              1.0757306 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.125 = fieldNorm(doc=4258)
          0.06854838 = weight(abstract_txt:indexing in 4258) [ClassicSimilarity], result of:
            0.06854838 = score(doc=4258,freq=2.0), product of:
              0.08915057 = queryWeight, product of:
                1.2030425 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.017037077 = queryNorm
              0.7689057 = fieldWeight in 4258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.125 = fieldNorm(doc=4258)
          0.23391308 = weight(abstract_txt:files in 4258) [ClassicSimilarity], result of:
            0.23391308 = score(doc=4258,freq=2.0), product of:
              0.23130912 = queryWeight, product of:
                2.3733454 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.017037077 = queryNorm
              1.0112575 = fieldWeight in 4258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.125 = fieldNorm(doc=4258)
          0.19475943 = weight(abstract_txt:text in 4258) [ClassicSimilarity], result of:
            0.19475943 = score(doc=4258,freq=4.0), product of:
              0.19264674 = queryWeight, product of:
                2.7962098 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017037077 = queryNorm
              1.0109667 = fieldWeight in 4258, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.125 = fieldNorm(doc=4258)
        0.2 = coord(5/25)
    
  2. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.13
    0.13300388 = sum of:
      0.13300388 = product of:
        0.8312743 = sum of:
          0.02219817 = weight(abstract_txt:such in 5705) [ClassicSimilarity], result of:
            0.02219817 = score(doc=5705,freq=1.0), product of:
              0.08294518 = queryWeight, product of:
                1.4212161 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.017037077 = queryNorm
              0.2676246 = fieldWeight in 5705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.078125 = fieldNorm(doc=5705)
          0.034179997 = weight(abstract_txt:data in 5705) [ClassicSimilarity], result of:
            0.034179997 = score(doc=5705,freq=1.0), product of:
              0.13113259 = queryWeight, product of:
                2.3069823 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017037077 = queryNorm
              0.26065218 = fieldWeight in 5705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=5705)
          0.060862325 = weight(abstract_txt:text in 5705) [ClassicSimilarity], result of:
            0.060862325 = score(doc=5705,freq=1.0), product of:
              0.19264674 = queryWeight, product of:
                2.7962098 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017037077 = queryNorm
              0.3159271 = fieldWeight in 5705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=5705)
          0.7140338 = weight(abstract_txt:compression in 5705) [ClassicSimilarity], result of:
            0.7140338 = score(doc=5705,freq=9.0), product of:
              0.40331534 = queryWeight, product of:
                3.1339128 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.017037077 = queryNorm
              1.7704107 = fieldWeight in 5705, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=5705)
        0.16 = coord(4/25)
    
  3. Broadhurst, R.N.: Caere PageKeeper (1993) 0.13
    0.12569243 = sum of:
      0.12569243 = product of:
        0.78557765 = sum of:
          0.06854838 = weight(abstract_txt:indexing in 6304) [ClassicSimilarity], result of:
            0.06854838 = score(doc=6304,freq=2.0), product of:
              0.08915057 = queryWeight, product of:
                1.2030425 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.017037077 = queryNorm
              0.7689057 = fieldWeight in 6304, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.125 = fieldNorm(doc=6304)
          0.2388316 = weight(abstract_txt:handling in 6304) [ClassicSimilarity], result of:
            0.2388316 = score(doc=6304,freq=1.0), product of:
              0.29550233 = queryWeight, product of:
                2.6825325 = boost
                6.465779 = idf(docFreq=186, maxDocs=44218)
                0.017037077 = queryNorm
              0.80822235 = fieldWeight in 6304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.465779 = idf(docFreq=186, maxDocs=44218)
                0.125 = fieldNorm(doc=6304)
          0.097379714 = weight(abstract_txt:text in 6304) [ClassicSimilarity], result of:
            0.097379714 = score(doc=6304,freq=1.0), product of:
              0.19264674 = queryWeight, product of:
                2.7962098 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017037077 = queryNorm
              0.5054833 = fieldWeight in 6304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.125 = fieldNorm(doc=6304)
          0.380818 = weight(abstract_txt:compression in 6304) [ClassicSimilarity], result of:
            0.380818 = score(doc=6304,freq=1.0), product of:
              0.40331534 = queryWeight, product of:
                3.1339128 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.017037077 = queryNorm
              0.94421905 = fieldWeight in 6304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.125 = fieldNorm(doc=6304)
        0.16 = coord(4/25)
    
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.12
    0.12073836 = sum of:
      0.12073836 = product of:
        0.5030765 = sum of:
          0.11753033 = weight(abstract_txt:inverted in 2648) [ClassicSimilarity], result of:
            0.11753033 = score(doc=2648,freq=2.0), product of:
              0.13866365 = queryWeight, product of:
                1.0609263 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.017037077 = queryNorm
              0.84759295 = fieldWeight in 2648, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.030294389 = weight(abstract_txt:indexing in 2648) [ClassicSimilarity], result of:
            0.030294389 = score(doc=2648,freq=1.0), product of:
              0.08915057 = queryWeight, product of:
                1.2030425 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.017037077 = queryNorm
              0.3398115 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.02219817 = weight(abstract_txt:such in 2648) [ClassicSimilarity], result of:
            0.02219817 = score(doc=2648,freq=1.0), product of:
              0.08294518 = queryWeight, product of:
                1.4212161 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.017037077 = queryNorm
              0.2676246 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.034179997 = weight(abstract_txt:data in 2648) [ClassicSimilarity], result of:
            0.034179997 = score(doc=2648,freq=1.0), product of:
              0.13113259 = queryWeight, product of:
                2.3069823 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017037077 = queryNorm
              0.26065218 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.060862325 = weight(abstract_txt:text in 2648) [ClassicSimilarity], result of:
            0.060862325 = score(doc=2648,freq=1.0), product of:
              0.19264674 = queryWeight, product of:
                2.7962098 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017037077 = queryNorm
              0.3159271 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.23801126 = weight(abstract_txt:compression in 2648) [ClassicSimilarity], result of:
            0.23801126 = score(doc=2648,freq=1.0), product of:
              0.40331534 = queryWeight, product of:
                3.1339128 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.017037077 = queryNorm
              0.5901369 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
        0.24 = coord(6/25)
    
  5. Bell, T.C.; Moffat, A.; Nevill-Manning, C.G.; Witten, I.H.; Zobel, J.: Data compression in full-text retrieval system (1993) 0.12
    0.118599825 = sum of:
      0.118599825 = product of:
        0.7412489 = sum of:
          0.026637804 = weight(abstract_txt:such in 5643) [ClassicSimilarity], result of:
            0.026637804 = score(doc=5643,freq=1.0), product of:
              0.08294518 = queryWeight, product of:
                1.4212161 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.017037077 = queryNorm
              0.3211495 = fieldWeight in 5643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.09375 = fieldNorm(doc=5643)
          0.041015994 = weight(abstract_txt:data in 5643) [ClassicSimilarity], result of:
            0.041015994 = score(doc=5643,freq=1.0), product of:
              0.13113259 = queryWeight, product of:
                2.3069823 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017037077 = queryNorm
              0.31278262 = fieldWeight in 5643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=5643)
          0.17889796 = weight(abstract_txt:text in 5643) [ClassicSimilarity], result of:
            0.17889796 = score(doc=5643,freq=6.0), product of:
              0.19264674 = queryWeight, product of:
                2.7962098 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017037077 = queryNorm
              0.92863214 = fieldWeight in 5643, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=5643)
          0.49469715 = weight(abstract_txt:compression in 5643) [ClassicSimilarity], result of:
            0.49469715 = score(doc=5643,freq=3.0), product of:
              0.40331534 = queryWeight, product of:
                3.1339128 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.017037077 = queryNorm
              1.2265766 = fieldWeight in 5643, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.09375 = fieldNorm(doc=5643)
        0.16 = coord(4/25)