Document (#20602)

Author
Ucoluk, G.
Toroslu, I.H.
Title
¬A genetic algorithm approach for verification of the syllable-based text compression technique
Source
Journal of information science. 23(1997) no.5, S.365-372
Year
1997
Abstract
It is possible to decompose any text into strings that have lengthy greater than 1 and occur frequently, provided that an easy mechanism exists for it. Having in one hand the set of such frequently occuring strings and in the other the set of letters and symbols, it is possible to compress the text using Huffman coding over an alphabet which is a subset of the union of these 2 sets. Observations reveal that, in most cases, the maximal inclusion of the strings leads to an optimal length of the compressed text. However, the verification of this prediction requires the consideration of all subsets in order to find the one that leads to the best compression. Describes a genetic algorithm devised and used for this process and concludes that Turkish texts, because of the agglutinative nature of the language and the highly regular syllable formation, provides a useful test bed for this technique
Object
Huffman codes

Similar documents (content)

  1. Robertson, A.M.; Willett, P.: Generation of equifrequent groups of words using a genetic algorithm (1994) 0.14
    0.13719705 = sum of:
      0.13719705 = product of:
        0.6859852 = sum of:
          0.15068635 = weight(abstract_txt:turkish in 8158) [ClassicSimilarity], result of:
            0.15068635 = score(doc=8158,freq=1.0), product of:
              0.18010615 = queryWeight, product of:
                1.2341757 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.01635225 = queryNorm
              0.836653 = fieldWeight in 8158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.09375 = fieldNorm(doc=8158)
          0.11136793 = weight(abstract_txt:algorithm in 8158) [ClassicSimilarity], result of:
            0.11136793 = score(doc=8158,freq=2.0), product of:
              0.14722653 = queryWeight, product of:
                1.5780499 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01635225 = queryNorm
              0.7564393 = fieldWeight in 8158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=8158)
          0.031532846 = weight(abstract_txt:that in 8158) [ClassicSimilarity], result of:
            0.031532846 = score(doc=8158,freq=5.0), product of:
              0.063482605 = queryWeight, product of:
                1.6384194 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01635225 = queryNorm
              0.49671632 = fieldWeight in 8158, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=8158)
          0.33631876 = weight(abstract_txt:genetic in 8158) [ClassicSimilarity], result of:
            0.33631876 = score(doc=8158,freq=3.0), product of:
              0.26870945 = queryWeight, product of:
                2.1319115 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.01635225 = queryNorm
              1.2516075 = fieldWeight in 8158, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.09375 = fieldNorm(doc=8158)
          0.056079242 = weight(abstract_txt:text in 8158) [ClassicSimilarity], result of:
            0.056079242 = score(doc=8158,freq=1.0), product of:
              0.14792244 = queryWeight, product of:
                2.2369678 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01635225 = queryNorm
              0.37911248 = fieldWeight in 8158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=8158)
        0.2 = coord(5/25)
    
  2. Cheng, K.-S.; Young, G.H.; Wong, K.-F.: ¬A study on word-based and integral-bit Chinese text compression algorithms (1999) 0.13
    0.12550938 = sum of:
      0.12550938 = product of:
        0.6275469 = sum of:
          0.010425602 = weight(abstract_txt:this in 3056) [ClassicSimilarity], result of:
            0.010425602 = score(doc=3056,freq=1.0), product of:
              0.03950232 = queryWeight, product of:
                1.0011165 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01635225 = queryNorm
              0.2639238 = fieldWeight in 3056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.109375 = fieldNorm(doc=3056)
          0.1591302 = weight(abstract_txt:algorithm in 3056) [ClassicSimilarity], result of:
            0.1591302 = score(doc=3056,freq=3.0), product of:
              0.14722653 = queryWeight, product of:
                1.5780499 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01635225 = queryNorm
              1.0808527 = fieldWeight in 3056, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.109375 = fieldNorm(doc=3056)
          0.023266975 = weight(abstract_txt:that in 3056) [ClassicSimilarity], result of:
            0.023266975 = score(doc=3056,freq=2.0), product of:
              0.063482605 = queryWeight, product of:
                1.6384194 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01635225 = queryNorm
              0.36650947 = fieldWeight in 3056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.109375 = fieldNorm(doc=3056)
          0.36929834 = weight(abstract_txt:compression in 3056) [ClassicSimilarity], result of:
            0.36929834 = score(doc=3056,freq=3.0), product of:
              0.25806904 = queryWeight, product of:
                2.0892751 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.01635225 = queryNorm
              1.431006 = fieldWeight in 3056, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.109375 = fieldNorm(doc=3056)
          0.06542578 = weight(abstract_txt:text in 3056) [ClassicSimilarity], result of:
            0.06542578 = score(doc=3056,freq=1.0), product of:
              0.14792244 = queryWeight, product of:
                2.2369678 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01635225 = queryNorm
              0.4422979 = fieldWeight in 3056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=3056)
        0.2 = coord(5/25)
    
  3. Karakos, A.: Greeklish : an experimental interface for automatic transliteration (2003) 0.11
    0.109245524 = sum of:
      0.109245524 = product of:
        0.4551897 = sum of:
          0.016651683 = weight(abstract_txt:this in 1820) [ClassicSimilarity], result of:
            0.016651683 = score(doc=1820,freq=5.0), product of:
              0.03950232 = queryWeight, product of:
                1.0011165 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01635225 = queryNorm
              0.42153683 = fieldWeight in 1820, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
          0.1197437 = weight(abstract_txt:letters in 1820) [ClassicSimilarity], result of:
            0.1197437 = score(doc=1820,freq=2.0), product of:
              0.13849217 = queryWeight, product of:
                1.0822443 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.01635225 = queryNorm
              0.8646243 = fieldWeight in 1820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
          0.20158665 = weight(abstract_txt:alphabet in 1820) [ClassicSimilarity], result of:
            0.20158665 = score(doc=1820,freq=3.0), product of:
              0.171212 = queryWeight, product of:
                1.2033163 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01635225 = queryNorm
              1.1774095 = fieldWeight in 1820, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
          0.03496423 = weight(abstract_txt:possible in 1820) [ClassicSimilarity], result of:
            0.03496423 = score(doc=1820,freq=1.0), product of:
              0.09675931 = queryWeight, product of:
                1.279305 = boost
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.01635225 = queryNorm
              0.36135262 = fieldWeight in 1820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
          0.065624185 = weight(abstract_txt:algorithm in 1820) [ClassicSimilarity], result of:
            0.065624185 = score(doc=1820,freq=1.0), product of:
              0.14722653 = queryWeight, product of:
                1.5780499 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01635225 = queryNorm
              0.44573617 = fieldWeight in 1820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
          0.016619269 = weight(abstract_txt:that in 1820) [ClassicSimilarity], result of:
            0.016619269 = score(doc=1820,freq=2.0), product of:
              0.063482605 = queryWeight, product of:
                1.6384194 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01635225 = queryNorm
              0.26179248 = fieldWeight in 1820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=1820)
        0.24 = coord(6/25)
    
  4. Akman, K.I.: ¬A new text compression technique based on natural language structure (1995) 0.11
    0.10844147 = sum of:
      0.10844147 = product of:
        0.54220736 = sum of:
          0.12557195 = weight(abstract_txt:turkish in 1860) [ClassicSimilarity], result of:
            0.12557195 = score(doc=1860,freq=1.0), product of:
              0.18010615 = queryWeight, product of:
                1.2341757 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.01635225 = queryNorm
              0.6972108 = fieldWeight in 1860, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=1860)
          0.061716918 = weight(abstract_txt:technique in 1860) [ClassicSimilarity], result of:
            0.061716918 = score(doc=1860,freq=1.0), product of:
              0.14132303 = queryWeight, product of:
                1.5460879 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.01635225 = queryNorm
              0.43670815 = fieldWeight in 1860, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=1860)
          0.09280662 = weight(abstract_txt:algorithm in 1860) [ClassicSimilarity], result of:
            0.09280662 = score(doc=1860,freq=2.0), product of:
              0.14722653 = queryWeight, product of:
                1.5780499 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01635225 = queryNorm
              0.63036615 = fieldWeight in 1860, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=1860)
          0.21537915 = weight(abstract_txt:compression in 1860) [ClassicSimilarity], result of:
            0.21537915 = score(doc=1860,freq=2.0), product of:
              0.25806904 = queryWeight, product of:
                2.0892751 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.01635225 = queryNorm
              0.8345796 = fieldWeight in 1860, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=1860)
          0.046732705 = weight(abstract_txt:text in 1860) [ClassicSimilarity], result of:
            0.046732705 = score(doc=1860,freq=1.0), product of:
              0.14792244 = queryWeight, product of:
                2.2369678 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01635225 = queryNorm
              0.3159271 = fieldWeight in 1860, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1860)
        0.2 = coord(5/25)
    
  5. Kokol, P.; Podgorelec, V.; Zorman, M.; Kokol, T.; Njivar, T.: Computer and natural language texts : a comparison based on long-range correlations (1999) 0.10
    0.10026666 = sum of:
      0.10026666 = product of:
        0.5013333 = sum of:
          0.012637737 = weight(abstract_txt:this in 4299) [ClassicSimilarity], result of:
            0.012637737 = score(doc=4299,freq=2.0), product of:
              0.03950232 = queryWeight, product of:
                1.0011165 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01635225 = queryNorm
              0.31992394 = fieldWeight in 4299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.08297932 = weight(abstract_txt:symbols in 4299) [ClassicSimilarity], result of:
            0.08297932 = score(doc=4299,freq=1.0), product of:
              0.121001996 = queryWeight, product of:
                1.0116004 = boost
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.01635225 = queryNorm
              0.6857682 = fieldWeight in 4299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.1310666 = weight(abstract_txt:maximal in 4299) [ClassicSimilarity], result of:
            0.1310666 = score(doc=4299,freq=1.0), product of:
              0.16411212 = queryWeight, product of:
                1.1781024 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.01635225 = queryNorm
              0.7986406 = fieldWeight in 4299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.024425235 = weight(abstract_txt:that in 4299) [ClassicSimilarity], result of:
            0.024425235 = score(doc=4299,freq=3.0), product of:
              0.063482605 = queryWeight, product of:
                1.6384194 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01635225 = queryNorm
              0.38475478 = fieldWeight in 4299, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.2502244 = weight(abstract_txt:strings in 4299) [ClassicSimilarity], result of:
            0.2502244 = score(doc=4299,freq=1.0), product of:
              0.36425552 = queryWeight, product of:
                3.0400198 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.01635225 = queryNorm
              0.68694746 = fieldWeight in 4299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
        0.2 = coord(5/25)