Document (#20602)

Ucoluk, G.
Toroslu, I.H.
¬A genetic algorithm approach for verification of the syllable-based text compression technique
Journal of information science. 23(1997) no.5, S.365-372.
It is possible to decompose any text into strings that have lengthy greater than 1 and occur frequently, provided that an easy mechanism exists for it. Having in one hand the set of such frequently occuring strings and in the other the set of letters and symbols, it is possible to compress the text using Huffman coding over an alphabet which is a subset of the union of these 2 sets. Observations reveal that, in most cases, the maximal inclusion of the strings leads to an optimal length of the compressed text. However, the verification of this prediction requires the consideration of all subsets in order to find the one that leads to the best compression. Describes a genetic algorithm devised and used for this process and concludes that Turkish texts, because of the agglutinative nature of the language and the highly regular syllable formation, provides a useful test bed for this technique
Huffman codes

Similar documents (content)

  1. Robertson, A.M.; Willett, P.: Generation of equifrequent groups of words using a genetic algorithm (1994) 0.14
    0.13678475 = sum of:
      0.13678475 = product of:
        0.6839237 = sum of:
          0.15088196 = weight(abstract_txt:turkish in 8157) [ClassicSimilarity], result of:
            0.15088196 = score(doc=8157,freq=1.0), product of:
              0.18024744 = queryWeight, product of:
                1.2369088 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.01632054 = queryNorm
              0.8370824 = fieldWeight in 8157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.09375 = fieldNorm(doc=8157)
          0.11131714 = weight(abstract_txt:algorithm in 8157) [ClassicSimilarity], result of:
            0.11131714 = score(doc=8157,freq=2.0), product of:
              0.14716989 = queryWeight, product of:
                1.5806204 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.01632054 = queryNorm
              0.7563853 = fieldWeight in 8157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.09375 = fieldNorm(doc=8157)
          0.03134435 = weight(abstract_txt:that in 8157) [ClassicSimilarity], result of:
            0.03134435 = score(doc=8157,freq=5.0), product of:
              0.063224256 = queryWeight, product of:
                1.6380607 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01632054 = queryNorm
              0.4957646 = fieldWeight in 8157, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=8157)
          0.33443862 = weight(abstract_txt:genetic in 8157) [ClassicSimilarity], result of:
            0.33443862 = score(doc=8157,freq=3.0), product of:
              0.26768544 = queryWeight, product of:
                2.1317217 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.01632054 = queryNorm
              1.2493718 = fieldWeight in 8157, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.09375 = fieldNorm(doc=8157)
          0.05594162 = weight(abstract_txt:text in 8157) [ClassicSimilarity], result of:
            0.05594162 = score(doc=8157,freq=1.0), product of:
              0.1476684 = queryWeight, product of:
                2.2391176 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01632054 = queryNorm
              0.3788327 = fieldWeight in 8157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=8157)
        0.2 = coord(5/25)
  2. Cheng, K.-S.; Young, G.H.; Wong, K.-F.: ¬A study on word-based and integral-bit Chinese text compression algorithms (1999) 0.13
    0.12553345 = sum of:
      0.12553345 = product of:
        0.6276672 = sum of:
          0.0103354305 = weight(abstract_txt:this in 4056) [ClassicSimilarity], result of:
            0.0103354305 = score(doc=4056,freq=1.0), product of:
              0.03927105 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.01632054 = queryNorm
              0.26318192 = fieldWeight in 4056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.15905763 = weight(abstract_txt:algorithm in 4056) [ClassicSimilarity], result of:
            0.15905763 = score(doc=4056,freq=3.0), product of:
              0.14716989 = queryWeight, product of:
                1.5806204 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.01632054 = queryNorm
              1.0807756 = fieldWeight in 4056, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.023127891 = weight(abstract_txt:that in 4056) [ClassicSimilarity], result of:
            0.023127891 = score(doc=4056,freq=2.0), product of:
              0.063224256 = queryWeight, product of:
                1.6380607 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01632054 = queryNorm
              0.36580724 = fieldWeight in 4056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.369881 = weight(abstract_txt:compression in 4056) [ClassicSimilarity], result of:
            0.369881 = score(doc=4056,freq=3.0), product of:
              0.2583196 = queryWeight, product of:
                2.094097 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.01632054 = queryNorm
              1.4318737 = fieldWeight in 4056, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
          0.06526522 = weight(abstract_txt:text in 4056) [ClassicSimilarity], result of:
            0.06526522 = score(doc=4056,freq=1.0), product of:
              0.1476684 = queryWeight, product of:
                2.2391176 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01632054 = queryNorm
              0.44197148 = fieldWeight in 4056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.109375 = fieldNorm(doc=4056)
        0.2 = coord(5/25)
  3. Karakos, A.: Greeklish : an experimental interface for automatic transliteration (2003) 0.11
    0.10927425 = sum of:
      0.10927425 = product of:
        0.4553094 = sum of:
          0.016507661 = weight(abstract_txt:this in 2820) [ClassicSimilarity], result of:
            0.016507661 = score(doc=2820,freq=5.0), product of:
              0.03927105 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.01632054 = queryNorm
              0.42035192 = fieldWeight in 2820, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
          0.119925044 = weight(abstract_txt:letters in 2820) [ClassicSimilarity], result of:
            0.119925044 = score(doc=2820,freq=2.0), product of:
              0.13862078 = queryWeight, product of:
                1.0847191 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.01632054 = queryNorm
              0.86513036 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
          0.20185629 = weight(abstract_txt:alphabet in 2820) [ClassicSimilarity], result of:
            0.20185629 = score(doc=2820,freq=3.0), product of:
              0.17135082 = queryWeight, product of:
                1.205997 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.01632054 = queryNorm
              1.1780293 = fieldWeight in 2820, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
          0.034906197 = weight(abstract_txt:possible in 2820) [ClassicSimilarity], result of:
            0.034906197 = score(doc=2820,freq=1.0), product of:
              0.09664442 = queryWeight, product of:
                1.2808743 = boost
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.01632054 = queryNorm
              0.36118174 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
          0.065594256 = weight(abstract_txt:algorithm in 2820) [ClassicSimilarity], result of:
            0.065594256 = score(doc=2820,freq=1.0), product of:
              0.14716989 = queryWeight, product of:
                1.5806204 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.01632054 = queryNorm
              0.44570434 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
          0.016519923 = weight(abstract_txt:that in 2820) [ClassicSimilarity], result of:
            0.016519923 = score(doc=2820,freq=2.0), product of:
              0.063224256 = queryWeight, product of:
                1.6380607 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01632054 = queryNorm
              0.2612909 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=2820)
        0.24 = coord(6/25)
  4. Akman, K.I.: ¬A new text compression technique based on natural language structure (1995) 0.11
    0.10853801 = sum of:
      0.10853801 = product of:
        0.54269004 = sum of:
          0.12573497 = weight(abstract_txt:turkish in 1928) [ClassicSimilarity], result of:
            0.12573497 = score(doc=1928,freq=1.0), product of:
              0.18024744 = queryWeight, product of:
                1.2369088 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.01632054 = queryNorm
              0.69756866 = fieldWeight in 1928, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=1928)
          0.061853785 = weight(abstract_txt:technique in 1928) [ClassicSimilarity], result of:
            0.061853785 = score(doc=1928,freq=1.0), product of:
              0.14152047 = queryWeight, product of:
                1.5499859 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01632054 = queryNorm
              0.437066 = fieldWeight in 1928, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=1928)
          0.092764296 = weight(abstract_txt:algorithm in 1928) [ClassicSimilarity], result of:
            0.092764296 = score(doc=1928,freq=2.0), product of:
              0.14716989 = queryWeight, product of:
                1.5806204 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.01632054 = queryNorm
              0.63032115 = fieldWeight in 1928, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=1928)
          0.21571897 = weight(abstract_txt:compression in 1928) [ClassicSimilarity], result of:
            0.21571897 = score(doc=1928,freq=2.0), product of:
              0.2583196 = queryWeight, product of:
                2.094097 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.01632054 = queryNorm
              0.83508563 = fieldWeight in 1928, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=1928)
          0.04661802 = weight(abstract_txt:text in 1928) [ClassicSimilarity], result of:
            0.04661802 = score(doc=1928,freq=1.0), product of:
              0.1476684 = queryWeight, product of:
                2.2391176 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01632054 = queryNorm
              0.3156939 = fieldWeight in 1928, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=1928)
        0.2 = coord(5/25)
  5. Kokol, P.; Podgorelec, V.; Zorman, M.; Kokol, T.; Njivar, T.: Computer and natural language texts : a comparison based on long-range correlations (1999) 0.10
    0.10036049 = sum of:
      0.10036049 = product of:
        0.50180244 = sum of:
          0.012528434 = weight(abstract_txt:this in 5299) [ClassicSimilarity], result of:
            0.012528434 = score(doc=5299,freq=2.0), product of:
              0.03927105 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.01632054 = queryNorm
              0.31902468 = fieldWeight in 5299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=5299)
          0.08311517 = weight(abstract_txt:symbols in 5299) [ClassicSimilarity], result of:
            0.08311517 = score(doc=5299,freq=1.0), product of:
              0.12112425 = queryWeight, product of:
                1.013955 = boost
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.01632054 = queryNorm
              0.6861976 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.09375 = fieldNorm(doc=5299)
          0.13124634 = weight(abstract_txt:maximal in 5299) [ClassicSimilarity], result of:
            0.13124634 = score(doc=5299,freq=1.0), product of:
              0.16424887 = queryWeight, product of:
                1.1807401 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.01632054 = queryNorm
              0.79907 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.09375 = fieldNorm(doc=5299)
          0.024279226 = weight(abstract_txt:that in 5299) [ClassicSimilarity], result of:
            0.024279226 = score(doc=5299,freq=3.0), product of:
              0.063224256 = queryWeight, product of:
                1.6380607 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01632054 = queryNorm
              0.3840176 = fieldWeight in 5299, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=5299)
          0.25063327 = weight(abstract_txt:strings in 5299) [ClassicSimilarity], result of:
            0.25063327 = score(doc=5299,freq=1.0), product of:
              0.36462277 = queryWeight, product of:
                3.0470924 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.01632054 = queryNorm
              0.68737686 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.09375 = fieldNorm(doc=5299)
        0.2 = coord(5/25)