Document (#16777)

Saeed, K.
Dardzinska, A.
Natural language processing : word recognition without segmentation
Journal of the American Society for Information Science and technology. 52(2001) no.14, S.1275-1279
In an earlier article about the methods of recognition of machine and hand-written cursive letters, we presented a model showing the possibility of processing, classifying, and hence recognizing such scripts as images. The practical results we obtained encouraged us to extend the theory to an algorithm for word recognition. In this article, we introduce our ideas, describe our achievements, and present our results of testing words for recognition without segmentation. This would lead to the possibility of applying the methods used in this work, together with other previously developed algorithms to process whole sentences and, hence, written and spoken texts with the goal of automatic recognition.

Similar documents (content)

  1. Xinglin, L.: Automatic summarization method based on compound word recognition (2015) 0.18
    0.17916076 = sum of:
      0.17916076 = product of:
        0.63985986 = sum of:
          0.015559738 = weight(abstract_txt:results in 1841) [ClassicSimilarity], result of:
            0.015559738 = score(doc=1841,freq=1.0), product of:
              0.07148927 = queryWeight, product of:
                1.1350409 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018086225 = queryNorm
              0.21765138 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.08922123 = weight(abstract_txt:sentences in 1841) [ClassicSimilarity], result of:
            0.08922123 = score(doc=1841,freq=2.0), product of:
              0.14427733 = queryWeight, product of:
                1.1401845 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.018086225 = queryNorm
              0.6184009 = fieldWeight in 1841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.010981097 = weight(abstract_txt:this in 1841) [ClassicSimilarity], result of:
            0.010981097 = score(doc=1841,freq=2.0), product of:
              0.051486127 = queryWeight, product of:
                1.1797279 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.018086225 = queryNorm
              0.21328263 = fieldWeight in 1841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.026271336 = weight(abstract_txt:methods in 1841) [ClassicSimilarity], result of:
            0.026271336 = score(doc=1841,freq=1.0), product of:
              0.101366416 = queryWeight, product of:
                1.3515687 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.018086225 = queryNorm
              0.259172 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.10247364 = weight(abstract_txt:word in 1841) [ClassicSimilarity], result of:
            0.10247364 = score(doc=1841,freq=3.0), product of:
              0.17415677 = queryWeight, product of:
                1.771582 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018086225 = queryNorm
              0.5883988 = fieldWeight in 1841, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.18412454 = weight(abstract_txt:segmentation in 1841) [ClassicSimilarity], result of:
            0.18412454 = score(doc=1841,freq=1.0), product of:
              0.37123346 = queryWeight, product of:
                2.5865128 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.018086225 = queryNorm
              0.49598044 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.21122825 = weight(abstract_txt:recognition in 1841) [ClassicSimilarity], result of:
            0.21122825 = score(doc=1841,freq=1.0), product of:
              0.5521461 = queryWeight, product of:
                4.9875593 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.018086225 = queryNorm
              0.38255864 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
        0.28 = coord(7/25)
  2. Giannella, C.: ¬An improved algorithm for unsupervised decomposition of a multi-author document (2016) 0.17
    0.17358539 = sum of:
      0.17358539 = product of:
        0.6199478 = sum of:
          0.019449674 = weight(abstract_txt:results in 2642) [ClassicSimilarity], result of:
            0.019449674 = score(doc=2642,freq=1.0), product of:
              0.07148927 = queryWeight, product of:
                1.1350409 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018086225 = queryNorm
              0.27206424 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.111526534 = weight(abstract_txt:sentences in 2642) [ClassicSimilarity], result of:
            0.111526534 = score(doc=2642,freq=2.0), product of:
              0.14427733 = queryWeight, product of:
                1.1401845 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.018086225 = queryNorm
              0.7730011 = fieldWeight in 2642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.0168113 = weight(abstract_txt:this in 2642) [ClassicSimilarity], result of:
            0.0168113 = score(doc=2642,freq=3.0), product of:
              0.051486127 = queryWeight, product of:
                1.1797279 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.018086225 = queryNorm
              0.32652098 = fieldWeight in 2642, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.02535191 = weight(abstract_txt:article in 2642) [ClassicSimilarity], result of:
            0.02535191 = score(doc=2642,freq=1.0), product of:
              0.08530473 = queryWeight, product of:
                1.2398742 = boost
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.018086225 = queryNorm
              0.2971923 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.08896229 = weight(abstract_txt:written in 2642) [ClassicSimilarity], result of:
            0.08896229 = score(doc=2642,freq=1.0), product of:
              0.19698656 = queryWeight, product of:
                1.8841236 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.018086225 = queryNorm
              0.45161602 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.12769039 = weight(abstract_txt:hence in 2642) [ClassicSimilarity], result of:
            0.12769039 = score(doc=2642,freq=1.0), product of:
              0.25065216 = queryWeight, product of:
                2.125332 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.018086225 = queryNorm
              0.5094326 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.23015568 = weight(abstract_txt:segmentation in 2642) [ClassicSimilarity], result of:
            0.23015568 = score(doc=2642,freq=1.0), product of:
              0.37123346 = queryWeight, product of:
                2.5865128 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.018086225 = queryNorm
              0.61997557 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
        0.28 = coord(7/25)
  3. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.16
    0.15982482 = sum of:
      0.15982482 = product of:
        0.5708029 = sum of:
          0.023581475 = weight(abstract_txt:results in 2953) [ClassicSimilarity], result of:
            0.023581475 = score(doc=2953,freq=3.0), product of:
              0.07148927 = queryWeight, product of:
                1.1350409 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018086225 = queryNorm
              0.32986036 = fieldWeight in 2953, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.00960846 = weight(abstract_txt:this in 2953) [ClassicSimilarity], result of:
            0.00960846 = score(doc=2953,freq=2.0), product of:
              0.051486127 = queryWeight, product of:
                1.1797279 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.018086225 = queryNorm
              0.1866223 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.017746337 = weight(abstract_txt:article in 2953) [ClassicSimilarity], result of:
            0.017746337 = score(doc=2953,freq=1.0), product of:
              0.08530473 = queryWeight, product of:
                1.2398742 = boost
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.018086225 = queryNorm
              0.20803462 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.08277698 = weight(abstract_txt:recognizing in 2953) [ClassicSimilarity], result of:
            0.08277698 = score(doc=2953,freq=1.0), product of:
              0.18901531 = queryWeight, product of:
                1.3050423 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.018086225 = queryNorm
              0.43793795 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.05469028 = weight(abstract_txt:processing in 2953) [ClassicSimilarity], result of:
            0.05469028 = score(doc=2953,freq=2.0), product of:
              0.1433829 = queryWeight, product of:
                1.6074585 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.018086225 = queryNorm
              0.38142815 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.062273595 = weight(abstract_txt:written in 2953) [ClassicSimilarity], result of:
            0.062273595 = score(doc=2953,freq=1.0), product of:
              0.19698656 = queryWeight, product of:
                1.8841236 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.018086225 = queryNorm
              0.3161312 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.32012582 = weight(abstract_txt:recognition in 2953) [ClassicSimilarity], result of:
            0.32012582 = score(doc=2953,freq=3.0), product of:
              0.5521461 = queryWeight, product of:
                4.9875593 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.018086225 = queryNorm
              0.57978463 = fieldWeight in 2953, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
        0.28 = coord(7/25)
  4. Lin, M.; Zhang, Z.: Question-driven segmentation of lecture speech text : towards intelligent e-learning systems (2008) 0.16
    0.15979397 = sum of:
      0.15979397 = product of:
        0.66580826 = sum of:
          0.015559738 = weight(abstract_txt:results in 1351) [ClassicSimilarity], result of:
            0.015559738 = score(doc=1351,freq=1.0), product of:
              0.07148927 = queryWeight, product of:
                1.1350409 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018086225 = queryNorm
              0.21765138 = fieldWeight in 1351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
          0.06308893 = weight(abstract_txt:sentences in 1351) [ClassicSimilarity], result of:
            0.06308893 = score(doc=1351,freq=1.0), product of:
              0.14427733 = queryWeight, product of:
                1.1401845 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.018086225 = queryNorm
              0.43727544 = fieldWeight in 1351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
          0.007764808 = weight(abstract_txt:this in 1351) [ClassicSimilarity], result of:
            0.007764808 = score(doc=1351,freq=1.0), product of:
              0.051486127 = queryWeight, product of:
                1.1797279 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.018086225 = queryNorm
              0.1508136 = fieldWeight in 1351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
          0.020281529 = weight(abstract_txt:article in 1351) [ClassicSimilarity], result of:
            0.020281529 = score(doc=1351,freq=1.0), product of:
              0.08530473 = queryWeight, product of:
                1.2398742 = boost
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.018086225 = queryNorm
              0.23775385 = fieldWeight in 1351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8040617 = idf(docFreq=2677, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
          0.2603914 = weight(abstract_txt:segmentation in 1351) [ClassicSimilarity], result of:
            0.2603914 = score(doc=1351,freq=2.0), product of:
              0.37123346 = queryWeight, product of:
                2.5865128 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.018086225 = queryNorm
              0.7014223 = fieldWeight in 1351, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
          0.29872185 = weight(abstract_txt:recognition in 1351) [ClassicSimilarity], result of:
            0.29872185 = score(doc=1351,freq=2.0), product of:
              0.5521461 = queryWeight, product of:
                4.9875593 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.018086225 = queryNorm
              0.5410196 = fieldWeight in 1351, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.0625 = fieldNorm(doc=1351)
        0.24 = coord(6/25)
  5. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.16
    0.1597766 = sum of:
      0.1597766 = product of:
        0.66573584 = sum of:
          0.022004792 = weight(abstract_txt:results in 5206) [ClassicSimilarity], result of:
            0.022004792 = score(doc=5206,freq=2.0), product of:
              0.07148927 = queryWeight, product of:
                1.1350409 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018086225 = queryNorm
              0.30780554 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.06308893 = weight(abstract_txt:sentences in 5206) [ClassicSimilarity], result of:
            0.06308893 = score(doc=5206,freq=1.0), product of:
              0.14427733 = queryWeight, product of:
                1.1401845 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.018086225 = queryNorm
              0.43727544 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.026271336 = weight(abstract_txt:methods in 5206) [ClassicSimilarity], result of:
            0.026271336 = score(doc=5206,freq=1.0), product of:
              0.101366416 = queryWeight, product of:
                1.3515687 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.018086225 = queryNorm
              0.259172 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.04419642 = weight(abstract_txt:processing in 5206) [ClassicSimilarity], result of:
            0.04419642 = score(doc=5206,freq=1.0), product of:
              0.1433829 = queryWeight, product of:
                1.6074585 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.018086225 = queryNorm
              0.3082405 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.059163187 = weight(abstract_txt:word in 5206) [ClassicSimilarity], result of:
            0.059163187 = score(doc=5206,freq=1.0), product of:
              0.17415677 = queryWeight, product of:
                1.771582 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018086225 = queryNorm
              0.33971223 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.45101118 = weight(abstract_txt:segmentation in 5206) [ClassicSimilarity], result of:
            0.45101118 = score(doc=5206,freq=6.0), product of:
              0.37123346 = queryWeight, product of:
                2.5865128 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.018086225 = queryNorm
              1.2148991 = fieldWeight in 5206, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.24 = coord(6/25)