Document (#30207)

Author
Khoo, C.S.G.
Dai, D.
Loh, T.E.
Title
Using statistical and contextual information to identify two- and three-character words in Chinese text
Source
Journal of the American Society for Information Science and technology. 53(2002) no.5, S.365-377
Year
2002
Abstract
Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Theme
Computerlinguistik

Similar documents (author)

  1. Khoo, C.S.G.; Poo, D.C.C.: ¬An expert system approach to online catalog subject searching (1994) 6.10
    6.103445 = sum of:
      6.103445 = sum of:
        2.7373245 = weight(author_txt:khoo in 7302) [ClassicSimilarity], result of:
          2.7373245 = score(doc=7302,freq=1.0), product of:
            0.65689176 = queryWeight, product of:
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.078819074 = queryNorm
            4.167086 = fieldWeight in 7302, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.5 = fieldNorm(doc=7302)
        3.3661203 = weight(author_txt:c.s.g in 7302) [ClassicSimilarity], result of:
          3.3661203 = score(doc=7302,freq=1.0), product of:
            0.753985 = queryWeight, product of:
              1.0713576 = boost
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.078819074 = queryNorm
            4.4644394 = fieldWeight in 7302, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.5 = fieldNorm(doc=7302)
    
  2. Chaudhry, A.S.; Khoo, C.S.G..: ¬A survey of the top-level categories in the structure of corporate Websites (2008) 6.10
    6.103445 = sum of:
      6.103445 = sum of:
        2.7373245 = weight(author_txt:khoo in 3259) [ClassicSimilarity], result of:
          2.7373245 = score(doc=3259,freq=1.0), product of:
            0.65689176 = queryWeight, product of:
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.078819074 = queryNorm
            4.167086 = fieldWeight in 3259, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.5 = fieldNorm(doc=3259)
        3.3661203 = weight(author_txt:c.s.g in 3259) [ClassicSimilarity], result of:
          3.3661203 = score(doc=3259,freq=1.0), product of:
            0.753985 = queryWeight, product of:
              1.0713576 = boost
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.078819074 = queryNorm
            4.4644394 = fieldWeight in 3259, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.5 = fieldNorm(doc=3259)
    
  3. Khoo, C.S.G.; Ou, S.: Machine versus human clustering of concepts across documents (2008) 6.10
    6.103445 = sum of:
      6.103445 = sum of:
        2.7373245 = weight(author_txt:khoo in 3286) [ClassicSimilarity], result of:
          2.7373245 = score(doc=3286,freq=1.0), product of:
            0.65689176 = queryWeight, product of:
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.078819074 = queryNorm
            4.167086 = fieldWeight in 3286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.5 = fieldNorm(doc=3286)
        3.3661203 = weight(author_txt:c.s.g in 3286) [ClassicSimilarity], result of:
          3.3661203 = score(doc=3286,freq=1.0), product of:
            0.753985 = queryWeight, product of:
              1.0713576 = boost
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.078819074 = queryNorm
            4.4644394 = fieldWeight in 3286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.5 = fieldNorm(doc=3286)
    
  4. Poo, D.C.C.; Khoo, C.S.G.: Online Catalog Subject Searching (2009) 6.10
    6.103445 = sum of:
      6.103445 = sum of:
        2.7373245 = weight(author_txt:khoo in 838) [ClassicSimilarity], result of:
          2.7373245 = score(doc=838,freq=1.0), product of:
            0.65689176 = queryWeight, product of:
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.078819074 = queryNorm
            4.167086 = fieldWeight in 838, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.5 = fieldNorm(doc=838)
        3.3661203 = weight(author_txt:c.s.g in 838) [ClassicSimilarity], result of:
          3.3661203 = score(doc=838,freq=1.0), product of:
            0.753985 = queryWeight, product of:
              1.0713576 = boost
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.078819074 = queryNorm
            4.4644394 = fieldWeight in 838, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.5 = fieldNorm(doc=838)
    
  5. Sun, G.; Khoo, C.S.G.: ¬A framework to represent variables and values in social science research data sets to support data curation and reuse (2018) 6.10
    6.103445 = sum of:
      6.103445 = sum of:
        2.7373245 = weight(author_txt:khoo in 744) [ClassicSimilarity], result of:
          2.7373245 = score(doc=744,freq=1.0), product of:
            0.65689176 = queryWeight, product of:
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.078819074 = queryNorm
            4.167086 = fieldWeight in 744, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.334172 = idf(docFreq=28, maxDocs=44421)
              0.5 = fieldNorm(doc=744)
        3.3661203 = weight(author_txt:c.s.g in 744) [ClassicSimilarity], result of:
          3.3661203 = score(doc=744,freq=1.0), product of:
            0.753985 = queryWeight, product of:
              1.0713576 = boost
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.078819074 = queryNorm
            4.4644394 = fieldWeight in 744, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.928879 = idf(docFreq=15, maxDocs=44421)
              0.5 = fieldNorm(doc=744)
    

Similar documents (content)

  1. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.62
    0.6170671 = sum of:
      0.6170671 = product of:
        1.7140752 = sum of:
          0.05603038 = weight(abstract_txt:adjacent in 5580) [ClassicSimilarity], result of:
            0.05603038 = score(doc=5580,freq=1.0), product of:
              0.10108935 = queryWeight, product of:
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.011399013 = queryNorm
              0.5542659 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.007877406 = weight(abstract_txt:information in 5580) [ClassicSimilarity], result of:
            0.007877406 = score(doc=5580,freq=3.0), product of:
              0.03008325 = queryWeight, product of:
                1.0910375 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.011399013 = queryNorm
              0.26185355 = fieldWeight in 5580, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.022535497 = weight(abstract_txt:automatic in 5580) [ClassicSimilarity], result of:
            0.022535497 = score(doc=5580,freq=1.0), product of:
              0.0693976 = queryWeight, product of:
                1.1717489 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.011399013 = queryNorm
              0.32473022 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.038773663 = weight(abstract_txt:statistical in 5580) [ClassicSimilarity], result of:
            0.038773663 = score(doc=5580,freq=2.0), product of:
              0.079088666 = queryWeight, product of:
                1.2508909 = boost
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.011399013 = queryNorm
              0.49025562 = fieldWeight in 5580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.097029045 = weight(abstract_txt:characters in 5580) [ClassicSimilarity], result of:
            0.097029045 = score(doc=5580,freq=1.0), product of:
              0.21024771 = queryWeight, product of:
                2.4978914 = boost
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.011399013 = queryNorm
              0.4614987 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.2409139 = weight(abstract_txt:chinese in 5580) [ClassicSimilarity], result of:
            0.2409139 = score(doc=5580,freq=9.0), product of:
              0.20398743 = queryWeight, product of:
                2.8410509 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.011399013 = queryNorm
              1.1810232 = fieldWeight in 5580, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.25102296 = weight(abstract_txt:grams in 5580) [ClassicSimilarity], result of:
            0.25102296 = score(doc=5580,freq=2.0), product of:
              0.3461322 = queryWeight, product of:
                3.7008228 = boost
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.011399013 = queryNorm
              0.7252228 = fieldWeight in 5580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.7239141 = weight(abstract_txt:segmentation in 5580) [ClassicSimilarity], result of:
            0.7239141 = score(doc=5580,freq=9.0), product of:
              0.48623994 = queryWeight, product of:
                5.3721514 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.011399013 = queryNorm
              1.4888002 = fieldWeight in 5580, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.27597818 = weight(abstract_txt:words in 5580) [ClassicSimilarity], result of:
            0.27597818 = score(doc=5580,freq=5.0), product of:
              0.36870825 = queryWeight, product of:
                6.0393295 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.011399013 = queryNorm
              0.74850017 = fieldWeight in 5580, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
        0.36 = coord(9/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.35
    0.3549198 = sum of:
      0.3549198 = product of:
        1.4788325 = sum of:
          0.0045480225 = weight(abstract_txt:information in 1604) [ClassicSimilarity], result of:
            0.0045480225 = score(doc=1604,freq=1.0), product of:
              0.03008325 = queryWeight, product of:
                1.0910375 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.011399013 = queryNorm
              0.15118122 = fieldWeight in 1604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.031508513 = weight(abstract_txt:independent in 1604) [ClassicSimilarity], result of:
            0.031508513 = score(doc=1604,freq=1.0), product of:
              0.08677307 = queryWeight, product of:
                1.3102518 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.011399013 = queryNorm
              0.36311397 = fieldWeight in 1604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.21246609 = weight(abstract_txt:chinese in 1604) [ClassicSimilarity], result of:
            0.21246609 = score(doc=1604,freq=7.0), product of:
              0.20398743 = queryWeight, product of:
                2.8410509 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.011399013 = queryNorm
              1.0415646 = fieldWeight in 1604, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.22039501 = weight(abstract_txt:character in 1604) [ClassicSimilarity], result of:
            0.22039501 = score(doc=1604,freq=2.0), product of:
              0.38245487 = queryWeight, product of:
                5.1461973 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011399013 = queryNorm
              0.5762641 = fieldWeight in 1604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.7630725 = weight(abstract_txt:segmentation in 1604) [ClassicSimilarity], result of:
            0.7630725 = score(doc=1604,freq=10.0), product of:
              0.48623994 = queryWeight, product of:
                5.3721514 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.011399013 = queryNorm
              1.5693332 = fieldWeight in 1604, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
          0.2468424 = weight(abstract_txt:words in 1604) [ClassicSimilarity], result of:
            0.2468424 = score(doc=1604,freq=4.0), product of:
              0.36870825 = queryWeight, product of:
                6.0393295 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.011399013 = queryNorm
              0.6694789 = fieldWeight in 1604, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=1604)
        0.24 = coord(6/25)
    
  3. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 0.31
    0.31201053 = sum of:
      0.31201053 = product of:
        0.9750329 = sum of:
          0.0045480225 = weight(abstract_txt:information in 4773) [ClassicSimilarity], result of:
            0.0045480225 = score(doc=4773,freq=1.0), product of:
              0.03008325 = queryWeight, product of:
                1.0910375 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.011399013 = queryNorm
              0.15118122 = fieldWeight in 4773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.018772993 = weight(abstract_txt:using in 4773) [ClassicSimilarity], result of:
            0.018772993 = score(doc=4773,freq=2.0), product of:
              0.06144059 = queryWeight, product of:
                1.5592114 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.011399013 = queryNorm
              0.3055471 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.015851635 = weight(abstract_txt:were in 4773) [ClassicSimilarity], result of:
            0.015851635 = score(doc=4773,freq=1.0), product of:
              0.06915534 = queryWeight, product of:
                1.6542082 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.011399013 = queryNorm
              0.2292178 = fieldWeight in 4773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.1372198 = weight(abstract_txt:characters in 4773) [ClassicSimilarity], result of:
            0.1372198 = score(doc=4773,freq=2.0), product of:
              0.21024771 = queryWeight, product of:
                2.4978914 = boost
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.011399013 = queryNorm
              0.65265775 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.1135679 = weight(abstract_txt:chinese in 4773) [ClassicSimilarity], result of:
            0.1135679 = score(doc=4773,freq=2.0), product of:
              0.20398743 = queryWeight, product of:
                2.8410509 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.011399013 = queryNorm
              0.5567397 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.22039501 = weight(abstract_txt:character in 4773) [ClassicSimilarity], result of:
            0.22039501 = score(doc=4773,freq=2.0), product of:
              0.38245487 = queryWeight, product of:
                5.1461973 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011399013 = queryNorm
              0.5762641 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.34125638 = weight(abstract_txt:segmentation in 4773) [ClassicSimilarity], result of:
            0.34125638 = score(doc=4773,freq=2.0), product of:
              0.48623994 = queryWeight, product of:
                5.3721514 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.011399013 = queryNorm
              0.7018271 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
          0.1234212 = weight(abstract_txt:words in 4773) [ClassicSimilarity], result of:
            0.1234212 = score(doc=4773,freq=1.0), product of:
              0.36870825 = queryWeight, product of:
                6.0393295 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.011399013 = queryNorm
              0.33473945 = fieldWeight in 4773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=4773)
        0.32 = coord(8/25)
    
  4. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.30
    0.3032624 = sum of:
      0.3032624 = product of:
        1.2635934 = sum of:
          0.097029045 = weight(abstract_txt:characters in 4913) [ClassicSimilarity], result of:
            0.097029045 = score(doc=4913,freq=1.0), product of:
              0.21024771 = queryWeight, product of:
                2.4978914 = boost
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.011399013 = queryNorm
              0.4614987 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.19670539 = weight(abstract_txt:chinese in 4913) [ClassicSimilarity], result of:
            0.19670539 = score(doc=4913,freq=6.0), product of:
              0.20398743 = queryWeight, product of:
                2.8410509 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.011399013 = queryNorm
              0.96430147 = fieldWeight in 4913, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.084564485 = weight(abstract_txt:frequency in 4913) [ClassicSimilarity], result of:
            0.084564485 = score(doc=4913,freq=1.0), product of:
              0.22744253 = queryWeight, product of:
                3.3540392 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.011399013 = queryNorm
              0.37180594 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.1558428 = weight(abstract_txt:character in 4913) [ClassicSimilarity], result of:
            0.1558428 = score(doc=4913,freq=1.0), product of:
              0.38245487 = queryWeight, product of:
                5.1461973 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011399013 = queryNorm
              0.40748024 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.4826094 = weight(abstract_txt:segmentation in 4913) [ClassicSimilarity], result of:
            0.4826094 = score(doc=4913,freq=4.0), product of:
              0.48623994 = queryWeight, product of:
                5.3721514 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.011399013 = queryNorm
              0.99253345 = fieldWeight in 4913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
          0.2468424 = weight(abstract_txt:words in 4913) [ClassicSimilarity], result of:
            0.2468424 = score(doc=4913,freq=4.0), product of:
              0.36870825 = queryWeight, product of:
                6.0393295 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.011399013 = queryNorm
              0.6694789 = fieldWeight in 4913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=4913)
        0.24 = coord(6/25)
    
  5. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.26
    0.2562789 = sum of:
      0.2562789 = product of:
        0.9152818 = sum of:
          0.0064318753 = weight(abstract_txt:information in 1831) [ClassicSimilarity], result of:
            0.0064318753 = score(doc=1831,freq=2.0), product of:
              0.03008325 = queryWeight, product of:
                1.0910375 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.011399013 = queryNorm
              0.21380253 = fieldWeight in 1831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.024118368 = weight(abstract_txt:simple in 1831) [ClassicSimilarity], result of:
            0.024118368 = score(doc=1831,freq=1.0), product of:
              0.07261031 = queryWeight, product of:
                1.1985646 = boost
                5.314588 = idf(docFreq=593, maxDocs=44421)
                0.011399013 = queryNorm
              0.33216175 = fieldWeight in 1831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.314588 = idf(docFreq=593, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.026569655 = weight(abstract_txt:significantly in 1831) [ClassicSimilarity], result of:
            0.026569655 = score(doc=1831,freq=1.0), product of:
              0.077450395 = queryWeight, product of:
                1.2378674 = boost
                5.4888616 = idf(docFreq=498, maxDocs=44421)
                0.011399013 = queryNorm
              0.34305385 = fieldWeight in 1831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4888616 = idf(docFreq=498, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.027417121 = weight(abstract_txt:statistical in 1831) [ClassicSimilarity], result of:
            0.027417121 = score(doc=1831,freq=1.0), product of:
              0.079088666 = queryWeight, product of:
                1.2508909 = boost
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.011399013 = queryNorm
              0.3466631 = fieldWeight in 1831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.03170327 = weight(abstract_txt:were in 1831) [ClassicSimilarity], result of:
            0.03170327 = score(doc=1831,freq=4.0), product of:
              0.06915534 = queryWeight, product of:
                1.6542082 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.011399013 = queryNorm
              0.4584356 = fieldWeight in 1831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.16060926 = weight(abstract_txt:chinese in 1831) [ClassicSimilarity], result of:
            0.16060926 = score(doc=1831,freq=4.0), product of:
              0.20398743 = queryWeight, product of:
                2.8410509 = boost
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.011399013 = queryNorm
              0.7873488 = fieldWeight in 1831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2987905 = idf(docFreq=221, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.6384322 = weight(abstract_txt:segmentation in 1831) [ClassicSimilarity], result of:
            0.6384322 = score(doc=1831,freq=7.0), product of:
              0.48623994 = queryWeight, product of:
                5.3721514 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.011399013 = queryNorm
              1.3129983 = fieldWeight in 1831, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
        0.28 = coord(7/25)