Document (#28390)

Author
Sebastiani, F.
Title
Machine learning in automated text categorization
Source
ACM computing surveys. 34(2002) no.1, S.1-47
Year
2002
Abstract
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (author)

  1. Sebastiani, F.: On the role of logic in information retrieval (1998) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:sebastiani in 2140) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 2140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=2140)
    
  2. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:sebastiani in 4390) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 4390, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=4390)
    
  3. Sebastiani, F.: Classification of text, automatic (2006) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:sebastiani in 3) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 3, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=3)
    
  4. Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:sebastiani in 4456) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 4456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=4456)
    
  5. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:sebastiani in 172) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 172, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=172)
    

Similar documents (content)

  1. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.87
    0.8697428 = sum of:
      0.8697428 = product of:
        1.4495713 = sum of:
          0.057567574 = weight(abstract_txt:dominant in 4390) [ClassicSimilarity], result of:
            0.057567574 = score(doc=4390,freq=1.0), product of:
              0.13034035 = queryWeight, product of:
                1.089448 = boost
                7.0667386 = idf(docFreq=102, maxDocs=44421)
                0.016929861 = queryNorm
              0.44167116 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0667386 = idf(docFreq=102, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.057806335 = weight(abstract_txt:builds in 4390) [ClassicSimilarity], result of:
            0.057806335 = score(doc=4390,freq=1.0), product of:
              0.1307005 = queryWeight, product of:
                1.090952 = boost
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.016929861 = queryNorm
              0.44228092 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.009642089 = weight(abstract_txt:this in 4390) [ClassicSimilarity], result of:
            0.009642089 = score(doc=4390,freq=2.0), product of:
              0.045335468 = queryWeight, product of:
                1.1128758 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.016929861 = queryNorm
              0.21268311 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.073781155 = weight(abstract_txt:inductive in 4390) [ClassicSimilarity], result of:
            0.073781155 = score(doc=4390,freq=1.0), product of:
              0.15378852 = queryWeight, product of:
                1.183393 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.016929861 = queryNorm
              0.47975725 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.030443486 = weight(abstract_txt:text in 4390) [ClassicSimilarity], result of:
            0.030443486 = score(doc=4390,freq=2.0), product of:
              0.08523603 = queryWeight, product of:
                1.24593 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016929861 = queryNorm
              0.3571669 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.039614417 = weight(abstract_txt:documents in 4390) [ClassicSimilarity], result of:
            0.039614417 = score(doc=4390,freq=3.0), product of:
              0.08874939 = queryWeight, product of:
                1.2713487 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016929861 = queryNorm
              0.4463627 = fieldWeight in 4390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.10413449 = weight(abstract_txt:savings in 4390) [ClassicSimilarity], result of:
            0.10413449 = score(doc=4390,freq=1.0), product of:
              0.19350402 = queryWeight, product of:
                1.3274312 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.016929861 = queryNorm
              0.53815156 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.11158624 = weight(abstract_txt:witnessed in 4390) [ClassicSimilarity], result of:
            0.11158624 = score(doc=4390,freq=1.0), product of:
              0.20262858 = queryWeight, product of:
                1.3583678 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.016929861 = queryNorm
              0.5506935 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.13136211 = weight(abstract_txt:booming in 4390) [ClassicSimilarity], result of:
            0.13136211 = score(doc=4390,freq=1.0), product of:
              0.22591256 = queryWeight, product of:
                1.4342908 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.016929861 = queryNorm
              0.5814733 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.064051315 = weight(abstract_txt:categories in 4390) [ClassicSimilarity], result of:
            0.064051315 = score(doc=4390,freq=2.0), product of:
              0.13995196 = queryWeight, product of:
                1.5965096 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.016929861 = queryNorm
              0.45766646 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.081076905 = weight(abstract_txt:automated in 4390) [ClassicSimilarity], result of:
            0.081076905 = score(doc=4390,freq=2.0), product of:
              0.16376649 = queryWeight, product of:
                1.7270088 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.016929861 = queryNorm
              0.49507627 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.03623851 = weight(abstract_txt:approach in 4390) [ClassicSimilarity], result of:
            0.03623851 = score(doc=4390,freq=2.0), product of:
              0.109589994 = queryWeight, product of:
                1.7302669 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.016929861 = queryNorm
              0.33067352 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.1015826 = weight(abstract_txt:machine in 4390) [ClassicSimilarity], result of:
            0.1015826 = score(doc=4390,freq=2.0), product of:
              0.21787308 = queryWeight, product of:
                2.4396608 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.016929861 = queryNorm
              0.4662467 = fieldWeight in 4390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.12051612 = weight(abstract_txt:learning in 4390) [ClassicSimilarity], result of:
            0.12051612 = score(doc=4390,freq=3.0), product of:
              0.23476677 = queryWeight, product of:
                2.9242556 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016929861 = queryNorm
              0.51334405 = fieldWeight in 4390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
          0.43016782 = weight(abstract_txt:classifier in 4390) [ClassicSimilarity], result of:
            0.43016782 = score(doc=4390,freq=3.0), product of:
              0.54831713 = queryWeight, product of:
                4.4690266 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.016929861 = queryNorm
              0.7845238 = fieldWeight in 4390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=4390)
        0.6 = coord(15/25)
    
  2. Sebastiani, F.: Classification of text, automatic (2006) 0.45
    0.44682795 = sum of:
      0.44682795 = product of:
        1.1170698 = sum of:
          0.08670951 = weight(abstract_txt:builds in 3) [ClassicSimilarity], result of:
            0.08670951 = score(doc=3,freq=1.0), product of:
              0.1307005 = queryWeight, product of:
                1.090952 = boost
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.016929861 = queryNorm
              0.6634214 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.010226979 = weight(abstract_txt:this in 3) [ClassicSimilarity], result of:
            0.010226979 = score(doc=3,freq=1.0), product of:
              0.045335468 = queryWeight, product of:
                1.1128758 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.016929861 = queryNorm
              0.2255845 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.04566523 = weight(abstract_txt:text in 3) [ClassicSimilarity], result of:
            0.04566523 = score(doc=3,freq=2.0), product of:
              0.08523603 = queryWeight, product of:
                1.24593 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016929861 = queryNorm
              0.5357503 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.13368802 = weight(abstract_txt:predefined in 3) [ClassicSimilarity], result of:
            0.13368802 = score(doc=3,freq=1.0), product of:
              0.17443264 = queryWeight, product of:
                1.2603202 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.016929861 = queryNorm
              0.7664163 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.09607698 = weight(abstract_txt:categories in 3) [ClassicSimilarity], result of:
            0.09607698 = score(doc=3,freq=2.0), product of:
              0.13995196 = queryWeight, product of:
                1.5965096 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.016929861 = queryNorm
              0.6864997 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.12161535 = weight(abstract_txt:automated in 3) [ClassicSimilarity], result of:
            0.12161535 = score(doc=3,freq=2.0), product of:
              0.16376649 = queryWeight, product of:
                1.7270088 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.016929861 = queryNorm
              0.7426144 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.038436744 = weight(abstract_txt:approach in 3) [ClassicSimilarity], result of:
            0.038436744 = score(doc=3,freq=1.0), product of:
              0.109589994 = queryWeight, product of:
                1.7302669 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.016929861 = queryNorm
              0.35073224 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.10774463 = weight(abstract_txt:machine in 3) [ClassicSimilarity], result of:
            0.10774463 = score(doc=3,freq=1.0), product of:
              0.21787308 = queryWeight, product of:
                2.4396608 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.016929861 = queryNorm
              0.4945293 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.10437003 = weight(abstract_txt:learning in 3) [ClassicSimilarity], result of:
            0.10437003 = score(doc=3,freq=1.0), product of:
              0.23476677 = queryWeight, product of:
                2.9242556 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016929861 = queryNorm
              0.444569 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.37253627 = weight(abstract_txt:classifier in 3) [ClassicSimilarity], result of:
            0.37253627 = score(doc=3,freq=1.0), product of:
              0.54831713 = queryWeight, product of:
                4.4690266 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.016929861 = queryNorm
              0.67941755 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
        0.4 = coord(10/25)
    
  3. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.35
    0.3543019 = sum of:
      0.3543019 = product of:
        0.8857547 = sum of:
          0.008522483 = weight(abstract_txt:this in 797) [ClassicSimilarity], result of:
            0.008522483 = score(doc=797,freq=1.0), product of:
              0.045335468 = queryWeight, product of:
                1.1128758 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.016929861 = queryNorm
              0.18798709 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.019989826 = weight(abstract_txt:different in 797) [ClassicSimilarity], result of:
            0.019989826 = score(doc=797,freq=1.0), product of:
              0.06991488 = queryWeight, product of:
                1.1284097 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.016929861 = queryNorm
              0.28591663 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.03805436 = weight(abstract_txt:text in 797) [ClassicSimilarity], result of:
            0.03805436 = score(doc=797,freq=2.0), product of:
              0.08523603 = queryWeight, product of:
                1.24593 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016929861 = queryNorm
              0.4464586 = fieldWeight in 797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.028589241 = weight(abstract_txt:documents in 797) [ClassicSimilarity], result of:
            0.028589241 = score(doc=797,freq=1.0), product of:
              0.08874939 = queryWeight, product of:
                1.2713487 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016929861 = queryNorm
              0.32213452 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.13948281 = weight(abstract_txt:witnessed in 797) [ClassicSimilarity], result of:
            0.13948281 = score(doc=797,freq=1.0), product of:
              0.20262858 = queryWeight, product of:
                1.3583678 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.016929861 = queryNorm
              0.6883669 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.16420265 = weight(abstract_txt:booming in 797) [ClassicSimilarity], result of:
            0.16420265 = score(doc=797,freq=1.0), product of:
              0.22591256 = queryWeight, product of:
                1.4342908 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.016929861 = queryNorm
              0.7268416 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.08006415 = weight(abstract_txt:categories in 797) [ClassicSimilarity], result of:
            0.08006415 = score(doc=797,freq=2.0), product of:
              0.13995196 = queryWeight, product of:
                1.5965096 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.016929861 = queryNorm
              0.57208306 = fieldWeight in 797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.07166254 = weight(abstract_txt:automated in 797) [ClassicSimilarity], result of:
            0.07166254 = score(doc=797,freq=1.0), product of:
              0.16376649 = queryWeight, product of:
                1.7270088 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.016929861 = queryNorm
              0.43758973 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.03203062 = weight(abstract_txt:approach in 797) [ClassicSimilarity], result of:
            0.03203062 = score(doc=797,freq=1.0), product of:
              0.109589994 = queryWeight, product of:
                1.7302669 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.016929861 = queryNorm
              0.29227686 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
          0.30315605 = weight(abstract_txt:categorization in 797) [ClassicSimilarity], result of:
            0.30315605 = score(doc=797,freq=3.0), product of:
              0.33998865 = queryWeight, product of:
                3.0476134 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.016929861 = queryNorm
              0.89166516 = fieldWeight in 797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.078125 = fieldNorm(doc=797)
        0.4 = coord(10/25)
    
  4. Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.32
    0.31607974 = sum of:
      0.31607974 = product of:
        1.1288562 = sum of:
          0.008522483 = weight(abstract_txt:this in 115) [ClassicSimilarity], result of:
            0.008522483 = score(doc=115,freq=1.0), product of:
              0.045335468 = queryWeight, product of:
                1.1128758 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.016929861 = queryNorm
              0.18798709 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.026908495 = weight(abstract_txt:text in 115) [ClassicSimilarity], result of:
            0.026908495 = score(doc=115,freq=1.0), product of:
              0.08523603 = queryWeight, product of:
                1.24593 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016929861 = queryNorm
              0.3156939 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.057178482 = weight(abstract_txt:documents in 115) [ClassicSimilarity], result of:
            0.057178482 = score(doc=115,freq=4.0), product of:
              0.08874939 = queryWeight, product of:
                1.2713487 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016929861 = queryNorm
              0.64426905 = fieldWeight in 115, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.08006415 = weight(abstract_txt:categories in 115) [ClassicSimilarity], result of:
            0.08006415 = score(doc=115,freq=2.0), product of:
              0.13995196 = queryWeight, product of:
                1.5965096 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.016929861 = queryNorm
              0.57208306 = fieldWeight in 115, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.08697502 = weight(abstract_txt:learning in 115) [ClassicSimilarity], result of:
            0.08697502 = score(doc=115,freq=1.0), product of:
              0.23476677 = queryWeight, product of:
                2.9242556 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016929861 = queryNorm
              0.37047416 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.17502722 = weight(abstract_txt:categorization in 115) [ClassicSimilarity], result of:
            0.17502722 = score(doc=115,freq=1.0), product of:
              0.33998865 = queryWeight, product of:
                3.0476134 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.016929861 = queryNorm
              0.5148031 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
          0.69418037 = weight(abstract_txt:classifier in 115) [ClassicSimilarity], result of:
            0.69418037 = score(doc=115,freq=5.0), product of:
              0.54831713 = queryWeight, product of:
                4.4690266 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.016929861 = queryNorm
              1.2660198 = fieldWeight in 115, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.078125 = fieldNorm(doc=115)
        0.28 = coord(7/25)
    
  5. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.31
    0.30614406 = sum of:
      0.30614406 = product of:
        1.0933716 = sum of:
          0.010226979 = weight(abstract_txt:this in 2595) [ClassicSimilarity], result of:
            0.010226979 = score(doc=2595,freq=1.0), product of:
              0.045335468 = queryWeight, product of:
                1.1128758 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.016929861 = queryNorm
              0.2255845 = fieldWeight in 2595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.04566523 = weight(abstract_txt:text in 2595) [ClassicSimilarity], result of:
            0.04566523 = score(doc=2595,freq=2.0), product of:
              0.08523603 = queryWeight, product of:
                1.24593 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016929861 = queryNorm
              0.5357503 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.06793668 = weight(abstract_txt:categories in 2595) [ClassicSimilarity], result of:
            0.06793668 = score(doc=2595,freq=1.0), product of:
              0.13995196 = queryWeight, product of:
                1.5965096 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.016929861 = queryNorm
              0.4854286 = fieldWeight in 2595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.15237391 = weight(abstract_txt:machine in 2595) [ClassicSimilarity], result of:
            0.15237391 = score(doc=2595,freq=2.0), product of:
              0.21787308 = queryWeight, product of:
                2.4396608 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.016929861 = queryNorm
              0.69937 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.1476015 = weight(abstract_txt:learning in 2595) [ClassicSimilarity], result of:
            0.1476015 = score(doc=2595,freq=2.0), product of:
              0.23476677 = queryWeight, product of:
                2.9242556 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016929861 = queryNorm
              0.62871546 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.29703102 = weight(abstract_txt:categorization in 2595) [ClassicSimilarity], result of:
            0.29703102 = score(doc=2595,freq=2.0), product of:
              0.33998865 = queryWeight, product of:
                3.0476134 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.016929861 = queryNorm
              0.87364984 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.37253627 = weight(abstract_txt:classifier in 2595) [ClassicSimilarity], result of:
            0.37253627 = score(doc=2595,freq=1.0), product of:
              0.54831713 = queryWeight, product of:
                4.4690266 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.016929861 = queryNorm
              0.67941755 = fieldWeight in 2595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
        0.28 = coord(7/25)