Document (#23481)

Author
Goller, C.
Löning, J.
Will, T.
Wolff, W.
Title
Automatic document classification : a thourough evaluation of various methods
Source
Informationskompetenz - Basiskompetenz in der Informationsgesellschaft: Proceedings des 7. Internationalen Symposiums für Informationswissenschaft (ISI 2000), Hrsg.: G. Knorz u. R. Kuhlen
Imprint
Konstanz : UVK, Universitätsverlag
Year
2000
Pages
S.245-264
Series
Schriften zur Informationswissenschaft; Bd.38
Abstract
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
Theme
Automatisches Indexieren
Computerlinguistik

Similar documents (author)

  1. Will, L.; Will, S.: Dewey for Windows (1997) 2.35
    2.3481336 = sum of:
      2.3481336 = product of:
        4.696267 = sum of:
          4.696267 = weight(author_txt:will in 771) [ClassicSimilarity], result of:
            4.696267 = score(doc=771,freq=2.0), product of:
              0.7584222 = queryWeight, product of:
                1.0787244 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.08028673 = queryNorm
              6.1921544 = fieldWeight in 771, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.5 = fieldNorm(doc=771)
        0.5 = coord(1/2)
    
  2. Will, L.: ¬The indexing of museum objects (1993) 2.08
    2.0754764 = sum of:
      2.0754764 = product of:
        4.150953 = sum of:
          4.150953 = weight(author_txt:will in 6100) [ClassicSimilarity], result of:
            4.150953 = score(doc=6100,freq=1.0), product of:
              0.7584222 = queryWeight, product of:
                1.0787244 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.08028673 = queryNorm
              5.4731426 = fieldWeight in 6100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.625 = fieldNorm(doc=6100)
        0.5 = coord(1/2)
    
  3. Will, L.D.: UML model : as given in British Standard Draft for Development DD8723-5:2008 (2008) 2.08
    2.0754764 = sum of:
      2.0754764 = product of:
        4.150953 = sum of:
          4.150953 = weight(author_txt:will in 7635) [ClassicSimilarity], result of:
            4.150953 = score(doc=7635,freq=1.0), product of:
              0.7584222 = queryWeight, product of:
                1.0787244 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.08028673 = queryNorm
              5.4731426 = fieldWeight in 7635, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.625 = fieldNorm(doc=7635)
        0.5 = coord(1/2)
    
  4. Will, L.: Museum objects as sources of information (1994) 2.08
    2.0754764 = sum of:
      2.0754764 = product of:
        4.150953 = sum of:
          4.150953 = weight(author_txt:will in 8293) [ClassicSimilarity], result of:
            4.150953 = score(doc=8293,freq=1.0), product of:
              0.7584222 = queryWeight, product of:
                1.0787244 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.08028673 = queryNorm
              5.4731426 = fieldWeight in 8293, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.625 = fieldNorm(doc=8293)
        0.5 = coord(1/2)
    
  5. Will, A.: Von Georgi bis UnCover oder Bibliographieunterricht im Lichte sich wandelnder Praxis und Erscheinungsformen (1995) 2.08
    2.0754764 = sum of:
      2.0754764 = product of:
        4.150953 = sum of:
          4.150953 = weight(author_txt:will in 2303) [ClassicSimilarity], result of:
            4.150953 = score(doc=2303,freq=1.0), product of:
              0.7584222 = queryWeight, product of:
                1.0787244 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.08028673 = queryNorm
              5.4731426 = fieldWeight in 2303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.625 = fieldNorm(doc=2303)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.35
    0.35018685 = sum of:
      0.35018685 = product of:
        0.8754671 = sum of:
          0.047022276 = weight(abstract_txt:reduce in 1446) [ClassicSimilarity], result of:
            0.047022276 = score(doc=1446,freq=1.0), product of:
              0.11969968 = queryWeight, product of:
                1.0028114 = boost
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.01899079 = queryNorm
              0.39283544 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.089160725 = weight(abstract_txt:neural in 1446) [ClassicSimilarity], result of:
            0.089160725 = score(doc=1446,freq=2.0), product of:
              0.14554466 = queryWeight, product of:
                1.1057856 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.01899079 = queryNorm
              0.61260045 = fieldWeight in 1446, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.09480191 = weight(abstract_txt:gram in 1446) [ClassicSimilarity], result of:
            0.09480191 = score(doc=1446,freq=1.0), product of:
              0.19103016 = queryWeight, product of:
                1.2668458 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.01899079 = queryNorm
              0.49626672 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.031672996 = weight(abstract_txt:support in 1446) [ClassicSimilarity], result of:
            0.031672996 = score(doc=1446,freq=1.0), product of:
              0.11588484 = queryWeight, product of:
                1.3954077 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.01899079 = queryNorm
              0.2733144 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.03199021 = weight(abstract_txt:various in 1446) [ClassicSimilarity], result of:
            0.03199021 = score(doc=1446,freq=1.0), product of:
              0.11665732 = queryWeight, product of:
                1.4000508 = boost
                4.387581 = idf(docFreq=1500, maxDocs=44421)
                0.01899079 = queryNorm
              0.2742238 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.387581 = idf(docFreq=1500, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.04038735 = weight(abstract_txt:learning in 1446) [ClassicSimilarity], result of:
            0.04038735 = score(doc=1446,freq=1.0), product of:
              0.13626912 = queryWeight, product of:
                1.5131658 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.01899079 = queryNorm
              0.29637933 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.08318964 = weight(abstract_txt:selection in 1446) [ClassicSimilarity], result of:
            0.08318964 = score(doc=1446,freq=2.0), product of:
              0.1750935 = queryWeight, product of:
                1.7152318 = boost
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.01899079 = queryNorm
              0.47511548 = fieldWeight in 1446, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.19076455 = weight(abstract_txt:feature in 1446) [ClassicSimilarity], result of:
            0.19076455 = score(doc=1446,freq=6.0), product of:
              0.21111313 = queryWeight, product of:
                1.8834124 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.01899079 = queryNorm
              0.9036129 = fieldWeight in 1446, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.1049594 = weight(abstract_txt:vector in 1446) [ClassicSimilarity], result of:
            0.1049594 = score(doc=1446,freq=1.0), product of:
              0.25758156 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.01899079 = queryNorm
              0.40748024 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.16151817 = weight(abstract_txt:classifiers in 1446) [ClassicSimilarity], result of:
            0.16151817 = score(doc=1446,freq=1.0), product of:
              0.3433324 = queryWeight, product of:
                2.4018462 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01899079 = queryNorm
              0.47044253 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
        0.4 = coord(10/25)
    
  2. Malenica, M.; Smuc, T.; Snajder, J.; Basic, B.D.: Language morphology offset : text classification on a Croatian-English parallel corpus (2008) 0.32
    0.3235977 = sum of:
      0.3235977 = product of:
        1.0112429 = sum of:
          0.2748751 = weight(abstract_txt:morphological in 3035) [ClassicSimilarity], result of:
            0.2748751 = score(doc=3035,freq=5.0), product of:
              0.19575708 = queryWeight, product of:
                1.2824237 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.01899079 = queryNorm
              1.4041643 = fieldWeight in 3035, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.03959124 = weight(abstract_txt:support in 3035) [ClassicSimilarity], result of:
            0.03959124 = score(doc=3035,freq=1.0), product of:
              0.11588484 = queryWeight, product of:
                1.3954077 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.01899079 = queryNorm
              0.34164298 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.073529944 = weight(abstract_txt:selection in 3035) [ClassicSimilarity], result of:
            0.073529944 = score(doc=3035,freq=1.0), product of:
              0.1750935 = queryWeight, product of:
                1.7152318 = boost
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.01899079 = queryNorm
              0.41994673 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.16861363 = weight(abstract_txt:feature in 3035) [ClassicSimilarity], result of:
            0.16861363 = score(doc=3035,freq=3.0), product of:
              0.21111313 = queryWeight, product of:
                1.8834124 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.01899079 = queryNorm
              0.79868853 = fieldWeight in 3035, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.13119924 = weight(abstract_txt:vector in 3035) [ClassicSimilarity], result of:
            0.13119924 = score(doc=3035,freq=1.0), product of:
              0.25758156 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.01899079 = queryNorm
              0.5093503 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.16571367 = weight(abstract_txt:machines in 3035) [ClassicSimilarity], result of:
            0.16571367 = score(doc=3035,freq=1.0), product of:
              0.3009766 = queryWeight, product of:
                2.2488172 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.01899079 = queryNorm
              0.5505865 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.06735582 = weight(abstract_txt:methods in 3035) [ClassicSimilarity], result of:
            0.06735582 = score(doc=3035,freq=1.0), product of:
              0.20807534 = queryWeight, product of:
                2.6443145 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.01899079 = queryNorm
              0.3237088 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
          0.09036424 = weight(abstract_txt:classification in 3035) [ClassicSimilarity], result of:
            0.09036424 = score(doc=3035,freq=1.0), product of:
              0.2897335 = queryWeight, product of:
                3.8216226 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01899079 = queryNorm
              0.31188744 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=3035)
        0.32 = coord(8/25)
    
  3. Li, Y.; Shawe-Taylor, J.: Advanced learning algorithms for cross-language patent retrieval and classification (2007) 0.26
    0.2620169 = sum of:
      0.2620169 = product of:
        0.81880283 = sum of:
          0.03959124 = weight(abstract_txt:support in 1931) [ClassicSimilarity], result of:
            0.03959124 = score(doc=1931,freq=1.0), product of:
              0.11588484 = queryWeight, product of:
                1.3954077 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.01899079 = queryNorm
              0.34164298 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.1335686 = weight(abstract_txt:learning in 1931) [ClassicSimilarity], result of:
            0.1335686 = score(doc=1931,freq=7.0), product of:
              0.13626912 = queryWeight, product of:
                1.5131658 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.01899079 = queryNorm
              0.9801824 = fieldWeight in 1931, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.09734912 = weight(abstract_txt:feature in 1931) [ClassicSimilarity], result of:
            0.09734912 = score(doc=1931,freq=1.0), product of:
              0.21111313 = queryWeight, product of:
                1.8834124 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.01899079 = queryNorm
              0.46112302 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.05623084 = weight(abstract_txt:document in 1931) [ClassicSimilarity], result of:
            0.05623084 = score(doc=1931,freq=1.0), product of:
              0.16761287 = queryWeight, product of:
                2.0553563 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01899079 = queryNorm
              0.33548045 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.13119924 = weight(abstract_txt:vector in 1931) [ClassicSimilarity], result of:
            0.13119924 = score(doc=1931,freq=1.0), product of:
              0.25758156 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.01899079 = queryNorm
              0.5093503 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.16571367 = weight(abstract_txt:machines in 1931) [ClassicSimilarity], result of:
            0.16571367 = score(doc=1931,freq=1.0), product of:
              0.3009766 = queryWeight, product of:
                2.2488172 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.01899079 = queryNorm
              0.5505865 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.06735582 = weight(abstract_txt:methods in 1931) [ClassicSimilarity], result of:
            0.06735582 = score(doc=1931,freq=1.0), product of:
              0.20807534 = queryWeight, product of:
                2.6443145 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.01899079 = queryNorm
              0.3237088 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
          0.12779433 = weight(abstract_txt:classification in 1931) [ClassicSimilarity], result of:
            0.12779433 = score(doc=1931,freq=2.0), product of:
              0.2897335 = queryWeight, product of:
                3.8216226 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01899079 = queryNorm
              0.44107544 = fieldWeight in 1931, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=1931)
        0.32 = coord(8/25)
    
  4. Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.22
    0.22485442 = sum of:
      0.22485442 = product of:
        0.7026701 = sum of:
          0.031672996 = weight(abstract_txt:support in 4151) [ClassicSimilarity], result of:
            0.031672996 = score(doc=4151,freq=1.0), product of:
              0.11588484 = queryWeight, product of:
                1.3954077 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.01899079 = queryNorm
              0.2733144 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.04038735 = weight(abstract_txt:learning in 4151) [ClassicSimilarity], result of:
            0.04038735 = score(doc=4151,freq=1.0), product of:
              0.13626912 = queryWeight, product of:
                1.5131658 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.01899079 = queryNorm
              0.29637933 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.05312143 = weight(abstract_txt:automatic in 4151) [ClassicSimilarity], result of:
            0.05312143 = score(doc=4151,freq=1.0), product of:
              0.16358635 = queryWeight, product of:
                1.6579114 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.01899079 = queryNorm
              0.32473022 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.1049594 = weight(abstract_txt:vector in 4151) [ClassicSimilarity], result of:
            0.1049594 = score(doc=4151,freq=1.0), product of:
              0.25758156 = queryWeight, product of:
                2.0803921 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.01899079 = queryNorm
              0.40748024 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.13257092 = weight(abstract_txt:machines in 4151) [ClassicSimilarity], result of:
            0.13257092 = score(doc=4151,freq=1.0), product of:
              0.3009766 = queryWeight, product of:
                2.2488172 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.01899079 = queryNorm
              0.4404692 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.16151817 = weight(abstract_txt:classifiers in 4151) [ClassicSimilarity], result of:
            0.16151817 = score(doc=4151,freq=1.0), product of:
              0.3433324 = queryWeight, product of:
                2.4018462 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01899079 = queryNorm
              0.47044253 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.07620441 = weight(abstract_txt:methods in 4151) [ClassicSimilarity], result of:
            0.07620441 = score(doc=4151,freq=2.0), product of:
              0.20807534 = queryWeight, product of:
                2.6443145 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.01899079 = queryNorm
              0.3662347 = fieldWeight in 4151, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
          0.102235466 = weight(abstract_txt:classification in 4151) [ClassicSimilarity], result of:
            0.102235466 = score(doc=4151,freq=2.0), product of:
              0.2897335 = queryWeight, product of:
                3.8216226 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01899079 = queryNorm
              0.35286036 = fieldWeight in 4151, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=4151)
        0.32 = coord(8/25)
    
  5. Finn, A.; Kushmerick, N.: Learning to classify documents according to genre (2006) 0.22
    0.21561934 = sum of:
      0.21561934 = product of:
        0.77006906 = sum of:
          0.058777843 = weight(abstract_txt:reduce in 10) [ClassicSimilarity], result of:
            0.058777843 = score(doc=10,freq=1.0), product of:
              0.11969968 = queryWeight, product of:
                1.0028114 = boost
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.01899079 = queryNorm
              0.49104428 = fieldWeight in 10, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.050484188 = weight(abstract_txt:learning in 10) [ClassicSimilarity], result of:
            0.050484188 = score(doc=10,freq=1.0), product of:
              0.13626912 = queryWeight, product of:
                1.5131658 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.01899079 = queryNorm
              0.37047416 = fieldWeight in 10, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.093906306 = weight(abstract_txt:automatic in 10) [ClassicSimilarity], result of:
            0.093906306 = score(doc=10,freq=2.0), product of:
              0.16358635 = queryWeight, product of:
                1.6579114 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.01899079 = queryNorm
              0.5740473 = fieldWeight in 10, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.09734912 = weight(abstract_txt:feature in 10) [ClassicSimilarity], result of:
            0.09734912 = score(doc=10,freq=1.0), product of:
              0.21111313 = queryWeight, product of:
                1.8834124 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.01899079 = queryNorm
              0.46112302 = fieldWeight in 10, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.05623084 = weight(abstract_txt:document in 10) [ClassicSimilarity], result of:
            0.05623084 = score(doc=10,freq=1.0), product of:
              0.16761287 = queryWeight, product of:
                2.0553563 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01899079 = queryNorm
              0.33548045 = fieldWeight in 10, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.28552648 = weight(abstract_txt:classifiers in 10) [ClassicSimilarity], result of:
            0.28552648 = score(doc=10,freq=2.0), product of:
              0.3433324 = queryWeight, product of:
                2.4018462 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.01899079 = queryNorm
              0.83163273 = fieldWeight in 10, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
          0.12779433 = weight(abstract_txt:classification in 10) [ClassicSimilarity], result of:
            0.12779433 = score(doc=10,freq=2.0), product of:
              0.2897335 = queryWeight, product of:
                3.8216226 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01899079 = queryNorm
              0.44107544 = fieldWeight in 10, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=10)
        0.28 = coord(7/25)