Document (#37473)

Author
Wartena, C.
Sommer, M.
Title
Automatic classification of scientific records using the German Subject Heading Authority File (SWD)
Source
Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al
Year
2012
Pages
S.37-48
Abstract
The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Content
This work is partially based on the Bachelor thesis of Maike Sommer. Vgl. auch: http://sda2012.dke-research.de.
Theme
Automatisches Klassifizieren
Object
DDC
SWD

Similar documents (author)

  1. Sommer, F.T.: Theorie neuronaler Assoziativspeicher : lokales Lernen und iteratives Retrieval von Information (1994) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:sommer in 3169) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 3169, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=3169)
    
  2. Sommer, D.: Zwölf Jahre Projektarbeit am VD 17 in der Universitäts- und Landesbibliothek Sachsen-Anhalt - eine Bilanz (2008) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:sommer in 3336) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 3336, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=3336)
    
  3. Sommer, M.: Automatische Generierung von DDC-Notationen für Hochschulveröffentlichungen (2012) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:sommer in 1587) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 1587, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=1587)
    
  4. Sommer, D.: VD16, VD17, VD18 : Diversität und Integration (2010) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:sommer in 3867) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 3867, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=3867)
    
  5. Sommer, D.; Schöning-Walter, C.; Heiligenhaus, K.: URN Granular : persistente Identifizierung und Adressierung von Einzelseiten digitalisierter Drucke (2008) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:sommer in 916) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 916, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=916)
    

Similar documents (content)

  1. Jacobs, J.-H.; Mengel, T.; Müller, K.: Benefits of the CrissCross project for conceptual interoperability and retrieval (2010) 0.26
    0.26292592 = sum of:
      0.26292592 = product of:
        1.0955247 = sum of:
          0.10410389 = weight(abstract_txt:authority in 25) [ClassicSimilarity], result of:
            0.10410389 = score(doc=25,freq=1.0), product of:
              0.15650764 = queryWeight, product of:
                1.6434757 = boost
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.01789579 = queryNorm
              0.6651681 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
          0.120251924 = weight(abstract_txt:file in 25) [ClassicSimilarity], result of:
            0.120251924 = score(doc=25,freq=1.0), product of:
              0.17230013 = queryWeight, product of:
                1.7244011 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.01789579 = queryNorm
              0.6979213 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
          0.06194495 = weight(abstract_txt:subject in 25) [ClassicSimilarity], result of:
            0.06194495 = score(doc=25,freq=1.0), product of:
              0.12674338 = queryWeight, product of:
                1.8113558 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.01789579 = queryNorm
              0.4887431 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
          0.33712792 = weight(abstract_txt:notations in 25) [ClassicSimilarity], result of:
            0.33712792 = score(doc=25,freq=1.0), product of:
              0.34257373 = queryWeight, product of:
                2.4314902 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.01789579 = queryNorm
              0.98410326 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
          0.1318705 = weight(abstract_txt:classification in 25) [ClassicSimilarity], result of:
            0.1318705 = score(doc=25,freq=1.0), product of:
              0.26425898 = queryWeight, product of:
                3.6988866 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01789579 = queryNorm
              0.49901992 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
          0.34022546 = weight(abstract_txt:german in 25) [ClassicSimilarity], result of:
            0.34022546 = score(doc=25,freq=1.0), product of:
              0.43425563 = queryWeight, product of:
                3.8715353 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.01789579 = queryNorm
              0.7834682 = fieldWeight in 25, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.125 = fieldNorm(doc=25)
        0.24 = coord(6/25)
    
  2. Jahns, Y.: 20 years SWD : German subject authority data prepared for the future (2011) 0.24
    0.24456643 = sum of:
      0.24456643 = product of:
        0.87345153 = sum of:
          0.014246091 = weight(abstract_txt:with in 2802) [ClassicSimilarity], result of:
            0.014246091 = score(doc=2802,freq=2.0), product of:
              0.051656123 = queryWeight, product of:
                1.1563839 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.01789579 = queryNorm
              0.2757871 = fieldWeight in 2802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.092015706 = weight(abstract_txt:authority in 2802) [ClassicSimilarity], result of:
            0.092015706 = score(doc=2802,freq=2.0), product of:
              0.15650764 = queryWeight, product of:
                1.6434757 = boost
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.01789579 = queryNorm
              0.5879311 = fieldWeight in 2802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.10628869 = weight(abstract_txt:file in 2802) [ClassicSimilarity], result of:
            0.10628869 = score(doc=2802,freq=2.0), product of:
              0.17230013 = queryWeight, product of:
                1.7244011 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.01789579 = queryNorm
              0.6168811 = fieldWeight in 2802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.06705737 = weight(abstract_txt:subject in 2802) [ClassicSimilarity], result of:
            0.06705737 = score(doc=2802,freq=3.0), product of:
              0.12674338 = queryWeight, product of:
                1.8113558 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.01789579 = queryNorm
              0.5290799 = fieldWeight in 2802, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.21070497 = weight(abstract_txt:notations in 2802) [ClassicSimilarity], result of:
            0.21070497 = score(doc=2802,freq=1.0), product of:
              0.34257373 = queryWeight, product of:
                2.4314902 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.01789579 = queryNorm
              0.61506456 = fieldWeight in 2802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.08241906 = weight(abstract_txt:classification in 2802) [ClassicSimilarity], result of:
            0.08241906 = score(doc=2802,freq=1.0), product of:
              0.26425898 = queryWeight, product of:
                3.6988866 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01789579 = queryNorm
              0.31188744 = fieldWeight in 2802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
          0.30071968 = weight(abstract_txt:german in 2802) [ClassicSimilarity], result of:
            0.30071968 = score(doc=2802,freq=2.0), product of:
              0.43425563 = queryWeight, product of:
                3.8715353 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.01789579 = queryNorm
              0.6924946 = fieldWeight in 2802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.078125 = fieldNorm(doc=2802)
        0.28 = coord(7/25)
    
  3. Jacobs, J.-H.; Mengel, T.; Müller, K.: Insights and Outlooks : a retrospective view on the CrissCross project (2011) 0.23
    0.23006018 = sum of:
      0.23006018 = product of:
        0.9585841 = sum of:
          0.0910909 = weight(abstract_txt:authority in 785) [ClassicSimilarity], result of:
            0.0910909 = score(doc=785,freq=1.0), product of:
              0.15650764 = queryWeight, product of:
                1.6434757 = boost
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.01789579 = queryNorm
              0.5820221 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
          0.10522044 = weight(abstract_txt:file in 785) [ClassicSimilarity], result of:
            0.10522044 = score(doc=785,freq=1.0), product of:
              0.17230013 = queryWeight, product of:
                1.7244011 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.01789579 = queryNorm
              0.6106811 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
          0.05420183 = weight(abstract_txt:subject in 785) [ClassicSimilarity], result of:
            0.05420183 = score(doc=785,freq=1.0), product of:
              0.12674338 = queryWeight, product of:
                1.8113558 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.01789579 = queryNorm
              0.4276502 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
          0.29498693 = weight(abstract_txt:notations in 785) [ClassicSimilarity], result of:
            0.29498693 = score(doc=785,freq=1.0), product of:
              0.34257373 = queryWeight, product of:
                2.4314902 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.01789579 = queryNorm
              0.86109036 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
          0.11538669 = weight(abstract_txt:classification in 785) [ClassicSimilarity], result of:
            0.11538669 = score(doc=785,freq=1.0), product of:
              0.26425898 = queryWeight, product of:
                3.6988866 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01789579 = queryNorm
              0.43664244 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
          0.29769728 = weight(abstract_txt:german in 785) [ClassicSimilarity], result of:
            0.29769728 = score(doc=785,freq=1.0), product of:
              0.43425563 = queryWeight, product of:
                3.8715353 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.01789579 = queryNorm
              0.68553466 = fieldWeight in 785, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.109375 = fieldNorm(doc=785)
        0.24 = coord(6/25)
    
  4. Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.19
    0.18654266 = sum of:
      0.18654266 = product of:
        0.7772611 = sum of:
          0.010073508 = weight(abstract_txt:with in 3166) [ClassicSimilarity], result of:
            0.010073508 = score(doc=3166,freq=1.0), product of:
              0.051656123 = queryWeight, product of:
                1.1563839 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.01789579 = queryNorm
              0.19501092 = fieldWeight in 3166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.074786454 = weight(abstract_txt:records in 3166) [ClassicSimilarity], result of:
            0.074786454 = score(doc=3166,freq=4.0), product of:
              0.108184785 = queryWeight, product of:
                1.3664024 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01789579 = queryNorm
              0.6912844 = fieldWeight in 3166, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.085649684 = weight(abstract_txt:automatic in 3166) [ClassicSimilarity], result of:
            0.085649684 = score(doc=3166,freq=2.0), product of:
              0.14920318 = queryWeight, product of:
                1.6046656 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.01789579 = queryNorm
              0.5740473 = fieldWeight in 3166, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.038715594 = weight(abstract_txt:subject in 3166) [ClassicSimilarity], result of:
            0.038715594 = score(doc=3166,freq=1.0), product of:
              0.12674338 = queryWeight, product of:
                1.8113558 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.01789579 = queryNorm
              0.30546445 = fieldWeight in 3166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.14275399 = weight(abstract_txt:classification in 3166) [ClassicSimilarity], result of:
            0.14275399 = score(doc=3166,freq=3.0), product of:
              0.26425898 = queryWeight, product of:
                3.6988866 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01789579 = queryNorm
              0.5402049 = fieldWeight in 3166, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.42528185 = weight(abstract_txt:german in 3166) [ClassicSimilarity], result of:
            0.42528185 = score(doc=3166,freq=4.0), product of:
              0.43425563 = queryWeight, product of:
                3.8715353 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.01789579 = queryNorm
              0.97933525 = fieldWeight in 3166, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
        0.24 = coord(6/25)
    
  5. Rolland-Thomas, P.; Mercure, G.: Subject access in a bilingual online catalogue (1989) 0.19
    0.18653686 = sum of:
      0.18653686 = product of:
        0.5829277 = sum of:
          0.014246091 = weight(abstract_txt:with in 575) [ClassicSimilarity], result of:
            0.014246091 = score(doc=575,freq=2.0), product of:
              0.051656123 = queryWeight, product of:
                1.1563839 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.01789579 = queryNorm
              0.2757871 = fieldWeight in 575, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.1299572 = weight(abstract_txt:reciprocal in 575) [ClassicSimilarity], result of:
            0.1299572 = score(doc=575,freq=1.0), product of:
              0.1970128 = queryWeight, product of:
                1.3038503 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.01789579 = queryNorm
              0.65963835 = fieldWeight in 575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.064766966 = weight(abstract_txt:records in 575) [ClassicSimilarity], result of:
            0.064766966 = score(doc=575,freq=3.0), product of:
              0.108184785 = queryWeight, product of:
                1.3664024 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01789579 = queryNorm
              0.5986698 = fieldWeight in 575, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.085649684 = weight(abstract_txt:automatic in 575) [ClassicSimilarity], result of:
            0.085649684 = score(doc=575,freq=2.0), product of:
              0.14920318 = queryWeight, product of:
                1.6046656 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.01789579 = queryNorm
              0.5740473 = fieldWeight in 575, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.092015706 = weight(abstract_txt:authority in 575) [ClassicSimilarity], result of:
            0.092015706 = score(doc=575,freq=2.0), product of:
              0.15650764 = queryWeight, product of:
                1.6434757 = boost
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.01789579 = queryNorm
              0.5879311 = fieldWeight in 575, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.321345 = idf(docFreq=589, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.075157456 = weight(abstract_txt:file in 575) [ClassicSimilarity], result of:
            0.075157456 = score(doc=575,freq=1.0), product of:
              0.17230013 = queryWeight, product of:
                1.7244011 = boost
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.01789579 = queryNorm
              0.4362008 = fieldWeight in 575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58337 = idf(docFreq=453, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.038715594 = weight(abstract_txt:subject in 575) [ClassicSimilarity], result of:
            0.038715594 = score(doc=575,freq=1.0), product of:
              0.12674338 = queryWeight, product of:
                1.8113558 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.01789579 = queryNorm
              0.30546445 = fieldWeight in 575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
          0.08241906 = weight(abstract_txt:classification in 575) [ClassicSimilarity], result of:
            0.08241906 = score(doc=575,freq=1.0), product of:
              0.26425898 = queryWeight, product of:
                3.6988866 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.01789579 = queryNorm
              0.31188744 = fieldWeight in 575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=575)
        0.32 = coord(8/25)