Document (#41312)

Author
Munkelt, J.
Schaer, P.
Lepsky, K.
Title
Towards an IR test collection for the German National Library
Issue
[Preprint].
Year
2018
Abstract
Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
Footnote
Munkelt-etal_DNB_TestColletion.pdf.
Theme
Retrievalstudien
Automatisches Indexieren

Similar documents (author)

  1. Fühles-Ubach, S.; Schaer, P.; Lepsky, K.; Seidler-de Alwis, R.: Data Librarian : ein neuer Studienschwerpunkt für wissenschaftliche Bibliotheken und Forschungseinrichtungen (2019) 2.99
    2.993267 = sum of:
      2.993267 = sum of:
        1.2846128 = weight(author_txt:lepsky in 836) [ClassicSimilarity], result of:
          1.2846128 = score(doc=836,freq=1.0), product of:
            0.63721806 = queryWeight, product of:
              8.063882 = idf(docFreq=37, maxDocs=44421)
              0.07902125 = queryNorm
            2.0159705 = fieldWeight in 836, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.063882 = idf(docFreq=37, maxDocs=44421)
              0.25 = fieldNorm(doc=836)
        1.7086544 = weight(author_txt:schaer in 836) [ClassicSimilarity], result of:
          1.7086544 = score(doc=836,freq=1.0), product of:
            0.7706835 = queryWeight, product of:
              1.09975 = boost
              8.868255 = idf(docFreq=16, maxDocs=44421)
              0.07902125 = queryNorm
            2.2170637 = fieldWeight in 836, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.868255 = idf(docFreq=16, maxDocs=44421)
              0.25 = fieldNorm(doc=836)
    
  2. Schaer, P.: Integration von Open-Access-Repositorien in Fachportale (2010) 2.14
    2.135818 = sum of:
      2.135818 = product of:
        4.271636 = sum of:
          4.271636 = weight(author_txt:schaer in 3320) [ClassicSimilarity], result of:
            4.271636 = score(doc=3320,freq=1.0), product of:
              0.7706835 = queryWeight, product of:
                1.09975 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.07902125 = queryNorm
              5.5426593 = fieldWeight in 3320, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.625 = fieldNorm(doc=3320)
        0.5 = coord(1/2)
    
  3. Schaer, P.: Sprachmodelle und neuronale Netze im Information Retrieval (2023) 2.14
    2.135818 = sum of:
      2.135818 = product of:
        4.271636 = sum of:
          4.271636 = weight(author_txt:schaer in 1800) [ClassicSimilarity], result of:
            4.271636 = score(doc=1800,freq=1.0), product of:
              0.7706835 = queryWeight, product of:
                1.09975 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.07902125 = queryNorm
              5.5426593 = fieldWeight in 1800, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.625 = fieldNorm(doc=1800)
        0.5 = coord(1/2)
    
  4. Munkelt, J.; Schaer, P.: Towards an IR test collection for the German National Library (2018) 1.71
    1.7086544 = sum of:
      1.7086544 = product of:
        3.4173088 = sum of:
          3.4173088 = weight(author_txt:schaer in 780) [ClassicSimilarity], result of:
            3.4173088 = score(doc=780,freq=1.0), product of:
              0.7706835 = queryWeight, product of:
                1.09975 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.07902125 = queryNorm
              4.4341273 = fieldWeight in 780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.5 = fieldNorm(doc=780)
        0.5 = coord(1/2)
    
  5. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 1.61
    1.6057659 = sum of:
      1.6057659 = product of:
        3.2115319 = sum of:
          3.2115319 = weight(author_txt:lepsky in 5228) [ClassicSimilarity], result of:
            3.2115319 = score(doc=5228,freq=1.0), product of:
              0.63721806 = queryWeight, product of:
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.07902125 = queryNorm
              5.039926 = fieldWeight in 5228, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.625 = fieldNorm(doc=5228)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.15
    0.14573818 = sum of:
      0.14573818 = product of:
        0.60724247 = sum of:
          0.07406328 = weight(abstract_txt:national in 3166) [ClassicSimilarity], result of:
            0.07406328 = score(doc=3166,freq=4.0), product of:
              0.10301231 = queryWeight, product of:
                1.1689425 = boost
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.019151473 = queryNorm
              0.71897507 = fieldWeight in 3166, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.018488403 = weight(abstract_txt:library in 3166) [ClassicSimilarity], result of:
            0.018488403 = score(doc=3166,freq=1.0), product of:
              0.07421138 = queryWeight, product of:
                1.2151488 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.019151473 = queryNorm
              0.24913163 = fieldWeight in 3166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.04433084 = weight(abstract_txt:since in 3166) [ClassicSimilarity], result of:
            0.04433084 = score(doc=3166,freq=1.0), product of:
              0.11613892 = queryWeight, product of:
                1.2411877 = boost
                4.8858275 = idf(docFreq=911, maxDocs=44421)
                0.019151473 = queryNorm
              0.38170528 = fieldWeight in 3166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8858275 = idf(docFreq=911, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.18717836 = weight(abstract_txt:german in 3166) [ClassicSimilarity], result of:
            0.18717836 = score(doc=3166,freq=4.0), product of:
              0.19112797 = queryWeight, product of:
                1.592248 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.019151473 = queryNorm
              0.97933525 = fieldWeight in 3166, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.09469753 = weight(abstract_txt:series in 3166) [ClassicSimilarity], result of:
            0.09469753 = score(doc=3166,freq=1.0), product of:
              0.22051087 = queryWeight, product of:
                2.094639 = boost
                5.4969096 = idf(docFreq=494, maxDocs=44421)
                0.019151473 = queryNorm
              0.42944607 = fieldWeight in 3166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4969096 = idf(docFreq=494, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
          0.18848404 = weight(abstract_txt:automatic in 3166) [ClassicSimilarity], result of:
            0.18848404 = score(doc=3166,freq=2.0), product of:
              0.32834235 = queryWeight, product of:
                3.2997575 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.019151473 = queryNorm
              0.5740473 = fieldWeight in 3166, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=3166)
        0.24 = coord(6/25)
    
  2. Abalkina, A.: Challenges posed by hijacked journals in Scopus (2024) 0.12
    0.116835445 = sum of:
      0.116835445 = product of:
        0.41726944 = sum of:
          0.03546467 = weight(abstract_txt:since in 2259) [ClassicSimilarity], result of:
            0.03546467 = score(doc=2259,freq=1.0), product of:
              0.11613892 = queryWeight, product of:
                1.2411877 = boost
                4.8858275 = idf(docFreq=911, maxDocs=44421)
                0.019151473 = queryNorm
              0.30536422 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8858275 = idf(docFreq=911, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.044265755 = weight(abstract_txt:publications in 2259) [ClassicSimilarity], result of:
            0.044265755 = score(doc=2259,freq=1.0), product of:
              0.13463534 = queryWeight, product of:
                1.3363743 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.019151473 = queryNorm
              0.32878256 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.06277262 = weight(abstract_txt:papers in 2259) [ClassicSimilarity], result of:
            0.06277262 = score(doc=2259,freq=2.0), product of:
              0.13488096 = queryWeight, product of:
                1.3375927 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.019151473 = queryNorm
              0.4653927 = fieldWeight in 2259, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.04709714 = weight(abstract_txt:content in 2259) [ClassicSimilarity], result of:
            0.04709714 = score(doc=2259,freq=2.0), product of:
              0.12748642 = queryWeight, product of:
                1.5926714 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019151473 = queryNorm
              0.36942866 = fieldWeight in 2259, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.045695167 = weight(abstract_txt:quality in 2259) [ClassicSimilarity], result of:
            0.045695167 = score(doc=2259,freq=1.0), product of:
              0.15741922 = queryWeight, product of:
                1.7697955 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.019151473 = queryNorm
              0.2902769 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.10687006 = weight(abstract_txt:september in 2259) [ClassicSimilarity], result of:
            0.10687006 = score(doc=2259,freq=1.0), product of:
              0.24229877 = queryWeight, product of:
                1.7927684 = boost
                7.057077 = idf(docFreq=103, maxDocs=44421)
                0.019151473 = queryNorm
              0.4410673 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.057077 = idf(docFreq=103, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
          0.07510403 = weight(abstract_txt:indexing in 2259) [ClassicSimilarity], result of:
            0.07510403 = score(doc=2259,freq=1.0), product of:
              0.27622506 = queryWeight, product of:
                3.3154366 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.019151473 = queryNorm
              0.27189434 = fieldWeight in 2259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=2259)
        0.28 = coord(7/25)
    
  3. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.11
    0.107867956 = sum of:
      0.107867956 = product of:
        0.5393398 = sum of:
          0.059250627 = weight(abstract_txt:national in 2717) [ClassicSimilarity], result of:
            0.059250627 = score(doc=2717,freq=1.0), product of:
              0.10301231 = queryWeight, product of:
                1.1689425 = boost
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.019151473 = queryNorm
              0.57518005 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.125 = fieldNorm(doc=2717)
          0.029581444 = weight(abstract_txt:library in 2717) [ClassicSimilarity], result of:
            0.029581444 = score(doc=2717,freq=1.0), product of:
              0.07421138 = queryWeight, product of:
                1.2151488 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.019151473 = queryNorm
              0.39861062 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.125 = fieldNorm(doc=2717)
          0.08853151 = weight(abstract_txt:publications in 2717) [ClassicSimilarity], result of:
            0.08853151 = score(doc=2717,freq=1.0), product of:
              0.13463534 = queryWeight, product of:
                1.3363743 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.019151473 = queryNorm
              0.6575651 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.125 = fieldNorm(doc=2717)
          0.21176814 = weight(abstract_txt:german in 2717) [ClassicSimilarity], result of:
            0.21176814 = score(doc=2717,freq=2.0), product of:
              0.19112797 = queryWeight, product of:
                1.592248 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.019151473 = queryNorm
              1.1079913 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.125 = fieldNorm(doc=2717)
          0.15020806 = weight(abstract_txt:indexing in 2717) [ClassicSimilarity], result of:
            0.15020806 = score(doc=2717,freq=1.0), product of:
              0.27622506 = queryWeight, product of:
                3.3154366 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.019151473 = queryNorm
              0.5437887 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.125 = fieldNorm(doc=2717)
        0.2 = coord(5/25)
    
  4. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.11
    0.107867956 = sum of:
      0.107867956 = product of:
        0.5393398 = sum of:
          0.059250627 = weight(abstract_txt:national in 2969) [ClassicSimilarity], result of:
            0.059250627 = score(doc=2969,freq=1.0), product of:
              0.10301231 = queryWeight, product of:
                1.1689425 = boost
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.019151473 = queryNorm
              0.57518005 = fieldWeight in 2969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.125 = fieldNorm(doc=2969)
          0.029581444 = weight(abstract_txt:library in 2969) [ClassicSimilarity], result of:
            0.029581444 = score(doc=2969,freq=1.0), product of:
              0.07421138 = queryWeight, product of:
                1.2151488 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.019151473 = queryNorm
              0.39861062 = fieldWeight in 2969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.125 = fieldNorm(doc=2969)
          0.08853151 = weight(abstract_txt:publications in 2969) [ClassicSimilarity], result of:
            0.08853151 = score(doc=2969,freq=1.0), product of:
              0.13463534 = queryWeight, product of:
                1.3363743 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.019151473 = queryNorm
              0.6575651 = fieldWeight in 2969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.125 = fieldNorm(doc=2969)
          0.21176814 = weight(abstract_txt:german in 2969) [ClassicSimilarity], result of:
            0.21176814 = score(doc=2969,freq=2.0), product of:
              0.19112797 = queryWeight, product of:
                1.592248 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.019151473 = queryNorm
              1.1079913 = fieldWeight in 2969, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.125 = fieldNorm(doc=2969)
          0.15020806 = weight(abstract_txt:indexing in 2969) [ClassicSimilarity], result of:
            0.15020806 = score(doc=2969,freq=1.0), product of:
              0.27622506 = queryWeight, product of:
                3.3154366 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.019151473 = queryNorm
              0.5437887 = fieldWeight in 2969, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.125 = fieldNorm(doc=2969)
        0.2 = coord(5/25)
    
  5. Svensson, L.G.; Jahns, Y.: PDF, CSV, RSS and other Acronyms : redefining the bibliographic services in the German National Library (2010) 0.11
    0.1074555 = sum of:
      0.1074555 = product of:
        0.44773126 = sum of:
          0.059250627 = weight(abstract_txt:national in 957) [ClassicSimilarity], result of:
            0.059250627 = score(doc=957,freq=4.0), product of:
              0.10301231 = queryWeight, product of:
                1.1689425 = boost
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.019151473 = queryNorm
              0.57518005 = fieldWeight in 957, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
          0.014790722 = weight(abstract_txt:library in 957) [ClassicSimilarity], result of:
            0.014790722 = score(doc=957,freq=1.0), product of:
              0.07421138 = queryWeight, product of:
                1.2151488 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.019151473 = queryNorm
              0.19930531 = fieldWeight in 957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
          0.14795652 = weight(abstract_txt:discontinued in 957) [ClassicSimilarity], result of:
            0.14795652 = score(doc=957,freq=1.0), product of:
              0.2388874 = queryWeight, product of:
                1.2587231 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.019151473 = queryNorm
              0.61935675 = fieldWeight in 957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
          0.07487134 = weight(abstract_txt:german in 957) [ClassicSimilarity], result of:
            0.07487134 = score(doc=957,freq=1.0), product of:
              0.19112797 = queryWeight, product of:
                1.592248 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.019151473 = queryNorm
              0.3917341 = fieldWeight in 957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
          0.07575802 = weight(abstract_txt:series in 957) [ClassicSimilarity], result of:
            0.07575802 = score(doc=957,freq=1.0), product of:
              0.22051087 = queryWeight, product of:
                2.094639 = boost
                5.4969096 = idf(docFreq=494, maxDocs=44421)
                0.019151473 = queryNorm
              0.34355685 = fieldWeight in 957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4969096 = idf(docFreq=494, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
          0.07510403 = weight(abstract_txt:indexing in 957) [ClassicSimilarity], result of:
            0.07510403 = score(doc=957,freq=1.0), product of:
              0.27622506 = queryWeight, product of:
                3.3154366 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.019151473 = queryNorm
              0.27189434 = fieldWeight in 957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=957)
        0.24 = coord(6/25)