Document (#40210)

Author
Toepfer, M.
Kempf, A.O.
Title
Automatische Indexierung auf Basis von Titeln und Autoren-Keywords : ein Werkstattbericht
Source
027.7 Zeitschrift für Bibliothekskultur. 4(2016), H.2
Year
2016
Abstract
Automatische Verfahren sind für Bibliotheken essentiell, um die Erschliessung stetig wachsender Datenmengen zu stemmen. Die Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft sammelt seit Längerem Erfahrungen im Bereich automatischer Indexierung und baut hier eigene Kompetenzen auf. Aufgrund rechtlicher Restriktionen werden unter anderem Ansätze untersucht, die ohne Volltextnutzung arbeiten. Dieser Beitrag gibt einen Einblick in ein laufendes Teilprojekt, das unter Verwendung von Titeln und Autoren-Keywords auf eine Nachnormierung der inhaltsbeschreibenden Metadaten auf den Standard-Thesaurus Wirtschaft (STW) abzielt. Wir erläutern den Hintergrund der Arbeit, betrachten die Systemarchitektur und stellen erste vielversprechende Ergebnisse eines dokumentenorientierten Verfahrens vor.
Im Folgenden erläutern wir zunächst den Hintergrund der aktuellen Arbeit. Wir beziehen uns auf Erfahrungen mit maschinellen Verfahren allgemein und an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (ZBW) - Leibniz-Informationszentrum Wirtschaft im Speziellen. Im Anschluss geben wir einen konkreten Einblick in ein laufendes Teilprojekt, bei dem die Systemarchitektur der Automatik gegenüber früheren Arbeiten Titel und Autoren-Keywords gemeinsam verwendet, um eine Nachnormierung auf den Standard-Thesaurus Wirtschaft (STW) zu erzielen. Im Gegenssatz zu einer statischen Verknüpfung im Sinne einer Crosskonkordanz bzw. Vokabularabbildung ist das jetzt verfolgte Vorgehen dokumentenorientiert und damit in der Lage, kontextbezogene Zuordnungen vorzunehmen. Der Artikel stellt neben der Systemarchitektur auch erste experimentelle Ergebnisse vor, die im Vergleich zu titelbasierten Vorhersagen bereits deutliche Verbesserungen aufzeigen.
Content
Beitrag in einem Themenschwerpunkt 'Computerlinguistik und Bibliotheken'. Vgl.: http://0277.ch/ojs/index.php/cdrs_0277/article/view/156/354.
Theme
Automatisches Indexieren

Similar documents (author)

  1. Kempf, G.: Klassifikationsprobleme der Rechtswissenschaft (1972) 5.23
    5.230789 = sum of:
      5.230789 = weight(author_txt:kempf in 4742) [ClassicSimilarity], result of:
        5.230789 = fieldWeight in 4742, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.369263 = idf(docFreq=27, maxDocs=44421)
          0.625 = fieldNorm(doc=4742)
    
  2. Kempf, A.: Thematischer Zugang zu Fachinformationen im Internet (1994) 5.23
    5.230789 = sum of:
      5.230789 = weight(author_txt:kempf in 589) [ClassicSimilarity], result of:
        5.230789 = fieldWeight in 589, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.369263 = idf(docFreq=27, maxDocs=44421)
          0.625 = fieldNorm(doc=589)
    
  3. Kempf, A.: Forstliche Klassifikation und Meta-Information zum Wald im Internet (1995) 5.23
    5.230789 = sum of:
      5.230789 = weight(author_txt:kempf in 3272) [ClassicSimilarity], result of:
        5.230789 = fieldWeight in 3272, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.369263 = idf(docFreq=27, maxDocs=44421)
          0.625 = fieldNorm(doc=3272)
    
  4. Kempf, A.: Advocating global forest issues on the Internet (1996) 5.23
    5.230789 = sum of:
      5.230789 = weight(author_txt:kempf in 93) [ClassicSimilarity], result of:
        5.230789 = fieldWeight in 93, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.369263 = idf(docFreq=27, maxDocs=44421)
          0.625 = fieldNorm(doc=93)
    
  5. Kempf, K.: Dalla Germania un esempio avanzato di sistema integrato (1997) 5.23
    5.230789 = sum of:
      5.230789 = weight(author_txt:kempf in 846) [ClassicSimilarity], result of:
        5.230789 = fieldWeight in 846, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.369263 = idf(docFreq=27, maxDocs=44421)
          0.625 = fieldNorm(doc=846)
    

Similar documents (content)

  1. Neubert, J.; Kempf, A.O.: Standard-Thesaurus Wirtschaft : nach Komplettüberarbeitung in Version 9.0 verfügbar (2015) 0.44
    0.44357222 = sum of:
      0.44357222 = product of:
        1.8482176 = sum of:
          0.087461725 = weight(abstract_txt:thesaurus in 3048) [ClassicSimilarity], result of:
            0.087461725 = score(doc=3048,freq=1.0), product of:
              0.090214565 = queryWeight, product of:
                1.0747403 = boost
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.01623428 = queryNorm
              0.96948564 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
          0.29453745 = weight(abstract_txt:zentralbibliothek in 3048) [ClassicSimilarity], result of:
            0.29453745 = score(doc=3048,freq=1.0), product of:
              0.2026866 = queryWeight, product of:
                1.6109339 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01623428 = queryNorm
              1.453167 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
          0.3062145 = weight(abstract_txt:leibniz in 3048) [ClassicSimilarity], result of:
            0.3062145 = score(doc=3048,freq=1.0), product of:
              0.20800886 = queryWeight, product of:
                1.6319473 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.01623428 = queryNorm
              1.4721224 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
          0.32247412 = weight(abstract_txt:informationszentrum in 3048) [ClassicSimilarity], result of:
            0.32247412 = score(doc=3048,freq=1.0), product of:
              0.21530853 = queryWeight, product of:
                1.6603354 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.01623428 = queryNorm
              1.4977304 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
          0.33506712 = weight(abstract_txt:wirtschaftswissenschaften in 3048) [ClassicSimilarity], result of:
            0.33506712 = score(doc=3048,freq=1.0), product of:
              0.22087803 = queryWeight, product of:
                1.6816727 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01623428 = queryNorm
              1.516978 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
          0.5024626 = weight(abstract_txt:wirtschaft in 3048) [ClassicSimilarity], result of:
            0.5024626 = score(doc=3048,freq=2.0), product of:
              0.2893791 = queryWeight, product of:
                2.7221608 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.01623428 = queryNorm
              1.7363473 = fieldWeight in 3048, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.1875 = fieldNorm(doc=3048)
        0.24 = coord(6/25)
    
  2. Dolud, L.; Kreis, C: ¬Die Crosskonkordanz Wirtschaft zwischen dem STW und der GND (2012) 0.32
    0.3161732 = sum of:
      0.3161732 = product of:
        1.12919 = sum of:
          0.20338401 = weight(abstract_txt:crosskonkordanz in 2716) [ClassicSimilarity], result of:
            0.20338401 = score(doc=2716,freq=3.0), product of:
              0.15620668 = queryWeight, product of:
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.01623428 = queryNorm
              1.3020186 = fieldWeight in 2716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.036442384 = weight(abstract_txt:thesaurus in 2716) [ClassicSimilarity], result of:
            0.036442384 = score(doc=2716,freq=1.0), product of:
              0.090214565 = queryWeight, product of:
                1.0747403 = boost
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.01623428 = queryNorm
              0.40395233 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.17355788 = weight(abstract_txt:zentralbibliothek in 2716) [ClassicSimilarity], result of:
            0.17355788 = score(doc=2716,freq=2.0), product of:
              0.2026866 = queryWeight, product of:
                1.6109339 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01623428 = queryNorm
              0.8562869 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.12758937 = weight(abstract_txt:leibniz in 2716) [ClassicSimilarity], result of:
            0.12758937 = score(doc=2716,freq=1.0), product of:
              0.20800886 = queryWeight, product of:
                1.6319473 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.01623428 = queryNorm
              0.61338437 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.13436422 = weight(abstract_txt:informationszentrum in 2716) [ClassicSimilarity], result of:
            0.13436422 = score(doc=2716,freq=1.0), product of:
              0.21530853 = queryWeight, product of:
                1.6603354 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.01623428 = queryNorm
              0.6240543 = fieldWeight in 2716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.19744019 = weight(abstract_txt:wirtschaftswissenschaften in 2716) [ClassicSimilarity], result of:
            0.19744019 = score(doc=2716,freq=2.0), product of:
              0.22087803 = queryWeight, product of:
                1.6816727 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01623428 = queryNorm
              0.8938879 = fieldWeight in 2716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
          0.25641188 = weight(abstract_txt:wirtschaft in 2716) [ClassicSimilarity], result of:
            0.25641188 = score(doc=2716,freq=3.0), product of:
              0.2893791 = queryWeight, product of:
                2.7221608 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.01623428 = queryNorm
              0.88607603 = fieldWeight in 2716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=2716)
        0.28 = coord(7/25)
    
  3. Borst, T.; Löhden, A.; Neubert, J.; Pohl, A.: "Linked Open Data" im Fokus : Spannende Themen und Diskussionen bei der SWIB12 (2013) 0.22
    0.21514507 = sum of:
      0.21514507 = product of:
        1.0757253 = sum of:
          0.19635831 = weight(abstract_txt:zentralbibliothek in 339) [ClassicSimilarity], result of:
            0.19635831 = score(doc=339,freq=1.0), product of:
              0.2026866 = queryWeight, product of:
                1.6109339 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01623428 = queryNorm
              0.968778 = fieldWeight in 339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.125 = fieldNorm(doc=339)
          0.204143 = weight(abstract_txt:leibniz in 339) [ClassicSimilarity], result of:
            0.204143 = score(doc=339,freq=1.0), product of:
              0.20800886 = queryWeight, product of:
                1.6319473 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.01623428 = queryNorm
              0.981415 = fieldWeight in 339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.125 = fieldNorm(doc=339)
          0.21498276 = weight(abstract_txt:informationszentrum in 339) [ClassicSimilarity], result of:
            0.21498276 = score(doc=339,freq=1.0), product of:
              0.21530853 = queryWeight, product of:
                1.6603354 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.01623428 = queryNorm
              0.99848694 = fieldWeight in 339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.125 = fieldNorm(doc=339)
          0.22337808 = weight(abstract_txt:wirtschaftswissenschaften in 339) [ClassicSimilarity], result of:
            0.22337808 = score(doc=339,freq=1.0), product of:
              0.22087803 = queryWeight, product of:
                1.6816727 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01623428 = queryNorm
              1.0113187 = fieldWeight in 339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.125 = fieldNorm(doc=339)
          0.23686315 = weight(abstract_txt:wirtschaft in 339) [ClassicSimilarity], result of:
            0.23686315 = score(doc=339,freq=1.0), product of:
              0.2893791 = queryWeight, product of:
                2.7221608 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.01623428 = queryNorm
              0.818522 = fieldWeight in 339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.125 = fieldNorm(doc=339)
        0.2 = coord(5/25)
    
  4. Version 8.08 des Standard-Thesaurus Wirtschaft mit Mapping zu anderen Vokabularen veröffentlicht (2012) 0.19
    0.1912244 = sum of:
      0.1912244 = product of:
        0.7967683 = sum of:
          0.06312007 = weight(abstract_txt:thesaurus in 1007) [ClassicSimilarity], result of:
            0.06312007 = score(doc=1007,freq=3.0), product of:
              0.090214565 = queryWeight, product of:
                1.0747403 = boost
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.01623428 = queryNorm
              0.699666 = fieldWeight in 1007, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.17059 = idf(docFreq=685, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.12272395 = weight(abstract_txt:zentralbibliothek in 1007) [ClassicSimilarity], result of:
            0.12272395 = score(doc=1007,freq=1.0), product of:
              0.2026866 = queryWeight, product of:
                1.6109339 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01623428 = queryNorm
              0.6054863 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.12758937 = weight(abstract_txt:leibniz in 1007) [ClassicSimilarity], result of:
            0.12758937 = score(doc=1007,freq=1.0), product of:
              0.20800886 = queryWeight, product of:
                1.6319473 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.01623428 = queryNorm
              0.61338437 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.13436422 = weight(abstract_txt:informationszentrum in 1007) [ClassicSimilarity], result of:
            0.13436422 = score(doc=1007,freq=1.0), product of:
              0.21530853 = queryWeight, product of:
                1.6603354 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.01623428 = queryNorm
              0.6240543 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.1396113 = weight(abstract_txt:wirtschaftswissenschaften in 1007) [ClassicSimilarity], result of:
            0.1396113 = score(doc=1007,freq=1.0), product of:
              0.22087803 = queryWeight, product of:
                1.6816727 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01623428 = queryNorm
              0.6320742 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.20935942 = weight(abstract_txt:wirtschaft in 1007) [ClassicSimilarity], result of:
            0.20935942 = score(doc=1007,freq=2.0), product of:
              0.2893791 = queryWeight, product of:
                2.7221608 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.01623428 = queryNorm
              0.7234781 = fieldWeight in 1007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
        0.24 = coord(6/25)
    
  5. Borst, T.; Neubert, J.; Seiler, A.: Bibliotheken auf dem Weg in das Semantic Web : Bericht von der SWIB2010 in Köln - unterschiedliche Entwicklungsschwerpunkte (2011) 0.19
    0.18825193 = sum of:
      0.18825193 = product of:
        0.9412596 = sum of:
          0.17181352 = weight(abstract_txt:zentralbibliothek in 532) [ClassicSimilarity], result of:
            0.17181352 = score(doc=532,freq=1.0), product of:
              0.2026866 = queryWeight, product of:
                1.6109339 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01623428 = queryNorm
              0.84768075 = fieldWeight in 532, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.109375 = fieldNorm(doc=532)
          0.17862514 = weight(abstract_txt:leibniz in 532) [ClassicSimilarity], result of:
            0.17862514 = score(doc=532,freq=1.0), product of:
              0.20800886 = queryWeight, product of:
                1.6319473 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.01623428 = queryNorm
              0.8587381 = fieldWeight in 532, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.109375 = fieldNorm(doc=532)
          0.1881099 = weight(abstract_txt:informationszentrum in 532) [ClassicSimilarity], result of:
            0.1881099 = score(doc=532,freq=1.0), product of:
              0.21530853 = queryWeight, product of:
                1.6603354 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.01623428 = queryNorm
              0.87367606 = fieldWeight in 532, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.109375 = fieldNorm(doc=532)
          0.19545582 = weight(abstract_txt:wirtschaftswissenschaften in 532) [ClassicSimilarity], result of:
            0.19545582 = score(doc=532,freq=1.0), product of:
              0.22087803 = queryWeight, product of:
                1.6816727 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01623428 = queryNorm
              0.88490385 = fieldWeight in 532, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.109375 = fieldNorm(doc=532)
          0.20725524 = weight(abstract_txt:wirtschaft in 532) [ClassicSimilarity], result of:
            0.20725524 = score(doc=532,freq=1.0), product of:
              0.2893791 = queryWeight, product of:
                2.7221608 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.01623428 = queryNorm
              0.7162067 = fieldWeight in 532, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.109375 = fieldNorm(doc=532)
        0.2 = coord(5/25)