Document (#37596)

Author
Geisriegler, E.
Title
Enriching electronic texts with semantic metadata : a use case for the historical Newspaper Collection ANNO (Austrian Newspapers Online) of the Austrian National Libraryhek
Imprint
Wien : Universität / ÖNB
Year
2012
Pages
345 S
Abstract
Die vorliegende Master Thesis setzt sich mit der Frage nach Möglichkeiten der Anreicherung historischer Zeitungen mit semantischen Metadaten auseinander. Sie möchte außerdem analysieren, welcher Nutzen für vor allem geisteswissenschaftlich Forschende, durch die Anreicherung mit zusätzlichen Informationsquellen entsteht. Nach der Darstellung der Entwicklung der interdisziplinären 'Digital Humanities', wurde für die digitale Sammlung historischer Zeitungen (ANNO AustriaN Newspapers Online) der Österreichischen Nationalbibliothek ein Use Case entwickelt, bei dem 'Named Entities' (Personen, Orte, Organisationen und Daten) in ausgewählten Zeitungsausgaben manuell annotiert wurden. Methodisch wurde das Kodieren mit 'TEI', einem Dokumentenformat zur Kodierung und zum Austausch von Texten durchgeführt. Zusätzlich wurden zu allen annotierten 'Named Entities' Einträge in externen Datenbanken wie Wikipedia, Wikipedia Personensuche, der ehemaligen Personennamen- und Schlagwortnormdatei (jetzt Gemeinsame Normdatei GND), VIAF und dem Bildarchiv Austria gesucht und gegebenenfalls verlinkt. Eine Beschreibung der Ergebnisse des manuellen Annotierens der Zeitungsseiten schließt diesen Teil der Arbeit ab. In einem weiteren Abschnitt werden die Ergebnisse des manuellen Annotierens mit jenen Ergebnissen, die automatisch mit dem German NER (Named Entity Recognition) generiert wurden, verglichen und in ihrer Genauigkeit analysiert. Abschließend präsentiert die Arbeit einige Best Practice-Beispiele kodierter und angereicherter Zeitungsseiten, um den zusätzlichen Nutzen durch die Auszeichnung der 'Named Entities' und durch die Verlinkung mit externen Informationsquellen für die BenützerInnen darzustellen.
Footnote
Wien, Univ., Lehrgang Library and Information Studies, Master-Thesis, 2012.
Theme
Zeitungen
Location
A

Similar documents (content)

  1. Menzel, S.; Schnaitter, H.; Zinck, J.; Petras, V.; Neudecker, C.; Labusch, K.; Leitner, E.; Rehm, G.: Named Entity Linking mit Wikidata und GND : das Potenzial handkuratierter und strukturierter Datenquellen für die semantische Anreicherung von Volltexten (2021) 0.13
    0.12688926 = sum of:
      0.12688926 = product of:
        0.63444626 = sum of:
          0.09818944 = weight(abstract_txt:auszeichnung in 373) [ClassicSimilarity], result of:
            0.09818944 = score(doc=373,freq=1.0), product of:
              0.16537757 = queryWeight, product of:
                1.0644716 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.016354391 = queryNorm
              0.5937289 = fieldWeight in 373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=373)
          0.026328603 = weight(abstract_txt:durch in 373) [ClassicSimilarity], result of:
            0.026328603 = score(doc=373,freq=1.0), product of:
              0.09918037 = queryWeight, product of:
                1.4278063 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.016354391 = queryNorm
              0.26546183 = fieldWeight in 373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.0625 = fieldNorm(doc=373)
          0.048316978 = weight(abstract_txt:wurden in 373) [ClassicSimilarity], result of:
            0.048316978 = score(doc=373,freq=1.0), product of:
              0.14866441 = queryWeight, product of:
                1.7480744 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.016354391 = queryNorm
              0.32500702 = fieldWeight in 373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0625 = fieldNorm(doc=373)
          0.13640325 = weight(abstract_txt:entities in 373) [ClassicSimilarity], result of:
            0.13640325 = score(doc=373,freq=4.0), product of:
              0.18706979 = queryWeight, product of:
                1.9609126 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.016354391 = queryNorm
              0.72915703 = fieldWeight in 373, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.0625 = fieldNorm(doc=373)
          0.32520804 = weight(abstract_txt:named in 373) [ClassicSimilarity], result of:
            0.32520804 = score(doc=373,freq=5.0), product of:
              0.3411177 = queryWeight, product of:
                3.0575805 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.016354391 = queryNorm
              0.9533602 = fieldWeight in 373, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=373)
        0.2 = coord(5/25)
    
  2. Brogiato, H.P.; Horn, K.: ¬Der historische Bildbestand im Institut für Länderkunde Leipzig : Aufbau eines digitalen Langzeitarchivs (2003) 0.08
    0.08368932 = sum of:
      0.08368932 = product of:
        0.41844657 = sum of:
          0.1151219 = weight(abstract_txt:bildarchiv in 1324) [ClassicSimilarity], result of:
            0.1151219 = score(doc=1324,freq=1.0), product of:
              0.15846452 = queryWeight, product of:
                1.0419858 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.016354391 = queryNorm
              0.72648376 = fieldWeight in 1324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=1324)
          0.047129076 = weight(abstract_txt:ergebnisse in 1324) [ClassicSimilarity], result of:
            0.047129076 = score(doc=1324,freq=1.0), product of:
              0.11007686 = queryWeight, product of:
                1.2281709 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.016354391 = queryNorm
              0.428147 = fieldWeight in 1324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.078125 = fieldNorm(doc=1324)
          0.05570227 = weight(abstract_txt:nutzen in 1324) [ClassicSimilarity], result of:
            0.05570227 = score(doc=1324,freq=1.0), product of:
              0.123051055 = queryWeight, product of:
                1.2985343 = boost
                5.794254 = idf(docFreq=365, maxDocs=44218)
                0.016354391 = queryNorm
              0.4526761 = fieldWeight in 1324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.794254 = idf(docFreq=365, maxDocs=44218)
                0.078125 = fieldNorm(doc=1324)
          0.032910753 = weight(abstract_txt:durch in 1324) [ClassicSimilarity], result of:
            0.032910753 = score(doc=1324,freq=1.0), product of:
              0.09918037 = queryWeight, product of:
                1.4278063 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.016354391 = queryNorm
              0.33182728 = fieldWeight in 1324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.078125 = fieldNorm(doc=1324)
          0.16758257 = weight(abstract_txt:historischer in 1324) [ClassicSimilarity], result of:
            0.16758257 = score(doc=1324,freq=1.0), product of:
              0.2564421 = queryWeight, product of:
                1.8745862 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016354391 = queryNorm
              0.6534909 = fieldWeight in 1324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=1324)
        0.2 = coord(5/25)
    
  3. Stelzenmüller, C.: Mashups in Bibliotheken : Untersuchung der Verbreitung von Mashups auf Webseiten wissenschaftlicher Bibliotheken und Erstellung eines praktischen Beispiels (2008) 0.07
    0.06534206 = sum of:
      0.06534206 = product of:
        0.4083879 = sum of:
          0.06788532 = weight(abstract_txt:arbeit in 3069) [ClassicSimilarity], result of:
            0.06788532 = score(doc=3069,freq=1.0), product of:
              0.10262992 = queryWeight, product of:
                1.1858991 = boost
                5.291659 = idf(docFreq=604, maxDocs=44218)
                0.016354391 = queryNorm
              0.66145736 = fieldWeight in 3069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.291659 = idf(docFreq=604, maxDocs=44218)
                0.125 = fieldNorm(doc=3069)
          0.08912363 = weight(abstract_txt:nutzen in 3069) [ClassicSimilarity], result of:
            0.08912363 = score(doc=3069,freq=1.0), product of:
              0.123051055 = queryWeight, product of:
                1.2985343 = boost
                5.794254 = idf(docFreq=365, maxDocs=44218)
                0.016354391 = queryNorm
              0.7242817 = fieldWeight in 3069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.794254 = idf(docFreq=365, maxDocs=44218)
                0.125 = fieldNorm(doc=3069)
          0.052657206 = weight(abstract_txt:durch in 3069) [ClassicSimilarity], result of:
            0.052657206 = score(doc=3069,freq=1.0), product of:
              0.09918037 = queryWeight, product of:
                1.4278063 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.016354391 = queryNorm
              0.53092366 = fieldWeight in 3069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.125 = fieldNorm(doc=3069)
          0.19872175 = weight(abstract_txt:informationsquellen in 3069) [ClassicSimilarity], result of:
            0.19872175 = score(doc=3069,freq=1.0), product of:
              0.21001664 = queryWeight, product of:
                1.6964365 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.016354391 = queryNorm
              0.9462191 = fieldWeight in 3069, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.125 = fieldNorm(doc=3069)
        0.16 = coord(4/25)
    
  4. Scholz, D.: Retrokonversion in der Zentralbibliothek der Universitätsbibliothek Dortmund : Abschlussbericht November 1995 bis April 2003 (2003) 0.06
    0.063854665 = sum of:
      0.063854665 = product of:
        0.31927332 = sum of:
          0.041727718 = weight(abstract_txt:wurde in 1941) [ClassicSimilarity], result of:
            0.041727718 = score(doc=1941,freq=5.0), product of:
              0.0834377 = queryWeight, product of:
                1.0692812 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016354391 = queryNorm
              0.5001063 = fieldWeight in 1941, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.046875 = fieldNorm(doc=1941)
          0.05091399 = weight(abstract_txt:arbeit in 1941) [ClassicSimilarity], result of:
            0.05091399 = score(doc=1941,freq=4.0), product of:
              0.10262992 = queryWeight, product of:
                1.1858991 = boost
                5.291659 = idf(docFreq=604, maxDocs=44218)
                0.016354391 = queryNorm
              0.49609303 = fieldWeight in 1941, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.291659 = idf(docFreq=604, maxDocs=44218)
                0.046875 = fieldNorm(doc=1941)
          0.034201857 = weight(abstract_txt:durch in 1941) [ClassicSimilarity], result of:
            0.034201857 = score(doc=1941,freq=3.0), product of:
              0.09918037 = queryWeight, product of:
                1.4278063 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.016354391 = queryNorm
              0.34484503 = fieldWeight in 1941, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.046875 = fieldNorm(doc=1941)
          0.10249579 = weight(abstract_txt:wurden in 1941) [ClassicSimilarity], result of:
            0.10249579 = score(doc=1941,freq=8.0), product of:
              0.14866441 = queryWeight, product of:
                1.7480744 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.016354391 = queryNorm
              0.689444 = fieldWeight in 1941, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.046875 = fieldNorm(doc=1941)
          0.08993397 = weight(abstract_txt:zusätzlichen in 1941) [ClassicSimilarity], result of:
            0.08993397 = score(doc=1941,freq=1.0), product of:
              0.23805927 = queryWeight, product of:
                1.8061479 = boost
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.016354391 = queryNorm
              0.37777975 = fieldWeight in 1941, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.046875 = fieldNorm(doc=1941)
        0.2 = coord(5/25)
    
  5. Kugler, A.: Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen (2018) 0.06
    0.06164709 = sum of:
      0.06164709 = product of:
        0.38529432 = sum of:
          0.031102004 = weight(abstract_txt:wurde in 4595) [ClassicSimilarity], result of:
            0.031102004 = score(doc=4595,freq=1.0), product of:
              0.0834377 = queryWeight, product of:
                1.0692812 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016354391 = queryNorm
              0.3727572 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.047129076 = weight(abstract_txt:ergebnisse in 4595) [ClassicSimilarity], result of:
            0.047129076 = score(doc=4595,freq=1.0), product of:
              0.11007686 = queryWeight, product of:
                1.2281709 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.016354391 = queryNorm
              0.428147 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.13948068 = weight(abstract_txt:zeitungen in 4595) [ClassicSimilarity], result of:
            0.13948068 = score(doc=4595,freq=1.0), product of:
              0.22690608 = queryWeight, product of:
                1.7633309 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.016354391 = queryNorm
              0.6147067 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.16758257 = weight(abstract_txt:historischer in 4595) [ClassicSimilarity], result of:
            0.16758257 = score(doc=4595,freq=1.0), product of:
              0.2564421 = queryWeight, product of:
                1.8745862 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016354391 = queryNorm
              0.6534909 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
        0.16 = coord(4/25)