Document (#36384)

Author
Mühlberger, G.
Title
Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
Source
Zeitschrift für Bibliothekswesen und Bibliographie. 58(2011) H.1, S.10-18
Year
2011
Abstract
Die OCR Erkennung ist eine Schlüsseltechnologie, an der man bei der systematischen Digitalisierung von historischen Zeitungen nicht vorbeikommen wird. Obwohl vielfach nur eine Wortgenauigkeit von 80% oder weniger für Zeitungen des 19. und 20. Jahrhunderts zu erzielen sein wird, bietet dieser fehlerhafte Volltext trotzdem die Grundlage für eine ganze Reihe interessanter Anwendungen - von der Volltextsuche, über die Indexierung durch Suchmaschinen bis zur Online-Korrektur durch Benutzer. Der Einsatz der OCR erfordert allerdings sowohl bei der Projektplanung, der Gestaltung des Workflows, der Durchführung der Qualitätskontrolle als auch der Konzeption der Langzeitarchivierung und der Präsentation im Internet ein Umdenken gegenüber herkömmlichen Digitalisierungsprojekten.
Form
Zeitungen
Object
OCR

Similar documents (author)

  1. Mühlberger, G.: ¬Der digitalisierte Nominalkatalog der Universitätsbibliothek Innsbruck (2004) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:mühlberger in 3200) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 3200, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=3200)
    
  2. Mühlberger, G.; Habitzel, K.: ¬Das digitalisierte Zeitungsausschnittarchiv : Im EU-Projekt LAURIN des Innsbrucker Zeitungsarchivs/IZA der Universität Inssbruck werden neue Wege der Archivierung und Bereitstellung gegangen (1998) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:mühlberger in 1829) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 1829, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=1829)
    
  3. Mühlberger, G.; Klein, M.: Digitalisierte Zeitungsausschnitte im Internet : Das Innsbrucker Zeitungsarchiv zur deutsch- und frendsprachigen Literatur bietet seine Sammlung online an: http://iza.uibk.ac.at/ (2001) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:mühlberger in 914) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 914, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=914)
    
  4. Sigmund, K.; Dawson, J.; Mühlberger, K.: Kurt Gödel : Das Album - The Album (2006) 3.72
    3.7161405 = sum of:
      3.7161405 = weight(author_txt:mühlberger in 595) [ClassicSimilarity], result of:
        3.7161405 = fieldWeight in 595, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.375 = fieldNorm(doc=595)
    

Similar documents (content)

  1. Kugler, A.: Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen (2018) 0.25
    0.2504113 = sum of:
      0.2504113 = product of:
        0.89432603 = sum of:
          0.11614331 = weight(abstract_txt:volltext in 595) [ClassicSimilarity], result of:
            0.11614331 = score(doc=595,freq=2.0), product of:
              0.14287272 = queryWeight, product of:
                1.0017568 = boost
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.01938417 = queryNorm
              0.8129145 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.022143533 = weight(abstract_txt:wird in 595) [ClassicSimilarity], result of:
            0.022143533 = score(doc=595,freq=1.0), product of:
              0.07512846 = queryWeight, product of:
                1.0273179 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.01938417 = queryNorm
              0.2947423 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.12087078 = weight(abstract_txt:historischer in 595) [ClassicSimilarity], result of:
            0.12087078 = score(doc=595,freq=1.0), product of:
              0.18486048 = queryWeight, product of:
                1.1394877 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.01938417 = queryNorm
              0.65384865 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.17551823 = weight(abstract_txt:automatisierten in 595) [ClassicSimilarity], result of:
            0.17551823 = score(doc=595,freq=2.0), product of:
              0.18814877 = queryWeight, product of:
                1.1495776 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.01938417 = queryNorm
              0.93286943 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.037150204 = weight(abstract_txt:eine in 595) [ClassicSimilarity], result of:
            0.037150204 = score(doc=595,freq=2.0), product of:
              0.09637575 = queryWeight, product of:
                1.4250567 = boost
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.01938417 = queryNorm
              0.38547254 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.12066261 = weight(abstract_txt:digitalisierung in 595) [ClassicSimilarity], result of:
            0.12066261 = score(doc=595,freq=1.0), product of:
              0.23264211 = queryWeight, product of:
                1.807785 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01938417 = queryNorm
              0.5186619 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
          0.30183733 = weight(abstract_txt:zeitungen in 595) [ClassicSimilarity], result of:
            0.30183733 = score(doc=595,freq=1.0), product of:
              0.4907409 = queryWeight, product of:
                3.2156916 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.01938417 = queryNorm
              0.61506456 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.078125 = fieldNorm(doc=595)
        0.28 = coord(7/25)
    
  2. Neudecker, C.: Zur Kuratierung digitalisierter Dokumente mit Künstlicher Intelligenz : das Qurator-Projekt (2020) 0.13
    0.13259117 = sum of:
      0.13259117 = product of:
        0.6629559 = sum of:
          0.02657224 = weight(abstract_txt:wird in 1048) [ClassicSimilarity], result of:
            0.02657224 = score(doc=1048,freq=1.0), product of:
              0.07512846 = queryWeight, product of:
                1.0273179 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.01938417 = queryNorm
              0.35369074 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.09375 = fieldNorm(doc=1048)
          0.037884608 = weight(abstract_txt:durch in 1048) [ClassicSimilarity], result of:
            0.037884608 = score(doc=1048,freq=1.0), product of:
              0.09516872 = queryWeight, product of:
                1.1562446 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.01938417 = queryNorm
              0.39807835 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.09375 = fieldNorm(doc=1048)
          0.031522993 = weight(abstract_txt:eine in 1048) [ClassicSimilarity], result of:
            0.031522993 = score(doc=1048,freq=1.0), product of:
              0.09637575 = queryWeight, product of:
                1.4250567 = boost
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.01938417 = queryNorm
              0.3270843 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.09375 = fieldNorm(doc=1048)
          0.20477124 = weight(abstract_txt:digitalisierung in 1048) [ClassicSimilarity], result of:
            0.20477124 = score(doc=1048,freq=2.0), product of:
              0.23264211 = queryWeight, product of:
                1.807785 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01938417 = queryNorm
              0.8801985 = fieldWeight in 1048, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.09375 = fieldNorm(doc=1048)
          0.3622048 = weight(abstract_txt:zeitungen in 1048) [ClassicSimilarity], result of:
            0.3622048 = score(doc=1048,freq=1.0), product of:
              0.4907409 = queryWeight, product of:
                3.2156916 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.01938417 = queryNorm
              0.73807746 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.09375 = fieldNorm(doc=1048)
        0.2 = coord(5/25)
    
  3. Mikro-Univers : 2. Workshop "Digitalisierung, Erschließung, Internetpräsentation und Langzeitarchivierung" (2004) 0.11
    0.11210939 = sum of:
      0.11210939 = product of:
        0.40039068 = sum of:
          0.06968599 = weight(abstract_txt:volltext in 4042) [ClassicSimilarity], result of:
            0.06968599 = score(doc=4042,freq=2.0), product of:
              0.14287272 = queryWeight, product of:
                1.0017568 = boost
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.01938417 = queryNorm
              0.4877487 = fieldWeight in 4042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.01328612 = weight(abstract_txt:wird in 4042) [ClassicSimilarity], result of:
            0.01328612 = score(doc=4042,freq=1.0), product of:
              0.07512846 = queryWeight, product of:
                1.0273179 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.01938417 = queryNorm
              0.17684537 = fieldWeight in 4042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.018942304 = weight(abstract_txt:durch in 4042) [ClassicSimilarity], result of:
            0.018942304 = score(doc=4042,freq=1.0), product of:
              0.09516872 = queryWeight, product of:
                1.1562446 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.01938417 = queryNorm
              0.19903918 = fieldWeight in 4042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.0846254 = weight(abstract_txt:volltextsuche in 4042) [ClassicSimilarity], result of:
            0.0846254 = score(doc=4042,freq=1.0), product of:
              0.20489413 = queryWeight, product of:
                1.199644 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.01938417 = queryNorm
              0.41302013 = fieldWeight in 4042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.10620954 = weight(abstract_txt:digitalisierungsprojekten in 4042) [ClassicSimilarity], result of:
            0.10620954 = score(doc=4042,freq=1.0), product of:
              0.23839915 = queryWeight, product of:
                1.294017 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.01938417 = queryNorm
              0.4455114 = fieldWeight in 4042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.03524378 = weight(abstract_txt:eine in 4042) [ClassicSimilarity], result of:
            0.03524378 = score(doc=4042,freq=5.0), product of:
              0.09637575 = queryWeight, product of:
                1.4250567 = boost
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.01938417 = queryNorm
              0.36569136 = fieldWeight in 4042, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
          0.07239757 = weight(abstract_txt:digitalisierung in 4042) [ClassicSimilarity], result of:
            0.07239757 = score(doc=4042,freq=1.0), product of:
              0.23264211 = queryWeight, product of:
                1.807785 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01938417 = queryNorm
              0.31119716 = fieldWeight in 4042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.046875 = fieldNorm(doc=4042)
        0.28 = coord(7/25)
    
  4. Waidmann, S.: Erschließung historischer Bestände mittels Crowdsourcing : eine Analyse ausgewählter aktueller Projekte (2014) 0.11
    0.11113526 = sum of:
      0.11113526 = product of:
        0.5556763 = sum of:
          0.08212572 = weight(abstract_txt:volltext in 3460) [ClassicSimilarity], result of:
            0.08212572 = score(doc=3460,freq=1.0), product of:
              0.14287272 = queryWeight, product of:
                1.0017568 = boost
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.01938417 = queryNorm
              0.57481736 = fieldWeight in 3460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.078125 = fieldNorm(doc=3460)
          0.12087078 = weight(abstract_txt:historischer in 3460) [ClassicSimilarity], result of:
            0.12087078 = score(doc=3460,freq=1.0), product of:
              0.18486048 = queryWeight, product of:
                1.1394877 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.01938417 = queryNorm
              0.65384865 = fieldWeight in 3460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=3460)
          0.070593804 = weight(abstract_txt:durch in 3460) [ClassicSimilarity], result of:
            0.070593804 = score(doc=3460,freq=5.0), product of:
              0.09516872 = queryWeight, product of:
                1.1562446 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.01938417 = queryNorm
              0.7417753 = fieldWeight in 3460, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.078125 = fieldNorm(doc=3460)
          0.16142339 = weight(abstract_txt:korrektur in 3460) [ClassicSimilarity], result of:
            0.16142339 = score(doc=3460,freq=1.0), product of:
              0.2241855 = queryWeight, product of:
                1.2548487 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.01938417 = queryNorm
              0.72004384 = fieldWeight in 3460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3460)
          0.12066261 = weight(abstract_txt:digitalisierung in 3460) [ClassicSimilarity], result of:
            0.12066261 = score(doc=3460,freq=1.0), product of:
              0.23264211 = queryWeight, product of:
                1.807785 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01938417 = queryNorm
              0.5186619 = fieldWeight in 3460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=3460)
        0.2 = coord(5/25)
    
  5. Lepsky, K.: Automatische Indexierung des Reallexikons zur Deutschen Kunstgeschichte (2006) 0.10
    0.10405017 = sum of:
      0.10405017 = product of:
        0.37160775 = sum of:
          0.07112296 = weight(abstract_txt:volltext in 80) [ClassicSimilarity], result of:
            0.07112296 = score(doc=80,freq=3.0), product of:
              0.14287272 = queryWeight, product of:
                1.0017568 = boost
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.01938417 = queryNorm
              0.49780643 = fieldWeight in 80, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.019176861 = weight(abstract_txt:wird in 80) [ClassicSimilarity], result of:
            0.019176861 = score(doc=80,freq=3.0), product of:
              0.07512846 = queryWeight, product of:
                1.0273179 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.01938417 = queryNorm
              0.2552543 = fieldWeight in 80, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.05459637 = weight(abstract_txt:erzielen in 80) [ClassicSimilarity], result of:
            0.05459637 = score(doc=80,freq=1.0), product of:
              0.17275304 = queryWeight, product of:
                1.1015404 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.01938417 = queryNorm
              0.3160371 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.022323716 = weight(abstract_txt:durch in 80) [ClassicSimilarity], result of:
            0.022323716 = score(doc=80,freq=2.0), product of:
              0.09516872 = queryWeight, product of:
                1.1562446 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.01938417 = queryNorm
              0.2345699 = fieldWeight in 80, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.07052117 = weight(abstract_txt:volltextsuche in 80) [ClassicSimilarity], result of:
            0.07052117 = score(doc=80,freq=1.0), product of:
              0.20489413 = queryWeight, product of:
                1.199644 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.01938417 = queryNorm
              0.34418344 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.029369816 = weight(abstract_txt:eine in 80) [ClassicSimilarity], result of:
            0.029369816 = score(doc=80,freq=5.0), product of:
              0.09637575 = queryWeight, product of:
                1.4250567 = boost
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.01938417 = queryNorm
              0.3047428 = fieldWeight in 80, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4888992 = idf(docFreq=3686, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
          0.104496874 = weight(abstract_txt:digitalisierung in 80) [ClassicSimilarity], result of:
            0.104496874 = score(doc=80,freq=3.0), product of:
              0.23264211 = queryWeight, product of:
                1.807785 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.01938417 = queryNorm
              0.44917437 = fieldWeight in 80, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0390625 = fieldNorm(doc=80)
        0.28 = coord(7/25)