Document (#39819)

Author
Baierer, K.
Zumstein, P.
Title
Verbesserung der OCR in digitalen Sammlungen von Bibliotheken
Source
027.7 Zeitschrift für Bibliothekskultur. 4(2016), H.2
Year
2016
Abstract
Möglichkeiten zur Verbesserung der automatischen Texterkennung (OCR) in digitalen Sammlungen insbesondere durch computerlinguistische Methoden werden beschrieben und bisherige PostOCR-Verfahren analysiert. Im Gegensatz zu diesen Möglichkeiten aus der Forschung oder aus einzelnen Projekten unterscheidet sich die momentane Anwendung von OCR in der Bibliothekspraxis wesentlich und nutzt das Potential nur teilweise aus.
Content
Beitrag in einem Themenschwerpunkt 'Computerlinguistik und Bibliotheken'. Vgl.: http://0277.ch/ojs/index.php/cdrs_0277/article/view/155/353.
Theme
Computerlinguistik
Aid
OCR

Similar documents (author)

  1. Zumstein, P.: ¬Die Rolle des Semantic Web für Bibliotheken : Linked Open Data und mehr: Welche Strategien können hier die Bibliotheken in die Zukunft führen? (2012) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:zumstein in 3450) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 3450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=3450)
    
  2. Zumstein, P.; Stöhr, M.: Zur Nachnutzung von bibliographischen Katalog- und Normdaten für die persönliche Literaturverwaltung und Wissensorganisation (2015) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:zumstein in 4192) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 4192, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=4192)
    
  3. Kim, T.C.-w.K.; Zumstein, P.: Semiautomatische Katalogisierung und Normdatenverknüpfung mit Zotero im Index Theologicus (2016) 4.34
    4.3354974 = sum of:
      4.3354974 = weight(author_txt:zumstein in 4064) [ClassicSimilarity], result of:
        4.3354974 = fieldWeight in 4064, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.4375 = fieldNorm(doc=4064)
    
  4. Daquino, M.; Peroni, S.; Shotton, D.; Colavizza, G.; Ghavimi, B.; Lauscher, A.; Mayr, P.; Romanello, M.; Zumstein, P.: ¬The OpenCitations Data Model (2020) 2.17
    2.1677487 = sum of:
      2.1677487 = weight(author_txt:zumstein in 1039) [ClassicSimilarity], result of:
        2.1677487 = fieldWeight in 1039, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.21875 = fieldNorm(doc=1039)
    

Similar documents (content)

  1. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.16
    0.15998925 = sum of:
      0.15998925 = product of:
        0.5713902 = sum of:
          0.031385474 = weight(abstract_txt:verfahren in 5197) [ClassicSimilarity], result of:
            0.031385474 = score(doc=5197,freq=1.0), product of:
              0.11616322 = queryWeight, product of:
                1.2490289 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.01613531 = queryNorm
              0.27018428 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.06347054 = weight(abstract_txt:methoden in 5197) [ClassicSimilarity], result of:
            0.06347054 = score(doc=5197,freq=4.0), product of:
              0.11702472 = queryWeight, product of:
                1.253652 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.01613531 = queryNorm
              0.54236865 = fieldWeight in 5197, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.054814935 = weight(abstract_txt:beschrieben in 5197) [ClassicSimilarity], result of:
            0.054814935 = score(doc=5197,freq=2.0), product of:
              0.13371253 = queryWeight, product of:
                1.34006 = boost
                6.184015 = idf(docFreq=248, maxDocs=44421)
                0.01613531 = queryNorm
              0.40994614 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.184015 = idf(docFreq=248, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.1194118 = weight(abstract_txt:automatischen in 5197) [ClassicSimilarity], result of:
            0.1194118 = score(doc=5197,freq=5.0), product of:
              0.16556084 = queryWeight, product of:
                1.4911351 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.01613531 = queryNorm
              0.72125626 = fieldWeight in 5197, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.05690966 = weight(abstract_txt:teilweise in 5197) [ClassicSimilarity], result of:
            0.05690966 = score(doc=5197,freq=1.0), product of:
              0.17273228 = queryWeight, product of:
                1.5230879 = boost
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.01613531 = queryNorm
              0.32946745 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.055679724 = weight(abstract_txt:möglichkeiten in 5197) [ClassicSimilarity], result of:
            0.055679724 = score(doc=5197,freq=1.0), product of:
              0.21448201 = queryWeight, product of:
                2.4002066 = boost
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.01613531 = queryNorm
              0.2596009 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.18971808 = weight(abstract_txt:verbesserung in 5197) [ClassicSimilarity], result of:
            0.18971808 = score(doc=5197,freq=3.0), product of:
              0.33673757 = queryWeight, product of:
                3.007454 = boost
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.01613531 = queryNorm
              0.5634004 = fieldWeight in 5197, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
        0.28 = coord(7/25)
    
  2. Schmitz, K.-D.: Wörterbuch, Thesaurus, Terminologie, Ontologie : Was tragen Terminologiewissenschaft und Informationswissenschaft zur Wissensordnung bei? (2006) 0.16
    0.1591217 = sum of:
      0.1591217 = product of:
        0.66300714 = sum of:
          0.057190176 = weight(abstract_txt:diesen in 75) [ClassicSimilarity], result of:
            0.057190176 = score(doc=75,freq=1.0), product of:
              0.1091718 = queryWeight, product of:
                1.2108586 = boost
                5.5877852 = idf(docFreq=451, maxDocs=44421)
                0.01613531 = queryNorm
              0.52385485 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5877852 = idf(docFreq=451, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
          0.06277095 = weight(abstract_txt:verfahren in 75) [ClassicSimilarity], result of:
            0.06277095 = score(doc=75,freq=1.0), product of:
              0.11616322 = queryWeight, product of:
                1.2490289 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.01613531 = queryNorm
              0.54036856 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
          0.06347054 = weight(abstract_txt:methoden in 75) [ClassicSimilarity], result of:
            0.06347054 = score(doc=75,freq=1.0), product of:
              0.11702472 = queryWeight, product of:
                1.253652 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.01613531 = queryNorm
              0.54236865 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
          0.06446632 = weight(abstract_txt:einzelnen in 75) [ClassicSimilarity], result of:
            0.06446632 = score(doc=75,freq=1.0), product of:
              0.118245535 = queryWeight, product of:
                1.2601742 = boost
                5.8153634 = idf(docFreq=359, maxDocs=44421)
                0.01613531 = queryNorm
              0.54519033 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8153634 = idf(docFreq=359, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
          0.148261 = weight(abstract_txt:nutzt in 75) [ClassicSimilarity], result of:
            0.148261 = score(doc=75,freq=1.0), product of:
              0.20602225 = queryWeight, product of:
                1.6633945 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.01613531 = queryNorm
              0.71963584 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
          0.26684818 = weight(abstract_txt:sammlungen in 75) [ClassicSimilarity], result of:
            0.26684818 = score(doc=75,freq=1.0), product of:
              0.38407466 = queryWeight, product of:
                3.2118926 = boost
                7.4110084 = idf(docFreq=72, maxDocs=44421)
                0.01613531 = queryNorm
              0.694782 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4110084 = idf(docFreq=72, maxDocs=44421)
                0.09375 = fieldNorm(doc=75)
        0.24 = coord(6/25)
    
  3. Plank, M.: AV-Portal für wissenschaftliche Filme : Analyse der Nutzerbedarfe (2010) 0.14
    0.13935475 = sum of:
      0.13935475 = product of:
        0.5806448 = sum of:
          0.040805038 = weight(abstract_txt:bibliotheken in 670) [ClassicSimilarity], result of:
            0.040805038 = score(doc=670,freq=1.0), product of:
              0.07865763 = queryWeight, product of:
                1.0278 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.01613531 = queryNorm
              0.5187677 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
          0.071010016 = weight(abstract_txt:forschung in 670) [ClassicSimilarity], result of:
            0.071010016 = score(doc=670,freq=1.0), product of:
              0.113800645 = queryWeight, product of:
                1.2362621 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.01613531 = queryNorm
              0.62398607 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
          0.07404896 = weight(abstract_txt:methoden in 670) [ClassicSimilarity], result of:
            0.07404896 = score(doc=670,freq=1.0), product of:
              0.11702472 = queryWeight, product of:
                1.253652 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.01613531 = queryNorm
              0.6327634 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
          0.118884645 = weight(abstract_txt:analysiert in 670) [ClassicSimilarity], result of:
            0.118884645 = score(doc=670,freq=1.0), product of:
              0.16045336 = queryWeight, product of:
                1.4679545 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.01613531 = queryNorm
              0.7409296 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
          0.124606006 = weight(abstract_txt:automatischen in 670) [ClassicSimilarity], result of:
            0.124606006 = score(doc=670,freq=1.0), product of:
              0.16556084 = queryWeight, product of:
                1.4911351 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.01613531 = queryNorm
              0.7526297 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
          0.15129012 = weight(abstract_txt:digitalen in 670) [ClassicSimilarity], result of:
            0.15129012 = score(doc=670,freq=1.0), product of:
              0.23740071 = queryWeight, product of:
                2.5251908 = boost
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.01613531 = queryNorm
              0.6372774 = fieldWeight in 670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.109375 = fieldNorm(doc=670)
        0.24 = coord(6/25)
    
  4. Lepsky, K.: Im Heuhaufen suchen - und finden : Automatische Erschließung von Internetquellen: Möglichkeiten und Grenzen (1998) 0.12
    0.124605015 = sum of:
      0.124605015 = product of:
        0.51918757 = sum of:
          0.04121931 = weight(abstract_txt:bibliotheken in 5655) [ClassicSimilarity], result of:
            0.04121931 = score(doc=5655,freq=2.0), product of:
              0.07865763 = queryWeight, product of:
                1.0278 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.01613531 = queryNorm
              0.5240345 = fieldWeight in 5655, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
          0.047658484 = weight(abstract_txt:diesen in 5655) [ClassicSimilarity], result of:
            0.047658484 = score(doc=5655,freq=1.0), product of:
              0.1091718 = queryWeight, product of:
                1.2108586 = boost
                5.5877852 = idf(docFreq=451, maxDocs=44421)
                0.01613531 = queryNorm
              0.43654573 = fieldWeight in 5655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5877852 = idf(docFreq=451, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
          0.052309126 = weight(abstract_txt:verfahren in 5655) [ClassicSimilarity], result of:
            0.052309126 = score(doc=5655,freq=1.0), product of:
              0.11616322 = queryWeight, product of:
                1.2490289 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.01613531 = queryNorm
              0.45030713 = fieldWeight in 5655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
          0.06077774 = weight(abstract_txt:anwendung in 5655) [ClassicSimilarity], result of:
            0.06077774 = score(doc=5655,freq=1.0), product of:
              0.12838472 = queryWeight, product of:
                1.3130912 = boost
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.01613531 = queryNorm
              0.4734032 = fieldWeight in 5655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
          0.094849445 = weight(abstract_txt:teilweise in 5655) [ClassicSimilarity], result of:
            0.094849445 = score(doc=5655,freq=1.0), product of:
              0.17273228 = queryWeight, product of:
                1.5230879 = boost
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.01613531 = queryNorm
              0.54911244 = fieldWeight in 5655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
          0.22237349 = weight(abstract_txt:sammlungen in 5655) [ClassicSimilarity], result of:
            0.22237349 = score(doc=5655,freq=1.0), product of:
              0.38407466 = queryWeight, product of:
                3.2118926 = boost
                7.4110084 = idf(docFreq=72, maxDocs=44421)
                0.01613531 = queryNorm
              0.57898504 = fieldWeight in 5655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4110084 = idf(docFreq=72, maxDocs=44421)
                0.078125 = fieldNorm(doc=5655)
        0.24 = coord(6/25)
    
  5. Kempf, A.O.: Automatische Indexierung in der sozialwissenschaftlichen Fachinformation : eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS (2012) 0.12
    0.12406637 = sum of:
      0.12406637 = product of:
        0.5169432 = sum of:
          0.052309126 = weight(abstract_txt:verfahren in 1903) [ClassicSimilarity], result of:
            0.052309126 = score(doc=1903,freq=1.0), product of:
              0.11616322 = queryWeight, product of:
                1.2490289 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.01613531 = queryNorm
              0.45030713 = fieldWeight in 1903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
          0.052892115 = weight(abstract_txt:methoden in 1903) [ClassicSimilarity], result of:
            0.052892115 = score(doc=1903,freq=1.0), product of:
              0.11702472 = queryWeight, product of:
                1.253652 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.01613531 = queryNorm
              0.45197386 = fieldWeight in 1903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
          0.06460002 = weight(abstract_txt:beschrieben in 1903) [ClassicSimilarity], result of:
            0.06460002 = score(doc=1903,freq=1.0), product of:
              0.13371253 = queryWeight, product of:
                1.34006 = boost
                6.184015 = idf(docFreq=248, maxDocs=44421)
                0.01613531 = queryNorm
              0.48312616 = fieldWeight in 1903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.184015 = idf(docFreq=248, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
          0.08491761 = weight(abstract_txt:analysiert in 1903) [ClassicSimilarity], result of:
            0.08491761 = score(doc=1903,freq=1.0), product of:
              0.16045336 = queryWeight, product of:
                1.4679545 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.01613531 = queryNorm
              0.5292355 = fieldWeight in 1903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
          0.15415996 = weight(abstract_txt:automatischen in 1903) [ClassicSimilarity], result of:
            0.15415996 = score(doc=1903,freq=3.0), product of:
              0.16556084 = queryWeight, product of:
                1.4911351 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.01613531 = queryNorm
              0.9311378 = fieldWeight in 1903, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
          0.10806437 = weight(abstract_txt:digitalen in 1903) [ClassicSimilarity], result of:
            0.10806437 = score(doc=1903,freq=1.0), product of:
              0.23740071 = queryWeight, product of:
                2.5251908 = boost
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.01613531 = queryNorm
              0.45519817 = fieldWeight in 1903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.078125 = fieldNorm(doc=1903)
        0.24 = coord(6/25)