Document (#28582)

Author
Lepsky, K.
Vorhauer, J.
Title
Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente
Source
ABI-Technik. 26(2006) H.1, S.18-28
Year
2006
Abstract
Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (author)

  1. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 5.04
    5.039926 = sum of:
      5.039926 = weight(author_txt:lepsky in 5228) [ClassicSimilarity], result of:
        5.039926 = fieldWeight in 5228, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.063882 = idf(docFreq=37, maxDocs=44421)
          0.625 = fieldNorm(doc=5228)
    
  2. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 5.04
    5.039926 = sum of:
      5.039926 = weight(author_txt:lepsky in 7063) [ClassicSimilarity], result of:
        5.039926 = fieldWeight in 7063, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.063882 = idf(docFreq=37, maxDocs=44421)
          0.625 = fieldNorm(doc=7063)
    
  3. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 5.04
    5.039926 = sum of:
      5.039926 = weight(author_txt:lepsky in 840) [ClassicSimilarity], result of:
        5.039926 = fieldWeight in 840, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.063882 = idf(docFreq=37, maxDocs=44421)
          0.625 = fieldNorm(doc=840)
    
  4. Lepsky, K.: Bild und Wirklichkeit : die Wirklichkeit im Bild (1987) 5.04
    5.039926 = sum of:
      5.039926 = weight(author_txt:lepsky in 1414) [ClassicSimilarity], result of:
        5.039926 = fieldWeight in 1414, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.063882 = idf(docFreq=37, maxDocs=44421)
          0.625 = fieldNorm(doc=1414)
    
  5. Lepsky, K.: Ernst H. Gombrich : Theorie und Methode (1991) 5.04
    5.039926 = sum of:
      5.039926 = weight(author_txt:lepsky in 1753) [ClassicSimilarity], result of:
        5.039926 = fieldWeight in 1753, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.063882 = idf(docFreq=37, maxDocs=44421)
          0.625 = fieldNorm(doc=1753)
    

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.26
    0.26440054 = sum of:
      0.26440054 = product of:
        0.82625175 = sum of:
          0.022986751 = weight(abstract_txt:mögliche in 2054) [ClassicSimilarity], result of:
            0.022986751 = score(doc=2054,freq=1.0), product of:
              0.085517354 = queryWeight, product of:
                1.0034862 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.012384532 = queryNorm
              0.26879632 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.016941817 = weight(abstract_txt:wird in 2054) [ClassicSimilarity], result of:
            0.016941817 = score(doc=2054,freq=5.0), product of:
              0.051411767 = queryWeight, product of:
                1.1003491 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.012384532 = queryNorm
              0.3295319 = fieldWeight in 2054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.0781105 = weight(abstract_txt:algorithmische in 2054) [ClassicSimilarity], result of:
            0.0781105 = score(doc=2054,freq=2.0), product of:
              0.15341417 = queryWeight, product of:
                1.3440548 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              0.5091479 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.05857556 = weight(abstract_txt:linguistisch in 2054) [ClassicSimilarity], result of:
            0.05857556 = score(doc=2054,freq=1.0), product of:
              0.15954389 = queryWeight, product of:
                1.3706429 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012384532 = queryNorm
              0.36714387 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.0628474 = weight(abstract_txt:lexikalische in 2054) [ClassicSimilarity], result of:
            0.0628474 = score(doc=2054,freq=1.0), product of:
              0.16720942 = queryWeight, product of:
                1.4031839 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.012384532 = queryNorm
              0.3758604 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.01579053 = weight(abstract_txt:open in 2054) [ClassicSimilarity], result of:
            0.01579053 = score(doc=2054,freq=1.0), product of:
              0.08388358 = queryWeight, product of:
                1.4055222 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.012384532 = queryNorm
              0.18824337 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.018674513 = weight(abstract_txt:source in 2054) [ClassicSimilarity], result of:
            0.018674513 = score(doc=2054,freq=1.0), product of:
              0.093809195 = queryWeight, product of:
                1.4863529 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.012384532 = queryNorm
              0.19906911 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.55232465 = weight(abstract_txt:lingo in 2054) [ClassicSimilarity], result of:
            0.55232465 = score(doc=2054,freq=4.0), product of:
              0.7670709 = queryWeight, product of:
                6.720274 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              0.72004384 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
        0.32 = coord(8/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.22
    0.22425216 = sum of:
      0.22425216 = product of:
        1.401576 = sum of:
          0.07488541 = weight(abstract_txt:automatische in 1401) [ClassicSimilarity], result of:
            0.07488541 = score(doc=1401,freq=1.0), product of:
              0.08654342 = queryWeight, product of:
                1.0094882 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.012384532 = queryNorm
              0.865293 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.1471152 = weight(abstract_txt:automatischen in 1401) [ClassicSimilarity], result of:
            0.1471152 = score(doc=1401,freq=1.0), product of:
              0.17103471 = queryWeight, product of:
                2.0069723 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.012384532 = queryNorm
              0.86014825 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.29585597 = weight(abstract_txt:indexierung in 1401) [ClassicSimilarity], result of:
            0.29585597 = score(doc=1401,freq=2.0), product of:
              0.24758245 = queryWeight, product of:
                2.957364 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.012384532 = queryNorm
              1.1949795 = fieldWeight in 1401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.8837195 = weight(abstract_txt:lingo in 1401) [ClassicSimilarity], result of:
            0.8837195 = score(doc=1401,freq=1.0), product of:
              0.7670709 = queryWeight, product of:
                6.720274 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              1.1520702 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
        0.16 = coord(4/25)
    
  3. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.17
    0.17451371 = sum of:
      0.17451371 = product of:
        0.62326324 = sum of:
          0.0367788 = weight(abstract_txt:mögliche in 283) [ClassicSimilarity], result of:
            0.0367788 = score(doc=283,freq=1.0), product of:
              0.085517354 = queryWeight, product of:
                1.0034862 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.012384532 = queryNorm
              0.43007413 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.052951984 = weight(abstract_txt:automatische in 283) [ClassicSimilarity], result of:
            0.052951984 = score(doc=283,freq=2.0), product of:
              0.08654342 = queryWeight, product of:
                1.0094882 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.012384532 = queryNorm
              0.61185455 = fieldWeight in 283, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.043598536 = weight(abstract_txt:vordergrund in 283) [ClassicSimilarity], result of:
            0.043598536 = score(doc=283,freq=1.0), product of:
              0.09578639 = queryWeight, product of:
                1.0620284 = boost
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.012384532 = queryNorm
              0.4551642 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.012122577 = weight(abstract_txt:wird in 283) [ClassicSimilarity], result of:
            0.012122577 = score(doc=283,freq=1.0), product of:
              0.051411767 = queryWeight, product of:
                1.1003491 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.012384532 = queryNorm
              0.23579383 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.07447732 = weight(abstract_txt:linguistische in 283) [ClassicSimilarity], result of:
            0.07447732 = score(doc=283,freq=1.0), product of:
              0.13687955 = queryWeight, product of:
                1.2695608 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.012384532 = queryNorm
              0.54410845 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.1471152 = weight(abstract_txt:automatischen in 283) [ClassicSimilarity], result of:
            0.1471152 = score(doc=283,freq=4.0), product of:
              0.17103471 = queryWeight, product of:
                2.0069723 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.012384532 = queryNorm
              0.86014825 = fieldWeight in 283, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
          0.2562188 = weight(abstract_txt:indexierung in 283) [ClassicSimilarity], result of:
            0.2562188 = score(doc=283,freq=6.0), product of:
              0.24758245 = queryWeight, product of:
                2.957364 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.012384532 = queryNorm
              1.0348827 = fieldWeight in 283, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.0625 = fieldNorm(doc=283)
        0.28 = coord(7/25)
    
  4. Jersek, T.: Automatische DDC-Klassifizierung mit Lingo : Vorgehensweise und Ergebnisse (2012) 0.16
    0.15667135 = sum of:
      0.15667135 = product of:
        1.3055946 = sum of:
          0.03000185 = weight(abstract_txt:wird in 1122) [ClassicSimilarity], result of:
            0.03000185 = score(doc=1122,freq=2.0), product of:
              0.051411767 = queryWeight, product of:
                1.1003491 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.012384532 = queryNorm
              0.58356 = fieldWeight in 1122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.109375 = fieldNorm(doc=1122)
          0.18204577 = weight(abstract_txt:automatischen in 1122) [ClassicSimilarity], result of:
            0.18204577 = score(doc=1122,freq=2.0), product of:
              0.17103471 = queryWeight, product of:
                2.0069723 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.012384532 = queryNorm
              1.0643791 = fieldWeight in 1122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.109375 = fieldNorm(doc=1122)
          1.093547 = weight(abstract_txt:lingo in 1122) [ClassicSimilarity], result of:
            1.093547 = score(doc=1122,freq=2.0), product of:
              0.7670709 = queryWeight, product of:
                6.720274 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              1.425614 = fieldWeight in 1122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=1122)
        0.12 = coord(3/25)
    
  5. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.14
    0.13684793 = sum of:
      0.13684793 = product of:
        0.85529953 = sum of:
          0.037442707 = weight(abstract_txt:automatische in 4954) [ClassicSimilarity], result of:
            0.037442707 = score(doc=4954,freq=1.0), product of:
              0.08654342 = queryWeight, product of:
                1.0094882 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.012384532 = queryNorm
              0.4326465 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.08837195 = weight(abstract_txt:algorithmische in 4954) [ClassicSimilarity], result of:
            0.08837195 = score(doc=4954,freq=1.0), product of:
              0.15341417 = queryWeight, product of:
                1.3440548 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              0.5760351 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.104600884 = weight(abstract_txt:indexierung in 4954) [ClassicSimilarity], result of:
            0.104600884 = score(doc=4954,freq=1.0), product of:
              0.24758245 = queryWeight, product of:
                2.957364 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.012384532 = queryNorm
              0.42248908 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.624884 = weight(abstract_txt:lingo in 4954) [ClassicSimilarity], result of:
            0.624884 = score(doc=4954,freq=2.0), product of:
              0.7670709 = queryWeight, product of:
                6.720274 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012384532 = queryNorm
              0.8146366 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
        0.16 = coord(4/25)