Document (#40195)

Author
Bredack, J.
Title
Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich
Imprint
Trier : Universität / Fachbereich II
Year
2016
Pages
V, 98 S
Abstract
In dieser Untersuchung wurden zwei Systeme eingesetzt, um MWT aus einer Dokumentkollektion mit fachsprachlichem Bezug (Volltexte des ACL Anthology Reference Corpus) automatisch zu extrahieren. Das thematische Spektrum umfasste alle Bereiche der natürlichen Sprachverarbeitung, im Speziellen die CL als interdisziplinäre Wissenschaft. Ziel war es MWT zu extrahieren, die als potentielle Indexterme im IR Verwendung finden können. Diese sollten auf Konzepte, Methoden, Verfahren und Algorithmen in der CL und angrenzenden Teilgebieten, wie Linguistik und Informatik hinweisen bzw. benennen.
Als Extraktionssysteme wurden der TreeTagger und die Indexierungssoftware Lingo verwendet. Der TreeTagger basiert auf einem statistischen Tagging- und Chunking- Algorithmus, mit dessen Hilfe NPs automatisch identifiziert und extrahiert werden. Er kann für verschiedene Anwendungsszenarien der natürlichen Sprachverarbeitung eingesetzt werden, in erster Linie als POS-Tagger für unterschiedliche Sprachen. Das Indexierungssystem Lingo arbeitet im Gegensatz zum TreeTagger mit elektronischen Wörterbüchern und einem musterbasierten Abgleich. Lingo ist ein auf automatische Indexierung ausgerichtetes System, was eine Vielzahl von Modulen mitliefert, die individuell auf eine bestimmte Aufgabenstellung angepasst und aufeinander abgestimmt werden können. Die unterschiedlichen Verarbeitungsweisen haben sich in den Ergebnismengen beider Systeme deutlich gezeigt. Die gering ausfallenden Übereinstimmungen der Ergebnismengen verdeutlichen die abweichende Funktionsweise und konnte mit einer qualitativen Analyse beispielhaft beschrieben werden. In der vorliegenden Arbeit kann abschließend nicht geklärt werden, welches der beiden Systeme bevorzugt für die Generierung von Indextermen eingesetzt werden sollte.
Content
Schriftliche Hausarbeit (Masterarbeit) zur Erlangung des Grades eines Master of Arts An der Universität Trier Fachbereich II Studiengang Computerlinguistik.
Theme
Automatisches Indexieren
Computerlinguistik
Object
Lingo
TreeTagger

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.34
    0.33690256 = sum of:
      0.33690256 = product of:
        0.93584037 = sum of:
          0.02762485 = weight(abstract_txt:können in 2054) [ClassicSimilarity], result of:
            0.02762485 = score(doc=2054,freq=5.0), product of:
              0.071092084 = queryWeight, product of:
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.015980398 = queryNorm
              0.38857841 = fieldWeight in 2054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.07384654 = weight(abstract_txt:extrahiert in 2054) [ClassicSimilarity], result of:
            0.07384654 = score(doc=2054,freq=2.0), product of:
              0.14750658 = queryWeight, product of:
                1.0185447 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.015980398 = queryNorm
              0.50063217 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.054927614 = weight(abstract_txt:wörterbüchern in 2054) [ClassicSimilarity], result of:
            0.054927614 = score(doc=2054,freq=1.0), product of:
              0.15256742 = queryWeight, product of:
                1.0358701 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015980398 = queryNorm
              0.36002192 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.06023329 = weight(abstract_txt:indexierungssystem in 2054) [ClassicSimilarity], result of:
            0.06023329 = score(doc=2054,freq=1.0), product of:
              0.16224042 = queryWeight, product of:
                1.0682033 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.015980398 = queryNorm
              0.37125948 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.085182734 = weight(abstract_txt:indexterme in 2054) [ClassicSimilarity], result of:
            0.085182734 = score(doc=2054,freq=2.0), product of:
              0.16224042 = queryWeight, product of:
                1.0682033 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.015980398 = queryNorm
              0.52504015 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.03423596 = weight(abstract_txt:wurden in 2054) [ClassicSimilarity], result of:
            0.03423596 = score(doc=2054,freq=3.0), product of:
              0.09725067 = queryWeight, product of:
                1.1695955 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.015980398 = queryNorm
              0.35203832 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.21883333 = weight(abstract_txt:extrahieren in 2054) [ClassicSimilarity], result of:
            0.21883333 = score(doc=2054,freq=5.0), product of:
              0.2825077 = queryWeight, product of:
                1.9934462 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.015980398 = queryNorm
              0.77461016 = fieldWeight in 2054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.051390324 = weight(abstract_txt:werden in 2054) [ClassicSimilarity], result of:
            0.051390324 = score(doc=2054,freq=8.0), product of:
              0.13259973 = queryWeight, product of:
                2.3654912 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015980398 = queryNorm
              0.3875598 = fieldWeight in 2054, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.32956573 = weight(abstract_txt:lingo in 2054) [ClassicSimilarity], result of:
            0.32956573 = score(doc=2054,freq=4.0), product of:
              0.4577023 = queryWeight, product of:
                3.1076105 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015980398 = queryNorm
              0.72004384 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
        0.36 = coord(9/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.17
    0.1677544 = sum of:
      0.1677544 = product of:
        0.838772 = sum of:
          0.04112943 = weight(abstract_txt:kann in 1401) [ClassicSimilarity], result of:
            0.04112943 = score(doc=1401,freq=1.0), product of:
              0.07299276 = queryWeight, product of:
                1.0132796 = boost
                4.507782 = idf(docFreq=1330, maxDocs=44421)
                0.015980398 = queryNorm
              0.56347275 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.507782 = idf(docFreq=1330, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.06325166 = weight(abstract_txt:wurden in 1401) [ClassicSimilarity], result of:
            0.06325166 = score(doc=1401,freq=1.0), product of:
              0.09725067 = queryWeight, product of:
                1.1695955 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.015980398 = queryNorm
              0.6503982 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.14894418 = weight(abstract_txt:automatische in 1401) [ClassicSimilarity], result of:
            0.14894418 = score(doc=1401,freq=1.0), product of:
              0.1721315 = queryWeight, product of:
                1.5560358 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.015980398 = queryNorm
              0.865293 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.05814152 = weight(abstract_txt:werden in 1401) [ClassicSimilarity], result of:
            0.05814152 = score(doc=1401,freq=1.0), product of:
              0.13259973 = queryWeight, product of:
                2.3654912 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015980398 = queryNorm
              0.43847388 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.5273052 = weight(abstract_txt:lingo in 1401) [ClassicSimilarity], result of:
            0.5273052 = score(doc=1401,freq=1.0), product of:
              0.4577023 = queryWeight, product of:
                3.1076105 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015980398 = queryNorm
              1.1520702 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
        0.2 = coord(5/25)
    
  3. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.16
    0.16299546 = sum of:
      0.16299546 = product of:
        0.6791478 = sum of:
          0.081654154 = weight(abstract_txt:bevorzugt in 4954) [ClassicSimilarity], result of:
            0.081654154 = score(doc=4954,freq=1.0), product of:
              0.14526919 = queryWeight, product of:
                1.0107905 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.015980398 = queryNorm
              0.5620886 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.03162583 = weight(abstract_txt:wurden in 4954) [ClassicSimilarity], result of:
            0.03162583 = score(doc=4954,freq=1.0), product of:
              0.09725067 = queryWeight, product of:
                1.1695955 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.015980398 = queryNorm
              0.3251991 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.07447209 = weight(abstract_txt:automatische in 4954) [ClassicSimilarity], result of:
            0.07447209 = score(doc=4954,freq=1.0), product of:
              0.1721315 = queryWeight, product of:
                1.5560358 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.015980398 = queryNorm
              0.4326465 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.08946388 = weight(abstract_txt:eingesetzt in 4954) [ClassicSimilarity], result of:
            0.08946388 = score(doc=4954,freq=1.0), product of:
              0.22266924 = queryWeight, product of:
                2.16753 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.015980398 = queryNorm
              0.40177926 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.02907076 = weight(abstract_txt:werden in 4954) [ClassicSimilarity], result of:
            0.02907076 = score(doc=4954,freq=1.0), product of:
              0.13259973 = queryWeight, product of:
                2.3654912 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015980398 = queryNorm
              0.21923694 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.37286106 = weight(abstract_txt:lingo in 4954) [ClassicSimilarity], result of:
            0.37286106 = score(doc=4954,freq=2.0), product of:
              0.4577023 = queryWeight, product of:
                3.1076105 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015980398 = queryNorm
              0.8146366 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
        0.24 = coord(6/25)
    
  4. Lepsky, K.: Automatisches Indexieren (2023) 0.16
    0.16183694 = sum of:
      0.16183694 = product of:
        0.6743206 = sum of:
          0.029650098 = weight(abstract_txt:können in 1782) [ClassicSimilarity], result of:
            0.029650098 = score(doc=1782,freq=1.0), product of:
              0.071092084 = queryWeight, product of:
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.015980398 = queryNorm
              0.4170661 = fieldWeight in 1782, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
          0.043624345 = weight(abstract_txt:kann in 1782) [ClassicSimilarity], result of:
            0.043624345 = score(doc=1782,freq=2.0), product of:
              0.07299276 = queryWeight, product of:
                1.0132796 = boost
                4.507782 = idf(docFreq=1330, maxDocs=44421)
                0.015980398 = queryNorm
              0.5976531 = fieldWeight in 1782, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.507782 = idf(docFreq=1330, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
          0.2503851 = weight(abstract_txt:indexterme in 1782) [ClassicSimilarity], result of:
            0.2503851 = score(doc=1782,freq=3.0), product of:
              0.16224042 = queryWeight, product of:
                1.0682033 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.015980398 = queryNorm
              1.5432968 = fieldWeight in 1782, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
          0.111708134 = weight(abstract_txt:automatische in 1782) [ClassicSimilarity], result of:
            0.111708134 = score(doc=1782,freq=1.0), product of:
              0.1721315 = queryWeight, product of:
                1.5560358 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.015980398 = queryNorm
              0.64896977 = fieldWeight in 1782, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
          0.16342485 = weight(abstract_txt:automatisch in 1782) [ClassicSimilarity], result of:
            0.16342485 = score(doc=1782,freq=2.0), product of:
              0.1760648 = queryWeight, product of:
                1.5737134 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.015980398 = queryNorm
              0.9282085 = fieldWeight in 1782, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
          0.07552804 = weight(abstract_txt:werden in 1782) [ClassicSimilarity], result of:
            0.07552804 = score(doc=1782,freq=3.0), product of:
              0.13259973 = queryWeight, product of:
                2.3654912 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015980398 = queryNorm
              0.56959426 = fieldWeight in 1782, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.09375 = fieldNorm(doc=1782)
        0.24 = coord(6/25)
    
  5. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.15
    0.14670931 = sum of:
      0.14670931 = product of:
        0.91693324 = sum of:
          0.16865322 = weight(abstract_txt:indexierungssystem in 872) [ClassicSimilarity], result of:
            0.16865322 = score(doc=872,freq=1.0), product of:
              0.16224042 = queryWeight, product of:
                1.0682033 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.015980398 = queryNorm
              1.0395266 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.13032615 = weight(abstract_txt:automatische in 872) [ClassicSimilarity], result of:
            0.13032615 = score(doc=872,freq=1.0), product of:
              0.1721315 = queryWeight, product of:
                1.5560358 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.015980398 = queryNorm
              0.7571314 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.1565618 = weight(abstract_txt:eingesetzt in 872) [ClassicSimilarity], result of:
            0.1565618 = score(doc=872,freq=1.0), product of:
              0.22266924 = queryWeight, product of:
                2.16753 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.015980398 = queryNorm
              0.70311373 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.46139205 = weight(abstract_txt:lingo in 872) [ClassicSimilarity], result of:
            0.46139205 = score(doc=872,freq=1.0), product of:
              0.4577023 = queryWeight, product of:
                3.1076105 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015980398 = queryNorm
              1.0080614 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
        0.16 = coord(4/25)