Document (#26336)

Author
Grün, S.
Title
Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo
Imprint
Köln : Fachhochschule, Institut für Informationswissenschaft
Year
2015
Pages
69 S
Abstract
In der deutschen Sprache lassen sich Begriffe durch Komposita und Mehrwortgruppen ausdrücken. Letztere können dabei aber auch als Kompositum selbst ausgedrückt werden und entsprechend auf den gleichen Begriff verweisen. In der nachfolgenden Studie werden Mehrwortgruppen analysiert, die auch Komposita sein können. Ziel der Untersuchung ist es, diese Wortfolgen über Muster zu identifizieren. Analysiert wurden Daten des Karrieremanagers Placement24 GmbH - in Form von Stellenanzeigen. Die Extraktion von Mehrwortgruppen erfolgte algorithmisch und wurde mit der Open-Source Software Lingo durch geführt. Auf der Basis von Erweiterungen bzw. Anpassungen in Wörterbüchern und den darin getaggten Wörtern wurde drei- bis fünfstelligen Kandidaten analysiert. Aus positiv bewerteten Mehrwortgruppen wurden Komposita gebildet. Diese wurden mit den identifizierten Komposita aus den Stellenanzeigen verglichen. Der Vergleich zeigte, dass ein Großteil der neu generierten Komposita nicht durch eine Kompositaidentifizierung erzeugt wurde.
Content
Bachelorarbeit, Studiengang Bibliothekswesen, Fakultät für Informations- und Kommunikationswissenschaften, Fachhochschule Köln
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.57
    0.5693233 = sum of:
      0.5693233 = product of:
        1.1860902 = sum of:
          0.01577183 = weight(abstract_txt:diese in 2054) [ClassicSimilarity], result of:
            0.01577183 = score(doc=2054,freq=4.0), product of:
              0.0465762 = queryWeight, product of:
                1.0313064 = boost
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.010419534 = queryNorm
              0.33862427 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.055336915 = weight(abstract_txt:extraktion in 2054) [ClassicSimilarity], result of:
            0.055336915 = score(doc=2054,freq=3.0), product of:
              0.09394828 = queryWeight, product of:
                1.0357027 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.010419534 = queryNorm
              0.58901465 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.019065749 = weight(abstract_txt:können in 2054) [ClassicSimilarity], result of:
            0.019065749 = score(doc=2054,freq=5.0), product of:
              0.04906538 = queryWeight, product of:
                1.0585059 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.010419534 = queryNorm
              0.38857841 = fieldWeight in 2054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.037909213 = weight(abstract_txt:wörterbüchern in 2054) [ClassicSimilarity], result of:
            0.037909213 = score(doc=2054,freq=1.0), product of:
              0.105296955 = queryWeight, product of:
                1.0964746 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010419534 = queryNorm
              0.36002192 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.0140580395 = weight(abstract_txt:basis in 2054) [ClassicSimilarity], result of:
            0.0140580395 = score(doc=2054,freq=2.0), product of:
              0.054350365 = queryWeight, product of:
                1.1140558 = boost
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.010419534 = queryNorm
              0.25865585 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.0610032 = weight(abstract_txt:algorithmisch in 2054) [ClassicSimilarity], result of:
            0.0610032 = score(doc=2054,freq=2.0), product of:
              0.11476542 = queryWeight, product of:
                1.144712 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010419534 = queryNorm
              0.5315469 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.01572773 = weight(abstract_txt:durch in 2054) [ClassicSimilarity], result of:
            0.01572773 = score(doc=2054,freq=2.0), product of:
              0.06704923 = queryWeight, product of:
                1.515473 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.010419534 = queryNorm
              0.2345699 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.02232349 = weight(abstract_txt:wurde in 2054) [ClassicSimilarity], result of:
            0.02232349 = score(doc=2054,freq=2.0), product of:
              0.084682 = queryWeight, product of:
                1.7031264 = boost
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.010419534 = queryNorm
              0.26361552 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.03544278 = weight(abstract_txt:wurden in 2054) [ClassicSimilarity], result of:
            0.03544278 = score(doc=2054,freq=3.0), product of:
              0.10067876 = queryWeight, product of:
                1.8570356 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.010419534 = queryNorm
              0.35203832 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.15163685 = weight(abstract_txt:lingo in 2054) [ClassicSimilarity], result of:
            0.15163685 = score(doc=2054,freq=4.0), product of:
              0.21059391 = queryWeight, product of:
                2.1929493 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010419534 = queryNorm
              0.72004384 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.07821617 = weight(abstract_txt:analysiert in 2054) [ClassicSimilarity], result of:
            0.07821617 = score(doc=2054,freq=3.0), product of:
              0.17065421 = queryWeight, product of:
                2.4177413 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.010419534 = queryNorm
              0.45833135 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.6795983 = weight(abstract_txt:mehrwortgruppen in 2054) [ClassicSimilarity], result of:
            0.6795983 = score(doc=2054,freq=13.0), product of:
              0.4869223 = queryWeight, product of:
                4.7157474 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010419534 = queryNorm
              1.3957016 = fieldWeight in 2054, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
        0.48 = coord(12/25)
    
  2. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.30
    0.2982438 = sum of:
      0.2982438 = product of:
        0.9320119 = sum of:
          0.015904857 = weight(abstract_txt:basis in 4954) [ClassicSimilarity], result of:
            0.015904857 = score(doc=4954,freq=1.0), product of:
              0.054350365 = queryWeight, product of:
                1.1140558 = boost
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.010419534 = queryNorm
              0.29263568 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.06651361 = weight(abstract_txt:bewerteten in 4954) [ClassicSimilarity], result of:
            0.06651361 = score(doc=4954,freq=1.0), product of:
              0.11197292 = queryWeight, product of:
                1.1306995 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.010419534 = queryNorm
              0.5940152 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.1380345 = weight(abstract_txt:kandidaten in 4954) [ClassicSimilarity], result of:
            0.1380345 = score(doc=4954,freq=4.0), product of:
              0.11476542 = queryWeight, product of:
                1.144712 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010419534 = queryNorm
              1.2027533 = fieldWeight in 4954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.02516437 = weight(abstract_txt:durch in 4954) [ClassicSimilarity], result of:
            0.02516437 = score(doc=4954,freq=2.0), product of:
              0.06704923 = queryWeight, product of:
                1.515473 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.010419534 = queryNorm
              0.37531185 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.05051229 = weight(abstract_txt:wurde in 4954) [ClassicSimilarity], result of:
            0.05051229 = score(doc=4954,freq=4.0), product of:
              0.084682 = queryWeight, product of:
                1.7031264 = boost
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.010419534 = queryNorm
              0.59649384 = fieldWeight in 4954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.03274064 = weight(abstract_txt:wurden in 4954) [ClassicSimilarity], result of:
            0.03274064 = score(doc=4954,freq=1.0), product of:
              0.10067876 = queryWeight, product of:
                1.8570356 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.010419534 = queryNorm
              0.3251991 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.1715575 = weight(abstract_txt:lingo in 4954) [ClassicSimilarity], result of:
            0.1715575 = score(doc=4954,freq=2.0), product of:
              0.21059391 = queryWeight, product of:
                2.1929493 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010419534 = queryNorm
              0.8146366 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
          0.43158412 = weight(abstract_txt:komposita in 4954) [ClassicSimilarity], result of:
            0.43158412 = score(doc=4954,freq=1.0), product of:
              0.70783716 = queryWeight, product of:
                6.963587 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010419534 = queryNorm
              0.6097223 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0625 = fieldNorm(doc=4954)
        0.32 = coord(8/25)
    
  3. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.29
    0.28567258 = sum of:
      0.28567258 = product of:
        1.4283628 = sum of:
          0.02523493 = weight(abstract_txt:diese in 1401) [ClassicSimilarity], result of:
            0.02523493 = score(doc=1401,freq=1.0), product of:
              0.0465762 = queryWeight, product of:
                1.0313064 = boost
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.010419534 = queryNorm
              0.54179883 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.05032874 = weight(abstract_txt:durch in 1401) [ClassicSimilarity], result of:
            0.05032874 = score(doc=1401,freq=2.0), product of:
              0.06704923 = queryWeight, product of:
                1.515473 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.010419534 = queryNorm
              0.7506237 = fieldWeight in 1401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.06548128 = weight(abstract_txt:wurden in 1401) [ClassicSimilarity], result of:
            0.06548128 = score(doc=1401,freq=1.0), product of:
              0.10067876 = queryWeight, product of:
                1.8570356 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.010419534 = queryNorm
              0.6503982 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          0.24261896 = weight(abstract_txt:lingo in 1401) [ClassicSimilarity], result of:
            0.24261896 = score(doc=1401,freq=1.0), product of:
              0.21059391 = queryWeight, product of:
                2.1929493 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010419534 = queryNorm
              1.1520702 = fieldWeight in 1401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
          1.044699 = weight(abstract_txt:mehrwortgruppen in 1401) [ClassicSimilarity], result of:
            1.044699 = score(doc=1401,freq=3.0), product of:
              0.4869223 = queryWeight, product of:
                4.7157474 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010419534 = queryNorm
              2.1455147 = fieldWeight in 1401, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.125 = fieldNorm(doc=1401)
        0.2 = coord(5/25)
    
  4. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.19
    0.18524835 = sum of:
      0.18524835 = product of:
        1.1578022 = sum of:
          0.15494336 = weight(abstract_txt:extraktion in 872) [ClassicSimilarity], result of:
            0.15494336 = score(doc=872,freq=3.0), product of:
              0.09394828 = queryWeight, product of:
                1.0357027 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.010419534 = queryNorm
              1.6492411 = fieldWeight in 872, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.044198256 = weight(abstract_txt:wurde in 872) [ClassicSimilarity], result of:
            0.044198256 = score(doc=872,freq=1.0), product of:
              0.084682 = queryWeight, product of:
                1.7031264 = boost
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.010419534 = queryNorm
              0.5219321 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.2122916 = weight(abstract_txt:lingo in 872) [ClassicSimilarity], result of:
            0.2122916 = score(doc=872,freq=1.0), product of:
              0.21059391 = queryWeight, product of:
                2.1929493 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010419534 = queryNorm
              1.0080614 = fieldWeight in 872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
          0.746369 = weight(abstract_txt:mehrwortgruppen in 872) [ClassicSimilarity], result of:
            0.746369 = score(doc=872,freq=2.0), product of:
              0.4869223 = queryWeight, product of:
                4.7157474 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010419534 = queryNorm
              1.5328298 = fieldWeight in 872, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.109375 = fieldNorm(doc=872)
        0.16 = coord(4/25)
    
  5. Dzeyk, W.: Effektiv und nutzerfreundlich : Einsatz von semantischen Technologien und Usability-Methoden zur Verbesserung der medizinischen Literatursuche (2010) 0.10
    0.096183226 = sum of:
      0.096183226 = product of:
        0.40076345 = sum of:
          0.011152368 = weight(abstract_txt:diese in 416) [ClassicSimilarity], result of:
            0.011152368 = score(doc=416,freq=2.0), product of:
              0.0465762 = queryWeight, product of:
                1.0313064 = boost
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.010419534 = queryNorm
              0.23944351 = fieldWeight in 416, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3343906 = idf(docFreq=1582, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
          0.009940535 = weight(abstract_txt:basis in 416) [ClassicSimilarity], result of:
            0.009940535 = score(doc=416,freq=1.0), product of:
              0.054350365 = queryWeight, product of:
                1.1140558 = boost
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.010419534 = queryNorm
              0.1828973 = fieldWeight in 416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
          0.02724123 = weight(abstract_txt:durch in 416) [ClassicSimilarity], result of:
            0.02724123 = score(doc=416,freq=6.0), product of:
              0.06704923 = queryWeight, product of:
                1.515473 = boost
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.010419534 = queryNorm
              0.406287 = fieldWeight in 416, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.246169 = idf(docFreq=1728, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
          0.04176343 = weight(abstract_txt:wurde in 416) [ClassicSimilarity], result of:
            0.04176343 = score(doc=416,freq=7.0), product of:
              0.084682 = queryWeight, product of:
                1.7031264 = boost
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.010419534 = queryNorm
              0.4931795 = fieldWeight in 416, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
          0.0409258 = weight(abstract_txt:wurden in 416) [ClassicSimilarity], result of:
            0.0409258 = score(doc=416,freq=4.0), product of:
              0.10067876 = queryWeight, product of:
                1.8570356 = boost
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.010419534 = queryNorm
              0.40649888 = fieldWeight in 416, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2031856 = idf(docFreq=663, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
          0.26974007 = weight(abstract_txt:komposita in 416) [ClassicSimilarity], result of:
            0.26974007 = score(doc=416,freq=1.0), product of:
              0.70783716 = queryWeight, product of:
                6.963587 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010419534 = queryNorm
              0.38107646 = fieldWeight in 416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0390625 = fieldNorm(doc=416)
        0.24 = coord(6/25)