Document (#26336)

Author
Grün, S.
Title
Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo
Imprint
Köln : Fachhochschule, Institut für Informationswissenschaft
Year
2015
Pages
69 S
Abstract
In der deutschen Sprache lassen sich Begriffe durch Komposita und Mehrwortgruppen ausdrücken. Letztere können dabei aber auch als Kompositum selbst ausgedrückt werden und entsprechend auf den gleichen Begriff verweisen. In der nachfolgenden Studie werden Mehrwortgruppen analysiert, die auch Komposita sein können. Ziel der Untersuchung ist es, diese Wortfolgen über Muster zu identifizieren. Analysiert wurden Daten des Karrieremanagers Placement24 GmbH - in Form von Stellenanzeigen. Die Extraktion von Mehrwortgruppen erfolgte algorithmisch und wurde mit der Open-Source Software Lingo durch geführt. Auf der Basis von Erweiterungen bzw. Anpassungen in Wörterbüchern und den darin getaggten Wörtern wurde drei- bis fünfstelligen Kandidaten analysiert. Aus positiv bewerteten Mehrwortgruppen wurden Komposita gebildet. Diese wurden mit den identifizierten Komposita aus den Stellenanzeigen verglichen. Der Vergleich zeigte, dass ein Großteil der neu generierten Komposita nicht durch eine Kompositaidentifizierung erzeugt wurde.
Content
Bachelorarbeit, Studiengang Bibliothekswesen, Fakultät für Informations- und Kommunikationswissenschaften, Fachhochschule Köln
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.57
    0.5690524 = sum of:
      0.5690524 = product of:
        1.1855259 = sum of:
          0.015794523 = weight(abstract_txt:diese in 1054) [ClassicSimilarity], result of:
            0.015794523 = score(doc=1054,freq=4.0), product of:
              0.04663115 = queryWeight, product of:
                1.0321356 = boost
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.010420751 = queryNorm
              0.33871186 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.055286203 = weight(abstract_txt:extraktion in 1054) [ClassicSimilarity], result of:
            0.055286203 = score(doc=1054,freq=3.0), product of:
              0.093911596 = queryWeight, product of:
                1.0357223 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010420751 = queryNorm
              0.58870476 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.019074231 = weight(abstract_txt:können in 1054) [ClassicSimilarity], result of:
            0.019074231 = score(doc=1054,freq=5.0), product of:
              0.049090765 = queryWeight, product of:
                1.0590065 = boost
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.010420751 = queryNorm
              0.3885503 = fieldWeight in 1054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.03787778 = weight(abstract_txt:wörterbüchern in 1054) [ClassicSimilarity], result of:
            0.03787778 = score(doc=1054,freq=1.0), product of:
              0.10526196 = queryWeight, product of:
                1.0965272 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010420751 = queryNorm
              0.35984302 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.014074568 = weight(abstract_txt:basis in 1054) [ClassicSimilarity], result of:
            0.014074568 = score(doc=1054,freq=2.0), product of:
              0.054404963 = queryWeight, product of:
                1.1148539 = boost
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.010420751 = queryNorm
              0.25870007 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.060956456 = weight(abstract_txt:algorithmisch in 1054) [ClassicSimilarity], result of:
            0.060956456 = score(doc=1054,freq=2.0), product of:
              0.1147321 = queryWeight, product of:
                1.1447909 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010420751 = queryNorm
              0.5312938 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.015751718 = weight(abstract_txt:durch in 1054) [ClassicSimilarity], result of:
            0.015751718 = score(doc=1054,freq=2.0), product of:
              0.06713219 = queryWeight, product of:
                1.5167351 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.010420751 = queryNorm
              0.23463732 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.022329021 = weight(abstract_txt:wurde in 1054) [ClassicSimilarity], result of:
            0.022329021 = score(doc=1054,freq=2.0), product of:
              0.08471468 = queryWeight, product of:
                1.7038199 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.010420751 = queryNorm
              0.26357913 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.035403445 = weight(abstract_txt:wurden in 1054) [ClassicSimilarity], result of:
            0.035403445 = score(doc=1054,freq=3.0), product of:
              0.10062645 = queryWeight, product of:
                1.8569508 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.010420751 = queryNorm
              0.35183042 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.15151112 = weight(abstract_txt:lingo in 1054) [ClassicSimilarity], result of:
            0.15151112 = score(doc=1054,freq=4.0), product of:
              0.21052392 = queryWeight, product of:
                2.1930544 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010420751 = queryNorm
              0.71968603 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.07836134 = weight(abstract_txt:analysiert in 1054) [ClassicSimilarity], result of:
            0.07836134 = score(doc=1054,freq=3.0), product of:
              0.170903 = queryWeight, product of:
                2.420021 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.010420751 = queryNorm
              0.45851353 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.6791056 = weight(abstract_txt:mehrwortgruppen in 1054) [ClassicSimilarity], result of:
            0.6791056 = score(doc=1054,freq=13.0), product of:
              0.48679435 = queryWeight, product of:
                4.7161374 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010420751 = queryNorm
              1.3950564 = fieldWeight in 1054, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
        0.48 = coord(12/25)
    
  2. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.30
    0.2980553 = sum of:
      0.2980553 = product of:
        0.9314228 = sum of:
          0.015923558 = weight(abstract_txt:basis in 3954) [ClassicSimilarity], result of:
            0.015923558 = score(doc=3954,freq=1.0), product of:
              0.054404963 = queryWeight, product of:
                1.1148539 = boost
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.010420751 = queryNorm
              0.29268575 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.06646147 = weight(abstract_txt:bewerteten in 3954) [ClassicSimilarity], result of:
            0.06646147 = score(doc=3954,freq=1.0), product of:
              0.11193909 = queryWeight, product of:
                1.1307708 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.010420751 = queryNorm
              0.5937289 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.13792872 = weight(abstract_txt:kandidaten in 3954) [ClassicSimilarity], result of:
            0.13792872 = score(doc=3954,freq=4.0), product of:
              0.1147321 = queryWeight, product of:
                1.1447909 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010420751 = queryNorm
              1.2021807 = fieldWeight in 3954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.025202747 = weight(abstract_txt:durch in 3954) [ClassicSimilarity], result of:
            0.025202747 = score(doc=3954,freq=2.0), product of:
              0.06713219 = queryWeight, product of:
                1.5167351 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.010420751 = queryNorm
              0.3754197 = fieldWeight in 3954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.050524812 = weight(abstract_txt:wurde in 3954) [ClassicSimilarity], result of:
            0.050524812 = score(doc=3954,freq=4.0), product of:
              0.08471468 = queryWeight, product of:
                1.7038199 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.010420751 = queryNorm
              0.5964115 = fieldWeight in 3954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.0327043 = weight(abstract_txt:wurden in 3954) [ClassicSimilarity], result of:
            0.0327043 = score(doc=3954,freq=1.0), product of:
              0.10062645 = queryWeight, product of:
                1.8569508 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.010420751 = queryNorm
              0.32500702 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.17141525 = weight(abstract_txt:lingo in 3954) [ClassicSimilarity], result of:
            0.17141525 = score(doc=3954,freq=2.0), product of:
              0.21052392 = queryWeight, product of:
                2.1930544 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010420751 = queryNorm
              0.81423175 = fieldWeight in 3954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.43126196 = weight(abstract_txt:komposita in 3954) [ClassicSimilarity], result of:
            0.43126196 = score(doc=3954,freq=1.0), product of:
              0.707641 = queryWeight, product of:
                6.964113 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010420751 = queryNorm
              0.6094361 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
        0.32 = coord(8/25)
    
  3. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.29
    0.28548893 = sum of:
      0.28548893 = product of:
        1.4274447 = sum of:
          0.025271237 = weight(abstract_txt:diese in 401) [ClassicSimilarity], result of:
            0.025271237 = score(doc=401,freq=1.0), product of:
              0.04663115 = queryWeight, product of:
                1.0321356 = boost
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.010420751 = queryNorm
              0.54193896 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.050405495 = weight(abstract_txt:durch in 401) [ClassicSimilarity], result of:
            0.050405495 = score(doc=401,freq=2.0), product of:
              0.06713219 = queryWeight, product of:
                1.5167351 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.010420751 = queryNorm
              0.7508394 = fieldWeight in 401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.0654086 = weight(abstract_txt:wurden in 401) [ClassicSimilarity], result of:
            0.0654086 = score(doc=401,freq=1.0), product of:
              0.10062645 = queryWeight, product of:
                1.8569508 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.010420751 = queryNorm
              0.65001404 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.24241778 = weight(abstract_txt:lingo in 401) [ClassicSimilarity], result of:
            0.24241778 = score(doc=401,freq=1.0), product of:
              0.21052392 = queryWeight, product of:
                2.1930544 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010420751 = queryNorm
              1.1514976 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          1.0439416 = weight(abstract_txt:mehrwortgruppen in 401) [ClassicSimilarity], result of:
            1.0439416 = score(doc=401,freq=3.0), product of:
              0.48679435 = queryWeight, product of:
                4.7161374 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010420751 = queryNorm
              2.144523 = fieldWeight in 401, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
        0.2 = coord(5/25)
    
  4. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.19
    0.18511264 = sum of:
      0.18511264 = product of:
        1.156954 = sum of:
          0.15480137 = weight(abstract_txt:extraktion in 4872) [ClassicSimilarity], result of:
            0.15480137 = score(doc=4872,freq=3.0), product of:
              0.093911596 = queryWeight, product of:
                1.0357223 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010420751 = queryNorm
              1.6483734 = fieldWeight in 4872, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.04420921 = weight(abstract_txt:wurde in 4872) [ClassicSimilarity], result of:
            0.04420921 = score(doc=4872,freq=1.0), product of:
              0.08471468 = queryWeight, product of:
                1.7038199 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.010420751 = queryNorm
              0.52186006 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.21211556 = weight(abstract_txt:lingo in 4872) [ClassicSimilarity], result of:
            0.21211556 = score(doc=4872,freq=1.0), product of:
              0.21052392 = queryWeight, product of:
                2.1930544 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010420751 = queryNorm
              1.0075604 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.7458279 = weight(abstract_txt:mehrwortgruppen in 4872) [ClassicSimilarity], result of:
            0.7458279 = score(doc=4872,freq=2.0), product of:
              0.48679435 = queryWeight, product of:
                4.7161374 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010420751 = queryNorm
              1.5321212 = fieldWeight in 4872, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
        0.16 = coord(4/25)
    
  5. Dzeyk, W.: Effektiv und nutzerfreundlich : Einsatz von semantischen Technologien und Usability-Methoden zur Verbesserung der medizinischen Literatursuche (2010) 0.10
    0.096143104 = sum of:
      0.096143104 = product of:
        0.4005963 = sum of:
          0.011168415 = weight(abstract_txt:diese in 4416) [ClassicSimilarity], result of:
            0.011168415 = score(doc=4416,freq=2.0), product of:
              0.04663115 = queryWeight, product of:
                1.0321356 = boost
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.010420751 = queryNorm
              0.23950545 = fieldWeight in 4416, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3355117 = idf(docFreq=1573, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
          0.009952223 = weight(abstract_txt:basis in 4416) [ClassicSimilarity], result of:
            0.009952223 = score(doc=4416,freq=1.0), product of:
              0.054404963 = queryWeight, product of:
                1.1148539 = boost
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.010420751 = queryNorm
              0.18292859 = fieldWeight in 4416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.682972 = idf(docFreq=1111, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
          0.027282778 = weight(abstract_txt:durch in 4416) [ClassicSimilarity], result of:
            0.027282778 = score(doc=4416,freq=6.0), product of:
              0.06713219 = queryWeight, product of:
                1.5167351 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.010420751 = queryNorm
              0.4064038 = fieldWeight in 4416, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
          0.041773777 = weight(abstract_txt:wurde in 4416) [ClassicSimilarity], result of:
            0.041773777 = score(doc=4416,freq=7.0), product of:
              0.08471468 = queryWeight, product of:
                1.7038199 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.010420751 = queryNorm
              0.49311143 = fieldWeight in 4416, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
          0.040880375 = weight(abstract_txt:wurden in 4416) [ClassicSimilarity], result of:
            0.040880375 = score(doc=4416,freq=4.0), product of:
              0.10062645 = queryWeight, product of:
                1.8569508 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.010420751 = queryNorm
              0.40625876 = fieldWeight in 4416, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
          0.26953873 = weight(abstract_txt:komposita in 4416) [ClassicSimilarity], result of:
            0.26953873 = score(doc=4416,freq=1.0), product of:
              0.707641 = queryWeight, product of:
                6.964113 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010420751 = queryNorm
              0.38089755 = fieldWeight in 4416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4416)
        0.24 = coord(6/25)