Document (#30810)

Author
Granitzer, M.
Title
Statistische Verfahren der Textanalyse
Source
Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
Imprint
Berlin : Springer
Year
2006
Pages
S.437-451
Series
X.media.press
Abstract
Der vorliegende Artikel bietet einen Überblick über statistische Verfahren der Textanalyse im Kontext des Semantic Webs. Als Einleitung erfolgt die Diskussion von Methoden und gängigen Techniken zur Vorverarbeitung von Texten wie z. B. Stemming oder Part-of-Speech Tagging. Die so eingeführten Repräsentationsformen dienen als Basis für statistische Merkmalsanalysen sowie für weiterführende Techniken wie Information Extraction und maschinelle Lernverfahren. Die Darstellung dieser speziellen Techniken erfolgt im Überblick, wobei auf die wichtigsten Aspekte in Bezug auf das Semantic Web detailliert eingegangen wird. Die Anwendung der vorgestellten Techniken zur Erstellung und Wartung von Ontologien sowie der Verweis auf weiterführende Literatur bilden den Abschluss dieses Artikels.
Theme
Computerlinguistik
Semantic Web

Similar documents (content)

  1. Franke-Maier, M.; Beck, C.; Kasprzik, A.; Maas, J.F.; Pielmeier, S.; Wiesenmüller, H: ¬Ein Feuerwerk an Algorithmen und der Startschuss zur Bildung eines Kompetenznetzwerks für maschinelle Erschließung : Bericht zur Fachtagung Netzwerk maschinelle Erschließung an der Deutschen Nationalbibliothek am 10. und 11. Oktober 2019 (2020) 0.12
    0.1208898 = sum of:
      0.1208898 = product of:
        0.75556123 = sum of:
          0.1900035 = weight(abstract_txt:maschinelle in 851) [ClassicSimilarity], result of:
            0.1900035 = score(doc=851,freq=4.0), product of:
              0.12941475 = queryWeight, product of:
                1.0661026 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.015502731 = queryNorm
              1.4681749 = fieldWeight in 851, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.09375 = fieldNorm(doc=851)
          0.03906287 = weight(abstract_txt:sowie in 851) [ClassicSimilarity], result of:
            0.03906287 = score(doc=851,freq=1.0), product of:
              0.090160325 = queryWeight, product of:
                1.2584323 = boost
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.015502731 = queryNorm
              0.43326008 = fieldWeight in 851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.09375 = fieldNorm(doc=851)
          0.07578588 = weight(abstract_txt:verfahren in 851) [ClassicSimilarity], result of:
            0.07578588 = score(doc=851,freq=1.0), product of:
              0.1402485 = queryWeight, product of:
                1.5695359 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.015502731 = queryNorm
              0.54036856 = fieldWeight in 851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.09375 = fieldNorm(doc=851)
          0.45070896 = weight(abstract_txt:textanalyse in 851) [ClassicSimilarity], result of:
            0.45070896 = score(doc=851,freq=2.0), product of:
              0.36539295 = queryWeight, product of:
                2.5333908 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015502731 = queryNorm
              1.2334911 = fieldWeight in 851, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.09375 = fieldNorm(doc=851)
        0.16 = coord(4/25)
    
  2. Stollberg, M.: Ontologiebasierte Wissensmodellierung : Verwendung als semantischer Grundbaustein des Semantic Web (2002) 0.11
    0.111052446 = sum of:
      0.111052446 = product of:
        0.46271855 = sum of:
          0.14151786 = weight(abstract_txt:ontologien in 495) [ClassicSimilarity], result of:
            0.14151786 = score(doc=495,freq=16.0), product of:
              0.120080106 = queryWeight, product of:
                1.0269343 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015502731 = queryNorm
              1.1785288 = fieldWeight in 495, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
          0.043664124 = weight(abstract_txt:abschluss in 495) [ClassicSimilarity], result of:
            0.043664124 = score(doc=495,freq=1.0), product of:
              0.13816139 = queryWeight, product of:
                1.1015404 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.015502731 = queryNorm
              0.3160371 = fieldWeight in 495, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
          0.04899591 = weight(abstract_txt:semantic in 495) [ClassicSimilarity], result of:
            0.04899591 = score(doc=495,freq=11.0), product of:
              0.08451929 = queryWeight, product of:
                1.2184285 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.015502731 = queryNorm
              0.5797009 = fieldWeight in 495, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
          0.028191196 = weight(abstract_txt:sowie in 495) [ClassicSimilarity], result of:
            0.028191196 = score(doc=495,freq=3.0), product of:
              0.090160325 = queryWeight, product of:
                1.2584323 = boost
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.015502731 = queryNorm
              0.31267852 = fieldWeight in 495, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
          0.09473235 = weight(abstract_txt:verfahren in 495) [ClassicSimilarity], result of:
            0.09473235 = score(doc=495,freq=9.0), product of:
              0.1402485 = queryWeight, product of:
                1.5695359 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.015502731 = queryNorm
              0.6754607 = fieldWeight in 495, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
          0.105617106 = weight(abstract_txt:techniken in 495) [ClassicSimilarity], result of:
            0.105617106 = score(doc=495,freq=1.0), product of:
              0.39519647 = queryWeight, product of:
                3.7260067 = boost
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.015502731 = queryNorm
              0.26725215 = fieldWeight in 495, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.0390625 = fieldNorm(doc=495)
        0.24 = coord(6/25)
    
  3. Rieger, B.B.: Unscharfe Semantik : die empirische Analyse, quantitative Beschreibung, formale Repräsentation und prozedurale Modellierung vager Wortbedeutungen in Texten (1990) 0.08
    0.07800266 = sum of:
      0.07800266 = product of:
        0.3250111 = sum of:
          0.032456737 = weight(abstract_txt:vorgestellten in 1209) [ClassicSimilarity], result of:
            0.032456737 = score(doc=1209,freq=1.0), product of:
              0.13155684 = queryWeight, product of:
                1.0748895 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015502731 = queryNorm
              0.24671265 = fieldWeight in 1209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
          0.022552958 = weight(abstract_txt:sowie in 1209) [ClassicSimilarity], result of:
            0.022552958 = score(doc=1209,freq=3.0), product of:
              0.090160325 = queryWeight, product of:
                1.2584323 = boost
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.015502731 = queryNorm
              0.2501428 = fieldWeight in 1209, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
          0.043755 = weight(abstract_txt:verfahren in 1209) [ClassicSimilarity], result of:
            0.043755 = score(doc=1209,freq=3.0), product of:
              0.1402485 = queryWeight, product of:
                1.5695359 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.015502731 = queryNorm
              0.31198192 = fieldWeight in 1209, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
          0.02576009 = weight(abstract_txt:überblick in 1209) [ClassicSimilarity], result of:
            0.02576009 = score(doc=1209,freq=1.0), product of:
              0.14208616 = queryWeight, product of:
                1.5797851 = boost
                5.8015704 = idf(docFreq=364, maxDocs=44421)
                0.015502731 = queryNorm
              0.18129908 = fieldWeight in 1209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8015704 = idf(docFreq=364, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
          0.10623312 = weight(abstract_txt:textanalyse in 1209) [ClassicSimilarity], result of:
            0.10623312 = score(doc=1209,freq=1.0), product of:
              0.36539295 = queryWeight, product of:
                2.5333908 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015502731 = queryNorm
              0.29073665 = fieldWeight in 1209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
          0.09425321 = weight(abstract_txt:statistische in 1209) [ClassicSimilarity], result of:
            0.09425321 = score(doc=1209,freq=1.0), product of:
              0.3862022 = queryWeight, product of:
                3.1898856 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.015502731 = queryNorm
              0.24405147 = fieldWeight in 1209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.03125 = fieldNorm(doc=1209)
        0.24 = coord(6/25)
    
  4. Reichenberger, K.: Kompendium semantische Netze : Konzepte, Technologie, Modellierung (2010) 0.07
    0.07251848 = sum of:
      0.07251848 = product of:
        0.45324054 = sum of:
          0.08491072 = weight(abstract_txt:ontologien in 413) [ClassicSimilarity], result of:
            0.08491072 = score(doc=413,freq=1.0), product of:
              0.120080106 = queryWeight, product of:
                1.0269343 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015502731 = queryNorm
              0.7071173 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.09375 = fieldNorm(doc=413)
          0.03906287 = weight(abstract_txt:sowie in 413) [ClassicSimilarity], result of:
            0.03906287 = score(doc=413,freq=1.0), product of:
              0.090160325 = queryWeight, product of:
                1.2584323 = boost
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.015502731 = queryNorm
              0.43326008 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.09375 = fieldNorm(doc=413)
          0.07578588 = weight(abstract_txt:verfahren in 413) [ClassicSimilarity], result of:
            0.07578588 = score(doc=413,freq=1.0), product of:
              0.1402485 = queryWeight, product of:
                1.5695359 = boost
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.015502731 = queryNorm
              0.54036856 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7639313 = idf(docFreq=378, maxDocs=44421)
                0.09375 = fieldNorm(doc=413)
          0.25348106 = weight(abstract_txt:techniken in 413) [ClassicSimilarity], result of:
            0.25348106 = score(doc=413,freq=1.0), product of:
              0.39519647 = queryWeight, product of:
                3.7260067 = boost
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.015502731 = queryNorm
              0.64140517 = fieldWeight in 413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.09375 = fieldNorm(doc=413)
        0.16 = coord(4/25)
    
  5. Budin, G.: Kommunikation in Netzwerken : Terminologiemanagement (2006) 0.07
    0.07157731 = sum of:
      0.07157731 = product of:
        0.44735822 = sum of:
          0.04727303 = weight(abstract_txt:semantic in 700) [ClassicSimilarity], result of:
            0.04727303 = score(doc=700,freq=1.0), product of:
              0.08451929 = queryWeight, product of:
                1.2184285 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.015502731 = queryNorm
              0.55931646 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.125 = fieldNorm(doc=700)
          0.052083828 = weight(abstract_txt:sowie in 700) [ClassicSimilarity], result of:
            0.052083828 = score(doc=700,freq=1.0), product of:
              0.090160325 = queryWeight, product of:
                1.2584323 = boost
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.015502731 = queryNorm
              0.5776801 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.621441 = idf(docFreq=1187, maxDocs=44421)
                0.125 = fieldNorm(doc=700)
          0.24496098 = weight(abstract_txt:repräsentationsformen in 700) [ClassicSimilarity], result of:
            0.24496098 = score(doc=700,freq=1.0), product of:
              0.20087913 = queryWeight, product of:
                1.3282338 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.015502731 = queryNorm
              1.2194446 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.125 = fieldNorm(doc=700)
          0.10304036 = weight(abstract_txt:überblick in 700) [ClassicSimilarity], result of:
            0.10304036 = score(doc=700,freq=1.0), product of:
              0.14208616 = queryWeight, product of:
                1.5797851 = boost
                5.8015704 = idf(docFreq=364, maxDocs=44421)
                0.015502731 = queryNorm
              0.7251963 = fieldWeight in 700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8015704 = idf(docFreq=364, maxDocs=44421)
                0.125 = fieldNorm(doc=700)
        0.16 = coord(4/25)