Document (#32783)

Author
Kumpe, D.
Title
Methoden zur automatischen Indexierung von Dokumenten
Imprint
Berlin : Technische Universität Berlin / Institut für Softwaretechnik und Theoretische Informatik, Computergestützte Informationssysteme
Year
2006
Pages
VII, 147 S
Abstract
Diese Diplomarbeit handelt von der Indexierung von unstrukturierten und natürlichsprachigen Dokumenten. Die zunehmende Informationsflut und die Zahl an veröffentlichten wissenschaftlichen Berichten und Büchern machen eine maschinelle inhaltliche Erschließung notwendig. Um die Anforderungen hierfür besser zu verstehen, werden Probleme der natürlichsprachigen schriftlichen Kommunikation untersucht. Die manuellen Techniken der Indexierung und die Dokumentationssprachen werden vorgestellt. Die Indexierung wird thematisch in den Bereich der inhaltlichen Erschließung und des Information Retrieval eingeordnet. Weiterhin werden Vor- und Nachteile von ausgesuchten Algorithmen untersucht und Softwareprodukte im Bereich des Information Retrieval auf ihre Arbeitsweise hin evaluiert. Anhand von Beispiel-Dokumenten werden die Ergebnisse einzelner Verfahren vorgestellt. Mithilfe des Projekts European Migration Network werden Probleme und grundlegende Anforderungen an die Durchführung einer inhaltlichen Erschließung identifiziert und Lösungsmöglichkeiten vorgeschlagen.
Content
Diplomarbeit
Theme
Automatisches Indexieren

Similar documents (content)

  1. El Jerroudi, F.: Inhaltliche Erschließung in Dokumenten-Management-Systemen, dargestellt am Beispiel der KRAFTWERKSSCHULE e.V (2007) 0.26
    0.26258528 = sum of:
      0.26258528 = product of:
        1.0941054 = sum of:
          0.0892926 = weight(abstract_txt:nachteile in 1527) [ClassicSimilarity], result of:
            0.0892926 = score(doc=1527,freq=1.0), product of:
              0.14747255 = queryWeight, product of:
                1.00729 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.018890454 = queryNorm
              0.6054863 = fieldWeight in 1527, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
          0.236912 = weight(abstract_txt:diplomarbeit in 1527) [ClassicSimilarity], result of:
            0.236912 = score(doc=1527,freq=5.0), product of:
              0.16528581 = queryWeight, product of:
                1.0663916 = boost
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.018890454 = queryNorm
              1.4333475 = fieldWeight in 1527, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
          0.16549431 = weight(abstract_txt:inhaltlichen in 1527) [ClassicSimilarity], result of:
            0.16549431 = score(doc=1527,freq=2.0), product of:
              0.22251344 = queryWeight, product of:
                1.7498146 = boost
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.018890454 = queryNorm
              0.74374974 = fieldWeight in 1527, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
          0.09256116 = weight(abstract_txt:werden in 1527) [ClassicSimilarity], result of:
            0.09256116 = score(doc=1527,freq=5.0), product of:
              0.15104976 = queryWeight, product of:
                2.279523 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.018890454 = queryNorm
              0.6127859 = fieldWeight in 1527, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
          0.2920894 = weight(abstract_txt:erschließung in 1527) [ClassicSimilarity], result of:
            0.2920894 = score(doc=1527,freq=6.0), product of:
              0.2579297 = queryWeight, product of:
                2.3073328 = boost
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.018890454 = queryNorm
              1.132438 = fieldWeight in 1527, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
          0.21775585 = weight(abstract_txt:dokumenten in 1527) [ClassicSimilarity], result of:
            0.21775585 = score(doc=1527,freq=2.0), product of:
              0.3058519 = queryWeight, product of:
                2.5125525 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.018890454 = queryNorm
              0.711965 = fieldWeight in 1527, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.078125 = fieldNorm(doc=1527)
        0.24 = coord(6/25)
    
  2. Halip, I.: Automatische Extrahierung von Schlagworten aus unstrukturierten Texten (2005) 0.22
    0.2153348 = sum of:
      0.2153348 = product of:
        0.76905286 = sum of:
          0.06250482 = weight(abstract_txt:nachteile in 986) [ClassicSimilarity], result of:
            0.06250482 = score(doc=986,freq=1.0), product of:
              0.14747255 = queryWeight, product of:
                1.00729 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.018890454 = queryNorm
              0.42384037 = fieldWeight in 986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.083140254 = weight(abstract_txt:eingeordnet in 986) [ClassicSimilarity], result of:
            0.083140254 = score(doc=986,freq=1.0), product of:
              0.17836504 = queryWeight, product of:
                1.1077807 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.018890454 = queryNorm
              0.46612418 = fieldWeight in 986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.087109335 = weight(abstract_txt:unstrukturierten in 986) [ClassicSimilarity], result of:
            0.087109335 = score(doc=986,freq=1.0), product of:
              0.18399751 = queryWeight, product of:
                1.1251357 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.018890454 = queryNorm
              0.4734267 = fieldWeight in 986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.04402169 = weight(abstract_txt:bereich in 986) [ClassicSimilarity], result of:
            0.04402169 = score(doc=986,freq=1.0), product of:
              0.14708102 = queryWeight, product of:
                1.4226309 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.018890454 = queryNorm
              0.2993023 = fieldWeight in 986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.07097697 = weight(abstract_txt:werden in 986) [ClassicSimilarity], result of:
            0.07097697 = score(doc=986,freq=6.0), product of:
              0.15104976 = queryWeight, product of:
                2.279523 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.018890454 = queryNorm
              0.4698913 = fieldWeight in 986, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.18668678 = weight(abstract_txt:dokumenten in 986) [ClassicSimilarity], result of:
            0.18668678 = score(doc=986,freq=3.0), product of:
              0.3058519 = queryWeight, product of:
                2.5125525 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.018890454 = queryNorm
              0.6103829 = fieldWeight in 986, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
          0.23461299 = weight(abstract_txt:indexierung in 986) [ClassicSimilarity], result of:
            0.23461299 = score(doc=986,freq=2.0), product of:
              0.44875938 = queryWeight, product of:
                3.5142746 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.018890454 = queryNorm
              0.52280354 = fieldWeight in 986, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.0546875 = fieldNorm(doc=986)
        0.28 = coord(7/25)
    
  3. Simon, D.: Anreicherung bibliothekarischer Titeldaten durch Tagging : Möglichkeiten und Probleme (2007) 0.21
    0.2141993 = sum of:
      0.2141993 = product of:
        0.8924971 = sum of:
          0.07587648 = weight(abstract_txt:vorgestellt in 1530) [ClassicSimilarity], result of:
            0.07587648 = score(doc=1530,freq=1.0), product of:
              0.14761421 = queryWeight, product of:
                1.4252071 = boost
                5.4828677 = idf(docFreq=501, maxDocs=44421)
                0.018890454 = queryNorm
              0.51401883 = fieldWeight in 1530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4828677 = idf(docFreq=501, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
          0.13976116 = weight(abstract_txt:untersucht in 1530) [ClassicSimilarity], result of:
            0.13976116 = score(doc=1530,freq=2.0), product of:
              0.17605068 = queryWeight, product of:
                1.5564414 = boost
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.018890454 = queryNorm
              0.79386896 = fieldWeight in 1530, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
          0.14042658 = weight(abstract_txt:inhaltlichen in 1530) [ClassicSimilarity], result of:
            0.14042658 = score(doc=1530,freq=1.0), product of:
              0.22251344 = queryWeight, product of:
                1.7498146 = boost
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.018890454 = queryNorm
              0.63109255 = fieldWeight in 1530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
          0.04967353 = weight(abstract_txt:werden in 1530) [ClassicSimilarity], result of:
            0.04967353 = score(doc=1530,freq=1.0), product of:
              0.15104976 = queryWeight, product of:
                2.279523 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.018890454 = queryNorm
              0.3288554 = fieldWeight in 1530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
          0.20236546 = weight(abstract_txt:erschließung in 1530) [ClassicSimilarity], result of:
            0.20236546 = score(doc=1530,freq=2.0), product of:
              0.2579297 = queryWeight, product of:
                2.3073328 = boost
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.018890454 = queryNorm
              0.784576 = fieldWeight in 1530, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
          0.2843939 = weight(abstract_txt:indexierung in 1530) [ClassicSimilarity], result of:
            0.2843939 = score(doc=1530,freq=1.0), product of:
              0.44875938 = queryWeight, product of:
                3.5142746 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.018890454 = queryNorm
              0.63373363 = fieldWeight in 1530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.09375 = fieldNorm(doc=1530)
        0.24 = coord(6/25)
    
  4. Schwarzendorfer, H.: Inhaltliche Erschließung von Altbeständen in allgemeinen Bibliothekskatalogen : Bestandsaufnahme und Entwicklungsmöglichkeiten (2009) 0.17
    0.17224272 = sum of:
      0.17224272 = product of:
        0.86121356 = sum of:
          0.10116865 = weight(abstract_txt:vorgestellt in 585) [ClassicSimilarity], result of:
            0.10116865 = score(doc=585,freq=1.0), product of:
              0.14761421 = queryWeight, product of:
                1.4252071 = boost
                5.4828677 = idf(docFreq=501, maxDocs=44421)
                0.018890454 = queryNorm
              0.68535846 = fieldWeight in 585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4828677 = idf(docFreq=501, maxDocs=44421)
                0.125 = fieldNorm(doc=585)
          0.13176809 = weight(abstract_txt:untersucht in 585) [ClassicSimilarity], result of:
            0.13176809 = score(doc=585,freq=1.0), product of:
              0.17605068 = queryWeight, product of:
                1.5564414 = boost
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.018890454 = queryNorm
              0.74846685 = fieldWeight in 585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.125 = fieldNorm(doc=585)
          0.2647909 = weight(abstract_txt:inhaltlichen in 585) [ClassicSimilarity], result of:
            0.2647909 = score(doc=585,freq=2.0), product of:
              0.22251344 = queryWeight, product of:
                1.7498146 = boost
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.018890454 = queryNorm
              1.1899996 = fieldWeight in 585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.731654 = idf(docFreq=143, maxDocs=44421)
                0.125 = fieldNorm(doc=585)
          0.0936653 = weight(abstract_txt:werden in 585) [ClassicSimilarity], result of:
            0.0936653 = score(doc=585,freq=2.0), product of:
              0.15104976 = queryWeight, product of:
                2.279523 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.018890454 = queryNorm
              0.6200957 = fieldWeight in 585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.125 = fieldNorm(doc=585)
          0.26982063 = weight(abstract_txt:erschließung in 585) [ClassicSimilarity], result of:
            0.26982063 = score(doc=585,freq=2.0), product of:
              0.2579297 = queryWeight, product of:
                2.3073328 = boost
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.018890454 = queryNorm
              1.0461013 = fieldWeight in 585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.125 = fieldNorm(doc=585)
        0.2 = coord(5/25)
    
  5. Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.17
    0.17184906 = sum of:
      0.17184906 = product of:
        0.8592453 = sum of:
          0.12500964 = weight(abstract_txt:nachteile in 2755) [ClassicSimilarity], result of:
            0.12500964 = score(doc=2755,freq=1.0), product of:
              0.14747255 = queryWeight, product of:
                1.00729 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.018890454 = queryNorm
              0.84768075 = fieldWeight in 2755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.109375 = fieldNorm(doc=2755)
          0.12892298 = weight(abstract_txt:maschinelle in 2755) [ClassicSimilarity], result of:
            0.12892298 = score(doc=2755,freq=1.0), product of:
              0.15053439 = queryWeight, product of:
                1.017693 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.018890454 = queryNorm
              0.8564354 = fieldWeight in 2755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.109375 = fieldNorm(doc=2755)
          0.057952452 = weight(abstract_txt:werden in 2755) [ClassicSimilarity], result of:
            0.057952452 = score(doc=2755,freq=1.0), product of:
              0.15104976 = queryWeight, product of:
                2.279523 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.018890454 = queryNorm
              0.38366464 = fieldWeight in 2755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.109375 = fieldNorm(doc=2755)
          0.2155673 = weight(abstract_txt:dokumenten in 2755) [ClassicSimilarity], result of:
            0.2155673 = score(doc=2755,freq=1.0), product of:
              0.3058519 = queryWeight, product of:
                2.5125525 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.018890454 = queryNorm
              0.7048094 = fieldWeight in 2755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.109375 = fieldNorm(doc=2755)
          0.33179286 = weight(abstract_txt:indexierung in 2755) [ClassicSimilarity], result of:
            0.33179286 = score(doc=2755,freq=1.0), product of:
              0.44875938 = queryWeight, product of:
                3.5142746 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.018890454 = queryNorm
              0.73935586 = fieldWeight in 2755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.109375 = fieldNorm(doc=2755)
        0.2 = coord(5/25)