Document (#26563)

Walther, R.
Möglichkeiten und Grenzen automatischer Klassifikationen von Web-Dokumenten
Bern : Rechts- und Wirtschaftswissenschaftlichen Fakultät
97 S
Automatische Klassifikationen von Web- und andern Textdokumenten ermöglichen es, betriebsinterne und externe Informationen geordnet zugänglich zu machen. Die Forschung zur automatischen Klassifikation hat sich in den letzten Jahren intensiviert. Das Resultat sind verschiedenen Methoden, die heute in der Praxis einzeln oder kombiniert für die Klassifikation im Einsatz sind. In der vorliegenden Lizenziatsarbeit werden neben allgemeinen Grundsätzen einige Methoden zur automatischen Klassifikation genauer betrachtet und ihre Möglichkeiten und Grenzen erörtert. Daneben erfolgt die Präsentation der Resultate aus einer Umfrage bei Anbieterrfirmen von Softwarelösungen zur automatische Klassifikation von Text-Dokumenten. Die Ausführungen dienen der myax internet AG als Basis, ein eigenes Klassifikations-Produkt zu entwickeln
Auch unter:
Lizenziatsarbeit an der Rechts- und Wirtschaftswissenschaftlichen Fakultät der Universität Bern, Institut für Wirtschaftsinformatik (Prof. G. Knolmayer)
Automatisches Klassifizieren

Similar documents (author)

  1. Walther, J.: ¬La construction d'un langage documentaire pluridisciplinaire (1992) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:walther in 2308) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 2308, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=2308)
  2. Walther, C.: Wie Deutschland zur Dezimalklassifikation kam (1957) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:walther in 5008) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 5008, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=5008)
  3. Walther, R.: In vierundzwanzig Bänden um die Welt : Die Neuauflage des 'Großen Brockhaus': wie die Enzyklopädie das Wissen der Gegenwart inventarisiert (1996) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:walther in 6348) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 6348, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=6348)
  4. Walther, R.: Wille und Kraft aller einzelnen Glieder : Mit Abschluß seiner 20. Auflage wird der 'Brockhaus' eingestellt (1999) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:walther in 4388) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 4388, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=4388)
  5. Walther, R.: Wanderung aus gestorbenen Systemen : Bibliotheken bemühen sich, digital archivierte Texte trotz des Wandels der Technik zugänglich zu halten (2003) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:walther in 2483) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 2483, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=2483)

Similar documents (content)

  1. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.28
    0.27643812 = sum of:
      0.27643812 = product of:
        0.98727894 = sum of:
          0.066426545 = weight(abstract_txt:automatischer in 5197) [ClassicSimilarity], result of:
            0.066426545 = score(doc=5197,freq=1.0), product of:
              0.17003484 = queryWeight, product of:
                1.1307113 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.01804362 = queryNorm
              0.39066434 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.038983334 = weight(abstract_txt:möglichkeiten in 5197) [ClassicSimilarity], result of:
            0.038983334 = score(doc=5197,freq=1.0), product of:
              0.1501664 = queryWeight, product of:
                1.5027411 = boost
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.01804362 = queryNorm
              0.2596009 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.08887592 = weight(abstract_txt:methoden in 5197) [ClassicSimilarity], result of:
            0.08887592 = score(doc=5197,freq=4.0), product of:
              0.16386625 = queryWeight, product of:
                1.5697936 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.01804362 = queryNorm
              0.54236865 = fieldWeight in 5197, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.086848065 = weight(abstract_txt:dokumenten in 5197) [ClassicSimilarity], result of:
            0.086848065 = score(doc=5197,freq=2.0), product of:
              0.203306 = queryWeight, product of:
                1.7485292 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.01804362 = queryNorm
              0.42717904 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.1672088 = weight(abstract_txt:automatischen in 5197) [ClassicSimilarity], result of:
            0.1672088 = score(doc=5197,freq=5.0), product of:
              0.23182996 = queryWeight, product of:
                1.8671644 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.01804362 = queryNorm
              0.72125626 = fieldWeight in 5197, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.07612789 = weight(abstract_txt:automatische in 5197) [ClassicSimilarity], result of:
            0.07612789 = score(doc=5197,freq=1.0), product of:
              0.23461151 = queryWeight, product of:
                1.8783324 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.01804362 = queryNorm
              0.32448488 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.46280837 = weight(title_txt:klassifikation in 5197) [ClassicSimilarity], result of:
            0.46280837 = score(doc=5197,freq=1.0), product of:
              0.39074275 = queryWeight, product of:
                3.4281375 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.01804362 = queryNorm
              1.1844324 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.1875 = fieldNorm(doc=5197)
        0.28 = coord(7/25)
  2. Oberhauser, O.: Klassifikation in Online-Informationssystemen (1986) 0.24
    0.23886421 = sum of:
      0.23886421 = product of:
        1.4929013 = sum of:
          0.0460337 = weight(abstract_txt:sind in 1588) [ClassicSimilarity], result of:
            0.0460337 = score(doc=1588,freq=4.0), product of:
              0.07518275 = queryWeight, product of:
                1.0633026 = boost
                3.9186604 = idf(docFreq=2398, maxDocs=44421)
                0.01804362 = queryNorm
              0.6122907 = fieldWeight in 1588, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9186604 = idf(docFreq=2398, maxDocs=44421)
                0.078125 = fieldNorm(doc=1588)
          0.06497223 = weight(abstract_txt:möglichkeiten in 1588) [ClassicSimilarity], result of:
            0.06497223 = score(doc=1588,freq=1.0), product of:
              0.1501664 = queryWeight, product of:
                1.5027411 = boost
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.01804362 = queryNorm
              0.43266818 = fieldWeight in 1588, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.078125 = fieldNorm(doc=1588)
          0.1477397 = weight(abstract_txt:klassifikationen in 1588) [ClassicSimilarity], result of:
            0.1477397 = score(doc=1588,freq=1.0), product of:
              0.25966838 = queryWeight, product of:
                1.9760927 = boost
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.01804362 = queryNorm
              0.56895524 = fieldWeight in 1588, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.078125 = fieldNorm(doc=1588)
          1.2341557 = weight(title_txt:klassifikation in 1588) [ClassicSimilarity], result of:
            1.2341557 = score(doc=1588,freq=1.0), product of:
              0.39074275 = queryWeight, product of:
                3.4281375 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.01804362 = queryNorm
              3.1584864 = fieldWeight in 1588, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.5 = fieldNorm(doc=1588)
        0.16 = coord(4/25)
  3. Degens, P.O.: Hierarchische Klassifikation (1980) 0.23
    0.23113984 = sum of:
      0.23113984 = product of:
        1.9261653 = sum of:
          0.09096111 = weight(abstract_txt:möglichkeiten in 89) [ClassicSimilarity], result of:
            0.09096111 = score(doc=89,freq=1.0), product of:
              0.1501664 = queryWeight, product of:
                1.5027411 = boost
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.01804362 = queryNorm
              0.6057354 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5381527 = idf(docFreq=474, maxDocs=44421)
                0.109375 = fieldNorm(doc=89)
          0.29250965 = weight(abstract_txt:klassifikationen in 89) [ClassicSimilarity], result of:
            0.29250965 = score(doc=89,freq=2.0), product of:
              0.25966838 = queryWeight, product of:
                1.9760927 = boost
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.01804362 = queryNorm
              1.1264739 = fieldWeight in 89, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.109375 = fieldNorm(doc=89)
          1.5426946 = weight(title_txt:klassifikation in 89) [ClassicSimilarity], result of:
            1.5426946 = score(doc=89,freq=1.0), product of:
              0.39074275 = queryWeight, product of:
                3.4281375 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.01804362 = queryNorm
              3.948108 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.625 = fieldNorm(doc=89)
        0.12 = coord(3/25)
  4. Manecke, H.-J.: Klassifikation, Klassieren (2004) 0.21
    0.21177936 = sum of:
      0.21177936 = product of:
        1.764828 = sum of:
          0.02391981 = weight(abstract_txt:sind in 3902) [ClassicSimilarity], result of:
            0.02391981 = score(doc=3902,freq=3.0), product of:
              0.07518275 = queryWeight, product of:
                1.0633026 = boost
                3.9186604 = idf(docFreq=2398, maxDocs=44421)
                0.01804362 = queryNorm
              0.31815556 = fieldWeight in 3902, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9186604 = idf(docFreq=2398, maxDocs=44421)
                0.046875 = fieldNorm(doc=3902)
          0.19821359 = weight(abstract_txt:klassifikationen in 3902) [ClassicSimilarity], result of:
            0.19821359 = score(doc=3902,freq=5.0), product of:
              0.25966838 = queryWeight, product of:
                1.9760927 = boost
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.01804362 = queryNorm
              0.76333356 = fieldWeight in 3902, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.046875 = fieldNorm(doc=3902)
          1.5426946 = weight(title_txt:klassifikation in 3902) [ClassicSimilarity], result of:
            1.5426946 = score(doc=3902,freq=1.0), product of:
              0.39074275 = queryWeight, product of:
                3.4281375 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.01804362 = queryNorm
              3.948108 = fieldWeight in 3902, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.625 = fieldNorm(doc=3902)
        0.12 = coord(3/25)
  5. Krauth, J.: Evaluation von Verfahren der automatischen Klassifikation (1983) 0.18
    0.18413438 = sum of:
      0.18413438 = product of:
        1.5344532 = sum of:
          0.19940813 = weight(abstract_txt:automatischen in 111) [ClassicSimilarity], result of:
            0.19940813 = score(doc=111,freq=1.0), product of:
              0.23182996 = queryWeight, product of:
                1.8671644 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.01804362 = queryNorm
              0.86014825 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.125 = fieldNorm(doc=111)
          0.40942824 = weight(abstract_txt:klassifikationen in 111) [ClassicSimilarity], result of:
            0.40942824 = score(doc=111,freq=3.0), product of:
              0.25966838 = queryWeight, product of:
                1.9760927 = boost
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.01804362 = queryNorm
              1.576735 = fieldWeight in 111, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.282627 = idf(docFreq=82, maxDocs=44421)
                0.125 = fieldNorm(doc=111)
          0.92561674 = weight(title_txt:klassifikation in 111) [ClassicSimilarity], result of:
            0.92561674 = score(doc=111,freq=1.0), product of:
              0.39074275 = queryWeight, product of:
                3.4281375 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.01804362 = queryNorm
              2.3688648 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.375 = fieldNorm(doc=111)
        0.12 = coord(3/25)