Document (#26666)

Author
Adams, K.C.
Title
Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources
Source
http://www.intelligentkm.com/feature/010101/feat1.shtml
Year
2003
Abstract
Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Adams, J.A.: ¬The computer catalog : a democratic or authoritarian technology? (1988) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:adams in 420) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 420, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=420)
    
  2. Adams, B.: Stand der retrospektiven Katalogisierung in Deutschland : zum gegenwärtigen Stand der Diskussion (1992) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:adams in 2367) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 2367, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=2367)
    
  3. Adams, B.: Charles Ami Cutters 'Expansive classification' : eine kritsche Darstellung (1965) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:adams in 4942) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 4942, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=4942)
    
  4. Adams, J.: ¬Le catalogue informatique (1989) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:adams in 5314) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 5314, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=5314)
    
  5. Adams, J.: Identifizierung für Waren mit Hilfe moderner Informationssysteme (1978) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:adams in 155) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 155, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=155)
    

Similar documents (content)

  1. Savic, D.: Designing an expert system for classifying office documents (1994) 0.12
    0.11674931 = sum of:
      0.11674931 = product of:
        0.58374655 = sum of:
          0.028355196 = weight(abstract_txt:this in 2654) [ClassicSimilarity], result of:
            0.028355196 = score(doc=2654,freq=2.0), product of:
              0.066660665 = queryWeight, product of:
                1.3544143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020454086 = queryNorm
              0.42536622 = fieldWeight in 2654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.125 = fieldNorm(doc=2654)
          0.09639122 = weight(abstract_txt:example in 2654) [ClassicSimilarity], result of:
            0.09639122 = score(doc=2654,freq=1.0), product of:
              0.15070912 = queryWeight, product of:
                1.4400299 = boost
                5.1166763 = idf(docFreq=723, maxDocs=44421)
                0.020454086 = queryNorm
              0.63958454 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1166763 = idf(docFreq=723, maxDocs=44421)
                0.125 = fieldNorm(doc=2654)
          0.11980195 = weight(abstract_txt:technology in 2654) [ClassicSimilarity], result of:
            0.11980195 = score(doc=2654,freq=2.0), product of:
              0.15828663 = queryWeight, product of:
                1.8074633 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.020454086 = queryNorm
              0.7568672 = fieldWeight in 2654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.125 = fieldNorm(doc=2654)
          0.20185137 = weight(abstract_txt:automatic in 2654) [ClassicSimilarity], result of:
            0.20185137 = score(doc=2654,freq=1.0), product of:
              0.31079856 = queryWeight, product of:
                2.924531 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.020454086 = queryNorm
              0.64946043 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.125 = fieldNorm(doc=2654)
          0.13734679 = weight(abstract_txt:classification in 2654) [ClassicSimilarity], result of:
            0.13734679 = score(doc=2654,freq=1.0), product of:
              0.2752331 = queryWeight, product of:
                3.3706424 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.020454086 = queryNorm
              0.49901992 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.125 = fieldNorm(doc=2654)
        0.2 = coord(5/25)
    
  2. Broughton, V.: Finding Bliss on the Web : some problems of representing faceted terminologies in digital environments 0.11
    0.11091898 = sum of:
      0.11091898 = product of:
        0.4621624 = sum of:
          0.039596297 = weight(abstract_txt:tools in 519) [ClassicSimilarity], result of:
            0.039596297 = score(doc=519,freq=1.0), product of:
              0.11392812 = queryWeight, product of:
                1.252037 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.020454086 = queryNorm
              0.3475551 = fieldWeight in 519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
          0.021704925 = weight(abstract_txt:this in 519) [ClassicSimilarity], result of:
            0.021704925 = score(doc=519,freq=3.0), product of:
              0.066660665 = queryWeight, product of:
                1.3544143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020454086 = queryNorm
              0.3256032 = fieldWeight in 519, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
          0.060244516 = weight(abstract_txt:example in 519) [ClassicSimilarity], result of:
            0.060244516 = score(doc=519,freq=1.0), product of:
              0.15070912 = queryWeight, product of:
                1.4400299 = boost
                5.1166763 = idf(docFreq=723, maxDocs=44421)
                0.020454086 = queryNorm
              0.39974034 = fieldWeight in 519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1166763 = idf(docFreq=723, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
          0.1261571 = weight(abstract_txt:automatic in 519) [ClassicSimilarity], result of:
            0.1261571 = score(doc=519,freq=1.0), product of:
              0.31079856 = queryWeight, product of:
                2.924531 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.020454086 = queryNorm
              0.40591276 = fieldWeight in 519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
          0.09306099 = weight(abstract_txt:software in 519) [ClassicSimilarity], result of:
            0.09306099 = score(doc=519,freq=1.0), product of:
              0.27332938 = queryWeight, product of:
                3.0663018 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.020454086 = queryNorm
              0.34047198 = fieldWeight in 519, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
          0.12139856 = weight(abstract_txt:classification in 519) [ClassicSimilarity], result of:
            0.12139856 = score(doc=519,freq=2.0), product of:
              0.2752331 = queryWeight, product of:
                3.3706424 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.020454086 = queryNorm
              0.44107544 = fieldWeight in 519, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=519)
        0.24 = coord(6/25)
    
  3. Golub, K.: Automatic subject indexing of text (2019) 0.11
    0.10548022 = sum of:
      0.10548022 = product of:
        0.4395009 = sum of:
          0.018181428 = weight(abstract_txt:into in 268) [ClassicSimilarity], result of:
            0.018181428 = score(doc=268,freq=1.0), product of:
              0.07868402 = queryWeight, product of:
                1.040507 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.020454086 = queryNorm
              0.23106888 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.056977745 = weight(abstract_txt:document in 268) [ClassicSimilarity], result of:
            0.056977745 = score(doc=268,freq=4.0), product of:
              0.106149524 = queryWeight, product of:
                1.2085392 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.020454086 = queryNorm
              0.53676873 = fieldWeight in 268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.010025076 = weight(abstract_txt:this in 268) [ClassicSimilarity], result of:
            0.010025076 = score(doc=268,freq=1.0), product of:
              0.066660665 = queryWeight, product of:
                1.3544143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020454086 = queryNorm
              0.15038967 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.06056243 = weight(abstract_txt:automatically in 268) [ClassicSimilarity], result of:
            0.06056243 = score(doc=268,freq=1.0), product of:
              0.17549714 = queryWeight, product of:
                1.553949 = boost
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.020454086 = queryNorm
              0.3450907 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.1748084 = weight(abstract_txt:automatic in 268) [ClassicSimilarity], result of:
            0.1748084 = score(doc=268,freq=3.0), product of:
              0.31079856 = queryWeight, product of:
                2.924531 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.020454086 = queryNorm
              0.5624492 = fieldWeight in 268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.118945815 = weight(abstract_txt:classification in 268) [ClassicSimilarity], result of:
            0.118945815 = score(doc=268,freq=3.0), product of:
              0.2752331 = queryWeight, product of:
                3.3706424 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.020454086 = queryNorm
              0.43216392 = fieldWeight in 268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
        0.24 = coord(6/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.10
    0.10364193 = sum of:
      0.10364193 = product of:
        0.51820964 = sum of:
          0.015037613 = weight(abstract_txt:this in 3) [ClassicSimilarity], result of:
            0.015037613 = score(doc=3,freq=1.0), product of:
              0.066660665 = queryWeight, product of:
                1.3544143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020454086 = queryNorm
              0.2255845 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.09084365 = weight(abstract_txt:automatically in 3) [ClassicSimilarity], result of:
            0.09084365 = score(doc=3,freq=1.0), product of:
              0.17549714 = queryWeight, product of:
                1.553949 = boost
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.020454086 = queryNorm
              0.51763606 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.15138853 = weight(abstract_txt:automatic in 3) [ClassicSimilarity], result of:
            0.15138853 = score(doc=3,freq=1.0), product of:
              0.31079856 = queryWeight, product of:
                2.924531 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.020454086 = queryNorm
              0.48709533 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.15792973 = weight(abstract_txt:software in 3) [ClassicSimilarity], result of:
            0.15792973 = score(doc=3,freq=2.0), product of:
              0.27332938 = queryWeight, product of:
                3.0663018 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.020454086 = queryNorm
              0.5778001 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.1030101 = weight(abstract_txt:classification in 3) [ClassicSimilarity], result of:
            0.1030101 = score(doc=3,freq=1.0), product of:
              0.2752331 = queryWeight, product of:
                3.3706424 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.020454086 = queryNorm
              0.37426496 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
        0.2 = coord(5/25)
    
  5. Zhu, B.; Chen, H.: Information visualization (2004) 0.10
    0.10266879 = sum of:
      0.10266879 = product of:
        0.32083997 = sum of:
          0.04431641 = weight(abstract_txt:transform in 5276) [ClassicSimilarity], result of:
            0.04431641 = score(doc=5276,freq=1.0), product of:
              0.15473227 = queryWeight, product of:
                1.0317564 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.020454086 = queryNorm
              0.28640702 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.04502212 = weight(abstract_txt:promise in 5276) [ClassicSimilarity], result of:
            0.04502212 = score(doc=5276,freq=1.0), product of:
              0.15637061 = queryWeight, product of:
                1.0372043 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.020454086 = queryNorm
              0.2879193 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.016070263 = weight(abstract_txt:into in 5276) [ClassicSimilarity], result of:
            0.016070263 = score(doc=5276,freq=2.0), product of:
              0.07868402 = queryWeight, product of:
                1.040507 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.020454086 = queryNorm
              0.20423797 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.010852463 = weight(abstract_txt:this in 5276) [ClassicSimilarity], result of:
            0.010852463 = score(doc=5276,freq=3.0), product of:
              0.066660665 = queryWeight, product of:
                1.3544143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020454086 = queryNorm
              0.1628016 = fieldWeight in 5276, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.033980943 = weight(abstract_txt:increasing in 5276) [ClassicSimilarity], result of:
            0.033980943 = score(doc=5276,freq=1.0), product of:
              0.16331954 = queryWeight, product of:
                1.4990662 = boost
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.020454086 = queryNorm
              0.20806417 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3264427 = idf(docFreq=586, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.052566487 = weight(abstract_txt:market in 5276) [ClassicSimilarity], result of:
            0.052566487 = score(doc=5276,freq=1.0), product of:
              0.2184509 = queryWeight, product of:
                1.7337188 = boost
                6.160204 = idf(docFreq=254, maxDocs=44421)
                0.020454086 = queryNorm
              0.24063297 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.160204 = idf(docFreq=254, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.03743811 = weight(abstract_txt:technology in 5276) [ClassicSimilarity], result of:
            0.03743811 = score(doc=5276,freq=2.0), product of:
              0.15828663 = queryWeight, product of:
                1.8074633 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.020454086 = queryNorm
              0.23652099 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
          0.08059318 = weight(abstract_txt:software in 5276) [ClassicSimilarity], result of:
            0.08059318 = score(doc=5276,freq=3.0), product of:
              0.27332938 = queryWeight, product of:
                3.0663018 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.020454086 = queryNorm
              0.29485738 = fieldWeight in 5276, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0390625 = fieldNorm(doc=5276)
        0.32 = coord(8/25)