Document (#37563)

Author
Hotho, A.
Bloehdorn, S.
Title
Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts
Source
Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK
Imprint
Washington, DC : IEEE Computer Society
Year
2004
Pages
S.331-334
Abstract
Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content
Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (content)

  1. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.13
    0.13227591 = sum of:
      0.13227591 = product of:
        0.5511496 = sum of:
          0.0456931 = weight(abstract_txt:mining in 51) [ClassicSimilarity], result of:
            0.0456931 = score(doc=51,freq=1.0), product of:
              0.13537358 = queryWeight, product of:
                1.0125291 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.021661961 = queryNorm
              0.33753335 = fieldWeight in 51, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
          0.02171181 = weight(abstract_txt:based in 51) [ClassicSimilarity], result of:
            0.02171181 = score(doc=51,freq=3.0), product of:
              0.07201126 = queryWeight, product of:
                1.0443733 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.021661961 = queryNorm
              0.30150574 = fieldWeight in 51, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
          0.05330733 = weight(abstract_txt:document in 51) [ClassicSimilarity], result of:
            0.05330733 = score(doc=51,freq=3.0), product of:
              0.13105725 = queryWeight, product of:
                1.4089191 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021661961 = queryNorm
              0.4067484 = fieldWeight in 51, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
          0.15798809 = weight(abstract_txt:words in 51) [ClassicSimilarity], result of:
            0.15798809 = score(doc=51,freq=7.0), product of:
              0.20387332 = queryWeight, product of:
                1.7572589 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.021661961 = queryNorm
              0.7749326 = fieldWeight in 51, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
          0.13374615 = weight(abstract_txt:classification in 51) [ClassicSimilarity], result of:
            0.13374615 = score(doc=51,freq=13.0), product of:
              0.16990794 = queryWeight, product of:
                1.9647533 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.021661961 = queryNorm
              0.7871683 = fieldWeight in 51, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
          0.13870311 = weight(abstract_txt:text in 51) [ClassicSimilarity], result of:
            0.13870311 = score(doc=51,freq=13.0), product of:
              0.17408055 = queryWeight, product of:
                1.9887322 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021661961 = queryNorm
              0.7967754 = fieldWeight in 51, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=51)
        0.24 = coord(6/25)
    
  2. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.13
    0.12683561 = sum of:
      0.12683561 = product of:
        0.63417804 = sum of:
          0.07758536 = weight(abstract_txt:extracted in 1601) [ClassicSimilarity], result of:
            0.07758536 = score(doc=1601,freq=1.0), product of:
              0.1345131 = queryWeight, product of:
                1.009306 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.021661961 = queryNorm
              0.5767867 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.09375 = fieldNorm(doc=1601)
          0.02148912 = weight(abstract_txt:based in 1601) [ClassicSimilarity], result of:
            0.02148912 = score(doc=1601,freq=1.0), product of:
              0.07201126 = queryWeight, product of:
                1.0443733 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.021661961 = queryNorm
              0.2984133 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=1601)
          0.034889214 = weight(abstract_txt:approach in 1601) [ClassicSimilarity], result of:
            0.034889214 = score(doc=1601,freq=1.0), product of:
              0.099475354 = queryWeight, product of:
                1.2274768 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.021661961 = queryNorm
              0.35073224 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.09375 = fieldNorm(doc=1601)
          0.052760575 = weight(abstract_txt:document in 1601) [ClassicSimilarity], result of:
            0.052760575 = score(doc=1601,freq=1.0), product of:
              0.13105725 = queryWeight, product of:
                1.4089191 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021661961 = queryNorm
              0.40257657 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=1601)
          0.4474538 = weight(abstract_txt:boosting in 1601) [ClassicSimilarity], result of:
            0.4474538 = score(doc=1601,freq=1.0), product of:
              0.5450297 = queryWeight, product of:
                2.8731985 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.021661961 = queryNorm
              0.8209714 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.09375 = fieldNorm(doc=1601)
        0.2 = coord(5/25)
    
  3. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.11
    0.11434207 = sum of:
      0.11434207 = product of:
        0.47642532 = sum of:
          0.13055171 = weight(abstract_txt:mining in 3697) [ClassicSimilarity], result of:
            0.13055171 = score(doc=3697,freq=4.0), product of:
              0.13537358 = queryWeight, product of:
                1.0125291 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.021661961 = queryNorm
              0.96438104 = fieldWeight in 3697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.017907599 = weight(abstract_txt:based in 3697) [ClassicSimilarity], result of:
            0.017907599 = score(doc=3697,freq=1.0), product of:
              0.07201126 = queryWeight, product of:
                1.0443733 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.021661961 = queryNorm
              0.24867775 = fieldWeight in 3697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.095640756 = weight(abstract_txt:corpora in 3697) [ClassicSimilarity], result of:
            0.095640756 = score(doc=3697,freq=1.0), product of:
              0.17463349 = queryWeight, product of:
                1.1500171 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.021661961 = queryNorm
              0.5476656 = fieldWeight in 3697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.03553174 = weight(abstract_txt:through in 3697) [ClassicSimilarity], result of:
            0.03553174 = score(doc=3697,freq=1.0), product of:
              0.11370682 = queryWeight, product of:
                1.3123474 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.021661961 = queryNorm
              0.31248558 = fieldWeight in 3697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.062178936 = weight(abstract_txt:document in 3697) [ClassicSimilarity], result of:
            0.062178936 = score(doc=3697,freq=2.0), product of:
              0.13105725 = queryWeight, product of:
                1.4089191 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021661961 = queryNorm
              0.47444102 = fieldWeight in 3697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.13461459 = weight(abstract_txt:text in 3697) [ClassicSimilarity], result of:
            0.13461459 = score(doc=3697,freq=6.0), product of:
              0.17408055 = queryWeight, product of:
                1.9887322 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021661961 = queryNorm
              0.7732891 = fieldWeight in 3697, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
        0.24 = coord(6/25)
    
  4. Pearce, C.; Nicholas, C.: TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data (1996) 0.11
    0.11262913 = sum of:
      0.11262913 = product of:
        0.46928805 = sum of:
          0.017907599 = weight(abstract_txt:based in 4139) [ClassicSimilarity], result of:
            0.017907599 = score(doc=4139,freq=1.0), product of:
              0.07201126 = queryWeight, product of:
                1.0443733 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.021661961 = queryNorm
              0.24867775 = fieldWeight in 4139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
          0.07693896 = weight(abstract_txt:typically in 4139) [ClassicSimilarity], result of:
            0.07693896 = score(doc=4139,freq=1.0), product of:
              0.15105313 = queryWeight, product of:
                1.0695606 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.021661961 = queryNorm
              0.5093503 = fieldWeight in 4139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
          0.13525645 = weight(abstract_txt:corpora in 4139) [ClassicSimilarity], result of:
            0.13525645 = score(doc=4139,freq=2.0), product of:
              0.17463349 = queryWeight, product of:
                1.1500171 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.021661961 = queryNorm
              0.77451617 = fieldWeight in 4139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
          0.043967146 = weight(abstract_txt:document in 4139) [ClassicSimilarity], result of:
            0.043967146 = score(doc=4139,freq=1.0), product of:
              0.13105725 = queryWeight, product of:
                1.4089191 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021661961 = queryNorm
              0.33548045 = fieldWeight in 4139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
          0.08530556 = weight(abstract_txt:words in 4139) [ClassicSimilarity], result of:
            0.08530556 = score(doc=4139,freq=1.0), product of:
              0.20387332 = queryWeight, product of:
                1.7572589 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.021661961 = queryNorm
              0.4184243 = fieldWeight in 4139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
          0.10991234 = weight(abstract_txt:text in 4139) [ClassicSimilarity], result of:
            0.10991234 = score(doc=4139,freq=4.0), product of:
              0.17408055 = queryWeight, product of:
                1.9887322 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021661961 = queryNorm
              0.6313878 = fieldWeight in 4139, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=4139)
        0.24 = coord(6/25)
    
  5. Lund, N.W.: Document, text and medium : concepts, theories and disciplines (2010) 0.10
    0.10307036 = sum of:
      0.10307036 = product of:
        0.42945987 = sum of:
          0.028782133 = weight(abstract_txt:approach in 149) [ClassicSimilarity], result of:
            0.028782133 = score(doc=149,freq=2.0), product of:
              0.099475354 = queryWeight, product of:
                1.2274768 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.021661961 = queryNorm
              0.28933933 = fieldWeight in 149, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
          0.024872217 = weight(abstract_txt:through in 149) [ClassicSimilarity], result of:
            0.024872217 = score(doc=149,freq=1.0), product of:
              0.11370682 = queryWeight, product of:
                1.3123474 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.021661961 = queryNorm
              0.2187399 = fieldWeight in 149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
          0.075387955 = weight(abstract_txt:document in 149) [ClassicSimilarity], result of:
            0.075387955 = score(doc=149,freq=6.0), product of:
              0.13105725 = queryWeight, product of:
                1.4089191 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021661961 = queryNorm
              0.57522917 = fieldWeight in 149, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
          0.08444819 = weight(abstract_txt:words in 149) [ClassicSimilarity], result of:
            0.08444819 = score(doc=149,freq=2.0), product of:
              0.20387332 = queryWeight, product of:
                1.7572589 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.021661961 = queryNorm
              0.41421893 = fieldWeight in 149, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
          0.10880767 = weight(abstract_txt:text in 149) [ClassicSimilarity], result of:
            0.10880767 = score(doc=149,freq=8.0), product of:
              0.17408055 = queryWeight, product of:
                1.9887322 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021661961 = queryNorm
              0.6250421 = fieldWeight in 149, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
          0.1071617 = weight(abstract_txt:classical in 149) [ClassicSimilarity], result of:
            0.1071617 = score(doc=149,freq=1.0), product of:
              0.30107167 = queryWeight, product of:
                2.1354554 = boost
                6.5085106 = idf(docFreq=179, maxDocs=44421)
                0.021661961 = queryNorm
              0.35593417 = fieldWeight in 149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5085106 = idf(docFreq=179, maxDocs=44421)
                0.0546875 = fieldNorm(doc=149)
        0.24 = coord(6/25)