Document (#15562)

Author
May, A.D.
Title
Automatic classification of e-mail messages by message type
Source
Journal of the American Society for Information Science. 48(1997) no.1, S.32-39
Year
1997
Abstract
This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
Theme
Automatisches Klassifizieren

Similar documents (content)

  1. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 0.19
    0.18911423 = sum of:
      0.18911423 = product of:
        0.78797597 = sum of:
          0.018596929 = weight(abstract_txt:between in 2234) [ClassicSimilarity], result of:
            0.018596929 = score(doc=2234,freq=2.0), product of:
              0.06085511 = queryWeight, product of:
                1.2374836 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0142235635 = queryNorm
              0.30559355 = fieldWeight in 2234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
          0.08692188 = weight(abstract_txt:predefined in 2234) [ClassicSimilarity], result of:
            0.08692188 = score(doc=2234,freq=1.0), product of:
              0.1701201 = queryWeight, product of:
                1.4630318 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.0142235635 = queryNorm
              0.5109442 = fieldWeight in 2234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
          0.09757219 = weight(abstract_txt:mail in 2234) [ClassicSimilarity], result of:
            0.09757219 = score(doc=2234,freq=2.0), product of:
              0.18374701 = queryWeight, product of:
                2.15031 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0142235635 = queryNorm
              0.5310138 = fieldWeight in 2234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
          0.07354207 = weight(abstract_txt:assigned in 2234) [ClassicSimilarity], result of:
            0.07354207 = score(doc=2234,freq=1.0), product of:
              0.19173592 = queryWeight, product of:
                2.196558 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0142235635 = queryNorm
              0.3835592 = fieldWeight in 2234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
          0.15661561 = weight(abstract_txt:messages in 2234) [ClassicSimilarity], result of:
            0.15661561 = score(doc=2234,freq=1.0), product of:
              0.36330107 = queryWeight, product of:
                3.703138 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0142235635 = queryNorm
              0.4310904 = fieldWeight in 2234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
          0.35472727 = weight(abstract_txt:message in 2234) [ClassicSimilarity], result of:
            0.35472727 = score(doc=2234,freq=2.0), product of:
              0.54736364 = queryWeight, product of:
                5.2486053 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0142235635 = queryNorm
              0.6480651 = fieldWeight in 2234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=2234)
        0.24 = coord(6/25)
    
  2. Goren-Bar, D.; Kuflik, T.: Supporting user-subjective categorization with self-organizing maps and learning vector quantization (2005) 0.14
    0.13630247 = sum of:
      0.13630247 = product of:
        0.567927 = sum of:
          0.05614432 = weight(abstract_txt:system's in 4325) [ClassicSimilarity], result of:
            0.05614432 = score(doc=4325,freq=1.0), product of:
              0.12711792 = queryWeight, product of:
                1.2646761 = boost
                7.0667386 = idf(docFreq=102, maxDocs=44421)
                0.0142235635 = queryNorm
              0.44167116 = fieldWeight in 4325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0667386 = idf(docFreq=102, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
          0.02024428 = weight(abstract_txt:classification in 4325) [ClassicSimilarity], result of:
            0.02024428 = score(doc=4325,freq=1.0), product of:
              0.08113616 = queryWeight, product of:
                1.4288878 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0142235635 = queryNorm
              0.24950996 = fieldWeight in 4325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
          0.06899396 = weight(abstract_txt:mail in 4325) [ClassicSimilarity], result of:
            0.06899396 = score(doc=4325,freq=1.0), product of:
              0.18374701 = queryWeight, product of:
                2.15031 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0142235635 = queryNorm
              0.37548345 = fieldWeight in 4325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
          0.13204505 = weight(abstract_txt:manually in 4325) [ClassicSimilarity], result of:
            0.13204505 = score(doc=4325,freq=2.0), product of:
              0.22481105 = queryWeight, product of:
                2.3784814 = boost
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.0142235635 = queryNorm
              0.58736014 = fieldWeight in 4325, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
          0.1338838 = weight(abstract_txt:automatic in 4325) [ClassicSimilarity], result of:
            0.1338838 = score(doc=4325,freq=4.0), product of:
              0.2061462 = queryWeight, product of:
                2.7894864 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0142235635 = queryNorm
              0.64946043 = fieldWeight in 4325, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
          0.15661561 = weight(abstract_txt:messages in 4325) [ClassicSimilarity], result of:
            0.15661561 = score(doc=4325,freq=1.0), product of:
              0.36330107 = queryWeight, product of:
                3.703138 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0142235635 = queryNorm
              0.4310904 = fieldWeight in 4325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0625 = fieldNorm(doc=4325)
        0.24 = coord(6/25)
    
  3. Golub, K.: Automatic subject indexing of text (2019) 0.12
    0.11816589 = sum of:
      0.11816589 = product of:
        0.42202103 = sum of:
          0.031468797 = weight(abstract_txt:classes in 268) [ClassicSimilarity], result of:
            0.031468797 = score(doc=268,freq=1.0), product of:
              0.086415105 = queryWeight, product of:
                1.0427274 = boost
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.0142235635 = queryNorm
              0.36415854 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.035090353 = weight(abstract_txt:matching in 268) [ClassicSimilarity], result of:
            0.035090353 = score(doc=268,freq=1.0), product of:
              0.092924036 = queryWeight, product of:
                1.0812845 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0142235635 = queryNorm
              0.3776241 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.013150014 = weight(abstract_txt:between in 268) [ClassicSimilarity], result of:
            0.013150014 = score(doc=268,freq=1.0), product of:
              0.06085511 = queryWeight, product of:
                1.2374836 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0142235635 = queryNorm
              0.21608727 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.03506412 = weight(abstract_txt:classification in 268) [ClassicSimilarity], result of:
            0.03506412 = score(doc=268,freq=3.0), product of:
              0.08113616 = queryWeight, product of:
                1.4288878 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0142235635 = queryNorm
              0.43216392 = fieldWeight in 268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.07354207 = weight(abstract_txt:assigned in 268) [ClassicSimilarity], result of:
            0.07354207 = score(doc=268,freq=1.0), product of:
              0.19173592 = queryWeight, product of:
                2.196558 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0142235635 = queryNorm
              0.3835592 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.117758915 = weight(abstract_txt:string in 268) [ClassicSimilarity], result of:
            0.117758915 = score(doc=268,freq=1.0), product of:
              0.26242715 = queryWeight, product of:
                2.5697763 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0142235635 = queryNorm
              0.44872993 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.11594677 = weight(abstract_txt:automatic in 268) [ClassicSimilarity], result of:
            0.11594677 = score(doc=268,freq=3.0), product of:
              0.2061462 = queryWeight, product of:
                2.7894864 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0142235635 = queryNorm
              0.5624492 = fieldWeight in 268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
        0.28 = coord(7/25)
    
  4. Cortese, J.; Lustria, M.L.A.: Can tailoring increase elaboration of health messages delivered via an adaptive educational site on adolescent sexual health and decision making? (2012) 0.12
    0.11506471 = sum of:
      0.11506471 = product of:
        0.7191545 = sum of:
          0.03682946 = weight(abstract_txt:total in 1371) [ClassicSimilarity], result of:
            0.03682946 = score(doc=1371,freq=1.0), product of:
              0.082703985 = queryWeight, product of:
                1.0200915 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0142235635 = queryNorm
              0.4453166 = fieldWeight in 1371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.078125 = fieldNorm(doc=1371)
          0.091927595 = weight(abstract_txt:assigned in 1371) [ClassicSimilarity], result of:
            0.091927595 = score(doc=1371,freq=1.0), product of:
              0.19173592 = queryWeight, product of:
                2.196558 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0142235635 = queryNorm
              0.479449 = fieldWeight in 1371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.078125 = fieldNorm(doc=1371)
          0.27685988 = weight(abstract_txt:messages in 1371) [ClassicSimilarity], result of:
            0.27685988 = score(doc=1371,freq=2.0), product of:
              0.36330107 = queryWeight, product of:
                3.703138 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0142235635 = queryNorm
              0.7620674 = fieldWeight in 1371, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.078125 = fieldNorm(doc=1371)
          0.31353757 = weight(abstract_txt:message in 1371) [ClassicSimilarity], result of:
            0.31353757 = score(doc=1371,freq=1.0), product of:
              0.54736364 = queryWeight, product of:
                5.2486053 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0142235635 = queryNorm
              0.57281405 = fieldWeight in 1371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.078125 = fieldNorm(doc=1371)
        0.16 = coord(4/25)
    
  5. Sebastiani, F.: Classification of text, automatic (2006) 0.11
    0.11307853 = sum of:
      0.11307853 = product of:
        0.47116053 = sum of:
          0.05930432 = weight(abstract_txt:automated in 3) [ClassicSimilarity], result of:
            0.05930432 = score(doc=3,freq=2.0), product of:
              0.07985883 = queryWeight, product of:
                1.0023916 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0142235635 = queryNorm
              0.7426144 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.047203198 = weight(abstract_txt:classes in 3) [ClassicSimilarity], result of:
            0.047203198 = score(doc=3,freq=1.0), product of:
              0.086415105 = queryWeight, product of:
                1.0427274 = boost
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.0142235635 = queryNorm
              0.5462378 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8265367 = idf(docFreq=355, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.03036642 = weight(abstract_txt:classification in 3) [ClassicSimilarity], result of:
            0.03036642 = score(doc=3,freq=1.0), product of:
              0.08113616 = queryWeight, product of:
                1.4288878 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0142235635 = queryNorm
              0.37426496 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.13038282 = weight(abstract_txt:predefined in 3) [ClassicSimilarity], result of:
            0.13038282 = score(doc=3,freq=1.0), product of:
              0.1701201 = queryWeight, product of:
                1.4630318 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.0142235635 = queryNorm
              0.7664163 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.103490934 = weight(abstract_txt:mail in 3) [ClassicSimilarity], result of:
            0.103490934 = score(doc=3,freq=1.0), product of:
              0.18374701 = queryWeight, product of:
                2.15031 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0142235635 = queryNorm
              0.56322515 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.100412846 = weight(abstract_txt:automatic in 3) [ClassicSimilarity], result of:
            0.100412846 = score(doc=3,freq=1.0), product of:
              0.2061462 = queryWeight, product of:
                2.7894864 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0142235635 = queryNorm
              0.48709533 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
        0.24 = coord(6/25)