Document (#15563)

Author
May, A.D.
Title
Automatic classification of e-mail messages by message type
Source
Journal of the American Society for Information Science. 48(1997) no.1, S.32-39
Year
1997
Abstract
This article describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of 4 classes: questions, responses, announcement or administartive. A total of 1.372 messages were analyzed. The automatic classification of a message was based on string matching between a message text and predefined string sets for each of the massage types. The system's automated ability to accurately classify a message was compared against manually assigned codes. The Cohen's Kappa of .55 suggested that there was a statistical agreement between the automatic and manually assigned codes
Theme
Automatisches Klassifizieren

Similar documents (content)

  1. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 0.19
    0.18937336 = sum of:
      0.18937336 = product of:
        0.7890557 = sum of:
          0.018675711 = weight(abstract_txt:between in 1234) [ClassicSimilarity], result of:
            0.018675711 = score(doc=1234,freq=2.0), product of:
              0.06100725 = queryWeight, product of:
                1.2401518 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0142038455 = queryNorm
              0.3061228 = fieldWeight in 1234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
          0.08669225 = weight(abstract_txt:predefined in 1234) [ClassicSimilarity], result of:
            0.08669225 = score(doc=1234,freq=1.0), product of:
              0.1697658 = queryWeight, product of:
                1.4628313 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.0142038455 = queryNorm
              0.5106579 = fieldWeight in 1234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
          0.09725543 = weight(abstract_txt:mail in 1234) [ClassicSimilarity], result of:
            0.09725543 = score(doc=1234,freq=2.0), product of:
              0.18329021 = queryWeight, product of:
                2.1495807 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0142038455 = queryNorm
              0.53060895 = fieldWeight in 1234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
          0.07358305 = weight(abstract_txt:assigned in 1234) [ClassicSimilarity], result of:
            0.07358305 = score(doc=1234,freq=1.0), product of:
              0.19174552 = queryWeight, product of:
                2.1986027 = boost
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0142038455 = queryNorm
              0.3837537 = fieldWeight in 1234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
          0.15727931 = weight(abstract_txt:messages in 1234) [ClassicSimilarity], result of:
            0.15727931 = score(doc=1234,freq=1.0), product of:
              0.3642097 = queryWeight, product of:
                3.7111244 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0142038455 = queryNorm
              0.43183723 = fieldWeight in 1234, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
          0.35556993 = weight(abstract_txt:message in 1234) [ClassicSimilarity], result of:
            0.35556993 = score(doc=1234,freq=2.0), product of:
              0.54805404 = queryWeight, product of:
                5.256671 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0142038455 = queryNorm
              0.64878625 = fieldWeight in 1234, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=1234)
        0.24 = coord(6/25)
    
  2. Goren-Bar, D.; Kuflik, T.: Supporting user-subjective categorization with self-organizing maps and learning vector quantization (2005) 0.14
    0.13632669 = sum of:
      0.13632669 = product of:
        0.56802785 = sum of:
          0.05598123 = weight(abstract_txt:system's in 3325) [ClassicSimilarity], result of:
            0.05598123 = score(doc=3325,freq=1.0), product of:
              0.12683088 = queryWeight, product of:
                1.2643917 = boost
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.0142038455 = queryNorm
              0.44138488 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.020223498 = weight(abstract_txt:classification in 3325) [ClassicSimilarity], result of:
            0.020223498 = score(doc=3325,freq=1.0), product of:
              0.081054576 = queryWeight, product of:
                1.4294629 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0142038455 = queryNorm
              0.2495047 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.06876998 = weight(abstract_txt:mail in 3325) [ClassicSimilarity], result of:
            0.06876998 = score(doc=3325,freq=1.0), product of:
              0.18329021 = queryWeight, product of:
                2.1495807 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0142038455 = queryNorm
              0.3751972 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.13202566 = weight(abstract_txt:manually in 3325) [ClassicSimilarity], result of:
            0.13202566 = score(doc=3325,freq=2.0), product of:
              0.22471683 = queryWeight, product of:
                2.3801367 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.0142038455 = queryNorm
              0.5875201 = fieldWeight in 3325, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.13374819 = weight(abstract_txt:automatic in 3325) [ClassicSimilarity], result of:
            0.13374819 = score(doc=3325,freq=4.0), product of:
              0.2059408 = queryWeight, product of:
                2.7906215 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0142038455 = queryNorm
              0.6494497 = fieldWeight in 3325, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.15727931 = weight(abstract_txt:messages in 3325) [ClassicSimilarity], result of:
            0.15727931 = score(doc=3325,freq=1.0), product of:
              0.3642097 = queryWeight, product of:
                3.7111244 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0142038455 = queryNorm
              0.43183723 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
        0.24 = coord(6/25)
    
  3. Golub, K.: Automatic subject indexing of text (2019) 0.12
    0.11805818 = sum of:
      0.11805818 = product of:
        0.42163637 = sum of:
          0.031409886 = weight(abstract_txt:classes in 5268) [ClassicSimilarity], result of:
            0.031409886 = score(doc=5268,freq=1.0), product of:
              0.0862795 = queryWeight, product of:
                1.0428526 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0142038455 = queryNorm
              0.3640481 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.03515983 = weight(abstract_txt:matching in 5268) [ClassicSimilarity], result of:
            0.03515983 = score(doc=5268,freq=1.0), product of:
              0.09301676 = queryWeight, product of:
                1.0828037 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0142038455 = queryNorm
              0.37799457 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.013205721 = weight(abstract_txt:between in 5268) [ClassicSimilarity], result of:
            0.013205721 = score(doc=5268,freq=1.0), product of:
              0.06100725 = queryWeight, product of:
                1.2401518 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0142038455 = queryNorm
              0.21646151 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.035028126 = weight(abstract_txt:classification in 5268) [ClassicSimilarity], result of:
            0.035028126 = score(doc=5268,freq=3.0), product of:
              0.081054576 = queryWeight, product of:
                1.4294629 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0142038455 = queryNorm
              0.4321548 = fieldWeight in 5268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.07358305 = weight(abstract_txt:assigned in 5268) [ClassicSimilarity], result of:
            0.07358305 = score(doc=5268,freq=1.0), product of:
              0.19174552 = queryWeight, product of:
                2.1986027 = boost
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0142038455 = queryNorm
              0.3837537 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.11742044 = weight(abstract_txt:string in 5268) [ClassicSimilarity], result of:
            0.11742044 = score(doc=5268,freq=1.0), product of:
              0.2618399 = queryWeight, product of:
                2.5692244 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0142038455 = queryNorm
              0.44844365 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.115829326 = weight(abstract_txt:automatic in 5268) [ClassicSimilarity], result of:
            0.115829326 = score(doc=5268,freq=3.0), product of:
              0.2059408 = queryWeight, product of:
                2.7906215 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0142038455 = queryNorm
              0.5624399 = fieldWeight in 5268, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
        0.28 = coord(7/25)
    
  4. Cortese, J.; Lustria, M.L.A.: Can tailoring increase elaboration of health messages delivered via an adaptive educational site on adolescent sexual health and decision making? (2012) 0.12
    0.11539856 = sum of:
      0.11539856 = product of:
        0.72124106 = sum of:
          0.036946654 = weight(abstract_txt:total in 371) [ClassicSimilarity], result of:
            0.036946654 = score(doc=371,freq=1.0), product of:
              0.08285272 = queryWeight, product of:
                1.0219332 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0142038455 = queryNorm
              0.4459317 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=371)
          0.09197881 = weight(abstract_txt:assigned in 371) [ClassicSimilarity], result of:
            0.09197881 = score(doc=371,freq=1.0), product of:
              0.19174552 = queryWeight, product of:
                2.1986027 = boost
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0142038455 = queryNorm
              0.4796921 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.078125 = fieldNorm(doc=371)
          0.2780332 = weight(abstract_txt:messages in 371) [ClassicSimilarity], result of:
            0.2780332 = score(doc=371,freq=2.0), product of:
              0.3642097 = queryWeight, product of:
                3.7111244 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0142038455 = queryNorm
              0.7633876 = fieldWeight in 371, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.078125 = fieldNorm(doc=371)
          0.3142824 = weight(abstract_txt:message in 371) [ClassicSimilarity], result of:
            0.3142824 = score(doc=371,freq=1.0), product of:
              0.54805404 = queryWeight, product of:
                5.256671 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0142038455 = queryNorm
              0.57345146 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=371)
        0.16 = coord(4/25)
    
  5. Sebastiani, F.: Classification of text, automatic (2006) 0.11
    0.112864934 = sum of:
      0.112864934 = product of:
        0.47027057 = sum of:
          0.05931602 = weight(abstract_txt:automated in 5003) [ClassicSimilarity], result of:
            0.05931602 = score(doc=5003,freq=2.0), product of:
              0.07984368 = queryWeight, product of:
                1.0032043 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0142038455 = queryNorm
              0.7429019 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.047114827 = weight(abstract_txt:classes in 5003) [ClassicSimilarity], result of:
            0.047114827 = score(doc=5003,freq=1.0), product of:
              0.0862795 = queryWeight, product of:
                1.0428526 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0142038455 = queryNorm
              0.5460721 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.030335248 = weight(abstract_txt:classification in 5003) [ClassicSimilarity], result of:
            0.030335248 = score(doc=5003,freq=1.0), product of:
              0.081054576 = queryWeight, product of:
                1.4294629 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0142038455 = queryNorm
              0.37425706 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.13003837 = weight(abstract_txt:predefined in 5003) [ClassicSimilarity], result of:
            0.13003837 = score(doc=5003,freq=1.0), product of:
              0.1697658 = queryWeight, product of:
                1.4628313 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.0142038455 = queryNorm
              0.76598686 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.103154965 = weight(abstract_txt:mail in 5003) [ClassicSimilarity], result of:
            0.103154965 = score(doc=5003,freq=1.0), product of:
              0.18329021 = queryWeight, product of:
                2.1495807 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0142038455 = queryNorm
              0.5627958 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.100311145 = weight(abstract_txt:automatic in 5003) [ClassicSimilarity], result of:
            0.100311145 = score(doc=5003,freq=1.0), product of:
              0.2059408 = queryWeight, product of:
                2.7906215 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0142038455 = queryNorm
              0.48708728 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
        0.24 = coord(6/25)