Document (#34722)

Author
Pera, M.S.
Ng, Y.-K.
Title
SpamED : a spam E-mail detection approach based on phrase similarity
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.2, S.393-409
Year
2009
Abstract
E-mail messages are unquestionably one of the most popular communication media these days. Not only are they fast and reliable but also free in general. Unfortunately, a significant number of e-mail messages received by e-mail users on a daily basis are spam. This fact is annoying since spam messages translate into a waste of the user's time in reviewing and deleting them. In addition, spam messages consume resources such as storage, bandwidth, and computer-processing time. Many attempts have been made in the past to eradicate spam; however, none has proven highly effective. In this article, we propose a spam e-mail detection approach, called SpamED, which uses the similarity of phrases in messages to detect spam. Conducted experiments not only verify that SpamED using trigrams in e-mail messages is capable of minimizing false positives and false negatives in spam detection but it also outperforms a number of existing e-mail filtering approaches with a 96% accuracy rate.

Similar documents (author)

  1. Pera, M. Soledad => Soledad Pera, M.: 5.04
    5.0403857 = sum of:
      5.0403857 = weight(author_txt:pera in 3875) [ClassicSimilarity], result of:
        5.0403857 = fieldWeight in 3875, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=3875)
    
  2. Azpiazu, I.M.; Soledad Pera, M.: Is cross-lingual readability assessment possible? (2020) 4.16
    4.1581063 = sum of:
      4.1581063 = weight(author_txt:pera in 868) [ClassicSimilarity], result of:
        4.1581063 = fieldWeight in 868, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.4375 = fieldNorm(doc=868)
    
  3. Pera, M.S.; Lund, W.; Ng, Y.-K.: ¬A sophisticated library search strategy using folksonomies and similarity matching (2009) 3.56
    3.5640912 = sum of:
      3.5640912 = weight(author_txt:pera in 3939) [ClassicSimilarity], result of:
        3.5640912 = fieldWeight in 3939, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=3939)
    
  4. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 3.56
    3.5640912 = sum of:
      3.5640912 = weight(author_txt:pera in 3772) [ClassicSimilarity], result of:
        3.5640912 = fieldWeight in 3772, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=3772)
    
  5. Soledad Pera, M.; Ng, Y.-K.: Recommending books to be exchanged online in the absence of wish lists (2018) 3.56
    3.5640912 = sum of:
      3.5640912 = weight(author_txt:pera in 182) [ClassicSimilarity], result of:
        3.5640912 = fieldWeight in 182, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=182)
    

Similar documents (content)

  1. Ruano-Ordás, D.; Fdez-Riverola, F.; Méndez, J.R.: Using evolutionary computation for discovering spam patterns from e-mail samples (2018) 0.17
    0.17274965 = sum of:
      0.17274965 = product of:
        1.4395804 = sum of:
          0.282087 = weight(abstract_txt:messages in 88) [ClassicSimilarity], result of:
            0.282087 = score(doc=88,freq=2.0), product of:
              0.37016016 = queryWeight, product of:
                5.8236217 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.009215272 = queryNorm
              0.7620674 = fieldWeight in 88, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.078125 = fieldNorm(doc=88)
          0.15377373 = weight(abstract_txt:mail in 88) [ClassicSimilarity], result of:
            0.15377373 = score(doc=88,freq=1.0), product of:
              0.32762823 = queryWeight, product of:
                5.9178286 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.009215272 = queryNorm
              0.46935433 = fieldWeight in 88, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.078125 = fieldNorm(doc=88)
          1.0037197 = weight(abstract_txt:spam in 88) [ClassicSimilarity], result of:
            1.0037197 = score(doc=88,freq=4.0), product of:
              0.75366586 = queryWeight, product of:
                9.595268 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009215272 = queryNorm
              1.3317834 = fieldWeight in 88, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=88)
        0.12 = coord(3/25)
    
  2. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.13
    0.12856144 = sum of:
      0.12856144 = product of:
        1.0713453 = sum of:
          0.012319181 = weight(abstract_txt:only in 4683) [ClassicSimilarity], result of:
            0.012319181 = score(doc=4683,freq=1.0), product of:
              0.046533436 = queryWeight, product of:
                1.1921208 = boost
                4.235812 = idf(docFreq=1746, maxDocs=44421)
                0.009215272 = queryNorm
              0.26473826 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.235812 = idf(docFreq=1746, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.07558571 = weight(abstract_txt:detection in 4683) [ClassicSimilarity], result of:
            0.07558571 = score(doc=4683,freq=1.0), product of:
              0.17852572 = queryWeight, product of:
                2.8597872 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.009215272 = queryNorm
              0.42338836 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.98344046 = weight(abstract_txt:spam in 4683) [ClassicSimilarity], result of:
            0.98344046 = score(doc=4683,freq=6.0), product of:
              0.75366586 = queryWeight, product of:
                9.595268 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009215272 = queryNorm
              1.304876 = fieldWeight in 4683, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
        0.12 = coord(3/25)
    
  3. Zilberman, P.; Katz, G.; Shabtai, A.; Elovici, Y.: Analyzing group E-mail exchange to detect data leakage (2013) 0.12
    0.12490886 = sum of:
      0.12490886 = product of:
        0.5204536 = sum of:
          0.016975204 = weight(abstract_txt:approach in 2050) [ClassicSimilarity], result of:
            0.016975204 = score(doc=2050,freq=4.0), product of:
              0.036299493 = queryWeight, product of:
                1.0529021 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.009215272 = queryNorm
              0.467643 = fieldWeight in 2050, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
          0.011574694 = weight(abstract_txt:time in 2050) [ClassicSimilarity], result of:
            0.011574694 = score(doc=2050,freq=1.0), product of:
              0.044639252 = queryWeight, product of:
                1.1676055 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.009215272 = queryNorm
              0.2592941 = fieldWeight in 2050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
          0.012319181 = weight(abstract_txt:only in 2050) [ClassicSimilarity], result of:
            0.012319181 = score(doc=2050,freq=1.0), product of:
              0.046533436 = queryWeight, product of:
                1.1921208 = boost
                4.235812 = idf(docFreq=1746, maxDocs=44421)
                0.009215272 = queryNorm
              0.26473826 = fieldWeight in 2050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.235812 = idf(docFreq=1746, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
          0.071356416 = weight(abstract_txt:false in 2050) [ClassicSimilarity], result of:
            0.071356416 = score(doc=2050,freq=1.0), product of:
              0.15008338 = queryWeight, product of:
                2.1409376 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.009215272 = queryNorm
              0.47544518 = fieldWeight in 2050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
          0.10689434 = weight(abstract_txt:detection in 2050) [ClassicSimilarity], result of:
            0.10689434 = score(doc=2050,freq=2.0), product of:
              0.17852572 = queryWeight, product of:
                2.8597872 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.009215272 = queryNorm
              0.59876156 = fieldWeight in 2050, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
          0.30133373 = weight(abstract_txt:mail in 2050) [ClassicSimilarity], result of:
            0.30133373 = score(doc=2050,freq=6.0), product of:
              0.32762823 = queryWeight, product of:
                5.9178286 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.009215272 = queryNorm
              0.9197429 = fieldWeight in 2050, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0625 = fieldNorm(doc=2050)
        0.24 = coord(6/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.10
    0.095938995 = sum of:
      0.095938995 = product of:
        0.79949164 = sum of:
          0.012731402 = weight(abstract_txt:approach in 3) [ClassicSimilarity], result of:
            0.012731402 = score(doc=3,freq=1.0), product of:
              0.036299493 = queryWeight, product of:
                1.0529021 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.009215272 = queryNorm
              0.35073224 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.18452846 = weight(abstract_txt:mail in 3) [ClassicSimilarity], result of:
            0.18452846 = score(doc=3,freq=1.0), product of:
              0.32762823 = queryWeight, product of:
                5.9178286 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.009215272 = queryNorm
              0.56322515 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.6022318 = weight(abstract_txt:spam in 3) [ClassicSimilarity], result of:
            0.6022318 = score(doc=3,freq=1.0), product of:
              0.75366586 = queryWeight, product of:
                9.595268 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009215272 = queryNorm
              0.79907 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
        0.12 = coord(3/25)
    
  5. Goodman, J.; Heckerman, D.; Rounthwaite, R.: Schutzwälle gegen Spam (2005) 0.09
    0.08810187 = sum of:
      0.08810187 = product of:
        1.1012734 = sum of:
          0.1076416 = weight(abstract_txt:mail in 4696) [ClassicSimilarity], result of:
            0.1076416 = score(doc=4696,freq=1.0), product of:
              0.32762823 = queryWeight, product of:
                5.9178286 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.009215272 = queryNorm
              0.328548 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4696)
          0.99363184 = weight(abstract_txt:spam in 4696) [ClassicSimilarity], result of:
            0.99363184 = score(doc=4696,freq=8.0), product of:
              0.75366586 = queryWeight, product of:
                9.595268 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.009215272 = queryNorm
              1.3183984 = fieldWeight in 4696, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4696)
        0.08 = coord(2/25)