Document (#38051)

Author
Zilberman, P.
Katz, G.
Shabtai, A.
Elovici, Y.
Title
Analyzing group E-mail exchange to detect data leakage
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.9, S.1768-1779
Year
2013
Abstract
Today's organizations spend a great deal of time and effort on e-mail leakage prevention. However, there are still no satisfactory solutions; addressing mistakes are not detected and in some cases correct recipients are wrongly marked as potential mistakes. In this article we present a new approach for preventing e-mail addressing mistakes in organizations. The approach is based on an analysis of e-mail exchanges among members of an organization and the identification of groups based on common topics. When a new e-mail is about to be sent, each recipient is analyzed. A recipient is approved if the e-mail's content belongs to at least one common topic to both the sender and the recipient. This can be applied even if the sender and recipient have never communicated directly before. The new approach was evaluated using the Enron e-mail data set and was compared with a well known method for the detection of e-mail addressing mistakes. The results show that the proposed approach is capable of detecting 87% of nonlegitimate recipients while incorrectly classifying only 0.5% of the legitimate recipients. These results outperform previous work, which reports a detection rate of 82% without reference to the false positive rate.

Similar documents (author)

  1. Katz, M.: Multimedia: the future of information delivery to homes and business (1993) 5.54
    5.5426593 = sum of:
      5.5426593 = weight(author_txt:katz in 6645) [ClassicSimilarity], result of:
        5.5426593 = fieldWeight in 6645, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.868255 = idf(docFreq=16, maxDocs=44421)
          0.625 = fieldNorm(doc=6645)
    
  2. Katz, B.: Community college reference services : a working guide for and by librarians (1992) 5.54
    5.5426593 = sum of:
      5.5426593 = weight(author_txt:katz in 729) [ClassicSimilarity], result of:
        5.5426593 = fieldWeight in 729, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.868255 = idf(docFreq=16, maxDocs=44421)
          0.625 = fieldNorm(doc=729)
    
  3. Katz, J.S.: Bibliometric standards : personal experience and lessons learned (1996) 5.54
    5.5426593 = sum of:
      5.5426593 = weight(author_txt:katz in 5126) [ClassicSimilarity], result of:
        5.5426593 = fieldWeight in 5126, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.868255 = idf(docFreq=16, maxDocs=44421)
          0.625 = fieldNorm(doc=5126)
    
  4. Katz, W.A.: Introduction to reference work (1997) 5.54
    5.5426593 = sum of:
      5.5426593 = weight(author_txt:katz in 2188) [ClassicSimilarity], result of:
        5.5426593 = fieldWeight in 2188, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.868255 = idf(docFreq=16, maxDocs=44421)
          0.625 = fieldNorm(doc=2188)
    
  5. Katz, W.A.: Introduction to reference work : Vol.1: Basic information sources; vol.2: Reference services and reference processes (1992) 5.54
    5.5426593 = sum of:
      5.5426593 = weight(author_txt:katz in 4364) [ClassicSimilarity], result of:
        5.5426593 = fieldWeight in 4364, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.868255 = idf(docFreq=16, maxDocs=44421)
          0.625 = fieldNorm(doc=4364)
    

Similar documents (content)

  1. Pera, M.S.; Ng, Y.-K.: SpamED : a spam E-mail detection approach based on phrase similarity (2009) 0.14
    0.13836162 = sum of:
      0.13836162 = product of:
        0.69180804 = sum of:
          0.07313513 = weight(abstract_txt:false in 3721) [ClassicSimilarity], result of:
            0.07313513 = score(doc=3721,freq=2.0), product of:
              0.0870163 = queryWeight, product of:
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.011438793 = queryNorm
              0.8404762 = fieldWeight in 3721, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.078125 = fieldNorm(doc=3721)
          0.05400181 = weight(abstract_txt:rate in 3721) [ClassicSimilarity], result of:
            0.05400181 = score(doc=3721,freq=1.0), product of:
              0.11284321 = queryWeight, product of:
                1.6104691 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.011438793 = queryNorm
              0.47855613 = fieldWeight in 3721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.078125 = fieldNorm(doc=3721)
          0.103293136 = weight(abstract_txt:detection in 3721) [ClassicSimilarity], result of:
            0.103293136 = score(doc=3721,freq=2.0), product of:
              0.13800904 = queryWeight, product of:
                1.7810186 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.011438793 = queryNorm
              0.74845195 = fieldWeight in 3721, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.078125 = fieldNorm(doc=3721)
          0.02460498 = weight(abstract_txt:approach in 3721) [ClassicSimilarity], result of:
            0.02460498 = score(doc=3721,freq=1.0), product of:
              0.08418381 = queryWeight, product of:
                1.9671794 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011438793 = queryNorm
              0.29227686 = fieldWeight in 3721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=3721)
          0.43677297 = weight(abstract_txt:mail in 3721) [ClassicSimilarity], result of:
            0.43677297 = score(doc=3721,freq=6.0), product of:
              0.37990877 = queryWeight, product of:
                5.528259 = boost
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.011438793 = queryNorm
              1.1496786 = fieldWeight in 3721, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.0077353 = idf(docFreq=296, maxDocs=44421)
                0.078125 = fieldNorm(doc=3721)
        0.2 = coord(5/25)
    
  2. Cyr, S.; Choo, C.W.: ¬The individual and social dynamics of knowledge sharing : an exploratory study (2010) 0.07
    0.06564678 = sum of:
      0.06564678 = product of:
        0.54705656 = sum of:
          0.017223487 = weight(abstract_txt:approach in 606) [ClassicSimilarity], result of:
            0.017223487 = score(doc=606,freq=1.0), product of:
              0.08418381 = queryWeight, product of:
                1.9671794 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011438793 = queryNorm
              0.20459381 = fieldWeight in 606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=606)
          0.18361168 = weight(abstract_txt:recipients in 606) [ClassicSimilarity], result of:
            0.18361168 = score(doc=606,freq=1.0), product of:
              0.37048316 = queryWeight, product of:
                3.5739176 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.49560058 = fieldWeight in 606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=606)
          0.34622142 = weight(abstract_txt:recipient in 606) [ClassicSimilarity], result of:
            0.34622142 = score(doc=606,freq=2.0), product of:
              0.49397752 = queryWeight, product of:
                4.7652235 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.700885 = fieldWeight in 606, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=606)
        0.12 = coord(3/25)
    
  3. MacFarlane, A.; Missaoui, S.; Makri, S.; Gutierrez Lopez, M.: Sender vs. recipient-orientated information systems revisited (2022) 0.06
    0.060522784 = sum of:
      0.060522784 = product of:
        0.3782674 = sum of:
          0.020878017 = weight(abstract_txt:approach in 1608) [ClassicSimilarity], result of:
            0.020878017 = score(doc=1608,freq=2.0), product of:
              0.08418381 = queryWeight, product of:
                1.9671794 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011438793 = queryNorm
              0.24800514 = fieldWeight in 1608, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.046875 = fieldNorm(doc=1608)
          0.08264045 = weight(abstract_txt:sender in 1608) [ClassicSimilarity], result of:
            0.08264045 = score(doc=1608,freq=1.0), product of:
              0.21065131 = queryWeight, product of:
                2.2003753 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.011438793 = queryNorm
              0.3923092 = fieldWeight in 1608, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.046875 = fieldNorm(doc=1608)
          0.06490706 = weight(abstract_txt:addressing in 1608) [ClassicSimilarity], result of:
            0.06490706 = score(doc=1608,freq=1.0), product of:
              0.20527092 = queryWeight, product of:
                2.6602597 = boost
                6.7456408 = idf(docFreq=141, maxDocs=44421)
                0.011438793 = queryNorm
              0.31620193 = fieldWeight in 1608, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7456408 = idf(docFreq=141, maxDocs=44421)
                0.046875 = fieldNorm(doc=1608)
          0.20984189 = weight(abstract_txt:recipient in 1608) [ClassicSimilarity], result of:
            0.20984189 = score(doc=1608,freq=1.0), product of:
              0.49397752 = queryWeight, product of:
                4.7652235 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.4248005 = fieldWeight in 1608, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.046875 = fieldNorm(doc=1608)
        0.16 = coord(4/25)
    
  4. Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.06
    0.05949575 = sum of:
      0.05949575 = product of:
        0.49579793 = sum of:
          0.1058215 = weight(abstract_txt:rate in 1238) [ClassicSimilarity], result of:
            0.1058215 = score(doc=1238,freq=6.0), product of:
              0.11284321 = queryWeight, product of:
                1.6104691 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.011438793 = queryNorm
              0.93777466 = fieldWeight in 1238, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.110187255 = weight(abstract_txt:sender in 1238) [ClassicSimilarity], result of:
            0.110187255 = score(doc=1238,freq=1.0), product of:
              0.21065131 = queryWeight, product of:
                2.2003753 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.011438793 = queryNorm
              0.5230789 = fieldWeight in 1238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.27978918 = weight(abstract_txt:recipient in 1238) [ClassicSimilarity], result of:
            0.27978918 = score(doc=1238,freq=1.0), product of:
              0.49397752 = queryWeight, product of:
                4.7652235 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.56640065 = fieldWeight in 1238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
        0.12 = coord(3/25)
    
  5. Foster, J.: On the interpretative authority of information systems (1999) 0.06
    0.058266126 = sum of:
      0.058266126 = product of:
        0.48555106 = sum of:
          0.057123844 = weight(abstract_txt:approach in 1274) [ClassicSimilarity], result of:
            0.057123844 = score(doc=1274,freq=11.0), product of:
              0.08418381 = queryWeight, product of:
                1.9671794 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.011438793 = queryNorm
              0.6785609 = fieldWeight in 1274, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1274)
          0.18361168 = weight(abstract_txt:recipients in 1274) [ClassicSimilarity], result of:
            0.18361168 = score(doc=1274,freq=1.0), product of:
              0.37048316 = queryWeight, product of:
                3.5739176 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.49560058 = fieldWeight in 1274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1274)
          0.24481554 = weight(abstract_txt:recipient in 1274) [ClassicSimilarity], result of:
            0.24481554 = score(doc=1274,freq=1.0), product of:
              0.49397752 = queryWeight, product of:
                4.7652235 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.011438793 = queryNorm
              0.49560058 = fieldWeight in 1274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1274)
        0.12 = coord(3/25)