Document (#34100)

Author
Kucukyilmaz, T.
Cambazoglu, B.B.
Aykanat, C.
Can, F.
Title
Chat mining : Predicting user and message attributes in computer-mediated communication
Source
Information processing and management. 44(2008) no.4, S.1448-1466
Year
2008
Abstract
The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors' writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7% accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed.

Similar documents (author)

  1. Cambazoglu, B. Barla => Barla Cambazoglu, B.: 5.17
    5.1736655 = sum of:
      5.1736655 = weight(author_txt:cambazoglu in 2505) [ClassicSimilarity], result of:
        5.1736655 = fieldWeight in 2505, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.375 = fieldNorm(doc=2505)
    
  2. Arapakis, I.; Cambazoglu, B.B.; Lalmas, M.: On the feasibility of predicting popular news at cold start (2017) 3.66
    3.6583338 = sum of:
      3.6583338 = weight(author_txt:cambazoglu in 4595) [ClassicSimilarity], result of:
        3.6583338 = fieldWeight in 4595, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.375 = fieldNorm(doc=4595)
    
  3. Arapakis, I.; Lalmas, M.; Cambazoglu, B.B.; MarcosM.-C.; Jose, J.M.: User engagement in online news : under the scope of sentiment, interest, affect, and gaze (2014) 3.05
    3.0486116 = sum of:
      3.0486116 = weight(author_txt:cambazoglu in 2497) [ClassicSimilarity], result of:
        3.0486116 = fieldWeight in 2497, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.3125 = fieldNorm(doc=2497)
    
  4. Kucukyilmaz, T.; Cambazoglu, B.B.; Aykanat, C.; Baeza-Yates, R.: ¬A machine learning approach for result caching in web search engines (2017) 3.05
    3.0486116 = sum of:
      3.0486116 = weight(author_txt:cambazoglu in 100) [ClassicSimilarity], result of:
        3.0486116 = fieldWeight in 100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.3125 = fieldNorm(doc=100)
    
  5. Sarigil, E.; Sengor Altingovde, I.; Blanco, R.; Barla Cambazoglu, B.; Ozcan, R.; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results (2018) 2.44
    2.4388893 = sum of:
      2.4388893 = weight(author_txt:cambazoglu in 39) [ClassicSimilarity], result of:
        2.4388893 = fieldWeight in 39, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.25 = fieldNorm(doc=39)
    

Similar documents (content)

  1. Zheng, R.; Li, J.; Chen, H.; Huang, Z.: ¬A framework for authorship identification of online messages : writing-style features and classification techniques (2006) 0.18
    0.17935279 = sum of:
      0.17935279 = product of:
        0.64054567 = sum of:
          0.019063286 = weight(abstract_txt:approach in 276) [ClassicSimilarity], result of:
            0.019063286 = score(doc=276,freq=2.0), product of:
              0.057649873 = queryWeight, product of:
                1.0795733 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014273873 = queryNorm
              0.33067352 = fieldWeight in 276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.032285653 = weight(abstract_txt:problem in 276) [ClassicSimilarity], result of:
            0.032285653 = score(doc=276,freq=2.0), product of:
              0.0819105 = queryWeight, product of:
                1.2868346 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014273873 = queryNorm
              0.3941577 = fieldWeight in 276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.036494184 = weight(abstract_txt:authors in 276) [ClassicSimilarity], result of:
            0.036494184 = score(doc=276,freq=2.0), product of:
              0.088882364 = queryWeight, product of:
                1.3404813 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.014273873 = queryNorm
              0.4105897 = fieldWeight in 276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.012453788 = weight(abstract_txt:based in 276) [ClassicSimilarity], result of:
            0.012453788 = score(doc=276,freq=1.0), product of:
              0.06260003 = queryWeight, product of:
                1.377799 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.014273873 = queryNorm
              0.1989422 = fieldWeight in 276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.014611423 = weight(abstract_txt:used in 276) [ClassicSimilarity], result of:
            0.014611423 = score(doc=276,freq=1.0), product of:
              0.06963623 = queryWeight, product of:
                1.4531692 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014273873 = queryNorm
              0.20982501 = fieldWeight in 276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.31038558 = weight(abstract_txt:messages in 276) [ClassicSimilarity], result of:
            0.31038558 = score(doc=276,freq=6.0), product of:
              0.2939392 = queryWeight, product of:
                2.9855704 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.014273873 = queryNorm
              1.0559516 = fieldWeight in 276, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
          0.21525174 = weight(abstract_txt:message in 276) [ClassicSimilarity], result of:
            0.21525174 = score(doc=276,freq=2.0), product of:
              0.33214524 = queryWeight, product of:
                3.173676 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.014273873 = queryNorm
              0.6480651 = fieldWeight in 276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=276)
        0.28 = coord(7/25)
    
  2. Miah, M.W.R.; Yearwood, J.; Kulkarni, S.: Constructing an inter-post similarity measure to differentiate the psychological stages in offensive chats (2015) 0.14
    0.14217097 = sum of:
      0.14217097 = product of:
        0.7108548 = sum of:
          0.0134797795 = weight(abstract_txt:approach in 2846) [ClassicSimilarity], result of:
            0.0134797795 = score(doc=2846,freq=1.0), product of:
              0.057649873 = queryWeight, product of:
                1.0795733 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014273873 = queryNorm
              0.2338215 = fieldWeight in 2846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=2846)
          0.012453788 = weight(abstract_txt:based in 2846) [ClassicSimilarity], result of:
            0.012453788 = score(doc=2846,freq=1.0), product of:
              0.06260003 = queryWeight, product of:
                1.377799 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.014273873 = queryNorm
              0.1989422 = fieldWeight in 2846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=2846)
          0.020663673 = weight(abstract_txt:used in 2846) [ClassicSimilarity], result of:
            0.020663673 = score(doc=2846,freq=2.0), product of:
              0.06963623 = queryWeight, product of:
                1.4531692 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014273873 = queryNorm
              0.29673737 = fieldWeight in 2846, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=2846)
          0.060527876 = weight(abstract_txt:mining in 2846) [ClassicSimilarity], result of:
            0.060527876 = score(doc=2846,freq=1.0), product of:
              0.15690862 = queryWeight, product of:
                1.7810509 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.014273873 = queryNorm
              0.3857524 = fieldWeight in 2846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=2846)
          0.6037297 = weight(abstract_txt:chat in 2846) [ClassicSimilarity], result of:
            0.6037297 = score(doc=2846,freq=4.0), product of:
              0.6216294 = queryWeight, product of:
                5.6051693 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.014273873 = queryNorm
              0.97120523 = fieldWeight in 2846, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.0625 = fieldNorm(doc=2846)
        0.2 = coord(5/25)
    
  3. Lewis, K.M.; DeGroote, S.L.: Digital reference access points : an analysis of usage (2008) 0.13
    0.12650178 = sum of:
      0.12650178 = product of:
        0.6325089 = sum of:
          0.0134797795 = weight(abstract_txt:approach in 1551) [ClassicSimilarity], result of:
            0.0134797795 = score(doc=1551,freq=1.0), product of:
              0.057649873 = queryWeight, product of:
                1.0795733 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014273873 = queryNorm
              0.2338215 = fieldWeight in 1551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=1551)
          0.020663673 = weight(abstract_txt:used in 1551) [ClassicSimilarity], result of:
            0.020663673 = score(doc=1551,freq=2.0), product of:
              0.06963623 = queryWeight, product of:
                1.4531692 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014273873 = queryNorm
              0.29673737 = fieldWeight in 1551, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=1551)
          0.019258022 = weight(abstract_txt:user in 1551) [ClassicSimilarity], result of:
            0.019258022 = score(doc=1551,freq=1.0), product of:
              0.083710775 = queryWeight, product of:
                1.5932696 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.014273873 = queryNorm
              0.23005427 = fieldWeight in 1551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0625 = fieldNorm(doc=1551)
          0.15220597 = weight(abstract_txt:message in 1551) [ClassicSimilarity], result of:
            0.15220597 = score(doc=1551,freq=1.0), product of:
              0.33214524 = queryWeight, product of:
                3.173676 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.014273873 = queryNorm
              0.45825124 = fieldWeight in 1551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=1551)
          0.4269014 = weight(abstract_txt:chat in 1551) [ClassicSimilarity], result of:
            0.4269014 = score(doc=1551,freq=2.0), product of:
              0.6216294 = queryWeight, product of:
                5.6051693 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.014273873 = queryNorm
              0.6867458 = fieldWeight in 1551, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.0625 = fieldNorm(doc=1551)
        0.2 = coord(5/25)
    
  4. Madden, A.D.: ¬A definition of information (2000) 0.10
    0.10076364 = sum of:
      0.10076364 = product of:
        0.5038182 = sum of:
          0.028536756 = weight(abstract_txt:problem in 838) [ClassicSimilarity], result of:
            0.028536756 = score(doc=838,freq=1.0), product of:
              0.0819105 = queryWeight, product of:
                1.2868346 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014273873 = queryNorm
              0.34838948 = fieldWeight in 838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.078125 = fieldNorm(doc=838)
          0.032256607 = weight(abstract_txt:authors in 838) [ClassicSimilarity], result of:
            0.032256607 = score(doc=838,freq=1.0), product of:
              0.088882364 = queryWeight, product of:
                1.3404813 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.014273873 = queryNorm
              0.36291346 = fieldWeight in 838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.078125 = fieldNorm(doc=838)
          0.015567235 = weight(abstract_txt:based in 838) [ClassicSimilarity], result of:
            0.015567235 = score(doc=838,freq=1.0), product of:
              0.06260003 = queryWeight, product of:
                1.377799 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.014273873 = queryNorm
              0.24867775 = fieldWeight in 838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=838)
          0.15839297 = weight(abstract_txt:messages in 838) [ClassicSimilarity], result of:
            0.15839297 = score(doc=838,freq=1.0), product of:
              0.2939392 = queryWeight, product of:
                2.9855704 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.014273873 = queryNorm
              0.538863 = fieldWeight in 838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.078125 = fieldNorm(doc=838)
          0.26906466 = weight(abstract_txt:message in 838) [ClassicSimilarity], result of:
            0.26906466 = score(doc=838,freq=2.0), product of:
              0.33214524 = queryWeight, product of:
                3.173676 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.014273873 = queryNorm
              0.81008136 = fieldWeight in 838, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.078125 = fieldNorm(doc=838)
        0.2 = coord(5/25)
    
  5. Chuang, K.Y.; Yang, C.C.: Informational support exchanges using different computer-mediated communication formats in a social media alcoholism community (2014) 0.10
    0.09501693 = sum of:
      0.09501693 = product of:
        0.33934617 = sum of:
          0.02071858 = weight(abstract_txt:computer in 2179) [ClassicSimilarity], result of:
            0.02071858 = score(doc=2179,freq=1.0), product of:
              0.076780304 = queryWeight, product of:
                1.2458848 = boost
                4.317478 = idf(docFreq=1609, maxDocs=44421)
                0.014273873 = queryNorm
              0.2698424 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.317478 = idf(docFreq=1609, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.031795714 = weight(abstract_txt:author in 2179) [ClassicSimilarity], result of:
            0.031795714 = score(doc=2179,freq=1.0), product of:
              0.10215404 = queryWeight, product of:
                1.4370792 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014273873 = queryNorm
              0.31125262 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.014611423 = weight(abstract_txt:used in 2179) [ClassicSimilarity], result of:
            0.014611423 = score(doc=2179,freq=1.0), product of:
              0.06963623 = queryWeight, product of:
                1.4531692 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.014273873 = queryNorm
              0.20982501 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.040224917 = weight(abstract_txt:investigate in 2179) [ClassicSimilarity], result of:
            0.040224917 = score(doc=2179,freq=1.0), product of:
              0.11949231 = queryWeight, product of:
                1.5542573 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.014273873 = queryNorm
              0.33663186 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.019258022 = weight(abstract_txt:user in 2179) [ClassicSimilarity], result of:
            0.019258022 = score(doc=2179,freq=1.0), product of:
              0.083710775 = queryWeight, product of:
                1.5932696 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.014273873 = queryNorm
              0.23005427 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.08602315 = weight(abstract_txt:mediated in 2179) [ClassicSimilarity], result of:
            0.08602315 = score(doc=2179,freq=1.0), product of:
              0.19834445 = queryWeight, product of:
                2.002456 = boost
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.014273873 = queryNorm
              0.43370587 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
          0.12671438 = weight(abstract_txt:messages in 2179) [ClassicSimilarity], result of:
            0.12671438 = score(doc=2179,freq=1.0), product of:
              0.2939392 = queryWeight, product of:
                2.9855704 = boost
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.014273873 = queryNorm
              0.4310904 = fieldWeight in 2179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8974466 = idf(docFreq=121, maxDocs=44421)
                0.0625 = fieldNorm(doc=2179)
        0.28 = coord(7/25)