Document (#37968)

Author
Ma, Z.
Sun, A.
Cong, G.
Title
On predicting the popularity of newly emerging hashtags in Twitter
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410
Year
2013
Abstract
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Theme
Automatisches Klassifizieren
Data Mining
Object
Twitter

Similar documents (content)

  1. Çelebi, A.; Özgür, A.: Segmenting hashtags and analyzing their grammatical structure (2018) 0.51
    0.5105737 = sum of:
      0.5105737 = product of:
        1.8234775 = sum of:
          0.02227111 = weight(abstract_txt:task in 221) [ClassicSimilarity], result of:
            0.02227111 = score(doc=221,freq=1.0), product of:
              0.07263658 = queryWeight, product of:
                1.2044414 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0122931525 = queryNorm
              0.3066101 = fieldWeight in 221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.06383779 = weight(abstract_txt:million in 221) [ClassicSimilarity], result of:
            0.06383779 = score(doc=221,freq=2.0), product of:
              0.11633294 = queryWeight, product of:
                1.524261 = boost
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0122931525 = queryNorm
              0.54875076 = fieldWeight in 221, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.081970155 = weight(abstract_txt:tweets in 221) [ClassicSimilarity], result of:
            0.081970155 = score(doc=221,freq=1.0), product of:
              0.17315352 = queryWeight, product of:
                1.8596176 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0122931525 = queryNorm
              0.47339582 = fieldWeight in 221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.58742845 = weight(abstract_txt:hashtag in 221) [ClassicSimilarity], result of:
            0.58742845 = score(doc=221,freq=5.0), product of:
              0.4308617 = queryWeight, product of:
                3.5927134 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0122931525 = queryNorm
              1.3633806 = fieldWeight in 221, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.087408066 = weight(abstract_txt:features in 221) [ClassicSimilarity], result of:
            0.087408066 = score(doc=221,freq=2.0), product of:
              0.21779163 = queryWeight, product of:
                3.9017785 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0122931525 = queryNorm
              0.40133804 = fieldWeight in 221, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.7941978 = weight(abstract_txt:hashtags in 221) [ClassicSimilarity], result of:
            0.7941978 = score(doc=221,freq=8.0), product of:
              0.49574685 = queryWeight, product of:
                4.449928 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0122931525 = queryNorm
              1.6020229 = fieldWeight in 221, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
          0.18636422 = weight(abstract_txt:twitter in 221) [ClassicSimilarity], result of:
            0.18636422 = score(doc=221,freq=1.0), product of:
              0.43179366 = queryWeight, product of:
                5.0863557 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0122931525 = queryNorm
              0.4316048 = fieldWeight in 221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0625 = fieldNorm(doc=221)
        0.28 = coord(7/25)
    
  2. Chang, H.-C.; Iyer, I.: Trends in Twitter hashtag applications : design features for value-added dimensions to future library catalogues (2012) 0.47
    0.47115335 = sum of:
      0.47115335 = product of:
        1.963139 = sum of:
          0.017216278 = weight(abstract_txt:content in 574) [ClassicSimilarity], result of:
            0.017216278 = score(doc=574,freq=1.0), product of:
              0.052724645 = queryWeight, product of:
                1.0261594 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0122931525 = queryNorm
              0.3265319 = fieldWeight in 574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
          0.14490412 = weight(abstract_txt:tweets in 574) [ClassicSimilarity], result of:
            0.14490412 = score(doc=574,freq=2.0), product of:
              0.17315352 = queryWeight, product of:
                1.8596176 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0122931525 = queryNorm
              0.83685344 = fieldWeight in 574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
          0.65676504 = weight(abstract_txt:hashtag in 574) [ClassicSimilarity], result of:
            0.65676504 = score(doc=574,freq=4.0), product of:
              0.4308617 = queryWeight, product of:
                3.5927134 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0122931525 = queryNorm
              1.5243058 = fieldWeight in 574, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
          0.07725855 = weight(abstract_txt:features in 574) [ClassicSimilarity], result of:
            0.07725855 = score(doc=574,freq=1.0), product of:
              0.21779163 = queryWeight, product of:
                3.9017785 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0122931525 = queryNorm
              0.3547361 = fieldWeight in 574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
          0.49637365 = weight(abstract_txt:hashtags in 574) [ClassicSimilarity], result of:
            0.49637365 = score(doc=574,freq=2.0), product of:
              0.49574685 = queryWeight, product of:
                4.449928 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0122931525 = queryNorm
              1.0012643 = fieldWeight in 574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
          0.57062155 = weight(abstract_txt:twitter in 574) [ClassicSimilarity], result of:
            0.57062155 = score(doc=574,freq=6.0), product of:
              0.43179366 = queryWeight, product of:
                5.0863557 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0122931525 = queryNorm
              1.3215144 = fieldWeight in 574, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.078125 = fieldNorm(doc=574)
        0.24 = coord(6/25)
    
  3. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.41
    0.4070425 = sum of:
      0.4070425 = product of:
        1.6960105 = sum of:
          0.008917611 = weight(abstract_txt:from in 3338) [ClassicSimilarity], result of:
            0.008917611 = score(doc=3338,freq=1.0), product of:
              0.034471706 = queryWeight, product of:
                1.0162135 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0122931525 = queryNorm
              0.25869364 = fieldWeight in 3338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
          0.037066314 = weight(abstract_txt:topics in 3338) [ClassicSimilarity], result of:
            0.037066314 = score(doc=3338,freq=1.0), product of:
              0.07784898 = queryWeight, product of:
                1.2469081 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.0122931525 = queryNorm
              0.47613102 = fieldWeight in 3338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
          0.394059 = weight(abstract_txt:hashtag in 3338) [ClassicSimilarity], result of:
            0.394059 = score(doc=3338,freq=1.0), product of:
              0.4308617 = queryWeight, product of:
                3.5927134 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0122931525 = queryNorm
              0.91458344 = fieldWeight in 3338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
          0.1311121 = weight(abstract_txt:features in 3338) [ClassicSimilarity], result of:
            0.1311121 = score(doc=3338,freq=2.0), product of:
              0.21779163 = queryWeight, product of:
                3.9017785 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0122931525 = queryNorm
              0.60200703 = fieldWeight in 3338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
          0.7295173 = weight(abstract_txt:hashtags in 3338) [ClassicSimilarity], result of:
            0.7295173 = score(doc=3338,freq=3.0), product of:
              0.49574685 = queryWeight, product of:
                4.449928 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0122931525 = queryNorm
              1.471552 = fieldWeight in 3338, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
          0.3953382 = weight(abstract_txt:twitter in 3338) [ClassicSimilarity], result of:
            0.3953382 = score(doc=3338,freq=2.0), product of:
              0.43179366 = queryWeight, product of:
                5.0863557 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0122931525 = queryNorm
              0.91557205 = fieldWeight in 3338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.09375 = fieldNorm(doc=3338)
        0.24 = coord(6/25)
    
  4. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.34
    0.33909538 = sum of:
      0.33909538 = product of:
        1.4128975 = sum of:
          0.013773023 = weight(abstract_txt:content in 4683) [ClassicSimilarity], result of:
            0.013773023 = score(doc=4683,freq=1.0), product of:
              0.052724645 = queryWeight, product of:
                1.0261594 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0122931525 = queryNorm
              0.26122552 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.045140132 = weight(abstract_txt:million in 4683) [ClassicSimilarity], result of:
            0.045140132 = score(doc=4683,freq=1.0), product of:
              0.11633294 = queryWeight, product of:
                1.524261 = boost
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0122931525 = queryNorm
              0.38802537 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.21687263 = weight(abstract_txt:tweets in 4683) [ClassicSimilarity], result of:
            0.21687263 = score(doc=4683,freq=7.0), product of:
              0.17315352 = queryWeight, product of:
                1.8596176 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0122931525 = queryNorm
              1.2524875 = fieldWeight in 4683, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.09252006 = weight(abstract_txt:popularity in 4683) [ClassicSimilarity], result of:
            0.09252006 = score(doc=4683,freq=1.0), product of:
              0.214873 = queryWeight, product of:
                2.5371406 = boost
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.0122931525 = queryNorm
              0.4305802 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.62786853 = weight(abstract_txt:hashtags in 4683) [ClassicSimilarity], result of:
            0.62786853 = score(doc=4683,freq=5.0), product of:
              0.49574685 = queryWeight, product of:
                4.449928 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0122931525 = queryNorm
              1.2665104 = fieldWeight in 4683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
          0.41672304 = weight(abstract_txt:twitter in 4683) [ClassicSimilarity], result of:
            0.41672304 = score(doc=4683,freq=5.0), product of:
              0.43179366 = queryWeight, product of:
                5.0863557 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0122931525 = queryNorm
              0.96509767 = fieldWeight in 4683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0625 = fieldNorm(doc=4683)
        0.24 = coord(6/25)
    
  5. Yi, K.; Choi, N.; Kim, Y.S.: ¬A content analysis of Twitter hyperlinks and their application in web resource indexing (2016) 0.21
    0.21168719 = sum of:
      0.21168719 = product of:
        1.0584359 = sum of:
          0.0059450744 = weight(abstract_txt:from in 4075) [ClassicSimilarity], result of:
            0.0059450744 = score(doc=4075,freq=1.0), product of:
              0.034471706 = queryWeight, product of:
                1.0162135 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0122931525 = queryNorm
              0.17246243 = fieldWeight in 4075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=4075)
          0.045140132 = weight(abstract_txt:million in 4075) [ClassicSimilarity], result of:
            0.045140132 = score(doc=4075,freq=1.0), product of:
              0.11633294 = queryWeight, product of:
                1.524261 = boost
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0122931525 = queryNorm
              0.38802537 = fieldWeight in 4075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.0625 = fieldNorm(doc=4075)
          0.1159233 = weight(abstract_txt:tweets in 4075) [ClassicSimilarity], result of:
            0.1159233 = score(doc=4075,freq=2.0), product of:
              0.17315352 = queryWeight, product of:
                1.8596176 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0122931525 = queryNorm
              0.66948277 = fieldWeight in 4075, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=4075)
          0.62786853 = weight(abstract_txt:hashtags in 4075) [ClassicSimilarity], result of:
            0.62786853 = score(doc=4075,freq=5.0), product of:
              0.49574685 = queryWeight, product of:
                4.449928 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0122931525 = queryNorm
              1.2665104 = fieldWeight in 4075, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=4075)
          0.2635588 = weight(abstract_txt:twitter in 4075) [ClassicSimilarity], result of:
            0.2635588 = score(doc=4075,freq=2.0), product of:
              0.43179366 = queryWeight, product of:
                5.0863557 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0122931525 = queryNorm
              0.61038136 = fieldWeight in 4075, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0625 = fieldNorm(doc=4075)
        0.2 = coord(5/25)