Document (#40827)

Author
Bandaragoda, T.R.
Silva, D. de
Alahakoon, D.
Title
Automatic event detection in microblogs using incremental machine learning
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2394-2411
Year
2017
Abstract
The global popularity of microblogs has led to an increasing accumulation of large volumes of text data on microblogging platforms such as Twitter. These corpora are untapped resources to understand social expressions on diverse subjects. Microblog analysis aims to unlock the value of such expressions by discovering insights and events of significance hidden among swathes of text. Besides velocity; diversity of content, brevity, absence of structure and time-sensitivity are key challenges in microblog analysis. In this paper, we propose an unsupervised incremental machine learning and event detection technique to address these challenges. The proposed technique separates a microblog discussion into topics to address the key problem of diversity. It maintains a record of the evolution of each topic over time. Brevity, time-sensitivity and unstructured nature are addressed by these individual topic pathways which contribute to generate a temporal, topic-driven structure of a microblog discussion. The proposed event detection method continuously monitors these topic pathways using multiple domain-independent event indicators for events of significance. The autonomous nature of topic separation, topic pathway generation, new topic identification and event detection, appropriates the proposed technique for extensive applications in microblog analysis. We demonstrate these capabilities on tweets containing #microsoft and tweets containing #obama.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23896/full.
Theme
Internet

Similar documents (author)

  1. Silva, M.: Creating electronic environments for learning (1998) 4.69
    4.694883 = sum of:
      4.694883 = weight(author_txt:silva in 3785) [ClassicSimilarity], result of:
        4.694883 = fieldWeight in 3785, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5118127 = idf(docFreq=65, maxDocs=44421)
          0.625 = fieldNorm(doc=3785)
    
  2. Silva, A.J.: ¬Ein Netz von Erinnerungen (2018) 4.69
    4.694883 = sum of:
      4.694883 = weight(author_txt:silva in 194) [ClassicSimilarity], result of:
        4.694883 = fieldWeight in 194, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5118127 = idf(docFreq=65, maxDocs=44421)
          0.625 = fieldNorm(doc=194)
    
  3. Silva, A.J.: ¬Das Gedächtnisnetz (2018) 4.69
    4.694883 = sum of:
      4.694883 = weight(author_txt:silva in 421) [ClassicSimilarity], result of:
        4.694883 = fieldWeight in 421, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5118127 = idf(docFreq=65, maxDocs=44421)
          0.625 = fieldNorm(doc=421)
    
  4. Silva, A.M. Da -> Da Silva, A.M.: 3.98
    3.9837403 = sum of:
      3.9837403 = weight(author_txt:silva in 2167) [ClassicSimilarity], result of:
        3.9837403 = fieldWeight in 2167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.5118127 = idf(docFreq=65, maxDocs=44421)
          0.375 = fieldNorm(doc=2167)
    
  5. Lucas da Silva, D. -> Silva, D.L da: 3.98
    3.9837403 = sum of:
      3.9837403 = weight(author_txt:silva in 1885) [ClassicSimilarity], result of:
        3.9837403 = fieldWeight in 1885, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.5118127 = idf(docFreq=65, maxDocs=44421)
          0.375 = fieldNorm(doc=1885)
    

Similar documents (content)

  1. Efron, M.: Information search and retrieval in microblogs (2011) 0.26
    0.26007023 = sum of:
      0.26007023 = product of:
        1.083626 = sum of:
          0.023679668 = weight(abstract_txt:discussion in 455) [ClassicSimilarity], result of:
            0.023679668 = score(doc=455,freq=1.0), product of:
              0.07372504 = queryWeight, product of:
                1.0371695 = boost
                5.1390233 = idf(docFreq=707, maxDocs=44421)
                0.013831991 = queryNorm
              0.32118896 = fieldWeight in 455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1390233 = idf(docFreq=707, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
          0.012681159 = weight(abstract_txt:analysis in 455) [ClassicSimilarity], result of:
            0.012681159 = score(doc=455,freq=1.0), product of:
              0.05565459 = queryWeight, product of:
                1.1036677 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.013831991 = queryNorm
              0.2278547 = fieldWeight in 455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
          0.018688004 = weight(abstract_txt:time in 455) [ClassicSimilarity], result of:
            0.018688004 = score(doc=455,freq=1.0), product of:
              0.07207262 = queryWeight, product of:
                1.2559519 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.013831991 = queryNorm
              0.2592941 = fieldWeight in 455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
          0.014064775 = weight(abstract_txt:these in 455) [ClassicSimilarity], result of:
            0.014064775 = score(doc=455,freq=1.0), product of:
              0.07070223 = queryWeight, product of:
                1.6059381 = boost
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.013831991 = queryNorm
              0.19892971 = fieldWeight in 455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
          0.25091302 = weight(abstract_txt:microblogs in 455) [ClassicSimilarity], result of:
            0.25091302 = score(doc=455,freq=3.0), product of:
              0.24660751 = queryWeight, product of:
                1.8969041 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013831991 = queryNorm
              1.0174589 = fieldWeight in 455, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
          0.76359946 = weight(abstract_txt:microblog in 455) [ClassicSimilarity], result of:
            0.76359946 = score(doc=455,freq=5.0), product of:
              0.592832 = queryWeight, product of:
                4.6502686 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013831991 = queryNorm
              1.2880536 = fieldWeight in 455, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=455)
        0.24 = coord(6/25)
    
  2. Jansen, B.J.; Zhang, M.; Sobel, K.; Chowdury, A.: Twitter power : tweets as electronic word of mouth (2009) 0.25
    0.2458654 = sum of:
      0.2458654 = product of:
        1.0244392 = sum of:
          0.04323209 = weight(abstract_txt:containing in 144) [ClassicSimilarity], result of:
            0.04323209 = score(doc=144,freq=1.0), product of:
              0.1101291 = queryWeight, product of:
                1.2676322 = boost
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.013831991 = queryNorm
              0.39255828 = fieldWeight in 144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
          0.07695265 = weight(abstract_txt:expressions in 144) [ClassicSimilarity], result of:
            0.07695265 = score(doc=144,freq=2.0), product of:
              0.12838186 = queryWeight, product of:
                1.3686552 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.013831991 = queryNorm
              0.5994044 = fieldWeight in 144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
          0.107221544 = weight(abstract_txt:tweets in 144) [ClassicSimilarity], result of:
            0.107221544 = score(doc=144,freq=2.0), product of:
              0.16015579 = queryWeight, product of:
                1.5286692 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.013831991 = queryNorm
              0.66948277 = fieldWeight in 144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
          0.024360904 = weight(abstract_txt:these in 144) [ClassicSimilarity], result of:
            0.024360904 = score(doc=144,freq=3.0), product of:
              0.07070223 = queryWeight, product of:
                1.6059381 = boost
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.013831991 = queryNorm
              0.34455636 = fieldWeight in 144, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
          0.2897294 = weight(abstract_txt:microblogs in 144) [ClassicSimilarity], result of:
            0.2897294 = score(doc=144,freq=4.0), product of:
              0.24660751 = queryWeight, product of:
                1.8969041 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013831991 = queryNorm
              1.1748604 = fieldWeight in 144, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
          0.48294267 = weight(abstract_txt:microblog in 144) [ClassicSimilarity], result of:
            0.48294267 = score(doc=144,freq=2.0), product of:
              0.592832 = queryWeight, product of:
                4.6502686 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013831991 = queryNorm
              0.8146366 = fieldWeight in 144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=144)
        0.24 = coord(6/25)
    
  3. Paltoglou, G.: Sentiment-based event detection in Twitter (2016) 0.18
    0.18164498 = sum of:
      0.18164498 = product of:
        0.64873207 = sum of:
          0.017933868 = weight(abstract_txt:analysis in 4010) [ClassicSimilarity], result of:
            0.017933868 = score(doc=4010,freq=2.0), product of:
              0.05565459 = queryWeight, product of:
                1.1036677 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.013831991 = queryNorm
              0.3222352 = fieldWeight in 4010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.018688004 = weight(abstract_txt:time in 4010) [ClassicSimilarity], result of:
            0.018688004 = score(doc=4010,freq=1.0), product of:
              0.07207262 = queryWeight, product of:
                1.2559519 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.013831991 = queryNorm
              0.2592941 = fieldWeight in 4010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.07456549 = weight(abstract_txt:events in 4010) [ClassicSimilarity], result of:
            0.07456549 = score(doc=4010,freq=3.0), product of:
              0.10982034 = queryWeight, product of:
                1.265854 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.013831991 = queryNorm
              0.6789771 = fieldWeight in 4010, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.07581708 = weight(abstract_txt:tweets in 4010) [ClassicSimilarity], result of:
            0.07581708 = score(doc=4010,freq=1.0), product of:
              0.16015579 = queryWeight, product of:
                1.5286692 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.013831991 = queryNorm
              0.47339582 = fieldWeight in 4010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.09746315 = weight(abstract_txt:sensitivity in 4010) [ClassicSimilarity], result of:
            0.09746315 = score(doc=4010,freq=1.0), product of:
              0.18934691 = queryWeight, product of:
                1.6621543 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.013831991 = queryNorm
              0.51473325 = fieldWeight in 4010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.15341066 = weight(abstract_txt:detection in 4010) [ClassicSimilarity], result of:
            0.15341066 = score(doc=4010,freq=2.0), product of:
              0.25621328 = queryWeight, product of:
                2.7343748 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.013831991 = queryNorm
              0.59876156 = fieldWeight in 4010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
          0.21085382 = weight(abstract_txt:event in 4010) [ClassicSimilarity], result of:
            0.21085382 = score(doc=4010,freq=2.0), product of:
              0.3411842 = queryWeight, product of:
                3.527822 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.013831991 = queryNorm
              0.6180058 = fieldWeight in 4010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0625 = fieldNorm(doc=4010)
        0.28 = coord(7/25)
    
  4. Kim, H.H.; Kim, Y.H.: ERP/MMR algorithm for classifying topic-relevant and topic-irrelevant visual shots of documentary videos (2019) 0.17
    0.17332275 = sum of:
      0.17332275 = product of:
        0.61900985 = sum of:
          0.012681159 = weight(abstract_txt:analysis in 358) [ClassicSimilarity], result of:
            0.012681159 = score(doc=358,freq=1.0), product of:
              0.05565459 = queryWeight, product of:
                1.1036677 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.013831991 = queryNorm
              0.2278547 = fieldWeight in 358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.043790087 = weight(abstract_txt:significance in 358) [ClassicSimilarity], result of:
            0.043790087 = score(doc=358,freq=1.0), product of:
              0.11107469 = queryWeight, product of:
                1.2730627 = boost
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.013831991 = queryNorm
              0.39424 = fieldWeight in 358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.045805212 = weight(abstract_txt:diversity in 358) [ClassicSimilarity], result of:
            0.045805212 = score(doc=358,freq=1.0), product of:
              0.1144567 = queryWeight, product of:
                1.2922984 = boost
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.013831991 = queryNorm
              0.40019688 = fieldWeight in 358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.036215574 = weight(abstract_txt:proposed in 358) [ClassicSimilarity], result of:
            0.036215574 = score(doc=358,freq=2.0), product of:
              0.08891641 = queryWeight, product of:
                1.3950149 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.013831991 = queryNorm
              0.4072991 = fieldWeight in 358, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.10847772 = weight(abstract_txt:detection in 358) [ClassicSimilarity], result of:
            0.10847772 = score(doc=358,freq=1.0), product of:
              0.25621328 = queryWeight, product of:
                2.7343748 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.013831991 = queryNorm
              0.42338836 = fieldWeight in 358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.14909616 = weight(abstract_txt:event in 358) [ClassicSimilarity], result of:
            0.14909616 = score(doc=358,freq=1.0), product of:
              0.3411842 = queryWeight, product of:
                3.527822 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.013831991 = queryNorm
              0.4369961 = fieldWeight in 358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
          0.22294393 = weight(abstract_txt:topic in 358) [ClassicSimilarity], result of:
            0.22294393 = score(doc=358,freq=8.0), product of:
              0.24954817 = queryWeight, product of:
                3.5698786 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013831991 = queryNorm
              0.89339036 = fieldWeight in 358, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=358)
        0.28 = coord(7/25)
    
  5. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.14
    0.1416986 = sum of:
      0.1416986 = product of:
        0.5904108 = sum of:
          0.027883196 = weight(abstract_txt:address in 1051) [ClassicSimilarity], result of:
            0.027883196 = score(doc=1051,freq=1.0), product of:
              0.08221031 = queryWeight, product of:
                1.0952301 = boost
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.013831991 = queryNorm
              0.33916909 = fieldWeight in 1051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.018688004 = weight(abstract_txt:time in 1051) [ClassicSimilarity], result of:
            0.018688004 = score(doc=1051,freq=1.0), product of:
              0.07207262 = queryWeight, product of:
                1.2559519 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.013831991 = queryNorm
              0.2592941 = fieldWeight in 1051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.043050412 = weight(abstract_txt:events in 1051) [ClassicSimilarity], result of:
            0.043050412 = score(doc=1051,freq=1.0), product of:
              0.10982034 = queryWeight, product of:
                1.265854 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.013831991 = queryNorm
              0.39200762 = fieldWeight in 1051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.15341066 = weight(abstract_txt:detection in 1051) [ClassicSimilarity], result of:
            0.15341066 = score(doc=1051,freq=2.0), product of:
              0.25621328 = queryWeight, product of:
                2.7343748 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.013831991 = queryNorm
              0.59876156 = fieldWeight in 1051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.21085382 = weight(abstract_txt:event in 1051) [ClassicSimilarity], result of:
            0.21085382 = score(doc=1051,freq=2.0), product of:
              0.3411842 = queryWeight, product of:
                3.527822 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.013831991 = queryNorm
              0.6180058 = fieldWeight in 1051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.1365247 = weight(abstract_txt:topic in 1051) [ClassicSimilarity], result of:
            0.1365247 = score(doc=1051,freq=3.0), product of:
              0.24954817 = queryWeight, product of:
                3.5698786 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013831991 = queryNorm
              0.5470876 = fieldWeight in 1051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
        0.24 = coord(6/25)