Document (#41673)

Author
Zheng, X.
Sun, A.
Title
Collecting event-related tweets from twitter stream
Source
Journal of the Association for Information Science and Technology. 70(2019) no.2, S.176-186
Year
2019
Abstract
Twitter provides a channel of collecting and publishing instant information on major events like natural disasters. However, information flow on Twitter is of great volume. For a specific event, messages collected from the Twitter Stream based on either location constraint or predefined keywords would contain a lot of noise. In this article, we propose a method to achieve both high-precision and high-recall in collecting event-related tweets. Our method involves an automatic keyword generation component, and an event-related tweet identification component. For keyword generation, we consider three properties of candidate keywords, namely relevance, coverage, and evolvement. The keyword updating mechanism enables our method to track the main topics of tweets along event development. To minimize annotation effort in identifying event-related tweets, we adopt active learning and incorporate multiple-instance learning which assigns labels to bags instead of instances (that is, individual tweets). Through experiments on two real-world events, we demonstrate the superiority of our method against state-of-the-art alternatives.
Content
Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24096.
Theme
Informetrie
Object
Twitter

Similar documents (author)

  1. Hill, L.L.; Zheng, Q.: Indirect geospatial referencing through place names in the digital library : Alexandra digital library experience with developing and implementing gazetteers (1999) 4.33
    4.326182 = sum of:
      4.326182 = weight(author_txt:zheng in 6543) [ClassicSimilarity], result of:
        4.326182 = score(doc=6543,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          4.3261824 = fieldWeight in 6543, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.5 = fieldNorm(doc=6543)
    
  2. Rada, R.; Bird, G.; Zheng, M.: Hypertext interchange using ICA (1995) 3.24
    3.2446365 = sum of:
      3.2446365 = weight(author_txt:zheng in 6806) [ClassicSimilarity], result of:
        3.2446365 = score(doc=6806,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          3.2446368 = fieldWeight in 6806, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.375 = fieldNorm(doc=6806)
    
  3. Rada, R.; Liu, Z.; Zheng, M.: Connecting educational information spaces (1997) 3.24
    3.2446365 = sum of:
      3.2446365 = weight(author_txt:zheng in 313) [ClassicSimilarity], result of:
        3.2446365 = score(doc=313,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          3.2446368 = fieldWeight in 313, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.375 = fieldNorm(doc=313)
    
  4. Hill, L.L.; Frew, J.; Zheng, Q.: Geographic names : the implementation of a gazetteer in a georeferenced digital library (1999) 3.24
    3.2446365 = sum of:
      3.2446365 = weight(author_txt:zheng in 1240) [ClassicSimilarity], result of:
        3.2446365 = score(doc=1240,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          3.2446368 = fieldWeight in 1240, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.375 = fieldNorm(doc=1240)
    
  5. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 3.24
    3.2446365 = sum of:
      3.2446365 = weight(author_txt:zheng in 2714) [ClassicSimilarity], result of:
        3.2446365 = score(doc=2714,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          3.2446368 = fieldWeight in 2714, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.375 = fieldNorm(doc=2714)
    

Similar documents (content)

  1. Bandaragoda, T.R.; Silva, D. de; Alahakoon, D.: Automatic event detection in microblogs using incremental machine learning (2017) 0.30
    0.30353698 = sum of:
      0.30353698 = product of:
        1.0840607 = sum of:
          0.022856787 = weight(abstract_txt:learning in 3826) [ClassicSimilarity], result of:
            0.022856787 = score(doc=3826,freq=1.0), product of:
              0.076977134 = queryWeight, product of:
                1.2967349 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.012495024 = queryNorm
              0.29692957 = fieldWeight in 3826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.03796022 = weight(abstract_txt:generation in 3826) [ClassicSimilarity], result of:
            0.03796022 = score(doc=3826,freq=1.0), product of:
              0.107953675 = queryWeight, product of:
                1.5356387 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.012495024 = queryNorm
              0.35163435 = fieldWeight in 3826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.07453012 = weight(abstract_txt:events in 3826) [ClassicSimilarity], result of:
            0.07453012 = score(doc=3826,freq=2.0), product of:
              0.13434747 = queryWeight, product of:
                1.7131094 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012495024 = queryNorm
              0.5547564 = fieldWeight in 3826, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.038872045 = weight(abstract_txt:method in 3826) [ClassicSimilarity], result of:
            0.038872045 = score(doc=3826,freq=1.0), product of:
              0.13818255 = queryWeight, product of:
                2.4570384 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012495024 = queryNorm
              0.28130937 = fieldWeight in 3826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.14113088 = weight(abstract_txt:twitter in 3826) [ClassicSimilarity], result of:
            0.14113088 = score(doc=3826,freq=1.0), product of:
              0.32641965 = queryWeight, product of:
                3.7763608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012495024 = queryNorm
              0.43236023 = fieldWeight in 3826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.32899603 = weight(abstract_txt:tweets in 3826) [ClassicSimilarity], result of:
            0.32899603 = score(doc=3826,freq=2.0), product of:
              0.4906616 = queryWeight, product of:
                5.176442 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.012495024 = queryNorm
              0.6705152 = fieldWeight in 3826, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
          0.4397146 = weight(abstract_txt:event in 3826) [ClassicSimilarity], result of:
            0.4397146 = score(doc=3826,freq=4.0), product of:
              0.5021336 = queryWeight, product of:
                5.736416 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.012495024 = queryNorm
              0.8756924 = fieldWeight in 3826, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=3826)
        0.28 = coord(7/25)
    
  2. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.27
    0.2695384 = sum of:
      0.2695384 = product of:
        1.1230767 = sum of:
          0.11412108 = weight(abstract_txt:tweet in 2335) [ClassicSimilarity], result of:
            0.11412108 = score(doc=2335,freq=3.0), product of:
              0.12374997 = queryWeight, product of:
                1.1625935 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.012495024 = queryNorm
              0.9221907 = fieldWeight in 2335, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
          0.022856787 = weight(abstract_txt:learning in 2335) [ClassicSimilarity], result of:
            0.022856787 = score(doc=2335,freq=1.0), product of:
              0.076977134 = queryWeight, product of:
                1.2967349 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.012495024 = queryNorm
              0.29692957 = fieldWeight in 2335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
          0.031690206 = weight(abstract_txt:related in 2335) [ClassicSimilarity], result of:
            0.031690206 = score(doc=2335,freq=1.0), product of:
              0.1205901 = queryWeight, product of:
                2.295309 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.012495024 = queryNorm
              0.26279277 = fieldWeight in 2335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
          0.038872045 = weight(abstract_txt:method in 2335) [ClassicSimilarity], result of:
            0.038872045 = score(doc=2335,freq=1.0), product of:
              0.13818255 = queryWeight, product of:
                2.4570384 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012495024 = queryNorm
              0.28130937 = fieldWeight in 2335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
          0.34569865 = weight(abstract_txt:twitter in 2335) [ClassicSimilarity], result of:
            0.34569865 = score(doc=2335,freq=6.0), product of:
              0.32641965 = queryWeight, product of:
                3.7763608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012495024 = queryNorm
              1.059062 = fieldWeight in 2335, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
          0.5698379 = weight(abstract_txt:tweets in 2335) [ClassicSimilarity], result of:
            0.5698379 = score(doc=2335,freq=6.0), product of:
              0.4906616 = queryWeight, product of:
                5.176442 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.012495024 = queryNorm
              1.1613665 = fieldWeight in 2335, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=2335)
        0.24 = coord(6/25)
    
  3. Cotelo, J.M.; Cruz, F.L.; Troyano, J.A.: Dynamic topic-related tweet retrieval (2014) 0.24
    0.23603265 = sum of:
      0.23603265 = product of:
        0.98346937 = sum of:
          0.07699207 = weight(abstract_txt:instant in 1217) [ClassicSimilarity], result of:
            0.07699207 = score(doc=1217,freq=1.0), product of:
              0.11831295 = queryWeight, product of:
                1.136767 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.012495024 = queryNorm
              0.6507493 = fieldWeight in 1217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
          0.08235979 = weight(abstract_txt:tweet in 1217) [ClassicSimilarity], result of:
            0.08235979 = score(doc=1217,freq=1.0), product of:
              0.12374997 = queryWeight, product of:
                1.1625935 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.012495024 = queryNorm
              0.66553384 = fieldWeight in 1217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
          0.07922552 = weight(abstract_txt:related in 1217) [ClassicSimilarity], result of:
            0.07922552 = score(doc=1217,freq=4.0), product of:
              0.1205901 = queryWeight, product of:
                2.295309 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.012495024 = queryNorm
              0.65698195 = fieldWeight in 1217, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
          0.08416045 = weight(abstract_txt:method in 1217) [ClassicSimilarity], result of:
            0.08416045 = score(doc=1217,freq=3.0), product of:
              0.13818255 = queryWeight, product of:
                2.4570384 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012495024 = queryNorm
              0.60905266 = fieldWeight in 1217, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
          0.2494865 = weight(abstract_txt:twitter in 1217) [ClassicSimilarity], result of:
            0.2494865 = score(doc=1217,freq=2.0), product of:
              0.32641965 = queryWeight, product of:
                3.7763608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012495024 = queryNorm
              0.76431215 = fieldWeight in 1217, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
          0.41124505 = weight(abstract_txt:tweets in 1217) [ClassicSimilarity], result of:
            0.41124505 = score(doc=1217,freq=2.0), product of:
              0.4906616 = queryWeight, product of:
                5.176442 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.012495024 = queryNorm
              0.83814394 = fieldWeight in 1217, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=1217)
        0.24 = coord(6/25)
    
  4. Arakawa, Y.; Kameda, A.; Aizawa, A.; Suzuki, T.: Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets (2014) 0.19
    0.19187047 = sum of:
      0.19187047 = product of:
        0.7994603 = sum of:
          0.06588784 = weight(abstract_txt:tweet in 1307) [ClassicSimilarity], result of:
            0.06588784 = score(doc=1307,freq=1.0), product of:
              0.12374997 = queryWeight, product of:
                1.1625935 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.012495024 = queryNorm
              0.5324271 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.022856787 = weight(abstract_txt:learning in 1307) [ClassicSimilarity], result of:
            0.022856787 = score(doc=1307,freq=1.0), product of:
              0.076977134 = queryWeight, product of:
                1.2967349 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.012495024 = queryNorm
              0.29692957 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.024461562 = weight(abstract_txt:high in 1307) [ClassicSimilarity], result of:
            0.024461562 = score(doc=1307,freq=1.0), product of:
              0.08053928 = queryWeight, product of:
                1.3263991 = boost
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.012495024 = queryNorm
              0.30372214 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.038872045 = weight(abstract_txt:method in 1307) [ClassicSimilarity], result of:
            0.038872045 = score(doc=1307,freq=1.0), product of:
              0.13818255 = queryWeight, product of:
                2.4570384 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012495024 = queryNorm
              0.28130937 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.24444585 = weight(abstract_txt:twitter in 1307) [ClassicSimilarity], result of:
            0.24444585 = score(doc=1307,freq=3.0), product of:
              0.32641965 = queryWeight, product of:
                3.7763608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012495024 = queryNorm
              0.7488699 = fieldWeight in 1307, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.40293622 = weight(abstract_txt:tweets in 1307) [ClassicSimilarity], result of:
            0.40293622 = score(doc=1307,freq=3.0), product of:
              0.4906616 = queryWeight, product of:
                5.176442 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.012495024 = queryNorm
              0.82121 = fieldWeight in 1307, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
        0.24 = coord(6/25)
    
  5. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.17
    0.17464687 = sum of:
      0.17464687 = product of:
        1.091543 = sum of:
          0.11412108 = weight(abstract_txt:tweet in 3683) [ClassicSimilarity], result of:
            0.11412108 = score(doc=3683,freq=3.0), product of:
              0.12374997 = queryWeight, product of:
                1.1625935 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.012495024 = queryNorm
              0.9221907 = fieldWeight in 3683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.04634844 = weight(abstract_txt:keywords in 3683) [ClassicSimilarity], result of:
            0.04634844 = score(doc=3683,freq=1.0), product of:
              0.12332232 = queryWeight, product of:
                1.6413121 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.012495024 = queryNorm
              0.37583172 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.31557822 = weight(abstract_txt:twitter in 3683) [ClassicSimilarity], result of:
            0.31557822 = score(doc=3683,freq=5.0), product of:
              0.32641965 = queryWeight, product of:
                3.7763608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.012495024 = queryNorm
              0.96678686 = fieldWeight in 3683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.6154952 = weight(abstract_txt:tweets in 3683) [ClassicSimilarity], result of:
            0.6154952 = score(doc=3683,freq=7.0), product of:
              0.4906616 = queryWeight, product of:
                5.176442 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.012495024 = queryNorm
              1.254419 = fieldWeight in 3683, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
        0.16 = coord(4/25)