Document (#39931)

Author
Daudaravicius, V.
Title
¬A framework for keyphrase extraction from scientific journals
Source
http://cs.unibo.it/save-sd/2016/papers/pdf/daudaravicius-savesd2016.pdf
Year
2016
Abstract
We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.
Content
Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.
Theme
Automatisches Indexieren
Field
Astronomie
Mathematik
Physik
Informatik

Similar documents (content)

  1. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.42
    0.4168362 = sum of:
      0.4168362 = product of:
        1.4887007 = sum of:
          0.050610803 = weight(abstract_txt:manually in 1601) [ClassicSimilarity], result of:
            0.050610803 = score(doc=1601,freq=1.0), product of:
              0.12185791 = queryWeight, product of:
                1.1497144 = boost
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.015949765 = queryNorm
              0.41532636 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.07140084 = weight(abstract_txt:specificity in 1601) [ClassicSimilarity], result of:
            0.07140084 = score(doc=1601,freq=1.0), product of:
              0.15328294 = queryWeight, product of:
                1.2894664 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015949765 = queryNorm
              0.46581078 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.07467032 = weight(abstract_txt:keywords in 1601) [ClassicSimilarity], result of:
            0.07467032 = score(doc=1601,freq=1.0), product of:
              0.1989758 = queryWeight, product of:
                2.077678 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.015949765 = queryNorm
              0.37527338 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.0803298 = weight(abstract_txt:extracted in 1601) [ClassicSimilarity], result of:
            0.0803298 = score(doc=1601,freq=1.0), product of:
              0.20890686 = queryWeight, product of:
                2.128896 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.015949765 = queryNorm
              0.38452446 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.081894405 = weight(abstract_txt:extraction in 1601) [ClassicSimilarity], result of:
            0.081894405 = score(doc=1601,freq=1.0), product of:
              0.21161075 = queryWeight, product of:
                2.142629 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.015949765 = queryNorm
              0.38700494 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.37205058 = weight(abstract_txt:keyphrase in 1601) [ClassicSimilarity], result of:
            0.37205058 = score(doc=1601,freq=2.0), product of:
              0.46070853 = queryWeight, product of:
                3.1614857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015949765 = queryNorm
              0.80756176 = fieldWeight in 1601, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.75774395 = weight(abstract_txt:keyphrases in 1601) [ClassicSimilarity], result of:
            0.75774395 = score(doc=1601,freq=7.0), product of:
              0.48754784 = queryWeight, product of:
                3.252271 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015949765 = queryNorm
              1.5541941 = fieldWeight in 1601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
        0.28 = coord(7/25)
    
  2. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.28
    0.28189695 = sum of:
      0.28189695 = product of:
        1.4094847 = sum of:
          0.03986318 = weight(abstract_txt:assigned in 290) [ClassicSimilarity], result of:
            0.03986318 = score(doc=290,freq=1.0), product of:
              0.10392967 = queryWeight, product of:
                1.0617759 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.015949765 = queryNorm
              0.3835592 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.050610803 = weight(abstract_txt:manually in 290) [ClassicSimilarity], result of:
            0.050610803 = score(doc=290,freq=1.0), product of:
              0.12185791 = queryWeight, product of:
                1.1497144 = boost
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.015949765 = queryNorm
              0.41532636 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6452217 = idf(docFreq=156, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.10559978 = weight(abstract_txt:keywords in 290) [ClassicSimilarity], result of:
            0.10559978 = score(doc=290,freq=2.0), product of:
              0.1989758 = queryWeight, product of:
                2.077678 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.015949765 = queryNorm
              0.5307167 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.45566705 = weight(abstract_txt:keyphrase in 290) [ClassicSimilarity], result of:
            0.45566705 = score(doc=290,freq=3.0), product of:
              0.46070853 = queryWeight, product of:
                3.1614857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015949765 = queryNorm
              0.9890571 = fieldWeight in 290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.75774395 = weight(abstract_txt:keyphrases in 290) [ClassicSimilarity], result of:
            0.75774395 = score(doc=290,freq=7.0), product of:
              0.48754784 = queryWeight, product of:
                3.252271 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015949765 = queryNorm
              1.5541941 = fieldWeight in 290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
        0.2 = coord(5/25)
    
  3. Zhang, Y.; Zhang, C.: Enhancing keyphrase extraction from microblogs using human reading time (2021) 0.26
    0.26390216 = sum of:
      0.26390216 = product of:
        1.0995923 = sum of:
          0.049483716 = weight(abstract_txt:extract in 1238) [ClassicSimilarity], result of:
            0.049483716 = score(doc=1238,freq=1.0), product of:
              0.120041974 = queryWeight, product of:
                1.1411157 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.015949765 = queryNorm
              0.41222012 = fieldWeight in 1238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.015374268 = weight(abstract_txt:from in 1238) [ClassicSimilarity], result of:
            0.015374268 = score(doc=1238,freq=2.0), product of:
              0.063035466 = queryWeight, product of:
                1.4322413 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015949765 = queryNorm
              0.2438987 = fieldWeight in 1238, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.0803298 = weight(abstract_txt:extracted in 1238) [ClassicSimilarity], result of:
            0.0803298 = score(doc=1238,freq=1.0), product of:
              0.20890686 = queryWeight, product of:
                2.128896 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.015949765 = queryNorm
              0.38452446 = fieldWeight in 1238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.14184527 = weight(abstract_txt:extraction in 1238) [ClassicSimilarity], result of:
            0.14184527 = score(doc=1238,freq=3.0), product of:
              0.21161075 = queryWeight, product of:
                2.142629 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.015949765 = queryNorm
              0.6703122 = fieldWeight in 1238, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.526159 = weight(abstract_txt:keyphrase in 1238) [ClassicSimilarity], result of:
            0.526159 = score(doc=1238,freq=4.0), product of:
              0.46070853 = queryWeight, product of:
                3.1614857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015949765 = queryNorm
              1.1420648 = fieldWeight in 1238, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
          0.28640032 = weight(abstract_txt:keyphrases in 1238) [ClassicSimilarity], result of:
            0.28640032 = score(doc=1238,freq=1.0), product of:
              0.48754784 = queryWeight, product of:
                3.252271 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015949765 = queryNorm
              0.5874302 = fieldWeight in 1238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=1238)
        0.24 = coord(6/25)
    
  4. Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.25
    0.25375172 = sum of:
      0.25375172 = product of:
        1.2687585 = sum of:
          0.01087125 = weight(abstract_txt:from in 3106) [ClassicSimilarity], result of:
            0.01087125 = score(doc=3106,freq=1.0), product of:
              0.063035466 = queryWeight, product of:
                1.4322413 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015949765 = queryNorm
              0.17246243 = fieldWeight in 3106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.11360349 = weight(abstract_txt:extracted in 3106) [ClassicSimilarity], result of:
            0.11360349 = score(doc=3106,freq=2.0), product of:
              0.20890686 = queryWeight, product of:
                2.128896 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.015949765 = queryNorm
              0.5437997 = fieldWeight in 3106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.115816176 = weight(abstract_txt:extraction in 3106) [ClassicSimilarity], result of:
            0.115816176 = score(doc=3106,freq=2.0), product of:
              0.21161075 = queryWeight, product of:
                2.142629 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.015949765 = queryNorm
              0.5473076 = fieldWeight in 3106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.45566705 = weight(abstract_txt:keyphrase in 3106) [ClassicSimilarity], result of:
            0.45566705 = score(doc=3106,freq=3.0), product of:
              0.46070853 = queryWeight, product of:
                3.1614857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015949765 = queryNorm
              0.9890571 = fieldWeight in 3106, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.57280064 = weight(abstract_txt:keyphrases in 3106) [ClassicSimilarity], result of:
            0.57280064 = score(doc=3106,freq=4.0), product of:
              0.48754784 = queryWeight, product of:
                3.252271 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015949765 = queryNorm
              1.1748604 = fieldWeight in 3106, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
        0.2 = coord(5/25)
    
  5. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 0.20
    0.20331699 = sum of:
      0.20331699 = product of:
        1.2707312 = sum of:
          0.061854642 = weight(abstract_txt:extract in 665) [ClassicSimilarity], result of:
            0.061854642 = score(doc=665,freq=1.0), product of:
              0.120041974 = queryWeight, product of:
                1.1411157 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.015949765 = queryNorm
              0.5152751 = fieldWeight in 665, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
          0.019217836 = weight(abstract_txt:from in 665) [ClassicSimilarity], result of:
            0.019217836 = score(doc=665,freq=2.0), product of:
              0.063035466 = queryWeight, product of:
                1.4322413 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015949765 = queryNorm
              0.30487338 = fieldWeight in 665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
          0.56958383 = weight(abstract_txt:keyphrase in 665) [ClassicSimilarity], result of:
            0.56958383 = score(doc=665,freq=3.0), product of:
              0.46070853 = queryWeight, product of:
                3.1614857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015949765 = queryNorm
              1.2363214 = fieldWeight in 665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
          0.62007487 = weight(abstract_txt:keyphrases in 665) [ClassicSimilarity], result of:
            0.62007487 = score(doc=665,freq=3.0), product of:
              0.48754784 = queryWeight, product of:
                3.252271 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015949765 = queryNorm
              1.2718236 = fieldWeight in 665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
        0.16 = coord(4/25)