Document (#39931)

Author
Daudaravicius, V.
Title
¬A framework for keyphrase extraction from scientific journals
Source
http://cs.unibo.it/save-sd/2016/papers/pdf/daudaravicius-savesd2016.pdf
Year
2016
Abstract
We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.
Content
Vortrag, "Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop co-located with the 25th International World Wide Web Conference April 11, 2016 - Montreal, Canada", Montreal 2016.
Theme
Automatisches Indexieren
Field
Astronomie
Mathematik
Physik
Informatik

Similar documents (content)

  1. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.42
    0.4163391 = sum of:
      0.4163391 = product of:
        1.4869254 = sum of:
          0.05063131 = weight(abstract_txt:manually in 601) [ClassicSimilarity], result of:
            0.05063131 = score(doc=601,freq=1.0), product of:
              0.1218741 = queryWeight, product of:
                1.1439131 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.016028417 = queryNorm
              0.41543946 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.07123994 = weight(abstract_txt:specificity in 601) [ClassicSimilarity], result of:
            0.07123994 = score(doc=601,freq=1.0), product of:
              0.15303156 = queryWeight, product of:
                1.2818223 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016028417 = queryNorm
              0.4655245 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.07497323 = weight(abstract_txt:keywords in 601) [ClassicSimilarity], result of:
            0.07497323 = score(doc=601,freq=1.0), product of:
              0.19948618 = queryWeight, product of:
                2.0697064 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.016028417 = queryNorm
              0.37583172 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.08057747 = weight(abstract_txt:extracted in 601) [ClassicSimilarity], result of:
            0.08057747 = score(doc=601,freq=1.0), product of:
              0.20930731 = queryWeight, product of:
                2.1200423 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.016028417 = queryNorm
              0.38497207 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.08183994 = weight(abstract_txt:extraction in 601) [ClassicSimilarity], result of:
            0.08183994 = score(doc=601,freq=1.0), product of:
              0.21148789 = queryWeight, product of:
                2.131057 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.016028417 = queryNorm
              0.38697222 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.37133837 = weight(abstract_txt:keyphrase in 601) [ClassicSimilarity], result of:
            0.37133837 = score(doc=601,freq=2.0), product of:
              0.46005723 = queryWeight, product of:
                3.1431005 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016028417 = queryNorm
              0.8071569 = fieldWeight in 601, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.7563251 = weight(abstract_txt:keyphrases in 601) [ClassicSimilarity], result of:
            0.7563251 = score(doc=601,freq=7.0), product of:
              0.48687223 = queryWeight, product of:
                3.233403 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.016028417 = queryNorm
              1.5534366 = fieldWeight in 601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
        0.28 = coord(7/25)
    
  2. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.28
    0.28153735 = sum of:
      0.28153735 = product of:
        1.4076867 = sum of:
          0.039907414 = weight(abstract_txt:assigned in 5290) [ClassicSimilarity], result of:
            0.039907414 = score(doc=5290,freq=1.0), product of:
              0.10399226 = queryWeight, product of:
                1.0566663 = boost
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.016028417 = queryNorm
              0.3837537 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.05063131 = weight(abstract_txt:manually in 5290) [ClassicSimilarity], result of:
            0.05063131 = score(doc=5290,freq=1.0), product of:
              0.1218741 = queryWeight, product of:
                1.1439131 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.016028417 = queryNorm
              0.41543946 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.10602816 = weight(abstract_txt:keywords in 5290) [ClassicSimilarity], result of:
            0.10602816 = score(doc=5290,freq=2.0), product of:
              0.19948618 = queryWeight, product of:
                2.0697064 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.016028417 = queryNorm
              0.5315063 = fieldWeight in 5290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.45479476 = weight(abstract_txt:keyphrase in 5290) [ClassicSimilarity], result of:
            0.45479476 = score(doc=5290,freq=3.0), product of:
              0.46005723 = queryWeight, product of:
                3.1431005 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016028417 = queryNorm
              0.9885613 = fieldWeight in 5290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.7563251 = weight(abstract_txt:keyphrases in 5290) [ClassicSimilarity], result of:
            0.7563251 = score(doc=5290,freq=7.0), product of:
              0.48687223 = queryWeight, product of:
                3.233403 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.016028417 = queryNorm
              1.5534366 = fieldWeight in 5290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
        0.2 = coord(5/25)
    
  3. Zhang, Y.; Zhang, C.: Enhancing keyphrase extraction from microblogs using human reading time (2021) 0.26
    0.26358822 = sum of:
      0.26358822 = product of:
        1.0982842 = sum of:
          0.04949707 = weight(abstract_txt:extract in 237) [ClassicSimilarity], result of:
            0.04949707 = score(doc=237,freq=1.0), product of:
              0.12004709 = queryWeight, product of:
                1.1353066 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.016028417 = queryNorm
              0.4123138 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
          0.015442977 = weight(abstract_txt:from in 237) [ClassicSimilarity], result of:
            0.015442977 = score(doc=237,freq=2.0), product of:
              0.06321446 = queryWeight, product of:
                1.4269415 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016028417 = queryNorm
              0.24429502 = fieldWeight in 237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
          0.08057747 = weight(abstract_txt:extracted in 237) [ClassicSimilarity], result of:
            0.08057747 = score(doc=237,freq=1.0), product of:
              0.20930731 = queryWeight, product of:
                2.1200423 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.016028417 = queryNorm
              0.38497207 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
          0.14175093 = weight(abstract_txt:extraction in 237) [ClassicSimilarity], result of:
            0.14175093 = score(doc=237,freq=3.0), product of:
              0.21148789 = queryWeight, product of:
                2.131057 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.016028417 = queryNorm
              0.67025554 = fieldWeight in 237, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
          0.5251518 = weight(abstract_txt:keyphrase in 237) [ClassicSimilarity], result of:
            0.5251518 = score(doc=237,freq=4.0), product of:
              0.46005723 = queryWeight, product of:
                3.1431005 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016028417 = queryNorm
              1.1414922 = fieldWeight in 237, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
          0.28586406 = weight(abstract_txt:keyphrases in 237) [ClassicSimilarity], result of:
            0.28586406 = score(doc=237,freq=1.0), product of:
              0.48687223 = queryWeight, product of:
                3.233403 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.016028417 = queryNorm
              0.5871439 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=237)
        0.24 = coord(6/25)
    
  4. Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.25
    0.25342712 = sum of:
      0.25342712 = product of:
        1.2671356 = sum of:
          0.010919834 = weight(abstract_txt:from in 2106) [ClassicSimilarity], result of:
            0.010919834 = score(doc=2106,freq=1.0), product of:
              0.06321446 = queryWeight, product of:
                1.4269415 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016028417 = queryNorm
              0.17274266 = fieldWeight in 2106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.11395375 = weight(abstract_txt:extracted in 2106) [ClassicSimilarity], result of:
            0.11395375 = score(doc=2106,freq=2.0), product of:
              0.20930731 = queryWeight, product of:
                2.1200423 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.016028417 = queryNorm
              0.5444327 = fieldWeight in 2106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.11573915 = weight(abstract_txt:extraction in 2106) [ClassicSimilarity], result of:
            0.11573915 = score(doc=2106,freq=2.0), product of:
              0.21148789 = queryWeight, product of:
                2.131057 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.016028417 = queryNorm
              0.54726136 = fieldWeight in 2106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.45479476 = weight(abstract_txt:keyphrase in 2106) [ClassicSimilarity], result of:
            0.45479476 = score(doc=2106,freq=3.0), product of:
              0.46005723 = queryWeight, product of:
                3.1431005 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016028417 = queryNorm
              0.9885613 = fieldWeight in 2106, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.5717281 = weight(abstract_txt:keyphrases in 2106) [ClassicSimilarity], result of:
            0.5717281 = score(doc=2106,freq=4.0), product of:
              0.48687223 = queryWeight, product of:
                3.233403 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.016028417 = queryNorm
              1.1742878 = fieldWeight in 2106, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
        0.2 = coord(5/25)
    
  5. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 0.20
    0.20297317 = sum of:
      0.20297317 = product of:
        1.2685823 = sum of:
          0.06187134 = weight(abstract_txt:extract in 4665) [ClassicSimilarity], result of:
            0.06187134 = score(doc=4665,freq=1.0), product of:
              0.12004709 = queryWeight, product of:
                1.1353066 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.016028417 = queryNorm
              0.51539224 = fieldWeight in 4665, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
          0.019303722 = weight(abstract_txt:from in 4665) [ClassicSimilarity], result of:
            0.019303722 = score(doc=4665,freq=2.0), product of:
              0.06321446 = queryWeight, product of:
                1.4269415 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016028417 = queryNorm
              0.30536878 = fieldWeight in 4665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
          0.5684934 = weight(abstract_txt:keyphrase in 4665) [ClassicSimilarity], result of:
            0.5684934 = score(doc=4665,freq=3.0), product of:
              0.46005723 = queryWeight, product of:
                3.1431005 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.016028417 = queryNorm
              1.2357016 = fieldWeight in 4665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
          0.6189138 = weight(abstract_txt:keyphrases in 4665) [ClassicSimilarity], result of:
            0.6189138 = score(doc=4665,freq=3.0), product of:
              0.48687223 = queryWeight, product of:
                3.233403 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.016028417 = queryNorm
              1.2712038 = fieldWeight in 4665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.078125 = fieldNorm(doc=4665)
        0.16 = coord(4/25)