Document (#39833)

Author
Martinez-Romo, J.
Araujo, L.
Fernandez, A.D.
Title
SemGraph : extracting keyphrases following a novel semantic graph-based approach
Source
Journal of the Association for Information Science and Technology. 67(2016) no.1, S.71-82
Year
2016
Abstract
Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23365/abstract.
Theme
Automatisches Abstracting
Object
SemGraph

Similar documents (author)

  1. Martinez, H.; Castañeda Romero, G.J.; Fernandez, R.: Knowledge : typology and construction (2024) 2.40
    2.402295 = sum of:
      2.402295 = product of:
        3.6034424 = sum of:
          1.5180783 = weight(author_txt:martinez in 2378) [ClassicSimilarity], result of:
            1.5180783 = score(doc=2378,freq=1.0), product of:
              0.4836996 = queryWeight, product of:
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.05779477 = queryNorm
              3.1384735 = fieldWeight in 2378, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.375 = fieldNorm(doc=2378)
          2.085364 = weight(author_txt:fernandez in 2378) [ClassicSimilarity], result of:
            2.085364 = score(doc=2378,freq=1.0), product of:
              0.59772426 = queryWeight, product of:
                1.1116359 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.05779477 = queryNorm
              3.4888396 = fieldWeight in 2378, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.375 = fieldNorm(doc=2378)
        0.6666667 = coord(2/3)
    
  2. Fernandez, C.W.: Semantic relationships between title phrases and LCSH (1991) 1.16
    1.1585357 = sum of:
      1.1585357 = product of:
        3.475607 = sum of:
          3.475607 = weight(author_txt:fernandez in 634) [ClassicSimilarity], result of:
            3.475607 = score(doc=634,freq=1.0), product of:
              0.59772426 = queryWeight, product of:
                1.1116359 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.05779477 = queryNorm
              5.814733 = fieldWeight in 634, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.625 = fieldNorm(doc=634)
        0.33333334 = coord(1/3)
    
  3. Fernandez, F.S.; Moreno, A.G.: History of information science in Spain : a selected bibliography (1997) 0.93
    0.92682856 = sum of:
      0.92682856 = product of:
        2.7804856 = sum of:
          2.7804856 = weight(author_txt:fernandez in 1052) [ClassicSimilarity], result of:
            2.7804856 = score(doc=1052,freq=1.0), product of:
              0.59772426 = queryWeight, product of:
                1.1116359 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.05779477 = queryNorm
              4.6517863 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.5 = fieldNorm(doc=1052)
        0.33333334 = coord(1/3)
    
  4. Novaes, M. de Araujo => Araujo Novaes, M. de: 0.91
    0.9062432 = sum of:
      0.9062432 = product of:
        2.7187295 = sum of:
          2.7187295 = weight(author_txt:araujo in 819) [ClassicSimilarity], result of:
            2.7187295 = score(doc=819,freq=2.0), product of:
              0.63934374 = queryWeight, product of:
                1.1496862 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.05779477 = queryNorm
              4.252375 = fieldWeight in 819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.3125 = fieldNorm(doc=819)
        0.33333334 = coord(1/3)
    
  5. Araujo, A. de Freitas => Freitas Araujo, A. de: 0.91
    0.9062432 = sum of:
      0.9062432 = product of:
        2.7187295 = sum of:
          2.7187295 = weight(author_txt:araujo in 885) [ClassicSimilarity], result of:
            2.7187295 = score(doc=885,freq=2.0), product of:
              0.63934374 = queryWeight, product of:
                1.1496862 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.05779477 = queryNorm
              4.252375 = fieldWeight in 885, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.3125 = fieldNorm(doc=885)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.27
    0.27372906 = sum of:
      0.27372906 = product of:
        1.3686452 = sum of:
          0.020971822 = weight(abstract_txt:text in 290) [ClassicSimilarity], result of:
            0.020971822 = score(doc=290,freq=2.0), product of:
              0.05871715 = queryWeight, product of:
                1.1404697 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012741045 = queryNorm
              0.3571669 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.024184227 = weight(abstract_txt:significant in 290) [ClassicSimilarity], result of:
            0.024184227 = score(doc=290,freq=1.0), product of:
              0.08135277 = queryWeight, product of:
                1.342417 = boost
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.012741045 = queryNorm
              0.29727602 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.05901714 = weight(abstract_txt:algorithm in 290) [ClassicSimilarity], result of:
            0.05901714 = score(doc=290,freq=2.0), product of:
              0.11703784 = queryWeight, product of:
                1.6101428 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012741045 = queryNorm
              0.5042569 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.030201372 = weight(abstract_txt:semantic in 290) [ClassicSimilarity], result of:
            0.030201372 = score(doc=290,freq=1.0), product of:
              0.10799386 = queryWeight, product of:
                1.8942899 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.012741045 = queryNorm
              0.27965823 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          1.2342707 = weight(abstract_txt:keyphrases in 290) [ClassicSimilarity], result of:
            1.2342707 = score(doc=290,freq=7.0), product of:
              0.79415476 = queryWeight, product of:
                6.6316843 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012741045 = queryNorm
              1.5541941 = fieldWeight in 290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
        0.2 = coord(5/25)
    
  2. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.22
    0.21677983 = sum of:
      0.21677983 = product of:
        1.0838991 = sum of:
          0.014829318 = weight(abstract_txt:text in 2014) [ClassicSimilarity], result of:
            0.014829318 = score(doc=2014,freq=1.0), product of:
              0.05871715 = queryWeight, product of:
                1.1404697 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012741045 = queryNorm
              0.25255513 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.022086734 = weight(abstract_txt:main in 2014) [ClassicSimilarity], result of:
            0.022086734 = score(doc=2014,freq=1.0), product of:
              0.07657821 = queryWeight, product of:
                1.3024284 = boost
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.012741045 = queryNorm
              0.28842062 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.04271119 = weight(abstract_txt:semantic in 2014) [ClassicSimilarity], result of:
            0.04271119 = score(doc=2014,freq=2.0), product of:
              0.10799386 = queryWeight, product of:
                1.8942899 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.012741045 = queryNorm
              0.39549646 = fieldWeight in 2014, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.07125097 = weight(abstract_txt:statistically in 2014) [ClassicSimilarity], result of:
            0.07125097 = score(doc=2014,freq=1.0), product of:
              0.16719042 = queryWeight, product of:
                1.9244515 = boost
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.012741045 = queryNorm
              0.4261666 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.93302095 = weight(abstract_txt:keyphrases in 2014) [ClassicSimilarity], result of:
            0.93302095 = score(doc=2014,freq=4.0), product of:
              0.79415476 = queryWeight, product of:
                6.6316843 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012741045 = queryNorm
              1.1748604 = fieldWeight in 2014, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
        0.2 = coord(5/25)
    
  3. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.16
    0.15775974 = sum of:
      0.15775974 = product of:
        1.3146645 = sum of:
          0.038662322 = weight(abstract_txt:ability in 1601) [ClassicSimilarity], result of:
            0.038662322 = score(doc=1601,freq=1.0), product of:
              0.11122681 = queryWeight, product of:
                1.5696615 = boost
                5.561583 = idf(docFreq=463, maxDocs=44421)
                0.012741045 = queryNorm
              0.34759894 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.561583 = idf(docFreq=463, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.04173142 = weight(abstract_txt:algorithm in 1601) [ClassicSimilarity], result of:
            0.04173142 = score(doc=1601,freq=1.0), product of:
              0.11703784 = queryWeight, product of:
                1.6101428 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012741045 = queryNorm
              0.35656348 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          1.2342707 = weight(abstract_txt:keyphrases in 1601) [ClassicSimilarity], result of:
            1.2342707 = score(doc=1601,freq=7.0), product of:
              0.79415476 = queryWeight, product of:
                6.6316843 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012741045 = queryNorm
              1.5541941 = fieldWeight in 1601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
        0.12 = coord(3/25)
    
  4. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 0.13
    0.12868975 = sum of:
      0.12868975 = product of:
        1.0724146 = sum of:
          0.026214777 = weight(abstract_txt:text in 665) [ClassicSimilarity], result of:
            0.026214777 = score(doc=665,freq=2.0), product of:
              0.05871715 = queryWeight, product of:
                1.1404697 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012741045 = queryNorm
              0.4464586 = fieldWeight in 665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
          0.036175087 = weight(abstract_txt:method in 665) [ClassicSimilarity], result of:
            0.036175087 = score(doc=665,freq=2.0), product of:
              0.07277919 = queryWeight, product of:
                1.269711 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012741045 = queryNorm
              0.4970526 = fieldWeight in 665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
          1.0100248 = weight(abstract_txt:keyphrases in 665) [ClassicSimilarity], result of:
            1.0100248 = score(doc=665,freq=3.0), product of:
              0.79415476 = queryWeight, product of:
                6.6316843 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012741045 = queryNorm
              1.2718236 = fieldWeight in 665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=665)
        0.12 = coord(3/25)
    
  5. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 0.13
    0.12860961 = sum of:
      0.12860961 = product of:
        0.45932004 = sum of:
          0.049796563 = weight(abstract_txt:unsupervised in 3380) [ClassicSimilarity], result of:
            0.049796563 = score(doc=3380,freq=1.0), product of:
              0.104505815 = queryWeight, product of:
                1.0758618 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.012741045 = queryNorm
              0.47649562 = fieldWeight in 3380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.034201663 = weight(abstract_txt:significant in 3380) [ClassicSimilarity], result of:
            0.034201663 = score(doc=3380,freq=2.0), product of:
              0.08135277 = queryWeight, product of:
                1.342417 = boost
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.012741045 = queryNorm
              0.42041177 = fieldWeight in 3380, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.05999332 = weight(abstract_txt:presence in 3380) [ClassicSimilarity], result of:
            0.05999332 = score(doc=3380,freq=1.0), product of:
              0.14908002 = queryWeight, product of:
                1.8172345 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.012741045 = queryNorm
              0.4024236 = fieldWeight in 3380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.04271119 = weight(abstract_txt:semantic in 3380) [ClassicSimilarity], result of:
            0.04271119 = score(doc=3380,freq=2.0), product of:
              0.10799386 = queryWeight, product of:
                1.8942899 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.012741045 = queryNorm
              0.39549646 = fieldWeight in 3380, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.07125097 = weight(abstract_txt:statistically in 3380) [ClassicSimilarity], result of:
            0.07125097 = score(doc=3380,freq=1.0), product of:
              0.16719042 = queryWeight, product of:
                1.9244515 = boost
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.012741045 = queryNorm
              0.4261666 = fieldWeight in 3380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8186655 = idf(docFreq=131, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.07482376 = weight(abstract_txt:extracting in 3380) [ClassicSimilarity], result of:
            0.07482376 = score(doc=3380,freq=1.0), product of:
              0.17273375 = queryWeight, product of:
                1.9560946 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012741045 = queryNorm
              0.43317392 = fieldWeight in 3380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
          0.12654257 = weight(abstract_txt:graph in 3380) [ClassicSimilarity], result of:
            0.12654257 = score(doc=3380,freq=1.0), product of:
              0.30892423 = queryWeight, product of:
                3.6994932 = boost
                6.553973 = idf(docFreq=171, maxDocs=44421)
                0.012741045 = queryNorm
              0.40962332 = fieldWeight in 3380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.553973 = idf(docFreq=171, maxDocs=44421)
                0.0625 = fieldNorm(doc=3380)
        0.28 = coord(7/25)