Document (#36666)

Author
Pirkola, A.
Title
Constructing topic-specific search keyphrase suggestion tools for Web information retrieval
Source
Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker
Imprint
Boizenburg : VWH, Verl. W. Hülsbusch
Year
2010
Pages
S.172-183
Series
Schriften zur Informationswissenschaft; Bd.58
Abstract
We devised a method to extract keyphrases from the Web pages to construct a keyphrase list for a specific topic. The keyphrases are identified and out-oftopic phrases removed based on their frequencies in the text corpora of various densities of text discussing the topic. The list is intended as a search aid for Web information retrieval, so that the user can browse the list, identify different aspects of the topic, and select from it keyphrases (e.g. find synonymous phrases) for a query. A keyphrase list containing a large set of key-phrases related to climate change was constructed using the proposed method. We argue that there is a need for such keyphrase suggestion tools, because the major Web search engines do not provide users with such terminological search aids that help them identify different topic aspects and find synonyms.

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:pirkola in 4476) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 4476, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=4476)
    
  2. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:pirkola in 4088) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 4088, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=4088)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:pirkola in 5907) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 5907, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=5907)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:pirkola in 1074) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 1074, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=1074)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.97
    2.9686446 = sum of:
      2.9686446 = weight(author_txt:pirkola in 3908) [ClassicSimilarity], result of:
        2.9686446 = fieldWeight in 3908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.3125 = fieldNorm(doc=3908)
    

Similar documents (content)

  1. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.36
    0.3641912 = sum of:
      0.3641912 = product of:
        1.5174633 = sum of:
          0.0064551206 = weight(abstract_txt:that in 5290) [ClassicSimilarity], result of:
            0.0064551206 = score(doc=5290,freq=2.0), product of:
              0.030821742 = queryWeight, product of:
                1.0120112 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01285345 = queryNorm
              0.20943399 = fieldWeight in 5290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.021391788 = weight(abstract_txt:text in 5290) [ClassicSimilarity], result of:
            0.021391788 = score(doc=5290,freq=2.0), product of:
              0.059848774 = queryWeight, product of:
                1.1514331 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01285345 = queryNorm
              0.3574307 = fieldWeight in 5290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.022393469 = weight(abstract_txt:search in 5290) [ClassicSimilarity], result of:
            0.022393469 = score(doc=5290,freq=1.0), product of:
              0.09794707 = queryWeight, product of:
                2.083156 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.01285345 = queryNorm
              0.22862828 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.111181505 = weight(abstract_txt:phrases in 5290) [ClassicSimilarity], result of:
            0.111181505 = score(doc=5290,freq=1.0), product of:
              0.25899178 = queryWeight, product of:
                2.9335918 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01285345 = queryNorm
              0.42928585 = fieldWeight in 5290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.7526195 = weight(abstract_txt:keyphrases in 5290) [ClassicSimilarity], result of:
            0.7526195 = score(doc=5290,freq=7.0), product of:
              0.48448676 = queryWeight, product of:
                4.0123396 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01285345 = queryNorm
              1.5534366 = fieldWeight in 5290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
          0.6034219 = weight(abstract_txt:keyphrase in 5290) [ClassicSimilarity], result of:
            0.6034219 = score(doc=5290,freq=3.0), product of:
              0.6104042 = queryWeight, product of:
                5.200377 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01285345 = queryNorm
              0.9885613 = fieldWeight in 5290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=5290)
        0.24 = coord(6/25)
    
  2. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.36
    0.35932317 = sum of:
      0.35932317 = product of:
        1.49718 = sum of:
          0.0045644594 = weight(abstract_txt:that in 1012) [ClassicSimilarity], result of:
            0.0045644594 = score(doc=1012,freq=1.0), product of:
              0.030821742 = queryWeight, product of:
                1.0120112 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01285345 = queryNorm
              0.1480922 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.015126279 = weight(abstract_txt:text in 1012) [ClassicSimilarity], result of:
            0.015126279 = score(doc=1012,freq=1.0), product of:
              0.059848774 = queryWeight, product of:
                1.1514331 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01285345 = queryNorm
              0.25274166 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.01836642 = weight(abstract_txt:specific in 1012) [ClassicSimilarity], result of:
            0.01836642 = score(doc=1012,freq=1.0), product of:
              0.068116166 = queryWeight, product of:
                1.2283897 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.01285345 = queryNorm
              0.2696338 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.111181505 = weight(abstract_txt:phrases in 1012) [ClassicSimilarity], result of:
            0.111181505 = score(doc=1012,freq=1.0), product of:
              0.25899178 = queryWeight, product of:
                2.9335918 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01285345 = queryNorm
              0.42928585 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.5689269 = weight(abstract_txt:keyphrases in 1012) [ClassicSimilarity], result of:
            0.5689269 = score(doc=1012,freq=4.0), product of:
              0.48448676 = queryWeight, product of:
                4.0123396 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01285345 = queryNorm
              1.1742878 = fieldWeight in 1012, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
          0.7790144 = weight(abstract_txt:keyphrase in 1012) [ClassicSimilarity], result of:
            0.7790144 = score(doc=1012,freq=5.0), product of:
              0.6104042 = queryWeight, product of:
                5.200377 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01285345 = queryNorm
              1.2762271 = fieldWeight in 1012, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=1012)
        0.24 = coord(6/25)
    
  3. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.28
    0.284236 = sum of:
      0.284236 = product of:
        1.0151286 = sum of:
          0.008068901 = weight(abstract_txt:that in 1871) [ClassicSimilarity], result of:
            0.008068901 = score(doc=1871,freq=2.0), product of:
              0.030821742 = queryWeight, product of:
                1.0120112 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01285345 = queryNorm
              0.26179248 = fieldWeight in 1871, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.022958025 = weight(abstract_txt:specific in 1871) [ClassicSimilarity], result of:
            0.022958025 = score(doc=1871,freq=1.0), product of:
              0.068116166 = queryWeight, product of:
                1.2283897 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.01285345 = queryNorm
              0.33704224 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.02607139 = weight(abstract_txt:method in 1871) [ClassicSimilarity], result of:
            0.02607139 = score(doc=1871,freq=1.0), product of:
              0.07414297 = queryWeight, product of:
                1.281581 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.01285345 = queryNorm
              0.3516367 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.027991837 = weight(abstract_txt:search in 1871) [ClassicSimilarity], result of:
            0.027991837 = score(doc=1871,freq=1.0), product of:
              0.09794707 = queryWeight, product of:
                2.083156 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.01285345 = queryNorm
              0.28578535 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.13897689 = weight(abstract_txt:phrases in 1871) [ClassicSimilarity], result of:
            0.13897689 = score(doc=1871,freq=1.0), product of:
              0.25899178 = queryWeight, product of:
                2.9335918 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01285345 = queryNorm
              0.5366073 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.35557932 = weight(abstract_txt:keyphrases in 1871) [ClassicSimilarity], result of:
            0.35557932 = score(doc=1871,freq=1.0), product of:
              0.48448676 = queryWeight, product of:
                4.0123396 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01285345 = queryNorm
              0.7339299 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
          0.4354823 = weight(abstract_txt:keyphrase in 1871) [ClassicSimilarity], result of:
            0.4354823 = score(doc=1871,freq=1.0), product of:
              0.6104042 = queryWeight, product of:
                5.200377 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01285345 = queryNorm
              0.71343267 = fieldWeight in 1871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=1871)
        0.28 = coord(7/25)
    
  4. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.26
    0.26119012 = sum of:
      0.26119012 = product of:
        1.3059505 = sum of:
          0.012910241 = weight(abstract_txt:that in 601) [ClassicSimilarity], result of:
            0.012910241 = score(doc=601,freq=8.0), product of:
              0.030821742 = queryWeight, product of:
                1.0120112 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01285345 = queryNorm
              0.41886798 = fieldWeight in 601, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.02017355 = weight(abstract_txt:tools in 601) [ClassicSimilarity], result of:
            0.02017355 = score(doc=601,freq=1.0), product of:
              0.072514035 = queryWeight, product of:
                1.2674246 = boost
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.01285345 = queryNorm
              0.278202 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.027555227 = weight(abstract_txt:identify in 601) [ClassicSimilarity], result of:
            0.027555227 = score(doc=601,freq=1.0), product of:
              0.08926952 = queryWeight, product of:
                1.4062505 = boost
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.01285345 = queryNorm
              0.30867454 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.7526195 = weight(abstract_txt:keyphrases in 601) [ClassicSimilarity], result of:
            0.7526195 = score(doc=601,freq=7.0), product of:
              0.48448676 = queryWeight, product of:
                4.0123396 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01285345 = queryNorm
              1.5534366 = fieldWeight in 601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
          0.49269196 = weight(abstract_txt:keyphrase in 601) [ClassicSimilarity], result of:
            0.49269196 = score(doc=601,freq=2.0), product of:
              0.6104042 = queryWeight, product of:
                5.200377 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01285345 = queryNorm
              0.8071569 = fieldWeight in 601, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=601)
        0.2 = coord(5/25)
    
  5. Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.19
    0.19219157 = sum of:
      0.19219157 = product of:
        1.2011974 = sum of:
          0.0064551206 = weight(abstract_txt:that in 2106) [ClassicSimilarity], result of:
            0.0064551206 = score(doc=2106,freq=2.0), product of:
              0.030821742 = queryWeight, product of:
                1.0120112 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01285345 = queryNorm
              0.20943399 = fieldWeight in 2106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.022393469 = weight(abstract_txt:search in 2106) [ClassicSimilarity], result of:
            0.022393469 = score(doc=2106,freq=1.0), product of:
              0.09794707 = queryWeight, product of:
                2.083156 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.01285345 = queryNorm
              0.22862828 = fieldWeight in 2106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.5689269 = weight(abstract_txt:keyphrases in 2106) [ClassicSimilarity], result of:
            0.5689269 = score(doc=2106,freq=4.0), product of:
              0.48448676 = queryWeight, product of:
                4.0123396 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01285345 = queryNorm
              1.1742878 = fieldWeight in 2106, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
          0.6034219 = weight(abstract_txt:keyphrase in 2106) [ClassicSimilarity], result of:
            0.6034219 = score(doc=2106,freq=3.0), product of:
              0.6104042 = queryWeight, product of:
                5.200377 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01285345 = queryNorm
              0.9885613 = fieldWeight in 2106, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=2106)
        0.16 = coord(4/25)