Document (#36666)

Author
Pirkola, A.
Title
Constructing topic-specific search keyphrase suggestion tools for Web information retrieval
Source
Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker
Imprint
Boizenburg : VWH, Verl. W. Hülsbusch
Year
2010
Pages
S.172-183
Series
Schriften zur Informationswissenschaft; Bd.58
Abstract
We devised a method to extract keyphrases from the Web pages to construct a keyphrase list for a specific topic. The keyphrases are identified and out-oftopic phrases removed based on their frequencies in the text corpora of various densities of text discussing the topic. The list is intended as a search aid for Web information retrieval, so that the user can browse the list, identify different aspects of the topic, and select from it keyphrases (e.g. find synonymous phrases) for a query. A keyphrase list containing a large set of key-phrases related to climate change was constructed using the proposed method. We argue that there is a need for such keyphrase suggestion tools, because the major Web search engines do not provide users with such terminological search aids that help them identify different topic aspects and find synonyms.

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:pirkola in 5476) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 5476, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=5476)
    
  2. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:pirkola in 4156) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 4156, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=4156)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:pirkola in 6907) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 6907, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=6907)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.56
    3.5640912 = sum of:
      3.5640912 = weight(author_txt:pirkola in 2074) [ClassicSimilarity], result of:
        3.5640912 = fieldWeight in 2074, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=2074)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.97
    2.9700758 = sum of:
      2.9700758 = weight(author_txt:pirkola in 4908) [ClassicSimilarity], result of:
        2.9700758 = fieldWeight in 4908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.3125 = fieldNorm(doc=4908)
    

Similar documents (content)

  1. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.36
    0.36450186 = sum of:
      0.36450186 = product of:
        1.5187578 = sum of:
          0.006414749 = weight(abstract_txt:that in 290) [ClassicSimilarity], result of:
            0.006414749 = score(doc=290,freq=2.0), product of:
              0.030687777 = queryWeight, product of:
                1.0107516 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01283813 = queryNorm
              0.20903271 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.021333361 = weight(abstract_txt:text in 290) [ClassicSimilarity], result of:
            0.021333361 = score(doc=290,freq=2.0), product of:
              0.05972939 = queryWeight, product of:
                1.1513573 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01283813 = queryNorm
              0.3571669 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.022318477 = weight(abstract_txt:search in 290) [ClassicSimilarity], result of:
            0.022318477 = score(doc=290,freq=1.0), product of:
              0.09771133 = queryWeight, product of:
                2.082589 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01283813 = queryNorm
              0.22841237 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.11134614 = weight(abstract_txt:phrases in 290) [ClassicSimilarity], result of:
            0.11134614 = score(doc=290,freq=1.0), product of:
              0.25920245 = queryWeight, product of:
                2.937523 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.01283813 = queryNorm
              0.4295721 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.75332904 = weight(abstract_txt:keyphrases in 290) [ClassicSimilarity], result of:
            0.75332904 = score(doc=290,freq=7.0), product of:
              0.48470718 = queryWeight, product of:
                4.0169964 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01283813 = queryNorm
              1.5541941 = fieldWeight in 290, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
          0.6040161 = weight(abstract_txt:keyphrase in 290) [ClassicSimilarity], result of:
            0.6040161 = score(doc=290,freq=3.0), product of:
              0.61069894 = queryWeight, product of:
                5.2064857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.01283813 = queryNorm
              0.9890571 = fieldWeight in 290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=290)
        0.24 = coord(6/25)
    
  2. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.36
    0.3596308 = sum of:
      0.3596308 = product of:
        1.4984617 = sum of:
          0.0045359125 = weight(abstract_txt:that in 2014) [ClassicSimilarity], result of:
            0.0045359125 = score(doc=2014,freq=1.0), product of:
              0.030687777 = queryWeight, product of:
                1.0107516 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01283813 = queryNorm
              0.14780845 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.015084964 = weight(abstract_txt:text in 2014) [ClassicSimilarity], result of:
            0.015084964 = score(doc=2014,freq=1.0), product of:
              0.05972939 = queryWeight, product of:
                1.1513573 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01283813 = queryNorm
              0.25255513 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.01824992 = weight(abstract_txt:specific in 2014) [ClassicSimilarity], result of:
            0.01824992 = score(doc=2014,freq=1.0), product of:
              0.06781606 = queryWeight, product of:
                1.2268243 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.01283813 = queryNorm
              0.26910913 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.11134614 = weight(abstract_txt:phrases in 2014) [ClassicSimilarity], result of:
            0.11134614 = score(doc=2014,freq=1.0), product of:
              0.25920245 = queryWeight, product of:
                2.937523 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.01283813 = queryNorm
              0.4295721 = fieldWeight in 2014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.56946325 = weight(abstract_txt:keyphrases in 2014) [ClassicSimilarity], result of:
            0.56946325 = score(doc=2014,freq=4.0), product of:
              0.48470718 = queryWeight, product of:
                4.0169964 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01283813 = queryNorm
              1.1748604 = fieldWeight in 2014, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
          0.77978146 = weight(abstract_txt:keyphrase in 2014) [ClassicSimilarity], result of:
            0.77978146 = score(doc=2014,freq=5.0), product of:
              0.61069894 = queryWeight, product of:
                5.2064857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.01283813 = queryNorm
              1.2768673 = fieldWeight in 2014, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=2014)
        0.24 = coord(6/25)
    
  3. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.28
    0.28441218 = sum of:
      0.28441218 = product of:
        1.0157578 = sum of:
          0.008018437 = weight(abstract_txt:that in 2871) [ClassicSimilarity], result of:
            0.008018437 = score(doc=2871,freq=2.0), product of:
              0.030687777 = queryWeight, product of:
                1.0107516 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01283813 = queryNorm
              0.2612909 = fieldWeight in 2871, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.0228124 = weight(abstract_txt:specific in 2871) [ClassicSimilarity], result of:
            0.0228124 = score(doc=2871,freq=1.0), product of:
              0.06781606 = queryWeight, product of:
                1.2268243 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.01283813 = queryNorm
              0.3363864 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.026020622 = weight(abstract_txt:method in 2871) [ClassicSimilarity], result of:
            0.026020622 = score(doc=2871,freq=1.0), product of:
              0.07403385 = queryWeight, product of:
                1.2818325 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.01283813 = queryNorm
              0.35146925 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.027898096 = weight(abstract_txt:search in 2871) [ClassicSimilarity], result of:
            0.027898096 = score(doc=2871,freq=1.0), product of:
              0.09771133 = queryWeight, product of:
                2.082589 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01283813 = queryNorm
              0.28551546 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.13918267 = weight(abstract_txt:phrases in 2871) [ClassicSimilarity], result of:
            0.13918267 = score(doc=2871,freq=1.0), product of:
              0.25920245 = queryWeight, product of:
                2.937523 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.01283813 = queryNorm
              0.53696513 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.35591453 = weight(abstract_txt:keyphrases in 2871) [ClassicSimilarity], result of:
            0.35591453 = score(doc=2871,freq=1.0), product of:
              0.48470718 = queryWeight, product of:
                4.0169964 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01283813 = queryNorm
              0.73428774 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
          0.43591112 = weight(abstract_txt:keyphrase in 2871) [ClassicSimilarity], result of:
            0.43591112 = score(doc=2871,freq=1.0), product of:
              0.61069894 = queryWeight, product of:
                5.2064857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.01283813 = queryNorm
              0.71379054 = fieldWeight in 2871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.078125 = fieldNorm(doc=2871)
        0.28 = coord(7/25)
    
  4. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.26
    0.2613587 = sum of:
      0.2613587 = product of:
        1.3067935 = sum of:
          0.012829498 = weight(abstract_txt:that in 1601) [ClassicSimilarity], result of:
            0.012829498 = score(doc=1601,freq=8.0), product of:
              0.030687777 = queryWeight, product of:
                1.0107516 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01283813 = queryNorm
              0.41806543 = fieldWeight in 1601, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.020128742 = weight(abstract_txt:tools in 1601) [ClassicSimilarity], result of:
            0.020128742 = score(doc=1601,freq=1.0), product of:
              0.072394066 = queryWeight, product of:
                1.2675573 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.01283813 = queryNorm
              0.27804407 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.027329134 = weight(abstract_txt:identify in 1601) [ClassicSimilarity], result of:
            0.027329134 = score(doc=1601,freq=1.0), product of:
              0.088765144 = queryWeight, product of:
                1.4035805 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.01283813 = queryNorm
              0.30788136 = fieldWeight in 1601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.75332904 = weight(abstract_txt:keyphrases in 1601) [ClassicSimilarity], result of:
            0.75332904 = score(doc=1601,freq=7.0), product of:
              0.48470718 = queryWeight, product of:
                4.0169964 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01283813 = queryNorm
              1.5541941 = fieldWeight in 1601, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
          0.49317712 = weight(abstract_txt:keyphrase in 1601) [ClassicSimilarity], result of:
            0.49317712 = score(doc=1601,freq=2.0), product of:
              0.61069894 = queryWeight, product of:
                5.2064857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.01283813 = queryNorm
              0.80756176 = fieldWeight in 1601, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=1601)
        0.2 = coord(5/25)
    
  5. Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.19
    0.19235401 = sum of:
      0.19235401 = product of:
        1.2022126 = sum of:
          0.006414749 = weight(abstract_txt:that in 3106) [ClassicSimilarity], result of:
            0.006414749 = score(doc=3106,freq=2.0), product of:
              0.030687777 = queryWeight, product of:
                1.0107516 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01283813 = queryNorm
              0.20903271 = fieldWeight in 3106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.022318477 = weight(abstract_txt:search in 3106) [ClassicSimilarity], result of:
            0.022318477 = score(doc=3106,freq=1.0), product of:
              0.09771133 = queryWeight, product of:
                2.082589 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01283813 = queryNorm
              0.22841237 = fieldWeight in 3106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.56946325 = weight(abstract_txt:keyphrases in 3106) [ClassicSimilarity], result of:
            0.56946325 = score(doc=3106,freq=4.0), product of:
              0.48470718 = queryWeight, product of:
                4.0169964 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01283813 = queryNorm
              1.1748604 = fieldWeight in 3106, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
          0.6040161 = weight(abstract_txt:keyphrase in 3106) [ClassicSimilarity], result of:
            0.6040161 = score(doc=3106,freq=3.0), product of:
              0.61069894 = queryWeight, product of:
                5.2064857 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.01283813 = queryNorm
              0.9890571 = fieldWeight in 3106, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=3106)
        0.16 = coord(4/25)