Document (#37564)

Author
Huo, W.
Title
Automatic multi-word term extraction and its application to Web-page summarization
Imprint
Guelph, Ontario : University of Guelph
Year
2012
Pages
vii, 104 S
Abstract
In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content
A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Theme
Computerlinguistik

Similar documents (content)

  1. Xiong, S.; Ji, D.: Query-focused multi-document summarization using hypergraph-based ranking (2016) 0.31
    0.31096402 = sum of:
      0.31096402 = product of:
        1.1105858 = sum of:
          0.036481947 = weight(abstract_txt:learn in 3972) [ClassicSimilarity], result of:
            0.036481947 = score(doc=3972,freq=1.0), product of:
              0.074347064 = queryWeight, product of:
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.011836947 = queryNorm
              0.49069786 = fieldWeight in 3972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.012395561 = weight(abstract_txt:results in 3972) [ClassicSimilarity], result of:
            0.012395561 = score(doc=3972,freq=1.0), product of:
              0.045610618 = queryWeight, product of:
                1.1076845 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.011836947 = queryNorm
              0.2717692 = fieldWeight in 3972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.0329748 = weight(abstract_txt:document in 3972) [ClassicSimilarity], result of:
            0.0329748 = score(doc=3972,freq=2.0), product of:
              0.06950242 = queryWeight, product of:
                1.3673606 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011836947 = queryNorm
              0.47444102 = fieldWeight in 3972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.15398556 = weight(abstract_txt:summaries in 3972) [ClassicSimilarity], result of:
            0.15398556 = score(doc=3972,freq=1.0), product of:
              0.28005216 = queryWeight, product of:
                3.3616166 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011836947 = queryNorm
              0.549846 = fieldWeight in 3972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.30147848 = weight(abstract_txt:summarization in 3972) [ClassicSimilarity], result of:
            0.30147848 = score(doc=3972,freq=2.0), product of:
              0.3828771 = queryWeight, product of:
                4.5386615 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011836947 = queryNorm
              0.78740275 = fieldWeight in 3972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.30538663 = weight(abstract_txt:multi in 3972) [ClassicSimilarity], result of:
            0.30538663 = score(doc=3972,freq=2.0), product of:
              0.46537304 = queryWeight, product of:
                6.619386 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.011836947 = queryNorm
              0.656219 = fieldWeight in 3972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
          0.2678828 = weight(abstract_txt:word in 3972) [ClassicSimilarity], result of:
            0.2678828 = score(doc=3972,freq=2.0), product of:
              0.44585645 = queryWeight, product of:
                6.9264483 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011836947 = queryNorm
              0.60082746 = fieldWeight in 3972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3972)
        0.28 = coord(7/25)
    
  2. Chang, Y.-W.: Influence of human behavior and the principle of least effort on library and information science research (2016) 0.31
    0.31096402 = sum of:
      0.31096402 = product of:
        1.1105858 = sum of:
          0.036481947 = weight(abstract_txt:learn in 3973) [ClassicSimilarity], result of:
            0.036481947 = score(doc=3973,freq=1.0), product of:
              0.074347064 = queryWeight, product of:
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.011836947 = queryNorm
              0.49069786 = fieldWeight in 3973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.012395561 = weight(abstract_txt:results in 3973) [ClassicSimilarity], result of:
            0.012395561 = score(doc=3973,freq=1.0), product of:
              0.045610618 = queryWeight, product of:
                1.1076845 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.011836947 = queryNorm
              0.2717692 = fieldWeight in 3973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.0329748 = weight(abstract_txt:document in 3973) [ClassicSimilarity], result of:
            0.0329748 = score(doc=3973,freq=2.0), product of:
              0.06950242 = queryWeight, product of:
                1.3673606 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011836947 = queryNorm
              0.47444102 = fieldWeight in 3973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.15398556 = weight(abstract_txt:summaries in 3973) [ClassicSimilarity], result of:
            0.15398556 = score(doc=3973,freq=1.0), product of:
              0.28005216 = queryWeight, product of:
                3.3616166 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011836947 = queryNorm
              0.549846 = fieldWeight in 3973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.30147848 = weight(abstract_txt:summarization in 3973) [ClassicSimilarity], result of:
            0.30147848 = score(doc=3973,freq=2.0), product of:
              0.3828771 = queryWeight, product of:
                4.5386615 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011836947 = queryNorm
              0.78740275 = fieldWeight in 3973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.30538663 = weight(abstract_txt:multi in 3973) [ClassicSimilarity], result of:
            0.30538663 = score(doc=3973,freq=2.0), product of:
              0.46537304 = queryWeight, product of:
                6.619386 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.011836947 = queryNorm
              0.656219 = fieldWeight in 3973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
          0.2678828 = weight(abstract_txt:word in 3973) [ClassicSimilarity], result of:
            0.2678828 = score(doc=3973,freq=2.0), product of:
              0.44585645 = queryWeight, product of:
                6.9264483 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011836947 = queryNorm
              0.60082746 = fieldWeight in 3973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3973)
        0.28 = coord(7/25)
    
  3. Vilares, J.; Alonso, M.A.; Doval, Y.; Vilares, M.: Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval (2016) 0.31
    0.31096402 = sum of:
      0.31096402 = product of:
        1.1105858 = sum of:
          0.036481947 = weight(abstract_txt:learn in 3974) [ClassicSimilarity], result of:
            0.036481947 = score(doc=3974,freq=1.0), product of:
              0.074347064 = queryWeight, product of:
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.011836947 = queryNorm
              0.49069786 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.012395561 = weight(abstract_txt:results in 3974) [ClassicSimilarity], result of:
            0.012395561 = score(doc=3974,freq=1.0), product of:
              0.045610618 = queryWeight, product of:
                1.1076845 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.011836947 = queryNorm
              0.2717692 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.0329748 = weight(abstract_txt:document in 3974) [ClassicSimilarity], result of:
            0.0329748 = score(doc=3974,freq=2.0), product of:
              0.06950242 = queryWeight, product of:
                1.3673606 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011836947 = queryNorm
              0.47444102 = fieldWeight in 3974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.15398556 = weight(abstract_txt:summaries in 3974) [ClassicSimilarity], result of:
            0.15398556 = score(doc=3974,freq=1.0), product of:
              0.28005216 = queryWeight, product of:
                3.3616166 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011836947 = queryNorm
              0.549846 = fieldWeight in 3974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.30147848 = weight(abstract_txt:summarization in 3974) [ClassicSimilarity], result of:
            0.30147848 = score(doc=3974,freq=2.0), product of:
              0.3828771 = queryWeight, product of:
                4.5386615 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011836947 = queryNorm
              0.78740275 = fieldWeight in 3974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.30538663 = weight(abstract_txt:multi in 3974) [ClassicSimilarity], result of:
            0.30538663 = score(doc=3974,freq=2.0), product of:
              0.46537304 = queryWeight, product of:
                6.619386 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.011836947 = queryNorm
              0.656219 = fieldWeight in 3974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
          0.2678828 = weight(abstract_txt:word in 3974) [ClassicSimilarity], result of:
            0.2678828 = score(doc=3974,freq=2.0), product of:
              0.44585645 = queryWeight, product of:
                6.9264483 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011836947 = queryNorm
              0.60082746 = fieldWeight in 3974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3974)
        0.28 = coord(7/25)
    
  4. Pandey, S.; Khanna, P.; Yokota, H.: ¬A semantics and image retrieval system for hierarchical image databases (2016) 0.31
    0.31096402 = sum of:
      0.31096402 = product of:
        1.1105858 = sum of:
          0.036481947 = weight(abstract_txt:learn in 4184) [ClassicSimilarity], result of:
            0.036481947 = score(doc=4184,freq=1.0), product of:
              0.074347064 = queryWeight, product of:
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.011836947 = queryNorm
              0.49069786 = fieldWeight in 4184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.012395561 = weight(abstract_txt:results in 4184) [ClassicSimilarity], result of:
            0.012395561 = score(doc=4184,freq=1.0), product of:
              0.045610618 = queryWeight, product of:
                1.1076845 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.011836947 = queryNorm
              0.2717692 = fieldWeight in 4184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.0329748 = weight(abstract_txt:document in 4184) [ClassicSimilarity], result of:
            0.0329748 = score(doc=4184,freq=2.0), product of:
              0.06950242 = queryWeight, product of:
                1.3673606 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011836947 = queryNorm
              0.47444102 = fieldWeight in 4184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.15398556 = weight(abstract_txt:summaries in 4184) [ClassicSimilarity], result of:
            0.15398556 = score(doc=4184,freq=1.0), product of:
              0.28005216 = queryWeight, product of:
                3.3616166 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011836947 = queryNorm
              0.549846 = fieldWeight in 4184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.30147848 = weight(abstract_txt:summarization in 4184) [ClassicSimilarity], result of:
            0.30147848 = score(doc=4184,freq=2.0), product of:
              0.3828771 = queryWeight, product of:
                4.5386615 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011836947 = queryNorm
              0.78740275 = fieldWeight in 4184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.30538663 = weight(abstract_txt:multi in 4184) [ClassicSimilarity], result of:
            0.30538663 = score(doc=4184,freq=2.0), product of:
              0.46537304 = queryWeight, product of:
                6.619386 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.011836947 = queryNorm
              0.656219 = fieldWeight in 4184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
          0.2678828 = weight(abstract_txt:word in 4184) [ClassicSimilarity], result of:
            0.2678828 = score(doc=4184,freq=2.0), product of:
              0.44585645 = queryWeight, product of:
                6.9264483 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011836947 = queryNorm
              0.60082746 = fieldWeight in 4184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=4184)
        0.28 = coord(7/25)
    
  5. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.31
    0.30638212 = sum of:
      0.30638212 = product of:
        0.9574442 = sum of:
          0.009494991 = weight(abstract_txt:these in 3105) [ClassicSimilarity], result of:
            0.009494991 = score(doc=3105,freq=1.0), product of:
              0.038184304 = queryWeight, product of:
                1.0135041 = boost
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.011836947 = queryNorm
              0.24866214 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1828754 = idf(docFreq=5006, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.048838824 = weight(abstract_txt:applies in 3105) [ClassicSimilarity], result of:
            0.048838824 = score(doc=3105,freq=1.0), product of:
              0.09030712 = queryWeight, product of:
                1.1021205 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.011836947 = queryNorm
              0.54080814 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.012395561 = weight(abstract_txt:results in 3105) [ClassicSimilarity], result of:
            0.012395561 = score(doc=3105,freq=1.0), product of:
              0.045610618 = queryWeight, product of:
                1.1076845 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.011836947 = queryNorm
              0.2717692 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.009280478 = weight(abstract_txt:from in 3105) [ClassicSimilarity], result of:
            0.009280478 = score(doc=3105,freq=1.0), product of:
              0.04304927 = queryWeight, product of:
                1.3179884 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.011836947 = queryNorm
              0.21557805 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.0329748 = weight(abstract_txt:document in 3105) [ClassicSimilarity], result of:
            0.0329748 = score(doc=3105,freq=2.0), product of:
              0.06950242 = queryWeight, product of:
                1.3673606 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011836947 = queryNorm
              0.47444102 = fieldWeight in 3105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.06450398 = weight(abstract_txt:generate in 3105) [ClassicSimilarity], result of:
            0.06450398 = score(doc=3105,freq=1.0), product of:
              0.13696602 = queryWeight, product of:
                1.9195062 = boost
                6.0281444 = idf(docFreq=290, maxDocs=44421)
                0.011836947 = queryNorm
              0.4709488 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0281444 = idf(docFreq=290, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.5640146 = weight(abstract_txt:summarization in 3105) [ClassicSimilarity], result of:
            0.5640146 = score(doc=3105,freq=7.0), product of:
              0.3828771 = queryWeight, product of:
                4.5386615 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011836947 = queryNorm
              1.4730957 = fieldWeight in 3105, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.21594097 = weight(abstract_txt:multi in 3105) [ClassicSimilarity], result of:
            0.21594097 = score(doc=3105,freq=1.0), product of:
              0.46537304 = queryWeight, product of:
                6.619386 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.011836947 = queryNorm
              0.4640169 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
        0.32 = coord(8/25)