Document (#39732)

Author
Szlávik, Z.
Tombros, A.
Lalmas, M.
Title
Summarisation of the logical structure of XML documents
Source
Information processing and management. 48(2012) no.5, S.956-968
Year
2012
Abstract
Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user's attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user's attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures.
Content
Beitrag in einem Themenheft "Large-Scale and Distributed Systems for Information Retrieval" Vgl.: doi:10.1016/j.ipm.2011.11.002.
Object
XML

Similar documents (author)

  1. Tombros, T.; Crestani, F.: Users' perception of relevance of spoken documents (2000) 1.86
    1.8585004 = sum of:
      1.8585004 = product of:
        3.7170007 = sum of:
          3.7170007 = weight(author_txt:tombros in 5996) [ClassicSimilarity], result of:
            3.7170007 = score(doc=5996,freq=1.0), product of:
              0.78217715 = queryWeight, product of:
                1.120441 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.073451154 = queryNorm
              4.7521214 = fieldWeight in 5996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.5 = fieldNorm(doc=5996)
        0.5 = coord(1/2)
    
  2. Tao, Y.; Tombros, A.: How collaborators make sense of tasks together : a comparative analysis of collaborative sensemaking behavior in collaborative information-seeking tasks (2017) 1.86
    1.8585004 = sum of:
      1.8585004 = product of:
        3.7170007 = sum of:
          3.7170007 = weight(author_txt:tombros in 4429) [ClassicSimilarity], result of:
            3.7170007 = score(doc=4429,freq=1.0), product of:
              0.78217715 = queryWeight, product of:
                1.120441 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.073451154 = queryNorm
              4.7521214 = fieldWeight in 4429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.5 = fieldNorm(doc=4429)
        0.5 = coord(1/2)
    
  3. Lalmas, M.: Logical models in information retrieval : introduction and overview (1998) 1.65
    1.6516032 = sum of:
      1.6516032 = product of:
        3.3032064 = sum of:
          3.3032064 = weight(author_txt:lalmas in 3668) [ClassicSimilarity], result of:
            3.3032064 = score(doc=3668,freq=1.0), product of:
              0.6230561 = queryWeight, product of:
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.073451154 = queryNorm
              5.3016195 = fieldWeight in 3668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.625 = fieldNorm(doc=3668)
        0.5 = coord(1/2)
    
  4. Lalmas, M.: XML information retrieval (2009) 1.65
    1.6516032 = sum of:
      1.6516032 = product of:
        3.3032064 = sum of:
          3.3032064 = weight(author_txt:lalmas in 867) [ClassicSimilarity], result of:
            3.3032064 = score(doc=867,freq=1.0), product of:
              0.6230561 = queryWeight, product of:
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.073451154 = queryNorm
              5.3016195 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.625 = fieldNorm(doc=867)
        0.5 = coord(1/2)
    
  5. Lalmas, M.: XML retrieval (2009) 1.65
    1.6516032 = sum of:
      1.6516032 = product of:
        3.3032064 = sum of:
          3.3032064 = weight(author_txt:lalmas in 998) [ClassicSimilarity], result of:
            3.3032064 = score(doc=998,freq=1.0), product of:
              0.6230561 = queryWeight, product of:
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.073451154 = queryNorm
              5.3016195 = fieldWeight in 998, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.625 = fieldNorm(doc=998)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Salton, G.: Automatic text structuring and summarization (1997) 0.33
    0.3284308 = sum of:
      0.3284308 = product of:
        1.642154 = sum of:
          0.019261992 = weight(abstract_txt:methods in 1145) [ClassicSimilarity], result of:
            0.019261992 = score(doc=1145,freq=1.0), product of:
              0.04958673 = queryWeight, product of:
                1.0466913 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.011433583 = queryNorm
              0.38845056 = fieldWeight in 1145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.09375 = fieldNorm(doc=1145)
          0.010744513 = weight(abstract_txt:that in 1145) [ClassicSimilarity], result of:
            0.010744513 = score(doc=1145,freq=1.0), product of:
              0.048461426 = queryWeight, product of:
                1.7922336 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.011433583 = queryNorm
              0.22171268 = fieldWeight in 1145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=1145)
          0.2002475 = weight(abstract_txt:summaries in 1145) [ClassicSimilarity], result of:
            0.2002475 = score(doc=1145,freq=2.0), product of:
              0.21460003 = queryWeight, product of:
                2.6668365 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011433583 = queryNorm
              0.9331196 = fieldWeight in 1145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.09375 = fieldNorm(doc=1145)
          0.12678175 = weight(abstract_txt:structure in 1145) [ClassicSimilarity], result of:
            0.12678175 = score(doc=1145,freq=2.0), product of:
              0.21942148 = queryWeight, product of:
                4.4035754 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.011433583 = queryNorm
              0.5778001 = fieldWeight in 1145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.09375 = fieldNorm(doc=1145)
          1.2851182 = weight(abstract_txt:summarisation in 1145) [ClassicSimilarity], result of:
            1.2851182 = score(doc=1145,freq=3.0), product of:
              0.85870165 = queryWeight, product of:
                8.148751 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.011433583 = queryNorm
              1.496583 = fieldWeight in 1145, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.09375 = fieldNorm(doc=1145)
        0.2 = coord(5/25)
    
  2. White, R.W.; Jose, J.M.; Ruthven, I.: ¬A task-oriented study on the influencing effects of query-biased summarisation in web searching (2003) 0.25
    0.25052965 = sum of:
      0.25052965 = product of:
        1.0438735 = sum of:
          0.010245648 = weight(abstract_txt:used in 2081) [ClassicSimilarity], result of:
            0.010245648 = score(doc=2081,freq=1.0), product of:
              0.04882949 = queryWeight, product of:
                1.2721039 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.011433583 = queryNorm
              0.20982501 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
          0.035655014 = weight(abstract_txt:user's in 2081) [ClassicSimilarity], result of:
            0.035655014 = score(doc=2081,freq=1.0), product of:
              0.09795784 = queryWeight, product of:
                1.471145 = boost
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.011433583 = queryNorm
              0.36398324 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
          0.03442308 = weight(abstract_txt:evaluation in 2081) [ClassicSimilarity], result of:
            0.03442308 = score(doc=2081,freq=2.0), product of:
              0.08693855 = queryWeight, product of:
                1.6974138 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.011433583 = queryNorm
              0.39594725 = fieldWeight in 2081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
          0.012406693 = weight(abstract_txt:that in 2081) [ClassicSimilarity], result of:
            0.012406693 = score(doc=2081,freq=3.0), product of:
              0.048461426 = queryWeight, product of:
                1.7922336 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.011433583 = queryNorm
              0.25601172 = fieldWeight in 2081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
          0.094397575 = weight(abstract_txt:summaries in 2081) [ClassicSimilarity], result of:
            0.094397575 = score(doc=2081,freq=1.0), product of:
              0.21460003 = queryWeight, product of:
                2.6668365 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011433583 = queryNorm
              0.4398768 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
          0.85674554 = weight(abstract_txt:summarisation in 2081) [ClassicSimilarity], result of:
            0.85674554 = score(doc=2081,freq=3.0), product of:
              0.85870165 = queryWeight, product of:
                8.148751 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.011433583 = queryNorm
              0.997722 = fieldWeight in 2081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=2081)
        0.24 = coord(6/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.21
    0.20673513 = sum of:
      0.20673513 = product of:
        1.0336757 = sum of:
          0.010595594 = weight(abstract_txt:also in 3054) [ClassicSimilarity], result of:
            0.010595594 = score(doc=3054,freq=1.0), product of:
              0.049935117 = queryWeight, product of:
                1.2864252 = boost
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.011433583 = queryNorm
              0.21218722 = fieldWeight in 3054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.052985363 = weight(abstract_txt:produce in 3054) [ClassicSimilarity], result of:
            0.052985363 = score(doc=3054,freq=2.0), product of:
              0.101247914 = queryWeight, product of:
                1.4956464 = boost
                5.920724 = idf(docFreq=323, maxDocs=44421)
                0.011433583 = queryNorm
              0.523323 = fieldWeight in 3054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.920724 = idf(docFreq=323, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.018951537 = weight(abstract_txt:that in 3054) [ClassicSimilarity], result of:
            0.018951537 = score(doc=3054,freq=7.0), product of:
              0.048461426 = queryWeight, product of:
                1.7922336 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.011433583 = queryNorm
              0.39106438 = fieldWeight in 3054, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.094397575 = weight(abstract_txt:summaries in 3054) [ClassicSimilarity], result of:
            0.094397575 = score(doc=3054,freq=1.0), product of:
              0.21460003 = queryWeight, product of:
                2.6668365 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011433583 = queryNorm
              0.4398768 = fieldWeight in 3054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.85674554 = weight(abstract_txt:summarisation in 3054) [ClassicSimilarity], result of:
            0.85674554 = score(doc=3054,freq=3.0), product of:
              0.85870165 = queryWeight, product of:
                8.148751 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.011433583 = queryNorm
              0.997722 = fieldWeight in 3054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
        0.2 = coord(5/25)
    
  4. Endres-Niggemeyer, B.: Summarising text for intelligent communication : results of the Dagstuhl seminar (1994) 0.17
    0.17163329 = sum of:
      0.17163329 = product of:
        0.71513873 = sum of:
          0.016051661 = weight(abstract_txt:methods in 481) [ClassicSimilarity], result of:
            0.016051661 = score(doc=481,freq=1.0), product of:
              0.04958673 = queryWeight, product of:
                1.0466913 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.011433583 = queryNorm
              0.3237088 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
          0.019234417 = weight(abstract_txt:particular in 481) [ClassicSimilarity], result of:
            0.019234417 = score(doc=481,freq=1.0), product of:
              0.055942018 = queryWeight, product of:
                1.1117444 = boost
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.011433583 = queryNorm
              0.34382772 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
          0.019866776 = weight(abstract_txt:order in 481) [ClassicSimilarity], result of:
            0.019866776 = score(doc=481,freq=1.0), product of:
              0.057161514 = queryWeight, product of:
                1.1237967 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.011433583 = queryNorm
              0.3475551 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
          0.032729305 = weight(abstract_txt:attention in 481) [ClassicSimilarity], result of:
            0.032729305 = score(doc=481,freq=1.0), product of:
              0.07973396 = queryWeight, product of:
                1.3272648 = boost
                5.254162 = idf(docFreq=630, maxDocs=44421)
                0.011433583 = queryNorm
              0.4104814 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.254162 = idf(docFreq=630, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
          0.00895376 = weight(abstract_txt:that in 481) [ClassicSimilarity], result of:
            0.00895376 = score(doc=481,freq=1.0), product of:
              0.048461426 = queryWeight, product of:
                1.7922336 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.011433583 = queryNorm
              0.18476056 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
          0.6183028 = weight(abstract_txt:summarisation in 481) [ClassicSimilarity], result of:
            0.6183028 = score(doc=481,freq=1.0), product of:
              0.85870165 = queryWeight, product of:
                8.148751 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.011433583 = queryNorm
              0.72004384 = fieldWeight in 481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=481)
        0.24 = coord(6/25)
    
  5. Sparck Jones, K.: Automatic summarising : the state of the art (2007) 0.17
    0.16596079 = sum of:
      0.16596079 = product of:
        0.82980394 = sum of:
          0.01280706 = weight(abstract_txt:used in 1932) [ClassicSimilarity], result of:
            0.01280706 = score(doc=1932,freq=1.0), product of:
              0.04882949 = queryWeight, product of:
                1.2721039 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.011433583 = queryNorm
              0.26228127 = fieldWeight in 1932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.078125 = fieldNorm(doc=1932)
          0.06803459 = weight(abstract_txt:evaluation in 1932) [ClassicSimilarity], result of:
            0.06803459 = score(doc=1932,freq=5.0), product of:
              0.08693855 = queryWeight, product of:
                1.6974138 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.011433583 = queryNorm
              0.7825595 = fieldWeight in 1932, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.078125 = fieldNorm(doc=1932)
          0.01266253 = weight(abstract_txt:that in 1932) [ClassicSimilarity], result of:
            0.01266253 = score(doc=1932,freq=2.0), product of:
              0.048461426 = queryWeight, product of:
                1.7922336 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.011433583 = queryNorm
              0.2612909 = fieldWeight in 1932, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=1932)
          0.11799697 = weight(abstract_txt:summaries in 1932) [ClassicSimilarity], result of:
            0.11799697 = score(doc=1932,freq=1.0), product of:
              0.21460003 = queryWeight, product of:
                2.6668365 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011433583 = queryNorm
              0.549846 = fieldWeight in 1932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=1932)
          0.6183028 = weight(abstract_txt:summarisation in 1932) [ClassicSimilarity], result of:
            0.6183028 = score(doc=1932,freq=1.0), product of:
              0.85870165 = queryWeight, product of:
                8.148751 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.011433583 = queryNorm
              0.72004384 = fieldWeight in 1932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=1932)
        0.2 = coord(5/25)