Document (#39731)

Author
Portier, P.-E.
Chatti, N.
Calabretto, S.
Egyed-Zsigmond, E.
Pinon, J.-M.
Title
Modeling, encoding and querying multi-structured documents
Source
Information processing and management. 48(2012) no.5, S.931-955
Year
2012
Abstract
The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.
Content
Beitrag in einem Themenheft "Large-Scale and Distributed Systems for Information Retrieval" Vgl.: doi:10.1016/j.ipm.2011.11.004.
Object
XQuery
XML

Similar documents (content)

  1. Hocine, A.; Lo, M.; Smadhi, S.: Information retrieval on the Web : an approach using a base of concepts and XML (2000) 0.20
    0.20124568 = sum of:
      0.20124568 = product of:
        0.6288928 = sum of:
          0.020215426 = weight(abstract_txt:based in 1149) [ClassicSimilarity], result of:
            0.020215426 = score(doc=1149,freq=2.0), product of:
              0.05748188 = queryWeight, product of:
                1.0431104 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017312262 = queryNorm
              0.35168344 = fieldWeight in 1149, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.057775725 = weight(abstract_txt:emergence in 1149) [ClassicSimilarity], result of:
            0.057775725 = score(doc=1149,freq=1.0), product of:
              0.11576377 = queryWeight, product of:
                1.0467335 = boost
                6.388262 = idf(docFreq=202, maxDocs=44421)
                0.017312262 = queryNorm
              0.49908295 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.388262 = idf(docFreq=202, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.028001862 = weight(abstract_txt:model in 1149) [ClassicSimilarity], result of:
            0.028001862 = score(doc=1149,freq=1.0), product of:
              0.089993335 = queryWeight, product of:
                1.3051786 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.017312262 = queryNorm
              0.31115484 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.10381099 = weight(abstract_txt:logical in 1149) [ClassicSimilarity], result of:
            0.10381099 = score(doc=1149,freq=1.0), product of:
              0.2155665 = queryWeight, product of:
                2.0200188 = boost
                6.1641335 = idf(docFreq=253, maxDocs=44421)
                0.017312262 = queryNorm
              0.48157293 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1641335 = idf(docFreq=253, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.04854285 = weight(abstract_txt:content in 1149) [ClassicSimilarity], result of:
            0.04854285 = score(doc=1149,freq=1.0), product of:
              0.1486619 = queryWeight, product of:
                2.0545194 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.017312262 = queryNorm
              0.3265319 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.073372155 = weight(abstract_txt:structure in 1149) [ClassicSimilarity], result of:
            0.073372155 = score(doc=1149,freq=1.0), product of:
              0.21550131 = queryWeight, product of:
                2.856306 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017312262 = queryNorm
              0.34047198 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.15535992 = weight(abstract_txt:documents in 1149) [ClassicSimilarity], result of:
            0.15535992 = score(doc=1149,freq=4.0), product of:
              0.24114138 = queryWeight, product of:
                3.3780856 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017312262 = queryNorm
              0.64426905 = fieldWeight in 1149, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
          0.14181381 = weight(abstract_txt:structured in 1149) [ClassicSimilarity], result of:
            0.14181381 = score(doc=1149,freq=1.0), product of:
              0.33438087 = queryWeight, product of:
                3.5579553 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.017312262 = queryNorm
              0.42410865 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=1149)
        0.32 = coord(8/25)
    
  2. Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.17
    0.17024632 = sum of:
      0.17024632 = product of:
        0.6080226 = sum of:
          0.01617234 = weight(abstract_txt:based in 1459) [ClassicSimilarity], result of:
            0.01617234 = score(doc=1459,freq=2.0), product of:
              0.05748188 = queryWeight, product of:
                1.0431104 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017312262 = queryNorm
              0.28134674 = fieldWeight in 1459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.050091255 = weight(abstract_txt:model in 1459) [ClassicSimilarity], result of:
            0.050091255 = score(doc=1459,freq=5.0), product of:
              0.089993335 = queryWeight, product of:
                1.3051786 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.017312262 = queryNorm
              0.5566107 = fieldWeight in 1459, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.11744873 = weight(abstract_txt:logical in 1459) [ClassicSimilarity], result of:
            0.11744873 = score(doc=1459,freq=2.0), product of:
              0.2155665 = queryWeight, product of:
                2.0200188 = boost
                6.1641335 = idf(docFreq=253, maxDocs=44421)
                0.017312262 = queryNorm
              0.5448376 = fieldWeight in 1459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1641335 = idf(docFreq=253, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.03883428 = weight(abstract_txt:content in 1459) [ClassicSimilarity], result of:
            0.03883428 = score(doc=1459,freq=1.0), product of:
              0.1486619 = queryWeight, product of:
                2.0545194 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.017312262 = queryNorm
              0.26122552 = fieldWeight in 1459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.11739545 = weight(abstract_txt:structure in 1459) [ClassicSimilarity], result of:
            0.11739545 = score(doc=1459,freq=4.0), product of:
              0.21550131 = queryWeight, product of:
                2.856306 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017312262 = queryNorm
              0.54475516 = fieldWeight in 1459, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.10763652 = weight(abstract_txt:documents in 1459) [ClassicSimilarity], result of:
            0.10763652 = score(doc=1459,freq=3.0), product of:
              0.24114138 = queryWeight, product of:
                3.3780856 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017312262 = queryNorm
              0.4463627 = fieldWeight in 1459, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.16044402 = weight(abstract_txt:structured in 1459) [ClassicSimilarity], result of:
            0.16044402 = score(doc=1459,freq=2.0), product of:
              0.33438087 = queryWeight, product of:
                3.5579553 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.017312262 = queryNorm
              0.47982416 = fieldWeight in 1459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
        0.28 = coord(7/25)
    
  3. Corby, O.; Dieng, R.; Hébért, C.: ¬A conceptual graph model for W3C resource description framework (2000) 0.14
    0.13677938 = sum of:
      0.13677938 = product of:
        0.6838969 = sum of:
          0.12994854 = weight(abstract_txt:expressing in 6086) [ClassicSimilarity], result of:
            0.12994854 = score(doc=6086,freq=1.0), product of:
              0.15879542 = queryWeight, product of:
                1.2259387 = boost
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.017312262 = queryNorm
              0.81833935 = fieldWeight in 6086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.109375 = fieldNorm(doc=6086)
          0.16430824 = weight(abstract_txt:formalism in 6086) [ClassicSimilarity], result of:
            0.16430824 = score(doc=6086,freq=1.0), product of:
              0.1856792 = queryWeight, product of:
                1.3256577 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.017312262 = queryNorm
              0.88490385 = fieldWeight in 6086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.109375 = fieldNorm(doc=6086)
          0.067959994 = weight(abstract_txt:content in 6086) [ClassicSimilarity], result of:
            0.067959994 = score(doc=6086,freq=1.0), product of:
              0.1486619 = queryWeight, product of:
                2.0545194 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.017312262 = queryNorm
              0.45714468 = fieldWeight in 6086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.109375 = fieldNorm(doc=6086)
          0.21292816 = weight(abstract_txt:querying in 6086) [ClassicSimilarity], result of:
            0.21292816 = score(doc=6086,freq=1.0), product of:
              0.27807105 = queryWeight, product of:
                2.2942603 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.017312262 = queryNorm
              0.76573294 = fieldWeight in 6086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.109375 = fieldNorm(doc=6086)
          0.10875195 = weight(abstract_txt:documents in 6086) [ClassicSimilarity], result of:
            0.10875195 = score(doc=6086,freq=1.0), product of:
              0.24114138 = queryWeight, product of:
                3.3780856 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017312262 = queryNorm
              0.45098835 = fieldWeight in 6086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.109375 = fieldNorm(doc=6086)
        0.2 = coord(5/25)
    
  4. Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.13
    0.13228011 = sum of:
      0.13228011 = product of:
        0.55116713 = sum of:
          0.014294465 = weight(abstract_txt:based in 1995) [ClassicSimilarity], result of:
            0.014294465 = score(doc=1995,freq=1.0), product of:
              0.05748188 = queryWeight, product of:
                1.0431104 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017312262 = queryNorm
              0.24867775 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.056003723 = weight(abstract_txt:model in 1995) [ClassicSimilarity], result of:
            0.056003723 = score(doc=1995,freq=4.0), product of:
              0.089993335 = queryWeight, product of:
                1.3051786 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.017312262 = queryNorm
              0.6223097 = fieldWeight in 1995, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.0970857 = weight(abstract_txt:content in 1995) [ClassicSimilarity], result of:
            0.0970857 = score(doc=1995,freq=4.0), product of:
              0.1486619 = queryWeight, product of:
                2.0545194 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.017312262 = queryNorm
              0.6530638 = fieldWeight in 1995, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.073372155 = weight(abstract_txt:structure in 1995) [ClassicSimilarity], result of:
            0.073372155 = score(doc=1995,freq=1.0), product of:
              0.21550131 = queryWeight, product of:
                2.856306 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017312262 = queryNorm
              0.34047198 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.109856054 = weight(abstract_txt:documents in 1995) [ClassicSimilarity], result of:
            0.109856054 = score(doc=1995,freq=2.0), product of:
              0.24114138 = queryWeight, product of:
                3.3780856 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017312262 = queryNorm
              0.455567 = fieldWeight in 1995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.20055503 = weight(abstract_txt:structured in 1995) [ClassicSimilarity], result of:
            0.20055503 = score(doc=1995,freq=2.0), product of:
              0.33438087 = queryWeight, product of:
                3.5579553 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.017312262 = queryNorm
              0.5997802 = fieldWeight in 1995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
        0.24 = coord(6/25)
    
  5. Call, A.; Gottlob, G.; Pieris, A.: ¬The return of the entity-relationship model : ontological query answering (2012) 0.13
    0.12733011 = sum of:
      0.12733011 = product of:
        0.53054214 = sum of:
          0.06205743 = weight(abstract_txt:prominent in 1434) [ClassicSimilarity], result of:
            0.06205743 = score(doc=1434,freq=1.0), product of:
              0.14088938 = queryWeight, product of:
                1.1547525 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.017312262 = queryNorm
              0.4404692 = fieldWeight in 1434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
          0.03168049 = weight(abstract_txt:model in 1434) [ClassicSimilarity], result of:
            0.03168049 = score(doc=1434,freq=2.0), product of:
              0.089993335 = queryWeight, product of:
                1.3051786 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.017312262 = queryNorm
              0.35203153 = fieldWeight in 1434, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
          0.1327811 = weight(abstract_txt:formalism in 1434) [ClassicSimilarity], result of:
            0.1327811 = score(doc=1434,freq=2.0), product of:
              0.1856792 = queryWeight, product of:
                1.3256577 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.017312262 = queryNorm
              0.7151103 = fieldWeight in 1434, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
          0.12365215 = weight(abstract_txt:fragment in 1434) [ClassicSimilarity], result of:
            0.12365215 = score(doc=1434,freq=1.0), product of:
              0.22309175 = queryWeight, product of:
                1.4530867 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.017312262 = queryNorm
              0.5542659 = fieldWeight in 1434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
          0.121673234 = weight(abstract_txt:querying in 1434) [ClassicSimilarity], result of:
            0.121673234 = score(doc=1434,freq=1.0), product of:
              0.27807105 = queryWeight, product of:
                2.2942603 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.017312262 = queryNorm
              0.4375617 = fieldWeight in 1434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
          0.058697727 = weight(abstract_txt:structure in 1434) [ClassicSimilarity], result of:
            0.058697727 = score(doc=1434,freq=1.0), product of:
              0.21550131 = queryWeight, product of:
                2.856306 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017312262 = queryNorm
              0.27237758 = fieldWeight in 1434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=1434)
        0.24 = coord(6/25)