Document (#27539)

Author
Trotman, A.
Title
Searching structured documents
Source
Information processing and management. 40(2004) no.4, S.619-632
Year
2004
Abstract
Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
Theme
Auszeichnungssprachen

Similar documents (content)

  1. Crestani, F.; Vegas, J.; Fuente, P. de la: ¬A graphical user interface for the retrieval of hierarchically structured documents (2004) 0.27
    0.26664254 = sum of:
      0.26664254 = product of:
        0.9522948 = sum of:
          0.02061085 = weight(abstract_txt:search in 3555) [ClassicSimilarity], result of:
            0.02061085 = score(doc=3555,freq=1.0), product of:
              0.07218821 = queryWeight, product of:
                1.2159482 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.016244696 = queryNorm
              0.28551546 = fieldWeight in 3555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.051271904 = weight(abstract_txt:documents in 3555) [ClassicSimilarity], result of:
            0.051271904 = score(doc=3555,freq=3.0), product of:
              0.091892816 = queryWeight, product of:
                1.3719008 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016244696 = queryNorm
              0.55795336 = fieldWeight in 3555, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.034950316 = weight(abstract_txt:structure in 3555) [ClassicSimilarity], result of:
            0.034950316 = score(doc=3555,freq=1.0), product of:
              0.10265255 = queryWeight, product of:
                1.449996 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.016244696 = queryNorm
              0.34047198 = fieldWeight in 3555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.053226154 = weight(abstract_txt:retrieval in 3555) [ClassicSimilarity], result of:
            0.053226154 = score(doc=3555,freq=4.0), product of:
              0.09798573 = queryWeight, product of:
                1.735038 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.016244696 = queryNorm
              0.5432031 = fieldWeight in 3555, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.16380018 = weight(abstract_txt:document in 3555) [ClassicSimilarity], result of:
            0.16380018 = score(doc=3555,freq=6.0), product of:
              0.19932945 = queryWeight, product of:
                2.8574765 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.016244696 = queryNorm
              0.821756 = fieldWeight in 3555, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.0997571 = weight(abstract_txt:searching in 3555) [ClassicSimilarity], result of:
            0.0997571 = score(doc=3555,freq=1.0), product of:
              0.297901 = queryWeight, product of:
                4.278372 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.016244696 = queryNorm
              0.3348666 = fieldWeight in 3555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
          0.5286783 = weight(abstract_txt:structured in 3555) [ClassicSimilarity], result of:
            0.5286783 = score(doc=3555,freq=5.0), product of:
              0.55748004 = queryWeight, product of:
                6.321653 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.016244696 = queryNorm
              0.9483358 = fieldWeight in 3555, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=3555)
        0.28 = coord(7/25)
    
  2. Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.23
    0.22885497 = sum of:
      0.22885497 = product of:
        0.7151718 = sum of:
          0.049141582 = weight(abstract_txt:adapted in 1459) [ClassicSimilarity], result of:
            0.049141582 = score(doc=1459,freq=1.0), product of:
              0.118658386 = queryWeight, product of:
                1.1023415 = boost
                6.6262937 = idf(docFreq=159, maxDocs=44421)
                0.016244696 = queryNorm
              0.41414335 = fieldWeight in 1459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6262937 = idf(docFreq=159, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.07675678 = weight(abstract_txt:tree in 1459) [ClassicSimilarity], result of:
            0.07675678 = score(doc=1459,freq=2.0), product of:
              0.1267847 = queryWeight, product of:
                1.1394633 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.016244696 = queryNorm
              0.60541046 = fieldWeight in 1459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.041017525 = weight(abstract_txt:documents in 1459) [ClassicSimilarity], result of:
            0.041017525 = score(doc=1459,freq=3.0), product of:
              0.091892816 = queryWeight, product of:
                1.3719008 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016244696 = queryNorm
              0.4463627 = fieldWeight in 1459, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.055920508 = weight(abstract_txt:structure in 1459) [ClassicSimilarity], result of:
            0.055920508 = score(doc=1459,freq=4.0), product of:
              0.10265255 = queryWeight, product of:
                1.449996 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.016244696 = queryNorm
              0.54475516 = fieldWeight in 1459, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.03687616 = weight(abstract_txt:retrieval in 1459) [ClassicSimilarity], result of:
            0.03687616 = score(doc=1459,freq=3.0), product of:
              0.09798573 = queryWeight, product of:
                1.735038 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.016244696 = queryNorm
              0.37634215 = fieldWeight in 1459, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.056926653 = weight(abstract_txt:precision in 1459) [ClassicSimilarity], result of:
            0.056926653 = score(doc=1459,freq=1.0), product of:
              0.16489954 = queryWeight, product of:
                1.8377721 = boost
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.016244696 = queryNorm
              0.3452202 = fieldWeight in 1459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.13104014 = weight(abstract_txt:document in 1459) [ClassicSimilarity], result of:
            0.13104014 = score(doc=1459,freq=6.0), product of:
              0.19932945 = queryWeight, product of:
                2.8574765 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.016244696 = queryNorm
              0.6574048 = fieldWeight in 1459, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
          0.26749238 = weight(abstract_txt:structured in 1459) [ClassicSimilarity], result of:
            0.26749238 = score(doc=1459,freq=2.0), product of:
              0.55748004 = queryWeight, product of:
                6.321653 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.016244696 = queryNorm
              0.47982416 = fieldWeight in 1459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.0625 = fieldNorm(doc=1459)
        0.32 = coord(8/25)
    
  3. Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.22
    0.21553704 = sum of:
      0.21553704 = product of:
        0.76977515 = sum of:
          0.041863333 = weight(abstract_txt:documents in 1995) [ClassicSimilarity], result of:
            0.041863333 = score(doc=1995,freq=2.0), product of:
              0.091892816 = queryWeight, product of:
                1.3719008 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016244696 = queryNorm
              0.455567 = fieldWeight in 1995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.034950316 = weight(abstract_txt:structure in 1995) [ClassicSimilarity], result of:
            0.034950316 = score(doc=1995,freq=1.0), product of:
              0.10265255 = queryWeight, product of:
                1.449996 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.016244696 = queryNorm
              0.34047198 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.026613077 = weight(abstract_txt:retrieval in 1995) [ClassicSimilarity], result of:
            0.026613077 = score(doc=1995,freq=1.0), product of:
              0.09798573 = queryWeight, product of:
                1.735038 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.016244696 = queryNorm
              0.27160156 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.16535467 = weight(abstract_txt:corpus in 1995) [ClassicSimilarity], result of:
            0.16535467 = score(doc=1995,freq=3.0), product of:
              0.20058858 = queryWeight, product of:
                2.0269127 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.016244696 = queryNorm
              0.8243474 = fieldWeight in 1995, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.06687114 = weight(abstract_txt:document in 1995) [ClassicSimilarity], result of:
            0.06687114 = score(doc=1995,freq=1.0), product of:
              0.19932945 = queryWeight, product of:
                2.8574765 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.016244696 = queryNorm
              0.33548045 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.0997571 = weight(abstract_txt:searching in 1995) [ClassicSimilarity], result of:
            0.0997571 = score(doc=1995,freq=1.0), product of:
              0.297901 = queryWeight, product of:
                4.278372 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.016244696 = queryNorm
              0.3348666 = fieldWeight in 1995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
          0.3343655 = weight(abstract_txt:structured in 1995) [ClassicSimilarity], result of:
            0.3343655 = score(doc=1995,freq=2.0), product of:
              0.55748004 = queryWeight, product of:
                6.321653 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.016244696 = queryNorm
              0.5997802 = fieldWeight in 1995, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=1995)
        0.28 = coord(7/25)
    
  4. Skov, M.; Larsen, B.; Ingwersen, P.: Inter and intra-document contexts applied in polyrepresentation for best match IR (2008) 0.16
    0.16486579 = sum of:
      0.16486579 = product of:
        0.5888064 = sum of:
          0.037846744 = weight(abstract_txt:supporting in 3117) [ClassicSimilarity], result of:
            0.037846744 = score(doc=3117,freq=1.0), product of:
              0.0996976 = queryWeight, product of:
                1.0104371 = boost
                6.0738463 = idf(docFreq=277, maxDocs=44421)
                0.016244696 = queryNorm
              0.3796154 = fieldWeight in 3117, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0738463 = idf(docFreq=277, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.061405316 = weight(abstract_txt:unstructured in 3117) [ClassicSimilarity], result of:
            0.061405316 = score(doc=3117,freq=1.0), product of:
              0.13765849 = queryWeight, product of:
                1.1873218 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.016244696 = queryNorm
              0.44606996 = fieldWeight in 3117, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.023318518 = weight(abstract_txt:search in 3117) [ClassicSimilarity], result of:
            0.023318518 = score(doc=3117,freq=2.0), product of:
              0.07218821 = queryWeight, product of:
                1.2159482 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.016244696 = queryNorm
              0.3230239 = fieldWeight in 3117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.042580925 = weight(abstract_txt:retrieval in 3117) [ClassicSimilarity], result of:
            0.042580925 = score(doc=3117,freq=4.0), product of:
              0.09798573 = queryWeight, product of:
                1.735038 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.016244696 = queryNorm
              0.4345625 = fieldWeight in 3117, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.080506444 = weight(abstract_txt:precision in 3117) [ClassicSimilarity], result of:
            0.080506444 = score(doc=3117,freq=2.0), product of:
              0.16489954 = queryWeight, product of:
                1.8377721 = boost
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.016244696 = queryNorm
              0.4882151 = fieldWeight in 3117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.07565606 = weight(abstract_txt:document in 3117) [ClassicSimilarity], result of:
            0.07565606 = score(doc=3117,freq=2.0), product of:
              0.19932945 = queryWeight, product of:
                2.8574765 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.016244696 = queryNorm
              0.3795528 = fieldWeight in 3117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
          0.26749238 = weight(abstract_txt:structured in 3117) [ClassicSimilarity], result of:
            0.26749238 = score(doc=3117,freq=2.0), product of:
              0.55748004 = queryWeight, product of:
                6.321653 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.016244696 = queryNorm
              0.47982416 = fieldWeight in 3117, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.0625 = fieldNorm(doc=3117)
        0.28 = coord(7/25)
    
  5. Sevigny, M.; Marcoux, Y.: Construction et evaluation d'un prototype d'interface-utilisateurs pour l'interrogation de bases de documents structures (1996) 0.16
    0.1614645 = sum of:
      0.1614645 = product of:
        0.8073225 = sum of:
          0.07752501 = weight(abstract_txt:sgml in 752) [ClassicSimilarity], result of:
            0.07752501 = score(doc=752,freq=1.0), product of:
              0.122715496 = queryWeight, product of:
                1.1210284 = boost
                6.738623 = idf(docFreq=142, maxDocs=44421)
                0.016244696 = queryNorm
              0.63174593 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.738623 = idf(docFreq=142, maxDocs=44421)
                0.09375 = fieldNorm(doc=752)
          0.03552222 = weight(abstract_txt:documents in 752) [ClassicSimilarity], result of:
            0.03552222 = score(doc=752,freq=1.0), product of:
              0.091892816 = queryWeight, product of:
                1.3719008 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.016244696 = queryNorm
              0.38656145 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.09375 = fieldNorm(doc=752)
          0.06387139 = weight(abstract_txt:retrieval in 752) [ClassicSimilarity], result of:
            0.06387139 = score(doc=752,freq=4.0), product of:
              0.09798573 = queryWeight, product of:
                1.735038 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.016244696 = queryNorm
              0.6518438 = fieldWeight in 752, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=752)
          0.13898905 = weight(abstract_txt:document in 752) [ClassicSimilarity], result of:
            0.13898905 = score(doc=752,freq=3.0), product of:
              0.19932945 = queryWeight, product of:
                2.8574765 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.016244696 = queryNorm
              0.697283 = fieldWeight in 752, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=752)
          0.49141487 = weight(abstract_txt:structured in 752) [ClassicSimilarity], result of:
            0.49141487 = score(doc=752,freq=3.0), product of:
              0.55748004 = queryWeight, product of:
                6.321653 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.016244696 = queryNorm
              0.8814932 = fieldWeight in 752, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.09375 = fieldNorm(doc=752)
        0.2 = coord(5/25)