Document (#28458)

Author
Hoenkamp, E.
Title
Unitary operators on the document space
Source
Journal of the American Society for Information Science and technology. 54(2003) no.4, S.314-320
Year
2003
Abstract
When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
Footnote
Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Theme
Retrievalalgorithmen
Object
Latent Semantic Indexing

Similar documents (content)

  1. Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.21
    0.21293613 = sum of:
      0.21293613 = product of:
        0.6654254 = sum of:
          0.092825495 = weight(abstract_txt:parsing in 1161) [ClassicSimilarity], result of:
            0.092825495 = score(doc=1161,freq=1.0), product of:
              0.15330735 = queryWeight, product of:
                1.0165654 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.01945868 = queryNorm
              0.6054863 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.033268645 = weight(abstract_txt:than in 1161) [ClassicSimilarity], result of:
            0.033268645 = score(doc=1161,freq=2.0), product of:
              0.0773526 = queryWeight, product of:
                1.0211895 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.01945868 = queryNorm
              0.43009087 = fieldWeight in 1161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.03283333 = weight(abstract_txt:indexing in 1161) [ClassicSimilarity], result of:
            0.03283333 = score(doc=1161,freq=1.0), product of:
              0.09660614 = queryWeight, product of:
                1.1412249 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01945868 = queryNorm
              0.33986792 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.06741858 = weight(abstract_txt:representation in 1161) [ClassicSimilarity], result of:
            0.06741858 = score(doc=1161,freq=2.0), product of:
              0.12387145 = queryWeight, product of:
                1.2922736 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.01945868 = queryNorm
              0.54426247 = fieldWeight in 1161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.13965465 = weight(abstract_txt:technique in 1161) [ClassicSimilarity], result of:
            0.13965465 = score(doc=1161,freq=4.0), product of:
              0.15976381 = queryWeight, product of:
                1.4676013 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01945868 = queryNorm
              0.874132 = fieldWeight in 1161, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.05930653 = weight(abstract_txt:documents in 1161) [ClassicSimilarity], result of:
            0.05930653 = score(doc=1161,freq=2.0), product of:
              0.13018179 = queryWeight, product of:
                1.6225184 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.01945868 = queryNorm
              0.455567 = fieldWeight in 1161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.06315621 = weight(abstract_txt:document in 1161) [ClassicSimilarity], result of:
            0.06315621 = score(doc=1161,freq=1.0), product of:
              0.18825601 = queryWeight, product of:
                2.252985 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01945868 = queryNorm
              0.33548045 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
          0.17696196 = weight(abstract_txt:space in 1161) [ClassicSimilarity], result of:
            0.17696196 = score(doc=1161,freq=2.0), product of:
              0.29697147 = queryWeight, product of:
                2.8297055 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.01945868 = queryNorm
              0.59588873 = fieldWeight in 1161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=1161)
        0.32 = coord(8/25)
    
  2. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.14
    0.14160879 = sum of:
      0.14160879 = product of:
        0.39335772 = sum of:
          0.035344448 = weight(abstract_txt:computed in 2211) [ClassicSimilarity], result of:
            0.035344448 = score(doc=2211,freq=1.0), product of:
              0.14835161 = queryWeight, product of:
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.01945868 = queryNorm
              0.23824781 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.009409795 = weight(abstract_txt:than in 2211) [ClassicSimilarity], result of:
            0.009409795 = score(doc=2211,freq=1.0), product of:
              0.0773526 = queryWeight, product of:
                1.0211895 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.01945868 = queryNorm
              0.12164807 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.04249996 = weight(abstract_txt:concepts in 2211) [ClassicSimilarity], result of:
            0.04249996 = score(doc=2211,freq=8.0), product of:
              0.10567782 = queryWeight, product of:
                1.1936054 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.01945868 = queryNorm
              0.40216538 = fieldWeight in 2211, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.019068856 = weight(abstract_txt:representation in 2211) [ClassicSimilarity], result of:
            0.019068856 = score(doc=2211,freq=1.0), product of:
              0.12387145 = queryWeight, product of:
                1.2922736 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.01945868 = queryNorm
              0.15394068 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.055861864 = weight(abstract_txt:technique in 2211) [ClassicSimilarity], result of:
            0.055861864 = score(doc=2211,freq=4.0), product of:
              0.15976381 = queryWeight, product of:
                1.4676013 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01945868 = queryNorm
              0.3496528 = fieldWeight in 2211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.04975864 = weight(abstract_txt:matching in 2211) [ClassicSimilarity], result of:
            0.04975864 = score(doc=2211,freq=2.0), product of:
              0.1863476 = queryWeight, product of:
                1.5850055 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.01945868 = queryNorm
              0.26702055 = fieldWeight in 2211, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.03354884 = weight(abstract_txt:documents in 2211) [ClassicSimilarity], result of:
            0.03354884 = score(doc=2211,freq=4.0), product of:
              0.13018179 = queryWeight, product of:
                1.6225184 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.01945868 = queryNorm
              0.25770763 = fieldWeight in 2211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.025262484 = weight(abstract_txt:document in 2211) [ClassicSimilarity], result of:
            0.025262484 = score(doc=2211,freq=1.0), product of:
              0.18825601 = queryWeight, product of:
                2.252985 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01945868 = queryNorm
              0.13419218 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.12260284 = weight(abstract_txt:space in 2211) [ClassicSimilarity], result of:
            0.12260284 = score(doc=2211,freq=6.0), product of:
              0.29697147 = queryWeight, product of:
                2.8297055 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.01945868 = queryNorm
              0.41284385 = fieldWeight in 2211, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
        0.36 = coord(9/25)
    
  3. Martin, D.I.; Berry, M.W.: Latent Semantic Indexing (2009) 0.13
    0.13375945 = sum of:
      0.13375945 = product of:
        0.5573311 = sum of:
          0.04643334 = weight(abstract_txt:indexing in 821) [ClassicSimilarity], result of:
            0.04643334 = score(doc=821,freq=2.0), product of:
              0.09660614 = queryWeight, product of:
                1.1412249 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01945868 = queryNorm
              0.48064584 = fieldWeight in 821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
          0.069827326 = weight(abstract_txt:technique in 821) [ClassicSimilarity], result of:
            0.069827326 = score(doc=821,freq=1.0), product of:
              0.15976381 = queryWeight, product of:
                1.4676013 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01945868 = queryNorm
              0.437066 = fieldWeight in 821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
          0.07084661 = weight(abstract_txt:underlying in 821) [ClassicSimilarity], result of:
            0.07084661 = score(doc=821,freq=1.0), product of:
              0.16131479 = queryWeight, product of:
                1.4747077 = boost
                5.6215343 = idf(docFreq=436, maxDocs=44421)
                0.01945868 = queryNorm
              0.43918237 = fieldWeight in 821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6215343 = idf(docFreq=436, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
          0.083872095 = weight(abstract_txt:documents in 821) [ClassicSimilarity], result of:
            0.083872095 = score(doc=821,freq=4.0), product of:
              0.13018179 = queryWeight, product of:
                1.6225184 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.01945868 = queryNorm
              0.64426905 = fieldWeight in 821, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
          0.10938977 = weight(abstract_txt:document in 821) [ClassicSimilarity], result of:
            0.10938977 = score(doc=821,freq=3.0), product of:
              0.18825601 = queryWeight, product of:
                2.252985 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01945868 = queryNorm
              0.5810692 = fieldWeight in 821, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
          0.17696196 = weight(abstract_txt:space in 821) [ClassicSimilarity], result of:
            0.17696196 = score(doc=821,freq=2.0), product of:
              0.29697147 = queryWeight, product of:
                2.8297055 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.01945868 = queryNorm
              0.59588873 = fieldWeight in 821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=821)
        0.24 = coord(6/25)
    
  4. Kiren, T.; Shoaib, M.: ¬A novel ontology matching approach using key concepts (2016) 0.12
    0.123226404 = sum of:
      0.123226404 = product of:
        0.616132 = sum of:
          0.070688896 = weight(abstract_txt:computed in 3589) [ClassicSimilarity], result of:
            0.070688896 = score(doc=3589,freq=1.0), product of:
              0.14835161 = queryWeight, product of:
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.01945868 = queryNorm
              0.47649562 = fieldWeight in 3589, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=3589)
          0.06719834 = weight(abstract_txt:concepts in 3589) [ClassicSimilarity], result of:
            0.06719834 = score(doc=3589,freq=5.0), product of:
              0.10567782 = queryWeight, product of:
                1.1936054 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.01945868 = queryNorm
              0.63587934 = fieldWeight in 3589, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.0625 = fieldNorm(doc=3589)
          0.0790006 = weight(abstract_txt:technique in 3589) [ClassicSimilarity], result of:
            0.0790006 = score(doc=3589,freq=2.0), product of:
              0.15976381 = queryWeight, product of:
                1.4676013 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01945868 = queryNorm
              0.4944837 = fieldWeight in 3589, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0625 = fieldNorm(doc=3589)
          0.19903456 = weight(abstract_txt:matching in 3589) [ClassicSimilarity], result of:
            0.19903456 = score(doc=3589,freq=8.0), product of:
              0.1863476 = queryWeight, product of:
                1.5850055 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.01945868 = queryNorm
              1.0680822 = fieldWeight in 3589, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=3589)
          0.2002096 = weight(abstract_txt:space in 3589) [ClassicSimilarity], result of:
            0.2002096 = score(doc=3589,freq=4.0), product of:
              0.29697147 = queryWeight, product of:
                2.8297055 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.01945868 = queryNorm
              0.67417115 = fieldWeight in 3589, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=3589)
        0.2 = coord(5/25)
    
  5. Kiren, T.: ¬A clustering based indexing technique of modularized ontologies for information retrieval (2017) 0.12
    0.122307524 = sum of:
      0.122307524 = product of:
        0.43681258 = sum of:
          0.062296864 = weight(abstract_txt:indexing in 399) [ClassicSimilarity], result of:
            0.062296864 = score(doc=399,freq=10.0), product of:
              0.09660614 = queryWeight, product of:
                1.1412249 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01945868 = queryNorm
              0.64485407 = fieldWeight in 399, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.039038707 = weight(abstract_txt:concepts in 399) [ClassicSimilarity], result of:
            0.039038707 = score(doc=399,freq=3.0), product of:
              0.10567782 = queryWeight, product of:
                1.1936054 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.01945868 = queryNorm
              0.36941248 = fieldWeight in 399, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.07256669 = weight(abstract_txt:technique in 399) [ClassicSimilarity], result of:
            0.07256669 = score(doc=399,freq=3.0), product of:
              0.15976381 = queryWeight, product of:
                1.4676013 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.01945868 = queryNorm
              0.4542123 = fieldWeight in 399, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.025161631 = weight(abstract_txt:documents in 399) [ClassicSimilarity], result of:
            0.025161631 = score(doc=399,freq=1.0), product of:
              0.13018179 = queryWeight, product of:
                1.6225184 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.01945868 = queryNorm
              0.19328073 = fieldWeight in 399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.07798168 = weight(abstract_txt:words in 399) [ClassicSimilarity], result of:
            0.07798168 = score(doc=399,freq=2.0), product of:
              0.219639 = queryWeight, product of:
                2.107508 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.01945868 = queryNorm
              0.35504478 = fieldWeight in 399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.05358982 = weight(abstract_txt:document in 399) [ClassicSimilarity], result of:
            0.05358982 = score(doc=399,freq=2.0), product of:
              0.18825601 = queryWeight, product of:
                2.252985 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.01945868 = queryNorm
              0.2846646 = fieldWeight in 399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
          0.106177166 = weight(abstract_txt:space in 399) [ClassicSimilarity], result of:
            0.106177166 = score(doc=399,freq=2.0), product of:
              0.29697147 = queryWeight, product of:
                2.8297055 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.01945868 = queryNorm
              0.35753322 = fieldWeight in 399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.046875 = fieldNorm(doc=399)
        0.28 = coord(7/25)