Document (#32960)

Author
Kontostathis, A.
Pottenger, W.M.
Title
¬A framework for understanding Latent Semantic Indexing (LSI) performance
Source
Information processing and management. 42(2006) no.1, S.56-73
Year
2006
Abstract
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval application. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term by dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the Singular Value Decomposition (SVD) algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.
Footnote
Beitrag innerhalb eines thematischen Schwerpunktes "Formal Methods for Information Retrieval"
Object
Latent Semantic Indexing

Similar documents (content)

  1. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.26
    0.25898102 = sum of:
      0.25898102 = product of:
        0.80931574 = sum of:
          0.09619915 = weight(abstract_txt:decomposition in 308) [ClassicSimilarity], result of:
            0.09619915 = score(doc=308,freq=1.0), product of:
              0.19440854 = queryWeight, product of:
                1.4517992 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.016913477 = queryNorm
              0.49482986 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.11084606 = weight(abstract_txt:singular in 308) [ClassicSimilarity], result of:
            0.11084606 = score(doc=308,freq=1.0), product of:
              0.21367219 = queryWeight, product of:
                1.5220288 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.016913477 = queryNorm
              0.5187669 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.031917877 = weight(abstract_txt:indexing in 308) [ClassicSimilarity], result of:
            0.031917877 = score(doc=308,freq=1.0), product of:
              0.117390744 = queryWeight, product of:
                1.5954412 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.016913477 = queryNorm
              0.27189434 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.08507304 = weight(abstract_txt:semantic in 308) [ClassicSimilarity], result of:
            0.08507304 = score(doc=308,freq=6.0), product of:
              0.1241906 = queryWeight, product of:
                1.6409987 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016913477 = queryNorm
              0.68501997 = fieldWeight in 308, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.05405633 = weight(abstract_txt:performance in 308) [ClassicSimilarity], result of:
            0.05405633 = score(doc=308,freq=2.0), product of:
              0.132383 = queryWeight, product of:
                1.6942598 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.016913477 = queryNorm
              0.40833285 = fieldWeight in 308, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.0536692 = weight(abstract_txt:understanding in 308) [ClassicSimilarity], result of:
            0.0536692 = score(doc=308,freq=1.0), product of:
              0.19001666 = queryWeight, product of:
                2.4860241 = boost
                4.5191154 = idf(docFreq=1315, maxDocs=44421)
                0.016913477 = queryNorm
              0.28244472 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5191154 = idf(docFreq=1315, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.2295233 = weight(abstract_txt:latent in 308) [ClassicSimilarity], result of:
            0.2295233 = score(doc=308,freq=3.0), product of:
              0.3032415 = queryWeight, product of:
                2.564237 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.016913477 = queryNorm
              0.7568994 = fieldWeight in 308, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.14803077 = weight(abstract_txt:term in 308) [ClassicSimilarity], result of:
            0.14803077 = score(doc=308,freq=3.0), product of:
              0.2851995 = queryWeight, product of:
                3.5168452 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.016913477 = queryNorm
              0.5190429 = fieldWeight in 308, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
        0.32 = coord(8/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.25
    0.25484106 = sum of:
      0.25484106 = product of:
        0.9101466 = sum of:
          0.12024893 = weight(abstract_txt:decomposition in 1690) [ClassicSimilarity], result of:
            0.12024893 = score(doc=1690,freq=1.0), product of:
              0.19440854 = queryWeight, product of:
                1.4517992 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.016913477 = queryNorm
              0.6185373 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.13855757 = weight(abstract_txt:singular in 1690) [ClassicSimilarity], result of:
            0.13855757 = score(doc=1690,freq=1.0), product of:
              0.21367219 = queryWeight, product of:
                1.5220288 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.016913477 = queryNorm
              0.6484586 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.056423374 = weight(abstract_txt:indexing in 1690) [ClassicSimilarity], result of:
            0.056423374 = score(doc=1690,freq=2.0), product of:
              0.117390744 = queryWeight, product of:
                1.5954412 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.016913477 = queryNorm
              0.48064584 = fieldWeight in 1690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.07519465 = weight(abstract_txt:semantic in 1690) [ClassicSimilarity], result of:
            0.07519465 = score(doc=1690,freq=3.0), product of:
              0.1241906 = queryWeight, product of:
                1.6409987 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016913477 = queryNorm
              0.6054778 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.047779497 = weight(abstract_txt:performance in 1690) [ClassicSimilarity], result of:
            0.047779497 = score(doc=1690,freq=1.0), product of:
              0.132383 = queryWeight, product of:
                1.6942598 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.016913477 = queryNorm
              0.36091867 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.28690413 = weight(abstract_txt:latent in 1690) [ClassicSimilarity], result of:
            0.28690413 = score(doc=1690,freq=3.0), product of:
              0.3032415 = queryWeight, product of:
                2.564237 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.016913477 = queryNorm
              0.94612426 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.18503848 = weight(abstract_txt:term in 1690) [ClassicSimilarity], result of:
            0.18503848 = score(doc=1690,freq=3.0), product of:
              0.2851995 = queryWeight, product of:
                3.5168452 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.016913477 = queryNorm
              0.64880365 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
        0.28 = coord(7/25)
    
  3. Berry, M.W.; Dumais, S.T.; O'Brien, G.W.: Using linear algebra for intelligent information retrieval (1995) 0.22
    0.22484839 = sum of:
      0.22484839 = product of:
        0.80302995 = sum of:
          0.114517815 = weight(abstract_txt:vectors in 3206) [ClassicSimilarity], result of:
            0.114517815 = score(doc=3206,freq=1.0), product of:
              0.18818133 = queryWeight, product of:
                1.4283582 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.016913477 = queryNorm
              0.60855037 = fieldWeight in 3206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.12024893 = weight(abstract_txt:decomposition in 3206) [ClassicSimilarity], result of:
            0.12024893 = score(doc=3206,freq=1.0), product of:
              0.19440854 = queryWeight, product of:
                1.4517992 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.016913477 = queryNorm
              0.6185373 = fieldWeight in 3206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.19594999 = weight(abstract_txt:singular in 3206) [ClassicSimilarity], result of:
            0.19594999 = score(doc=3206,freq=2.0), product of:
              0.21367219 = queryWeight, product of:
                1.5220288 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.016913477 = queryNorm
              0.91705894 = fieldWeight in 3206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.056423374 = weight(abstract_txt:indexing in 3206) [ClassicSimilarity], result of:
            0.056423374 = score(doc=3206,freq=2.0), product of:
              0.117390744 = queryWeight, product of:
                1.5954412 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.016913477 = queryNorm
              0.48064584 = fieldWeight in 3206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.043413654 = weight(abstract_txt:semantic in 3206) [ClassicSimilarity], result of:
            0.043413654 = score(doc=3206,freq=1.0), product of:
              0.1241906 = queryWeight, product of:
                1.6409987 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016913477 = queryNorm
              0.34957278 = fieldWeight in 3206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.16564418 = weight(abstract_txt:latent in 3206) [ClassicSimilarity], result of:
            0.16564418 = score(doc=3206,freq=1.0), product of:
              0.3032415 = queryWeight, product of:
                2.564237 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.016913477 = queryNorm
              0.5462451 = fieldWeight in 3206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
          0.106832005 = weight(abstract_txt:term in 3206) [ClassicSimilarity], result of:
            0.106832005 = score(doc=3206,freq=1.0), product of:
              0.2851995 = queryWeight, product of:
                3.5168452 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.016913477 = queryNorm
              0.37458694 = fieldWeight in 3206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=3206)
        0.28 = coord(7/25)
    
  4. Rishel, T.; Perkins, L.A.; Yenduri, S.; Zand, F.: Determining the context of text using augmented latent semantic indexing (2007) 0.20
    0.19798365 = sum of:
      0.19798365 = product of:
        0.8249319 = sum of:
          0.12024893 = weight(abstract_txt:decomposition in 2316) [ClassicSimilarity], result of:
            0.12024893 = score(doc=2316,freq=1.0), product of:
              0.19440854 = queryWeight, product of:
                1.4517992 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.016913477 = queryNorm
              0.6185373 = fieldWeight in 2316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.13855757 = weight(abstract_txt:singular in 2316) [ClassicSimilarity], result of:
            0.13855757 = score(doc=2316,freq=1.0), product of:
              0.21367219 = queryWeight, product of:
                1.5220288 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.016913477 = queryNorm
              0.6484586 = fieldWeight in 2316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.09707588 = weight(abstract_txt:semantic in 2316) [ClassicSimilarity], result of:
            0.09707588 = score(doc=2316,freq=5.0), product of:
              0.1241906 = queryWeight, product of:
                1.6409987 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016913477 = queryNorm
              0.7816685 = fieldWeight in 2316, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.047779497 = weight(abstract_txt:performance in 2316) [ClassicSimilarity], result of:
            0.047779497 = score(doc=2316,freq=1.0), product of:
              0.132383 = queryWeight, product of:
                1.6942598 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.016913477 = queryNorm
              0.36091867 = fieldWeight in 2316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.08998168 = weight(abstract_txt:algorithm in 2316) [ClassicSimilarity], result of:
            0.08998168 = score(doc=2316,freq=1.0), product of:
              0.20188649 = queryWeight, product of:
                2.092269 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016913477 = queryNorm
              0.44570434 = fieldWeight in 2316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.33128837 = weight(abstract_txt:latent in 2316) [ClassicSimilarity], result of:
            0.33128837 = score(doc=2316,freq=4.0), product of:
              0.3032415 = queryWeight, product of:
                2.564237 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.016913477 = queryNorm
              1.0924902 = fieldWeight in 2316, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
        0.24 = coord(6/25)
    
  5. Deerwester, S.C.; Dumais, S.T.; Landauer, T.K.; Furnas, G.W.; Harshman, R.A.: Indexing by latent semantic analysis (1990) 0.20
    0.19762385 = sum of:
      0.19762385 = product of:
        0.70579946 = sum of:
          0.16195264 = weight(abstract_txt:vectors in 3399) [ClassicSimilarity], result of:
            0.16195264 = score(doc=3399,freq=2.0), product of:
              0.18818133 = queryWeight, product of:
                1.4283582 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.016913477 = queryNorm
              0.86062014 = fieldWeight in 3399, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.12024893 = weight(abstract_txt:decomposition in 3399) [ClassicSimilarity], result of:
            0.12024893 = score(doc=3399,freq=1.0), product of:
              0.19440854 = queryWeight, product of:
                1.4517992 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.016913477 = queryNorm
              0.6185373 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.13855757 = weight(abstract_txt:singular in 3399) [ClassicSimilarity], result of:
            0.13855757 = score(doc=3399,freq=1.0), product of:
              0.21367219 = queryWeight, product of:
                1.5220288 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.016913477 = queryNorm
              0.6484586 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.03989735 = weight(abstract_txt:indexing in 3399) [ClassicSimilarity], result of:
            0.03989735 = score(doc=3399,freq=1.0), product of:
              0.117390744 = queryWeight, product of:
                1.5954412 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.016913477 = queryNorm
              0.33986792 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.043413654 = weight(abstract_txt:semantic in 3399) [ClassicSimilarity], result of:
            0.043413654 = score(doc=3399,freq=1.0), product of:
              0.1241906 = queryWeight, product of:
                1.6409987 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016913477 = queryNorm
              0.34957278 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.094897255 = weight(abstract_txt:values in 3399) [ClassicSimilarity], result of:
            0.094897255 = score(doc=3399,freq=1.0), product of:
              0.20917363 = queryWeight, product of:
                2.1296947 = boost
                5.807065 = idf(docFreq=362, maxDocs=44421)
                0.016913477 = queryNorm
              0.45367694 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.807065 = idf(docFreq=362, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
          0.106832005 = weight(abstract_txt:term in 3399) [ClassicSimilarity], result of:
            0.106832005 = score(doc=3399,freq=1.0), product of:
              0.2851995 = queryWeight, product of:
                3.5168452 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.016913477 = queryNorm
              0.37458694 = fieldWeight in 3399, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=3399)
        0.28 = coord(7/25)