Document (#28686)

Author
Efron, M.
Title
Eigenvalue-based model selection during Latent Semantic Indexing
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.9, S.969-988
Year
2005
Abstract
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals an these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests an the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators an six Standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs weIl, predicting the best values of k an 3 of 12 observations, with good predictions an several others, and never offering the worst estimate of optimal dimensionality.
Object
Latent Semantic Indexing

Similar documents (author)

  1. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 2620) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 2620, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=2620)
    
  2. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 3020) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 3020, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=3020)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 675) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 675, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=675)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 455) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 455, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=455)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.88
    4.8777785 = sum of:
      4.8777785 = weight(author_txt:efron in 456) [ClassicSimilarity], result of:
        4.8777785 = fieldWeight in 456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.5 = fieldNorm(doc=456)
    

Similar documents (content)

  1. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.10
    0.09660729 = sum of:
      0.09660729 = product of:
        0.40253037 = sum of:
          0.035410453 = weight(abstract_txt:model in 3710) [ClassicSimilarity], result of:
            0.035410453 = score(doc=3710,freq=2.0), product of:
              0.06705924 = queryWeight, product of:
                1.0401059 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.01618805 = queryNorm
              0.5280473 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.032630015 = weight(abstract_txt:indexing in 3710) [ClassicSimilarity], result of:
            0.032630015 = score(doc=3710,freq=1.0), product of:
              0.08000661 = queryWeight, product of:
                1.1360859 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01618805 = queryNorm
              0.4078415 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.05021282 = weight(abstract_txt:semantic in 3710) [ClassicSimilarity], result of:
            0.05021282 = score(doc=3710,freq=2.0), product of:
              0.084640995 = queryWeight, product of:
                1.1685266 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01618805 = queryNorm
              0.5932447 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.054282837 = weight(abstract_txt:under in 3710) [ClassicSimilarity], result of:
            0.054282837 = score(doc=3710,freq=1.0), product of:
              0.11232834 = queryWeight, product of:
                1.3461484 = boost
                5.154682 = idf(docFreq=696, maxDocs=44421)
                0.01618805 = queryNorm
              0.48325145 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.154682 = idf(docFreq=696, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.1915863 = weight(abstract_txt:latent in 3710) [ClassicSimilarity], result of:
            0.1915863 = score(doc=3710,freq=2.0), product of:
              0.20667152 = queryWeight, product of:
                1.8259487 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.01618805 = queryNorm
              0.92700875 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.038407914 = weight(abstract_txt:analysis in 3710) [ClassicSimilarity], result of:
            0.038407914 = score(doc=3710,freq=1.0), product of:
              0.11237546 = queryWeight, product of:
                1.9041405 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.01618805 = queryNorm
              0.34178203 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
        0.24 = coord(6/25)
    
  2. Cribbin, T.: Discovering latent topical structure by second-order similarity analysis (2011) 0.09
    0.094932646 = sum of:
      0.094932646 = product of:
        0.3955527 = sum of:
          0.06556367 = weight(abstract_txt:deriving in 470) [ClassicSimilarity], result of:
            0.06556367 = score(doc=470,freq=1.0), product of:
              0.1324974 = queryWeight, product of:
                1.033801 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.01618805 = queryNorm
              0.49482986 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.066136464 = weight(abstract_txt:independence in 470) [ClassicSimilarity], result of:
            0.066136464 = score(doc=470,freq=1.0), product of:
              0.13326798 = queryWeight, product of:
                1.0368028 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.01618805 = queryNorm
              0.49626672 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.016692646 = weight(abstract_txt:model in 470) [ClassicSimilarity], result of:
            0.016692646 = score(doc=470,freq=1.0), product of:
              0.06705924 = queryWeight, product of:
                1.0401059 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.01618805 = queryNorm
              0.24892388 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.033475213 = weight(abstract_txt:semantic in 470) [ClassicSimilarity], result of:
            0.033475213 = score(doc=470,freq=2.0), product of:
              0.084640995 = queryWeight, product of:
                1.1685266 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01618805 = queryNorm
              0.39549646 = fieldWeight in 470, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.15642956 = weight(abstract_txt:latent in 470) [ClassicSimilarity], result of:
            0.15642956 = score(doc=470,freq=3.0), product of:
              0.20667152 = queryWeight, product of:
                1.8259487 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.01618805 = queryNorm
              0.7568994 = fieldWeight in 470, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.057255138 = weight(abstract_txt:analysis in 470) [ClassicSimilarity], result of:
            0.057255138 = score(doc=470,freq=5.0), product of:
              0.11237546 = queryWeight, product of:
                1.9041405 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.01618805 = queryNorm
              0.5094986 = fieldWeight in 470, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
        0.24 = coord(6/25)
    
  3. He, X.; Cai, D.; Liu, H.; Ma, W.Y.: Locality preserving indexing for document representation (2004) 0.09
    0.086849436 = sum of:
      0.086849436 = product of:
        0.72374535 = sum of:
          0.15381937 = weight(abstract_txt:indexing in 5079) [ClassicSimilarity], result of:
            0.15381937 = score(doc=5079,freq=2.0), product of:
              0.08000661 = queryWeight, product of:
                1.1360859 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01618805 = queryNorm
              1.9225833 = fieldWeight in 5079, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.3125 = fieldNorm(doc=5079)
          0.11835275 = weight(abstract_txt:semantic in 5079) [ClassicSimilarity], result of:
            0.11835275 = score(doc=5079,freq=1.0), product of:
              0.084640995 = queryWeight, product of:
                1.1685266 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01618805 = queryNorm
              1.3982911 = fieldWeight in 5079, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.3125 = fieldNorm(doc=5079)
          0.45157322 = weight(abstract_txt:latent in 5079) [ClassicSimilarity], result of:
            0.45157322 = score(doc=5079,freq=1.0), product of:
              0.20667152 = queryWeight, product of:
                1.8259487 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.01618805 = queryNorm
              2.1849804 = fieldWeight in 5079, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.3125 = fieldNorm(doc=5079)
        0.12 = coord(3/25)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 0.08
    0.080347635 = sum of:
      0.080347635 = product of:
        0.5021727 = sum of:
          0.029508708 = weight(abstract_txt:model in 3005) [ClassicSimilarity], result of:
            0.029508708 = score(doc=3005,freq=2.0), product of:
              0.06705924 = queryWeight, product of:
                1.0401059 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.01618805 = queryNorm
              0.4400394 = fieldWeight in 3005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=3005)
          0.10541911 = weight(abstract_txt:retained in 3005) [ClassicSimilarity], result of:
            0.10541911 = score(doc=3005,freq=1.0), product of:
              0.15671289 = queryWeight, product of:
                1.1243088 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.01618805 = queryNorm
              0.67268944 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.078125 = fieldNorm(doc=3005)
          0.15965524 = weight(abstract_txt:latent in 3005) [ClassicSimilarity], result of:
            0.15965524 = score(doc=3005,freq=2.0), product of:
              0.20667152 = queryWeight, product of:
                1.8259487 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.01618805 = queryNorm
              0.77250725 = fieldWeight in 3005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=3005)
          0.20758964 = weight(abstract_txt:dimensionality in 3005) [ClassicSimilarity], result of:
            0.20758964 = score(doc=3005,freq=1.0), product of:
              0.31019798 = queryWeight, product of:
                2.237009 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.01618805 = queryNorm
              0.66921663 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=3005)
        0.16 = coord(4/25)
    
  5. Zhan, J.; Loh, H.T.: Using latent semantic indexing to improve the accuracy of document clustering (2007) 0.08
    0.079625726 = sum of:
      0.079625726 = product of:
        0.39812863 = sum of:
          0.020865807 = weight(abstract_txt:model in 1264) [ClassicSimilarity], result of:
            0.020865807 = score(doc=1264,freq=1.0), product of:
              0.06705924 = queryWeight, product of:
                1.0401059 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.01618805 = queryNorm
              0.31115484 = fieldWeight in 1264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=1264)
          0.02719168 = weight(abstract_txt:indexing in 1264) [ClassicSimilarity], result of:
            0.02719168 = score(doc=1264,freq=1.0), product of:
              0.08000661 = queryWeight, product of:
                1.1360859 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.01618805 = queryNorm
              0.33986792 = fieldWeight in 1264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=1264)
          0.029588187 = weight(abstract_txt:semantic in 1264) [ClassicSimilarity], result of:
            0.029588187 = score(doc=1264,freq=1.0), product of:
              0.084640995 = queryWeight, product of:
                1.1685266 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01618805 = queryNorm
              0.34957278 = fieldWeight in 1264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=1264)
          0.112893306 = weight(abstract_txt:latent in 1264) [ClassicSimilarity], result of:
            0.112893306 = score(doc=1264,freq=1.0), product of:
              0.20667152 = queryWeight, product of:
                1.8259487 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.01618805 = queryNorm
              0.5462451 = fieldWeight in 1264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=1264)
          0.20758964 = weight(abstract_txt:dimensionality in 1264) [ClassicSimilarity], result of:
            0.20758964 = score(doc=1264,freq=1.0), product of:
              0.31019798 = queryWeight, product of:
                2.237009 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.01618805 = queryNorm
              0.66921663 = fieldWeight in 1264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=1264)
        0.2 = coord(5/25)