Document (#37096)

Author
Das, A.
Jain, A.
Title
Indexing the World Wide Web : the journey so far
Source
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Imprint
Hershey, PA : IGI Publishing
Year
2012
Pages
S.1-28
Abstract
In this chapter, the authors describe the key indexing components of today's web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.
Footnote
Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64418.
Theme
Suchmaschinen
Object
Google

Similar documents (author)

  1. Jain, H.C.: Colon Classification : a review article (1964) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:jain in 1951) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 1951, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=1951)
    
  2. Jain, A.K.: Image data compression : a review (1981) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:jain in 310) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 310, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=310)
    
  3. Jain, R.: Visual information retrieval in digital libraries (1997) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:jain in 760) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 760, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=760)
    
  4. Jain, P.: ¬An empirical study of knowledge management in academic libraries in East and Southern Africa (2007) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:jain in 1864) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 1864, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=1864)
    
  5. Saggi, M.K.; Jain, S.: ¬A survey towards an integration of big data analytics to big insights for value-creation (2018) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:jain in 53) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 53, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=53)
    

Similar documents (content)

  1. Ceri, S.; Bozzon, A.; Brambilla, M.; Della Valle, E.; Fraternali, P.; Quarteroni, S.: Web Information Retrieval (2013) 0.12
    0.122137114 = sum of:
      0.122137114 = product of:
        0.436204 = sum of:
          0.050880764 = weight(abstract_txt:cover in 2082) [ClassicSimilarity], result of:
            0.050880764 = score(doc=2082,freq=1.0), product of:
              0.14402962 = queryWeight, product of:
                1.0203593 = boost
                6.4597206 = idf(docFreq=188, maxDocs=44421)
                0.021851687 = queryNorm
              0.35326597 = fieldWeight in 2082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4597206 = idf(docFreq=188, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.07050337 = weight(abstract_txt:grown in 2082) [ClassicSimilarity], result of:
            0.07050337 = score(doc=2082,freq=1.0), product of:
              0.17901495 = queryWeight, product of:
                1.1375536 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.021851687 = queryNorm
              0.39384067 = fieldWeight in 2082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.05827217 = weight(abstract_txt:search in 2082) [ClassicSimilarity], result of:
            0.05827217 = score(doc=2082,freq=10.0), product of:
              0.092200555 = queryWeight, product of:
                1.15454 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021851687 = queryNorm
              0.6320154 = fieldWeight in 2082, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.041829683 = weight(abstract_txt:data in 2082) [ClassicSimilarity], result of:
            0.041829683 = score(doc=2082,freq=4.0), product of:
              0.11483992 = queryWeight, product of:
                1.5780991 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.021851687 = queryNorm
              0.36424342 = fieldWeight in 2082, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.09579091 = weight(abstract_txt:chapter in 2082) [ClassicSimilarity], result of:
            0.09579091 = score(doc=2082,freq=1.0), product of:
              0.27667862 = queryWeight, product of:
                2.0 = boost
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.021851687 = queryNorm
              0.34621724 = fieldWeight in 2082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.05676397 = weight(abstract_txt:authors in 2082) [ClassicSimilarity], result of:
            0.05676397 = score(doc=2082,freq=1.0), product of:
              0.22344553 = queryWeight, product of:
                2.2012718 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.021851687 = queryNorm
              0.2540394 = fieldWeight in 2082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
          0.062163122 = weight(abstract_txt:indexing in 2082) [ClassicSimilarity], result of:
            0.062163122 = score(doc=2082,freq=1.0), product of:
              0.2612911 = queryWeight, product of:
                2.7486503 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021851687 = queryNorm
              0.23790754 = fieldWeight in 2082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2082)
        0.28 = coord(7/25)
    
  2. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.10
    0.09590791 = sum of:
      0.09590791 = product of:
        0.47953954 = sum of:
          0.052649368 = weight(abstract_txt:search in 1007) [ClassicSimilarity], result of:
            0.052649368 = score(doc=1007,freq=4.0), product of:
              0.092200555 = queryWeight, product of:
                1.15454 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021851687 = queryNorm
              0.5710309 = fieldWeight in 1007, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.029878344 = weight(abstract_txt:data in 1007) [ClassicSimilarity], result of:
            0.029878344 = score(doc=1007,freq=1.0), product of:
              0.11483992 = queryWeight, product of:
                1.5780991 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.021851687 = queryNorm
              0.26017386 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.19352686 = weight(abstract_txt:chapter in 1007) [ClassicSimilarity], result of:
            0.19352686 = score(doc=1007,freq=2.0), product of:
              0.27667862 = queryWeight, product of:
                2.0 = boost
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.021851687 = queryNorm
              0.69946444 = fieldWeight in 1007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.11468054 = weight(abstract_txt:authors in 1007) [ClassicSimilarity], result of:
            0.11468054 = score(doc=1007,freq=2.0), product of:
              0.22344553 = queryWeight, product of:
                2.2012718 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.021851687 = queryNorm
              0.5132371 = fieldWeight in 1007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.08880446 = weight(abstract_txt:indexing in 1007) [ClassicSimilarity], result of:
            0.08880446 = score(doc=1007,freq=1.0), product of:
              0.2612911 = queryWeight, product of:
                2.7486503 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021851687 = queryNorm
              0.33986792 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
        0.2 = coord(5/25)
    
  3. Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.10
    0.09514178 = sum of:
      0.09514178 = product of:
        0.59463614 = sum of:
          0.12086291 = weight(abstract_txt:trade in 2310) [ClassicSimilarity], result of:
            0.12086291 = score(doc=2310,freq=1.0), product of:
              0.17901495 = queryWeight, product of:
                1.1375536 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.021851687 = queryNorm
              0.6751554 = fieldWeight in 2310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.09375 = fieldNorm(doc=2310)
          0.20993732 = weight(abstract_txt:offs in 2310) [ClassicSimilarity], result of:
            0.20993732 = score(doc=2310,freq=1.0), product of:
              0.25867453 = queryWeight, product of:
                1.3674266 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.021851687 = queryNorm
              0.81158864 = fieldWeight in 2310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=2310)
          0.050705235 = weight(abstract_txt:data in 2310) [ClassicSimilarity], result of:
            0.050705235 = score(doc=2310,freq=2.0), product of:
              0.11483992 = queryWeight, product of:
                1.5780991 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.021851687 = queryNorm
              0.4415297 = fieldWeight in 2310, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=2310)
          0.2131307 = weight(abstract_txt:indexing in 2310) [ClassicSimilarity], result of:
            0.2131307 = score(doc=2310,freq=4.0), product of:
              0.2612911 = queryWeight, product of:
                2.7486503 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021851687 = queryNorm
              0.815683 = fieldWeight in 2310, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=2310)
        0.16 = coord(4/25)
    
  4. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.09
    0.091978796 = sum of:
      0.091978796 = product of:
        0.383245 = sum of:
          0.026324684 = weight(abstract_txt:search in 1101) [ClassicSimilarity], result of:
            0.026324684 = score(doc=1101,freq=1.0), product of:
              0.092200555 = queryWeight, product of:
                1.15454 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021851687 = queryNorm
              0.28551546 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
          0.046791766 = weight(abstract_txt:world in 1101) [ClassicSimilarity], result of:
            0.046791766 = score(doc=1101,freq=1.0), product of:
              0.13529167 = queryWeight, product of:
                1.3985491 = boost
                4.426988 = idf(docFreq=1442, maxDocs=44421)
                0.021851687 = queryNorm
              0.34585845 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.426988 = idf(docFreq=1442, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
          0.062314633 = weight(abstract_txt:wide in 1101) [ClassicSimilarity], result of:
            0.062314633 = score(doc=1101,freq=1.0), product of:
              0.16376388 = queryWeight, product of:
                1.5386904 = boost
                4.8705935 = idf(docFreq=925, maxDocs=44421)
                0.021851687 = queryNorm
              0.38051513 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8705935 = idf(docFreq=925, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
          0.029878344 = weight(abstract_txt:data in 1101) [ClassicSimilarity], result of:
            0.029878344 = score(doc=1101,freq=1.0), product of:
              0.11483992 = queryWeight, product of:
                1.5780991 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.021851687 = queryNorm
              0.26017386 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
          0.13684416 = weight(abstract_txt:chapter in 1101) [ClassicSimilarity], result of:
            0.13684416 = score(doc=1101,freq=1.0), product of:
              0.27667862 = queryWeight, product of:
                2.0 = boost
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.021851687 = queryNorm
              0.49459606 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
          0.08109139 = weight(abstract_txt:authors in 1101) [ClassicSimilarity], result of:
            0.08109139 = score(doc=1101,freq=1.0), product of:
              0.22344553 = queryWeight, product of:
                2.2012718 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.021851687 = queryNorm
              0.36291346 = fieldWeight in 1101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.078125 = fieldNorm(doc=1101)
        0.24 = coord(6/25)
    
  5. Head, A.J.: ¬A question of interface design : how do online service GUIs measure up? (1997) 0.09
    0.088777 = sum of:
      0.088777 = product of:
        0.55485624 = sum of:
          0.12709293 = weight(abstract_txt:newly in 1427) [ClassicSimilarity], result of:
            0.12709293 = score(doc=1427,freq=1.0), product of:
              0.1670361 = queryWeight, product of:
                1.0988348 = boost
                6.9565353 = idf(docFreq=114, maxDocs=44421)
                0.021851687 = queryNorm
              0.76087105 = fieldWeight in 1427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9565353 = idf(docFreq=114, maxDocs=44421)
                0.109375 = fieldNorm(doc=1427)
          0.14100674 = weight(abstract_txt:trade in 1427) [ClassicSimilarity], result of:
            0.14100674 = score(doc=1427,freq=1.0), product of:
              0.17901495 = queryWeight, product of:
                1.1375536 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.021851687 = queryNorm
              0.78768134 = fieldWeight in 1427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.109375 = fieldNorm(doc=1427)
          0.24492685 = weight(abstract_txt:offs in 1427) [ClassicSimilarity], result of:
            0.24492685 = score(doc=1427,freq=1.0), product of:
              0.25867453 = queryWeight, product of:
                1.3674266 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.021851687 = queryNorm
              0.9468534 = fieldWeight in 1427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.109375 = fieldNorm(doc=1427)
          0.041829683 = weight(abstract_txt:data in 1427) [ClassicSimilarity], result of:
            0.041829683 = score(doc=1427,freq=1.0), product of:
              0.11483992 = queryWeight, product of:
                1.5780991 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.021851687 = queryNorm
              0.36424342 = fieldWeight in 1427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.109375 = fieldNorm(doc=1427)
        0.16 = coord(4/25)