Document (#26669)

Author
Koch, T.
Ardö, A.
Noodén, L.
Title
¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1
Source
http://www.lub.lu.se/desire/DESIRE36a-WP1.html
Year
1999
Abstract
This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
Theme
Automatisches Klassifizieren
Internet
Object
DESIRE

Similar documents (author)

  1. Ardö, A.; Koch, T.: Lunds Universitets Elektroniska Bibliotek : Del.2: Gopher, World Wide Web (WWW). Planerade projekt (1993) 5.90
    5.9037666 = sum of:
      5.9037666 = sum of:
        1.9677776 = weight(author_txt:koch in 6000) [ClassicSimilarity], result of:
          1.9677776 = score(doc=6000,freq=1.0), product of:
            0.5329857 = queryWeight, product of:
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.07218137 = queryNorm
            3.6919897 = fieldWeight in 6000, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.5 = fieldNorm(doc=6000)
        3.9359891 = weight(author_txt:ardö in 6000) [ClassicSimilarity], result of:
          3.9359891 = score(doc=6000,freq=1.0), product of:
            0.84612423 = queryWeight, product of:
              1.2599673 = boost
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.07218137 = queryNorm
            4.6517863 = fieldWeight in 6000, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.5 = fieldNorm(doc=6000)
    
  2. Ardö, A.; Koch, T.: Wide-area information server (WAIS) as the hub of an electronic library service at Lund University (1993) 5.90
    5.9037666 = sum of:
      5.9037666 = sum of:
        1.9677776 = weight(author_txt:koch in 73) [ClassicSimilarity], result of:
          1.9677776 = score(doc=73,freq=1.0), product of:
            0.5329857 = queryWeight, product of:
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.07218137 = queryNorm
            3.6919897 = fieldWeight in 73, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.5 = fieldNorm(doc=73)
        3.9359891 = weight(author_txt:ardö in 73) [ClassicSimilarity], result of:
          3.9359891 = score(doc=73,freq=1.0), product of:
            0.84612423 = queryWeight, product of:
              1.2599673 = boost
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.07218137 = queryNorm
            4.6517863 = fieldWeight in 73, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.5 = fieldNorm(doc=73)
    
  3. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 5.90
    5.9037666 = sum of:
      5.9037666 = sum of:
        1.9677776 = weight(author_txt:koch in 1382) [ClassicSimilarity], result of:
          1.9677776 = score(doc=1382,freq=1.0), product of:
            0.5329857 = queryWeight, product of:
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.07218137 = queryNorm
            3.6919897 = fieldWeight in 1382, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.5 = fieldNorm(doc=1382)
        3.9359891 = weight(author_txt:ardö in 1382) [ClassicSimilarity], result of:
          3.9359891 = score(doc=1382,freq=1.0), product of:
            0.84612423 = queryWeight, product of:
              1.2599673 = boost
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.07218137 = queryNorm
            4.6517863 = fieldWeight in 1382, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.5 = fieldNorm(doc=1382)
    
  4. Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 5.90
    5.9037666 = sum of:
      5.9037666 = sum of:
        1.9677776 = weight(author_txt:koch in 2667) [ClassicSimilarity], result of:
          1.9677776 = score(doc=2667,freq=1.0), product of:
            0.5329857 = queryWeight, product of:
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.07218137 = queryNorm
            3.6919897 = fieldWeight in 2667, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.5 = fieldNorm(doc=2667)
        3.9359891 = weight(author_txt:ardö in 2667) [ClassicSimilarity], result of:
          3.9359891 = score(doc=2667,freq=1.0), product of:
            0.84612423 = queryWeight, product of:
              1.2599673 = boost
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.07218137 = queryNorm
            4.6517863 = fieldWeight in 2667, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.5 = fieldNorm(doc=2667)
    
  5. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 4.43
    4.427825 = sum of:
      4.427825 = sum of:
        1.4758332 = weight(author_txt:koch in 2669) [ClassicSimilarity], result of:
          1.4758332 = score(doc=2669,freq=1.0), product of:
            0.5329857 = queryWeight, product of:
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.07218137 = queryNorm
            2.7689922 = fieldWeight in 2669, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3839793 = idf(docFreq=74, maxDocs=44421)
              0.375 = fieldNorm(doc=2669)
        2.9519918 = weight(author_txt:ardö in 2669) [ClassicSimilarity], result of:
          2.9519918 = score(doc=2669,freq=1.0), product of:
            0.84612423 = queryWeight, product of:
              1.2599673 = boost
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.07218137 = queryNorm
            3.4888396 = fieldWeight in 2669, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.303573 = idf(docFreq=10, maxDocs=44421)
              0.375 = fieldNorm(doc=2669)
    

Similar documents (content)

  1. Ardö, A.; Godby, J.; Houghton, A.; Koch, T.; Reighart, R.; Thompson, R.; Vizine-Goetz, D.: Browsing engineering resources on the Web : a general knowledge organization scheme (Dewey) vs. a special scheme (EI) (2000) 0.25
    0.25347897 = sum of:
      0.25347897 = product of:
        1.0561625 = sum of:
          0.029827274 = weight(abstract_txt:subject in 1086) [ClassicSimilarity], result of:
            0.029827274 = score(doc=1086,freq=1.0), product of:
              0.081371374 = queryWeight, product of:
                1.1255077 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.018490667 = queryNorm
              0.36655733 = fieldWeight in 1086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
          0.04947189 = weight(abstract_txt:documents in 1086) [ClassicSimilarity], result of:
            0.04947189 = score(doc=1086,freq=2.0), product of:
              0.09049507 = queryWeight, product of:
                1.18693 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018490667 = queryNorm
              0.54668045 = fieldWeight in 1086, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
          0.08343435 = weight(abstract_txt:generated in 1086) [ClassicSimilarity], result of:
            0.08343435 = score(doc=1086,freq=1.0), product of:
              0.16154464 = queryWeight, product of:
                1.5858383 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.018490667 = queryNorm
              0.5164786 = fieldWeight in 1086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
          0.21010704 = weight(abstract_txt:engineering in 1086) [ClassicSimilarity], result of:
            0.21010704 = score(doc=1086,freq=4.0), product of:
              0.18836622 = queryWeight, product of:
                1.7124352 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.018490667 = queryNorm
              1.1154178 = fieldWeight in 1086, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
          0.1977126 = weight(abstract_txt:desire in 1086) [ClassicSimilarity], result of:
            0.1977126 = score(doc=1086,freq=1.0), product of:
              0.2871346 = queryWeight, product of:
                2.1142454 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.018490667 = queryNorm
              0.68857116 = fieldWeight in 1086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
          0.48560932 = weight(abstract_txt:robot in 1086) [ClassicSimilarity], result of:
            0.48560932 = score(doc=1086,freq=1.0), product of:
              0.59834415 = queryWeight, product of:
                3.7379527 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.018490667 = queryNorm
              0.81158864 = fieldWeight in 1086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=1086)
        0.24 = coord(6/25)
    
  2. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.18
    0.18409263 = sum of:
      0.18409263 = product of:
        0.92046314 = sum of:
          0.029827274 = weight(abstract_txt:subject in 2568) [ClassicSimilarity], result of:
            0.029827274 = score(doc=2568,freq=1.0), product of:
              0.081371374 = queryWeight, product of:
                1.1255077 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.018490667 = queryNorm
              0.36655733 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=2568)
          0.14856811 = weight(abstract_txt:engineering in 2568) [ClassicSimilarity], result of:
            0.14856811 = score(doc=2568,freq=2.0), product of:
              0.18836622 = queryWeight, product of:
                1.7124352 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.018490667 = queryNorm
              0.7887195 = fieldWeight in 2568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.09375 = fieldNorm(doc=2568)
          0.058745828 = weight(abstract_txt:database in 2568) [ClassicSimilarity], result of:
            0.058745828 = score(doc=2568,freq=1.0), product of:
              0.14635618 = queryWeight, product of:
                1.8486887 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.018490667 = queryNorm
              0.40138945 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.09375 = fieldNorm(doc=2568)
          0.1977126 = weight(abstract_txt:desire in 2568) [ClassicSimilarity], result of:
            0.1977126 = score(doc=2568,freq=1.0), product of:
              0.2871346 = queryWeight, product of:
                2.1142454 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.018490667 = queryNorm
              0.68857116 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.09375 = fieldNorm(doc=2568)
          0.48560932 = weight(abstract_txt:robot in 2568) [ClassicSimilarity], result of:
            0.48560932 = score(doc=2568,freq=1.0), product of:
              0.59834415 = queryWeight, product of:
                3.7379527 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.018490667 = queryNorm
              0.81158864 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=2568)
        0.2 = coord(5/25)
    
  3. MacCain, K.W.: Descriptor and citation retrieval in the medical behavioral sciences literature : retrieval overlaps and novelty distribution (1989) 0.15
    0.1528271 = sum of:
      0.1528271 = product of:
        0.54581106 = sum of:
          0.07660238 = weight(abstract_txt:percentage in 2289) [ClassicSimilarity], result of:
            0.07660238 = score(doc=2289,freq=1.0), product of:
              0.1367734 = queryWeight, product of:
                1.031807 = boost
                7.168868 = idf(docFreq=92, maxDocs=44421)
                0.018490667 = queryNorm
              0.56006783 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.168868 = idf(docFreq=92, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.024856063 = weight(abstract_txt:subject in 2289) [ClassicSimilarity], result of:
            0.024856063 = score(doc=2289,freq=1.0), product of:
              0.081371374 = queryWeight, product of:
                1.1255077 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.018490667 = queryNorm
              0.30546445 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.029151587 = weight(abstract_txt:documents in 2289) [ClassicSimilarity], result of:
            0.029151587 = score(doc=2289,freq=1.0), product of:
              0.09049507 = queryWeight, product of:
                1.18693 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018490667 = queryNorm
              0.32213452 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.036456432 = weight(abstract_txt:between in 2289) [ClassicSimilarity], result of:
            0.036456432 = score(doc=2289,freq=2.0), product of:
              0.095437706 = queryWeight, product of:
                1.4928572 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.018490667 = queryNorm
              0.38199192 = fieldWeight in 2289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.030574534 = weight(abstract_txt:different in 2289) [ClassicSimilarity], result of:
            0.030574534 = score(doc=2289,freq=1.0), product of:
              0.106935136 = queryWeight, product of:
                1.5802234 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.018490667 = queryNorm
              0.28591663 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.06923262 = weight(abstract_txt:database in 2289) [ClassicSimilarity], result of:
            0.06923262 = score(doc=2289,freq=2.0), product of:
              0.14635618 = queryWeight, product of:
                1.8486887 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.018490667 = queryNorm
              0.47304198 = fieldWeight in 2289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.2789375 = weight(abstract_txt:overlap in 2289) [ClassicSimilarity], result of:
            0.2789375 = score(doc=2289,freq=4.0), product of:
              0.25694177 = queryWeight, product of:
                2.0 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.018490667 = queryNorm
              1.0856059 = fieldWeight in 2289, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
        0.28 = coord(7/25)
    
  4. Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.15
    0.15017053 = sum of:
      0.15017053 = product of:
        0.9385659 = sum of:
          0.039769698 = weight(abstract_txt:subject in 5088) [ClassicSimilarity], result of:
            0.039769698 = score(doc=5088,freq=1.0), product of:
              0.081371374 = queryWeight, product of:
                1.1255077 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.018490667 = queryNorm
              0.4887431 = fieldWeight in 5088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.125 = fieldNorm(doc=5088)
          0.111245796 = weight(abstract_txt:generated in 5088) [ClassicSimilarity], result of:
            0.111245796 = score(doc=5088,freq=1.0), product of:
              0.16154464 = queryWeight, product of:
                1.5858383 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.018490667 = queryNorm
              0.68863815 = fieldWeight in 5088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.125 = fieldNorm(doc=5088)
          0.14007136 = weight(abstract_txt:engineering in 5088) [ClassicSimilarity], result of:
            0.14007136 = score(doc=5088,freq=1.0), product of:
              0.18836622 = queryWeight, product of:
                1.7124352 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.018490667 = queryNorm
              0.7436119 = fieldWeight in 5088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.125 = fieldNorm(doc=5088)
          0.64747906 = weight(abstract_txt:robot in 5088) [ClassicSimilarity], result of:
            0.64747906 = score(doc=5088,freq=1.0), product of:
              0.59834415 = queryWeight, product of:
                3.7379527 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.018490667 = queryNorm
              1.0821182 = fieldWeight in 5088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.125 = fieldNorm(doc=5088)
        0.16 = coord(4/25)
    
  5. Kimmel, S.: WWW search tools in reference services (1997) 0.14
    0.14372902 = sum of:
      0.14372902 = product of:
        1.1977419 = sum of:
          0.05965455 = weight(abstract_txt:subject in 1619) [ClassicSimilarity], result of:
            0.05965455 = score(doc=1619,freq=1.0), product of:
              0.081371374 = queryWeight, product of:
                1.1255077 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.018490667 = queryNorm
              0.73311466 = fieldWeight in 1619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.1875 = fieldNorm(doc=1619)
          0.1668687 = weight(abstract_txt:generated in 1619) [ClassicSimilarity], result of:
            0.1668687 = score(doc=1619,freq=1.0), product of:
              0.16154464 = queryWeight, product of:
                1.5858383 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.018490667 = queryNorm
              1.0329572 = fieldWeight in 1619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.1875 = fieldNorm(doc=1619)
          0.97121865 = weight(abstract_txt:robot in 1619) [ClassicSimilarity], result of:
            0.97121865 = score(doc=1619,freq=1.0), product of:
              0.59834415 = queryWeight, product of:
                3.7379527 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.018490667 = queryNorm
              1.6231773 = fieldWeight in 1619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.1875 = fieldNorm(doc=1619)
        0.12 = coord(3/25)