Document (#33499)

Craven, T.C.
Determining authorship of Web pages
Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad
New Delhi : Ess Ess Publications
Assignability of authors to Web pages using either normal browsing procedures or browsing assisted by simple automatic extraction was investigated. Candidate strings for 1000 pages were extracted automatically from title elements, meta-tags, and address-like and copyright-like passages; 539 of the pages produced at least one candidate: 310 candidates from titles, 66 from meta-tags, 91 from address-like passages, and 259 from copyright-like passages. An assistant attempted to identify personal authors for 943 pages by examining the pages themselves and related pages; this added 90 pages with authors to the pages from which no candidate strings were extracted. Specific problems are noted and some refinements to the extraction methods are suggested.

Similar documents (author)

  1. Craven, T.C.: ¬An online index entry format based on multiple search terms (1987) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:craven in 437) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 437, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=437)
  2. Craven, T.C.: Adapting of string indexing systems for retrieval using proximity operators (1988) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:craven in 704) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 704, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=704)
  3. Craven, T.C.: Customized extracts based on Boolean queries and sentence dependency structures (1989) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:craven in 788) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 788, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=788)
  4. Craven, T.C.: Research in document classification and indexing (Canada) 1971-1980 (1981) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:craven in 1210) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 1210, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=1210)
  5. Craven, T.C.: NEPHIS: a nested phrase indexing system (1977) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:craven in 1332) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 1332, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=1332)

Similar documents (content)

  1. Craven, T.C.: 'DESCRIPTION' META tags in locally linked web pages (2001) 0.15
    0.15389483 = sum of:
      0.15389483 = product of:
        0.7694741 = sum of:
          0.031805534 = weight(abstract_txt:were in 826) [ClassicSimilarity], result of:
            0.031805534 = score(doc=826,freq=3.0), product of:
              0.05340753 = queryWeight, product of:
                1.1298506 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.012888819 = queryNorm
              0.59552526 = fieldWeight in 826, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.09375 = fieldNorm(doc=826)
          0.09403987 = weight(abstract_txt:tags in 826) [ClassicSimilarity], result of:
            0.09403987 = score(doc=826,freq=1.0), product of:
              0.15867767 = queryWeight, product of:
                1.9475011 = boost
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.012888819 = queryNorm
              0.5926472 = fieldWeight in 826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.09375 = fieldNorm(doc=826)
          0.099368185 = weight(abstract_txt:meta in 826) [ClassicSimilarity], result of:
            0.099368185 = score(doc=826,freq=1.0), product of:
              0.16461624 = queryWeight, product of:
                1.9836093 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.012888819 = queryNorm
              0.6036354 = fieldWeight in 826, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.09375 = fieldNorm(doc=826)
          0.03318309 = weight(abstract_txt:from in 826) [ClassicSimilarity], result of:
            0.03318309 = score(doc=826,freq=2.0), product of:
              0.09070183 = queryWeight, product of:
                2.5502834 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.012888819 = queryNorm
              0.36584806 = fieldWeight in 826, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.09375 = fieldNorm(doc=826)
          0.5110774 = weight(abstract_txt:pages in 826) [ClassicSimilarity], result of:
            0.5110774 = score(doc=826,freq=3.0), product of:
              0.56147355 = queryWeight, product of:
                7.771247 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012888819 = queryNorm
              0.91024303 = fieldWeight in 826, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.09375 = fieldNorm(doc=826)
        0.2 = coord(5/25)
  2. Bar-Ilan, J.: ¬The Web as an information source on informetrics? : A content analysis (2000) 0.13
    0.12978373 = sum of:
      0.12978373 = product of:
        0.6489186 = sum of:
          0.02120369 = weight(abstract_txt:were in 5587) [ClassicSimilarity], result of:
            0.02120369 = score(doc=5587,freq=3.0), product of:
              0.05340753 = queryWeight, product of:
                1.1298506 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.012888819 = queryNorm
              0.39701685 = fieldWeight in 5587, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=5587)
          0.10010099 = weight(abstract_txt:extracted in 5587) [ClassicSimilarity], result of:
            0.10010099 = score(doc=5587,freq=3.0), product of:
              0.1502982 = queryWeight, product of:
                1.8953817 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.012888819 = queryNorm
              0.6660159 = fieldWeight in 5587, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.0625 = fieldNorm(doc=5587)
          0.052770503 = weight(abstract_txt:authors in 5587) [ClassicSimilarity], result of:
            0.052770503 = score(doc=5587,freq=2.0), product of:
              0.12852369 = queryWeight, product of:
                2.146629 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.012888819 = queryNorm
              0.4105897 = fieldWeight in 5587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0625 = fieldNorm(doc=5587)
          0.03497805 = weight(abstract_txt:from in 5587) [ClassicSimilarity], result of:
            0.03497805 = score(doc=5587,freq=5.0), product of:
              0.09070183 = queryWeight, product of:
                2.5502834 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.012888819 = queryNorm
              0.38563773 = fieldWeight in 5587, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=5587)
          0.43986538 = weight(abstract_txt:pages in 5587) [ClassicSimilarity], result of:
            0.43986538 = score(doc=5587,freq=5.0), product of:
              0.56147355 = queryWeight, product of:
                7.771247 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012888819 = queryNorm
              0.78341246 = fieldWeight in 5587, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=5587)
        0.2 = coord(5/25)
  3. Turner, T.P.; Brackbill, L.: Rising to the top : evaluating the use of HTML META tag to improve retrieval of World Wide Web documents through Internet search engines (1998) 0.13
    0.12880753 = sum of:
      0.12880753 = product of:
        0.6440376 = sum of:
          0.02120369 = weight(abstract_txt:were in 6230) [ClassicSimilarity], result of:
            0.02120369 = score(doc=6230,freq=3.0), product of:
              0.05340753 = queryWeight, product of:
                1.1298506 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.012888819 = queryNorm
              0.39701685 = fieldWeight in 6230, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=6230)
          0.1085879 = weight(abstract_txt:tags in 6230) [ClassicSimilarity], result of:
            0.1085879 = score(doc=6230,freq=3.0), product of:
              0.15867767 = queryWeight, product of:
                1.9475011 = boost
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.012888819 = queryNorm
              0.68433005 = fieldWeight in 6230, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.0625 = fieldNorm(doc=6230)
          0.19873637 = weight(abstract_txt:meta in 6230) [ClassicSimilarity], result of:
            0.19873637 = score(doc=6230,freq=9.0), product of:
              0.16461624 = queryWeight, product of:
                1.9836093 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.012888819 = queryNorm
              1.2072707 = fieldWeight in 6230, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.0625 = fieldNorm(doc=6230)
          0.03731438 = weight(abstract_txt:authors in 6230) [ClassicSimilarity], result of:
            0.03731438 = score(doc=6230,freq=1.0), product of:
              0.12852369 = queryWeight, product of:
                2.146629 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.012888819 = queryNorm
              0.29033077 = fieldWeight in 6230, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0625 = fieldNorm(doc=6230)
          0.2781953 = weight(abstract_txt:pages in 6230) [ClassicSimilarity], result of:
            0.2781953 = score(doc=6230,freq=2.0), product of:
              0.56147355 = queryWeight, product of:
                7.771247 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012888819 = queryNorm
              0.49547353 = fieldWeight in 6230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=6230)
        0.2 = coord(5/25)
  4. Ajiferuke, I.; Wolfram, D.: Analysis of Web page image tag distribution characteristics (2005) 0.12
    0.12412667 = sum of:
      0.12412667 = product of:
        0.62063336 = sum of:
          0.012241957 = weight(abstract_txt:were in 2059) [ClassicSimilarity], result of:
            0.012241957 = score(doc=2059,freq=1.0), product of:
              0.05340753 = queryWeight, product of:
                1.1298506 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.012888819 = queryNorm
              0.2292178 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=2059)
          0.08866164 = weight(abstract_txt:tags in 2059) [ClassicSimilarity], result of:
            0.08866164 = score(doc=2059,freq=2.0), product of:
              0.15867767 = queryWeight, product of:
                1.9475011 = boost
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.012888819 = queryNorm
              0.55875313 = fieldWeight in 2059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3215704 = idf(docFreq=216, maxDocs=44421)
                0.0625 = fieldNorm(doc=2059)
          0.052770503 = weight(abstract_txt:authors in 2059) [ClassicSimilarity], result of:
            0.052770503 = score(doc=2059,freq=2.0), product of:
              0.12852369 = queryWeight, product of:
                2.146629 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.012888819 = queryNorm
              0.4105897 = fieldWeight in 2059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0625 = fieldNorm(doc=2059)
          0.027093878 = weight(abstract_txt:from in 2059) [ClassicSimilarity], result of:
            0.027093878 = score(doc=2059,freq=3.0), product of:
              0.09070183 = queryWeight, product of:
                2.5502834 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.012888819 = queryNorm
              0.29871368 = fieldWeight in 2059, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=2059)
          0.43986538 = weight(abstract_txt:pages in 2059) [ClassicSimilarity], result of:
            0.43986538 = score(doc=2059,freq=5.0), product of:
              0.56147355 = queryWeight, product of:
                7.771247 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012888819 = queryNorm
              0.78341246 = fieldWeight in 2059, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=2059)
        0.2 = coord(5/25)
  5. Craven, T.C.: Variations in use of meta tag descriptions by Web pages in different languages (2004) 0.10
    0.103064 = sum of:
      0.103064 = product of:
        0.8588667 = sum of:
          0.03672587 = weight(abstract_txt:were in 3569) [ClassicSimilarity], result of:
            0.03672587 = score(doc=3569,freq=4.0), product of:
              0.05340753 = queryWeight, product of:
                1.1298506 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.012888819 = queryNorm
              0.6876534 = fieldWeight in 3569, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.09375 = fieldNorm(doc=3569)
          0.099368185 = weight(abstract_txt:meta in 3569) [ClassicSimilarity], result of:
            0.099368185 = score(doc=3569,freq=1.0), product of:
              0.16461624 = queryWeight, product of:
                1.9836093 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.012888819 = queryNorm
              0.6036354 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.09375 = fieldNorm(doc=3569)
          0.72277266 = weight(abstract_txt:pages in 3569) [ClassicSimilarity], result of:
            0.72277266 = score(doc=3569,freq=6.0), product of:
              0.56147355 = queryWeight, product of:
                7.771247 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012888819 = queryNorm
              1.2872782 = fieldWeight in 3569, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.09375 = fieldNorm(doc=3569)
        0.12 = coord(3/25)