Document (#19812)

Author
Cheng, K.-H.
Title
Automatic identification for topics of electronic documents
Source
Bulletin of the Library Association of China. 1997, no.59, Dec., S.43-58
Year
1997
Abstract
With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
Footnote
[In Chinesisch]
Theme
Automatisches Indexieren
Internet
Computerlinguistik

Similar documents (author)

  1. Cheng, L.R.L.: Beyond bilingualism : a quest for communicative competence (1996) 5.21
    5.2088575 = sum of:
      5.2088575 = weight(author_txt:cheng in 5291) [ClassicSimilarity], result of:
        5.2088575 = fieldWeight in 5291, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.625 = fieldNorm(doc=5291)
    
  2. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 4.17
    4.167086 = sum of:
      4.167086 = weight(author_txt:cheng in 2256) [ClassicSimilarity], result of:
        4.167086 = fieldWeight in 2256, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.5 = fieldNorm(doc=2256)
    
  3. Cheng, L.-y.: On bibliographic(al) control (1998) 4.17
    4.167086 = sum of:
      4.167086 = weight(author_txt:cheng in 4376) [ClassicSimilarity], result of:
        4.167086 = fieldWeight in 4376, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.5 = fieldNorm(doc=4376)
    
  4. Harter, S.P.; Cheng, Y.-R.: Colinked descriptors : improving vocabulary selection for end-user searching (1996) 3.65
    3.6462004 = sum of:
      3.6462004 = weight(author_txt:cheng in 4284) [ClassicSimilarity], result of:
        3.6462004 = fieldWeight in 4284, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.4375 = fieldNorm(doc=4284)
    
  5. Cheng, W.-N.; Khoo, C.S.G.: Information and argument structures in Sociology research abstracts (2018) 3.65
    3.6462004 = sum of:
      3.6462004 = weight(author_txt:cheng in 750) [ClassicSimilarity], result of:
        3.6462004 = fieldWeight in 750, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.334172 = idf(docFreq=28, maxDocs=44421)
          0.4375 = fieldNorm(doc=750)
    

Similar documents (content)

  1. WordNet : an electronic lexical database (language, speech and communication) (1998) 0.20
    0.1975444 = sum of:
      0.1975444 = product of:
        0.987722 = sum of:
          0.25868663 = weight(abstract_txt:verbs in 3434) [ClassicSimilarity], result of:
            0.25868663 = score(doc=3434,freq=2.0), product of:
              0.23691113 = queryWeight, product of:
                1.5160474 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.018974507 = queryNorm
              1.0919142 = fieldWeight in 3434, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.09375 = fieldNorm(doc=3434)
          0.051573634 = weight(abstract_txt:electronic in 3434) [ClassicSimilarity], result of:
            0.051573634 = score(doc=3434,freq=1.0), product of:
              0.12834373 = queryWeight, product of:
                1.5780559 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.018974507 = queryNorm
              0.4018399 = fieldWeight in 3434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.09375 = fieldNorm(doc=3434)
          0.1299191 = weight(abstract_txt:identification in 3434) [ClassicSimilarity], result of:
            0.1299191 = score(doc=3434,freq=1.0), product of:
              0.23761371 = queryWeight, product of:
                2.1471915 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.018974507 = queryNorm
              0.546766 = fieldWeight in 3434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.09375 = fieldNorm(doc=3434)
          0.33689988 = weight(abstract_txt:nouns in 3434) [ClassicSimilarity], result of:
            0.33689988 = score(doc=3434,freq=1.0), product of:
              0.4484941 = queryWeight, product of:
                2.9499414 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.018974507 = queryNorm
              0.7511802 = fieldWeight in 3434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.09375 = fieldNorm(doc=3434)
          0.21064277 = weight(abstract_txt:word in 3434) [ClassicSimilarity], result of:
            0.21064277 = score(doc=3434,freq=1.0), product of:
              0.41317165 = queryWeight, product of:
                4.0041957 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.018974507 = queryNorm
              0.50981903 = fieldWeight in 3434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.09375 = fieldNorm(doc=3434)
        0.2 = coord(5/25)
    
  2. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17027313 = sum of:
      0.17027313 = product of:
        0.6081183 = sum of:
          0.048746813 = weight(abstract_txt:frequency in 188) [ClassicSimilarity], result of:
            0.048746813 = score(doc=188,freq=2.0), product of:
              0.12361002 = queryWeight, product of:
                1.0950825 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.018974507 = queryNorm
              0.39435974 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.03469183 = weight(abstract_txt:taking in 188) [ClassicSimilarity], result of:
            0.03469183 = score(doc=188,freq=1.0), product of:
              0.124141686 = queryWeight, product of:
                1.0974351 = boost
                5.9616747 = idf(docFreq=310, maxDocs=44421)
                0.018974507 = queryNorm
              0.27945352 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9616747 = idf(docFreq=310, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.014527348 = weight(abstract_txt:that in 188) [ClassicSimilarity], result of:
            0.014527348 = score(doc=188,freq=5.0), product of:
              0.058605827 = queryWeight, product of:
                1.3060237 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018974507 = queryNorm
              0.2478823 = fieldWeight in 188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.06387677 = weight(abstract_txt:assign in 188) [ClassicSimilarity], result of:
            0.06387677 = score(doc=188,freq=1.0), product of:
              0.18649255 = queryWeight, product of:
                1.3450882 = boost
                7.3070183 = idf(docFreq=80, maxDocs=44421)
                0.018974507 = queryNorm
              0.34251648 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3070183 = idf(docFreq=80, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.06495955 = weight(abstract_txt:identification in 188) [ClassicSimilarity], result of:
            0.06495955 = score(doc=188,freq=1.0), product of:
              0.23761371 = queryWeight, product of:
                2.1471915 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.018974507 = queryNorm
              0.273383 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.102661856 = weight(abstract_txt:documents in 188) [ClassicSimilarity], result of:
            0.102661856 = score(doc=188,freq=5.0), product of:
              0.23753934 = queryWeight, product of:
                3.036112 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018974507 = queryNorm
              0.43218887 = fieldWeight in 188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
          0.27865416 = weight(abstract_txt:word in 188) [ClassicSimilarity], result of:
            0.27865416 = score(doc=188,freq=7.0), product of:
              0.41317165 = queryWeight, product of:
                4.0041957 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.018974507 = queryNorm
              0.6744271 = fieldWeight in 188, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.046875 = fieldNorm(doc=188)
        0.28 = coord(7/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.15
    0.14645943 = sum of:
      0.14645943 = product of:
        0.7322971 = sum of:
          0.01531317 = weight(abstract_txt:that in 1643) [ClassicSimilarity], result of:
            0.01531317 = score(doc=1643,freq=2.0), product of:
              0.058605827 = queryWeight, product of:
                1.3060237 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018974507 = queryNorm
              0.2612909 = fieldWeight in 1643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.15243255 = weight(abstract_txt:verbs in 1643) [ClassicSimilarity], result of:
            0.15243255 = score(doc=1643,freq=1.0), product of:
              0.23691113 = queryWeight, product of:
                1.5160474 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.018974507 = queryNorm
              0.6434166 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.108265914 = weight(abstract_txt:identification in 1643) [ClassicSimilarity], result of:
            0.108265914 = score(doc=1643,freq=1.0), product of:
              0.23761371 = queryWeight, product of:
                2.1471915 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.018974507 = queryNorm
              0.45563832 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.2807499 = weight(abstract_txt:nouns in 1643) [ClassicSimilarity], result of:
            0.2807499 = score(doc=1643,freq=1.0), product of:
              0.4484941 = queryWeight, product of:
                2.9499414 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.018974507 = queryNorm
              0.6259835 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.17553562 = weight(abstract_txt:word in 1643) [ClassicSimilarity], result of:
            0.17553562 = score(doc=1643,freq=1.0), product of:
              0.41317165 = queryWeight, product of:
                4.0041957 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.018974507 = queryNorm
              0.42484915 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
        0.2 = coord(5/25)
    
  4. Green, R.: WordNet (2009) 0.13
    0.13291302 = sum of:
      0.13291302 = product of:
        0.8307064 = sum of:
          0.012993655 = weight(abstract_txt:that in 696) [ClassicSimilarity], result of:
            0.012993655 = score(doc=696,freq=1.0), product of:
              0.058605827 = queryWeight, product of:
                1.3060237 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018974507 = queryNorm
              0.22171268 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.18291906 = weight(abstract_txt:verbs in 696) [ClassicSimilarity], result of:
            0.18291906 = score(doc=696,freq=1.0), product of:
              0.23691113 = queryWeight, product of:
                1.5160474 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.018974507 = queryNorm
              0.77209985 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.33689988 = weight(abstract_txt:nouns in 696) [ClassicSimilarity], result of:
            0.33689988 = score(doc=696,freq=1.0), product of:
              0.4484941 = queryWeight, product of:
                2.9499414 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.018974507 = queryNorm
              0.7511802 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.29789382 = weight(abstract_txt:word in 696) [ClassicSimilarity], result of:
            0.29789382 = score(doc=696,freq=2.0), product of:
              0.41317165 = queryWeight, product of:
                4.0041957 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.018974507 = queryNorm
              0.7209929 = fieldWeight in 696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
        0.16 = coord(4/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.13
    0.12679589 = sum of:
      0.12679589 = product of:
        0.63397944 = sum of:
          0.05559914 = weight(abstract_txt:numbers in 4267) [ClassicSimilarity], result of:
            0.05559914 = score(doc=4267,freq=1.0), product of:
              0.120942526 = queryWeight, product of:
                1.0832022 = boost
                5.8843565 = idf(docFreq=335, maxDocs=44421)
                0.018974507 = queryNorm
              0.45971537 = fieldWeight in 4267, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8843565 = idf(docFreq=335, maxDocs=44421)
                0.078125 = fieldNorm(doc=4267)
          0.010828045 = weight(abstract_txt:that in 4267) [ClassicSimilarity], result of:
            0.010828045 = score(doc=4267,freq=1.0), product of:
              0.058605827 = queryWeight, product of:
                1.3060237 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018974507 = queryNorm
              0.18476056 = fieldWeight in 4267, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4267)
          0.108265914 = weight(abstract_txt:identification in 4267) [ClassicSimilarity], result of:
            0.108265914 = score(doc=4267,freq=1.0), product of:
              0.23761371 = queryWeight, product of:
                2.1471915 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.018974507 = queryNorm
              0.45563832 = fieldWeight in 4267, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.078125 = fieldNorm(doc=4267)
          0.108215086 = weight(abstract_txt:documents in 4267) [ClassicSimilarity], result of:
            0.108215086 = score(doc=4267,freq=2.0), product of:
              0.23753934 = queryWeight, product of:
                3.036112 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.018974507 = queryNorm
              0.455567 = fieldWeight in 4267, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=4267)
          0.35107124 = weight(abstract_txt:word in 4267) [ClassicSimilarity], result of:
            0.35107124 = score(doc=4267,freq=4.0), product of:
              0.41317165 = queryWeight, product of:
                4.0041957 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.018974507 = queryNorm
              0.8496983 = fieldWeight in 4267, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=4267)
        0.2 = coord(5/25)