Document (#18399)

Author
Adamson, G.W.
Boreham, J.
Title
¬The use of an association measure based on character structure to identify semantically related pairs of words and document titles
Source
Information storage and retrieval. 10(1974), S.253-260
Year
1974
Abstract
An automatic classification technique has been developed, based on the character structure of words. Dice's similarity coefficient is computed from the number of matching diagrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clusterd into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second example of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews
Theme
Automatisches Klassifizieren
Field
Chemie

Similar documents (content)

  1. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.18
    0.17679758 = sum of:
      0.17679758 = product of:
        0.7366566 = sum of:
          0.07478328 = weight(abstract_txt:coefficient in 226) [ClassicSimilarity], result of:
            0.07478328 = score(doc=226,freq=1.0), product of:
              0.15400098 = queryWeight, product of:
                1.1623913 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.017051795 = queryNorm
              0.48560262 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.099772975 = weight(abstract_txt:pairs in 226) [ClassicSimilarity], result of:
            0.099772975 = score(doc=226,freq=1.0), product of:
              0.2351468 = queryWeight, product of:
                2.031305 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.017051795 = queryNorm
              0.4243008 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.016750006 = weight(abstract_txt:from in 226) [ClassicSimilarity], result of:
            0.016750006 = score(doc=226,freq=1.0), product of:
              0.09712264 = queryWeight, product of:
                2.0641243 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.017051795 = queryNorm
              0.17246243 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.12569048 = weight(abstract_txt:strings in 226) [ClassicSimilarity], result of:
            0.12569048 = score(doc=226,freq=1.0), product of:
              0.2742829 = queryWeight, product of:
                2.19384 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.017051795 = queryNorm
              0.45825124 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.16970816 = weight(abstract_txt:words in 226) [ClassicSimilarity], result of:
            0.16970816 = score(doc=226,freq=3.0), product of:
              0.29270843 = queryWeight, product of:
                3.2050753 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.017051795 = queryNorm
              0.5797857 = fieldWeight in 226, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
          0.24995168 = weight(abstract_txt:character in 226) [ClassicSimilarity], result of:
            0.24995168 = score(doc=226,freq=2.0), product of:
              0.43374503 = queryWeight, product of:
                3.9015563 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017051795 = queryNorm
              0.5762641 = fieldWeight in 226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=226)
        0.24 = coord(6/25)
    
  2. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.14
    0.14418647 = sum of:
      0.14418647 = product of:
        0.4505827 = sum of:
          0.017091969 = weight(abstract_txt:into in 1337) [ClassicSimilarity], result of:
            0.017091969 = score(doc=1337,freq=2.0), product of:
              0.06973878 = queryWeight, product of:
                1.1062232 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.017051795 = queryNorm
              0.24508555 = fieldWeight in 1337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.06209998 = weight(abstract_txt:groupings in 1337) [ClassicSimilarity], result of:
            0.06209998 = score(doc=1337,freq=1.0), product of:
              0.164819 = queryWeight, product of:
                1.2025254 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017051795 = queryNorm
              0.3767768 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.017696004 = weight(abstract_txt:related in 1337) [ClassicSimilarity], result of:
            0.017696004 = score(doc=1337,freq=1.0), product of:
              0.08992348 = queryWeight, product of:
                1.2561519 = boost
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.017051795 = queryNorm
              0.19678959 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.01979551 = weight(abstract_txt:structure in 1337) [ClassicSimilarity], result of:
            0.01979551 = score(doc=1337,freq=1.0), product of:
              0.0969023 = queryWeight, product of:
                1.3039852 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017051795 = queryNorm
              0.20428318 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.044939056 = weight(abstract_txt:titles in 1337) [ClassicSimilarity], result of:
            0.044939056 = score(doc=1337,freq=1.0), product of:
              0.16738078 = queryWeight, product of:
                1.7137932 = boost
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.017051795 = queryNorm
              0.26848397 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.021758895 = weight(abstract_txt:from in 1337) [ClassicSimilarity], result of:
            0.021758895 = score(doc=1337,freq=3.0), product of:
              0.09712264 = queryWeight, product of:
                2.0641243 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.017051795 = queryNorm
              0.22403526 = fieldWeight in 1337, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.16327672 = weight(abstract_txt:strings in 1337) [ClassicSimilarity], result of:
            0.16327672 = score(doc=1337,freq=3.0), product of:
              0.2742829 = queryWeight, product of:
                2.19384 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.017051795 = queryNorm
              0.5952858 = fieldWeight in 1337, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
          0.1039246 = weight(abstract_txt:words in 1337) [ClassicSimilarity], result of:
            0.1039246 = score(doc=1337,freq=2.0), product of:
              0.29270843 = queryWeight, product of:
                3.2050753 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.017051795 = queryNorm
              0.35504478 = fieldWeight in 1337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.046875 = fieldNorm(doc=1337)
        0.32 = coord(8/25)
    
  3. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.13
    0.13444923 = sum of:
      0.13444923 = product of:
        0.4801758 = sum of:
          0.10575953 = weight(abstract_txt:coefficient in 3775) [ClassicSimilarity], result of:
            0.10575953 = score(doc=3775,freq=2.0), product of:
              0.15400098 = queryWeight, product of:
                1.1623913 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.017051795 = queryNorm
              0.6867458 = fieldWeight in 3775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.07716662 = weight(abstract_txt:clustered in 3775) [ClassicSimilarity], result of:
            0.07716662 = score(doc=3775,freq=1.0), product of:
              0.15725584 = queryWeight, product of:
                1.1746109 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.017051795 = queryNorm
              0.4907075 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.023594672 = weight(abstract_txt:related in 3775) [ClassicSimilarity], result of:
            0.023594672 = score(doc=3775,freq=1.0), product of:
              0.08992348 = queryWeight, product of:
                1.2561519 = boost
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.017051795 = queryNorm
              0.2623861 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.045715775 = weight(abstract_txt:structure in 3775) [ClassicSimilarity], result of:
            0.045715775 = score(doc=3775,freq=3.0), product of:
              0.0969023 = queryWeight, product of:
                1.3039852 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.017051795 = queryNorm
              0.4717718 = fieldWeight in 3775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.10490427 = weight(abstract_txt:derived in 3775) [ClassicSimilarity], result of:
            0.10490427 = score(doc=3775,freq=3.0), product of:
              0.16858497 = queryWeight, product of:
                1.719947 = boost
                5.7482243 = idf(docFreq=384, maxDocs=44421)
                0.017051795 = queryNorm
              0.6222635 = fieldWeight in 3775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7482243 = idf(docFreq=384, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.08953492 = weight(abstract_txt:cluster in 3775) [ClassicSimilarity], result of:
            0.08953492 = score(doc=3775,freq=1.0), product of:
              0.21877219 = queryWeight, product of:
                1.9593034 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.017051795 = queryNorm
              0.409261 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.033500012 = weight(abstract_txt:from in 3775) [ClassicSimilarity], result of:
            0.033500012 = score(doc=3775,freq=4.0), product of:
              0.09712264 = queryWeight, product of:
                2.0641243 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.017051795 = queryNorm
              0.34492487 = fieldWeight in 3775, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
        0.28 = coord(7/25)
    
  4. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.12155786 = sum of:
      0.12155786 = product of:
        0.75973666 = sum of:
          0.016114462 = weight(abstract_txt:into in 206) [ClassicSimilarity], result of:
            0.016114462 = score(doc=206,freq=1.0), product of:
              0.06973878 = queryWeight, product of:
                1.1062232 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.017051795 = queryNorm
              0.23106888 = fieldWeight in 206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.016750006 = weight(abstract_txt:from in 206) [ClassicSimilarity], result of:
            0.016750006 = score(doc=206,freq=1.0), product of:
              0.09712264 = queryWeight, product of:
                2.0641243 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.017051795 = queryNorm
              0.17246243 = fieldWeight in 206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.29394317 = weight(abstract_txt:words in 206) [ClassicSimilarity], result of:
            0.29394317 = score(doc=206,freq=9.0), product of:
              0.29270843 = queryWeight, product of:
                3.2050753 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.017051795 = queryNorm
              1.0042183 = fieldWeight in 206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.432929 = weight(abstract_txt:character in 206) [ClassicSimilarity], result of:
            0.432929 = score(doc=206,freq=6.0), product of:
              0.43374503 = queryWeight, product of:
                3.9015563 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017051795 = queryNorm
              0.9981187 = fieldWeight in 206, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
        0.16 = coord(4/25)
    
  5. Beagle, D.: Visualizing keyword distribution across multidisciplinary c-space (2003) 0.11
    0.10766538 = sum of:
      0.10766538 = product of:
        0.33645433 = sum of:
          0.011394645 = weight(abstract_txt:into in 2202) [ClassicSimilarity], result of:
            0.011394645 = score(doc=2202,freq=2.0), product of:
              0.06973878 = queryWeight, product of:
                1.1062232 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.017051795 = queryNorm
              0.16339037 = fieldWeight in 2202, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.03605754 = weight(abstract_txt:diagrams in 2202) [ClassicSimilarity], result of:
            0.03605754 = score(doc=2202,freq=1.0), product of:
              0.15031576 = queryWeight, product of:
                1.1483992 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.017051795 = queryNorm
              0.23987862 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.041399986 = weight(abstract_txt:groupings in 2202) [ClassicSimilarity], result of:
            0.041399986 = score(doc=2202,freq=1.0), product of:
              0.164819 = queryWeight, product of:
                1.2025254 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017051795 = queryNorm
              0.25118455 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.011797336 = weight(abstract_txt:related in 2202) [ClassicSimilarity], result of:
            0.011797336 = score(doc=2202,freq=1.0), product of:
              0.08992348 = queryWeight, product of:
                1.2561519 = boost
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.017051795 = queryNorm
              0.13119306 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.079265036 = weight(abstract_txt:titles in 2202) [ClassicSimilarity], result of:
            0.079265036 = score(doc=2202,freq=7.0), product of:
              0.16738078 = queryWeight, product of:
                1.7137932 = boost
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.017051795 = queryNorm
              0.47356117 = fieldWeight in 2202, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.06331075 = weight(abstract_txt:cluster in 2202) [ClassicSimilarity], result of:
            0.06331075 = score(doc=2202,freq=2.0), product of:
              0.21877219 = queryWeight, product of:
                1.9593034 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.017051795 = queryNorm
              0.28939122 = fieldWeight in 2202, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.008375003 = weight(abstract_txt:from in 2202) [ClassicSimilarity], result of:
            0.008375003 = score(doc=2202,freq=1.0), product of:
              0.09712264 = queryWeight, product of:
                2.0641243 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.017051795 = queryNorm
              0.08623122 = fieldWeight in 2202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
          0.08485408 = weight(abstract_txt:words in 2202) [ClassicSimilarity], result of:
            0.08485408 = score(doc=2202,freq=3.0), product of:
              0.29270843 = queryWeight, product of:
                3.2050753 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.017051795 = queryNorm
              0.28989285 = fieldWeight in 2202, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.03125 = fieldNorm(doc=2202)
        0.32 = coord(8/25)