Document (#13778)

Author
Tsujii, J.-I.
Title
Automatic acquisition of semantic collocation from corpora
Source
Machine translation. 10(1995) no.3, S.219-258
Year
1995
Abstract
Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Theme
Automatisches Indexieren

Similar documents (content)

  1. Sánchez-de-Madariaga, R.; Fernández-del-Castillo, J.R.: ¬The bootstrapping of the Yarowsky algorithm in real corpora (2009) 0.18
    0.17842706 = sum of:
      0.17842706 = product of:
        0.89213526 = sum of:
          0.06285545 = weight(abstract_txt:real in 3451) [ClassicSimilarity], result of:
            0.06285545 = score(doc=3451,freq=2.0), product of:
              0.08937358 = queryWeight, product of:
                5.304538 = idf(docFreq=599, maxDocs=44421)
                0.016848514 = queryNorm
              0.703289 = fieldWeight in 3451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.304538 = idf(docFreq=599, maxDocs=44421)
                0.09375 = fieldNorm(doc=3451)
          0.03984438 = weight(abstract_txt:knowledge in 3451) [ClassicSimilarity], result of:
            0.03984438 = score(doc=3451,freq=1.0), product of:
              0.11984198 = queryWeight, product of:
                2.0056748 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.016848514 = queryNorm
              0.33247432 = fieldWeight in 3451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.09375 = fieldNorm(doc=3451)
          0.14453575 = weight(abstract_txt:acquisition in 3451) [ClassicSimilarity], result of:
            0.14453575 = score(doc=3451,freq=1.0), product of:
              0.24716333 = queryWeight, product of:
                2.3518112 = boost
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.016848514 = queryNorm
              0.5847783 = fieldWeight in 3451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.09375 = fieldNorm(doc=3451)
          0.41031903 = weight(abstract_txt:corpora in 3451) [ClassicSimilarity], result of:
            0.41031903 = score(doc=3451,freq=4.0), product of:
              0.3121727 = queryWeight, product of:
                2.6430652 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.016848514 = queryNorm
              1.3143975 = fieldWeight in 3451, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.09375 = fieldNorm(doc=3451)
          0.2345806 = weight(abstract_txt:algorithm in 3451) [ClassicSimilarity], result of:
            0.2345806 = score(doc=3451,freq=2.0), product of:
              0.31013373 = queryWeight, product of:
                3.2264917 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016848514 = queryNorm
              0.7563853 = fieldWeight in 3451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.09375 = fieldNorm(doc=3451)
        0.2 = coord(5/25)
    
  2. Ibekwe-SanJuan, F.: Constructing and maintaining knowledge organization tools : a symbolic approach (2006) 0.16
    0.16171725 = sum of:
      0.16171725 = product of:
        0.5775616 = sum of:
          0.007299205 = weight(abstract_txt:from in 595) [ClassicSimilarity], result of:
            0.007299205 = score(doc=595,freq=1.0), product of:
              0.048369657 = queryWeight, product of:
                1.0403918 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016848514 = queryNorm
              0.15090463 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.06890888 = weight(abstract_txt:automatic in 595) [ClassicSimilarity], result of:
            0.06890888 = score(doc=595,freq=2.0), product of:
              0.1714863 = queryWeight, product of:
                1.958958 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.016848514 = queryNorm
              0.40183315 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.0569324 = weight(abstract_txt:knowledge in 595) [ClassicSimilarity], result of:
            0.0569324 = score(doc=595,freq=6.0), product of:
              0.11984198 = queryWeight, product of:
                2.0056748 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.016848514 = queryNorm
              0.47506225 = fieldWeight in 595, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.07854362 = weight(abstract_txt:corpus in 595) [ClassicSimilarity], result of:
            0.07854362 = score(doc=595,freq=1.0), product of:
              0.23575626 = queryWeight, product of:
                2.2968998 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.016848514 = queryNorm
              0.33315602 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.08431252 = weight(abstract_txt:acquisition in 595) [ClassicSimilarity], result of:
            0.08431252 = score(doc=595,freq=1.0), product of:
              0.24716333 = queryWeight, product of:
                2.3518112 = boost
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.016848514 = queryNorm
              0.3411207 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.13683869 = weight(abstract_txt:algorithm in 595) [ClassicSimilarity], result of:
            0.13683869 = score(doc=595,freq=2.0), product of:
              0.31013373 = queryWeight, product of:
                3.2264917 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016848514 = queryNorm
              0.44122478 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
          0.14472628 = weight(abstract_txt:linguistic in 595) [ClassicSimilarity], result of:
            0.14472628 = score(doc=595,freq=2.0), product of:
              0.3219398 = queryWeight, product of:
                3.2873306 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.016848514 = queryNorm
              0.44954452 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.0546875 = fieldNorm(doc=595)
        0.28 = coord(7/25)
    
  3. Cui, H.; Heidorn, P.B.: ¬The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions (2007) 0.13
    0.12813148 = sum of:
      0.12813148 = product of:
        0.64065737 = sum of:
          0.016683897 = weight(abstract_txt:from in 1084) [ClassicSimilarity], result of:
            0.016683897 = score(doc=1084,freq=4.0), product of:
              0.048369657 = queryWeight, product of:
                1.0403918 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016848514 = queryNorm
              0.34492487 = fieldWeight in 1084, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=1084)
          0.055686783 = weight(abstract_txt:automatic in 1084) [ClassicSimilarity], result of:
            0.055686783 = score(doc=1084,freq=1.0), product of:
              0.1714863 = queryWeight, product of:
                1.958958 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.016848514 = queryNorm
              0.32473022 = fieldWeight in 1084, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=1084)
          0.046008326 = weight(abstract_txt:knowledge in 1084) [ClassicSimilarity], result of:
            0.046008326 = score(doc=1084,freq=3.0), product of:
              0.11984198 = queryWeight, product of:
                2.0056748 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.016848514 = queryNorm
              0.38390827 = fieldWeight in 1084, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0625 = fieldNorm(doc=1084)
          0.08976413 = weight(abstract_txt:corpus in 1084) [ClassicSimilarity], result of:
            0.08976413 = score(doc=1084,freq=1.0), product of:
              0.23575626 = queryWeight, product of:
                2.2968998 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.016848514 = queryNorm
              0.38074973 = fieldWeight in 1084, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=1084)
          0.43251422 = weight(abstract_txt:corpora in 1084) [ClassicSimilarity], result of:
            0.43251422 = score(doc=1084,freq=10.0), product of:
              0.3121727 = queryWeight, product of:
                2.6430652 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.016848514 = queryNorm
              1.3854966 = fieldWeight in 1084, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.0625 = fieldNorm(doc=1084)
        0.2 = coord(5/25)
    
  4. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.12
    0.11767974 = sum of:
      0.11767974 = product of:
        0.5883987 = sum of:
          0.01474662 = weight(abstract_txt:from in 1643) [ClassicSimilarity], result of:
            0.01474662 = score(doc=1643,freq=2.0), product of:
              0.048369657 = queryWeight, product of:
                1.0403918 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016848514 = queryNorm
              0.30487338 = fieldWeight in 1643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.09780814 = weight(abstract_txt:intervention in 1643) [ClassicSimilarity], result of:
            0.09780814 = score(doc=1643,freq=1.0), product of:
              0.17075025 = queryWeight, product of:
                1.3822166 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.016848514 = queryNorm
              0.57281405 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.15868208 = weight(abstract_txt:corpus in 1643) [ClassicSimilarity], result of:
            0.15868208 = score(doc=1643,freq=2.0), product of:
              0.23575626 = queryWeight, product of:
                2.2968998 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.016848514 = queryNorm
              0.6730768 = fieldWeight in 1643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.17096625 = weight(abstract_txt:corpora in 1643) [ClassicSimilarity], result of:
            0.17096625 = score(doc=1643,freq=1.0), product of:
              0.3121727 = queryWeight, product of:
                2.6430652 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.016848514 = queryNorm
              0.5476656 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.14619562 = weight(abstract_txt:linguistic in 1643) [ClassicSimilarity], result of:
            0.14619562 = score(doc=1643,freq=1.0), product of:
              0.3219398 = queryWeight, product of:
                3.2873306 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.016848514 = queryNorm
              0.45410857 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
        0.2 = coord(5/25)
    
  5. Anguiano Peña, G.; Naumis Peña, C.: Method for selecting specialized terms from a general language corpus (2015) 0.11
    0.11237746 = sum of:
      0.11237746 = product of:
        0.56188726 = sum of:
          0.01474662 = weight(abstract_txt:from in 3196) [ClassicSimilarity], result of:
            0.01474662 = score(doc=3196,freq=2.0), product of:
              0.048369657 = queryWeight, product of:
                1.0403918 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016848514 = queryNorm
              0.30487338 = fieldWeight in 3196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=3196)
          0.04695705 = weight(abstract_txt:knowledge in 3196) [ClassicSimilarity], result of:
            0.04695705 = score(doc=3196,freq=2.0), product of:
              0.11984198 = queryWeight, product of:
                2.0056748 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.016848514 = queryNorm
              0.39182472 = fieldWeight in 3196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.078125 = fieldNorm(doc=3196)
          0.11220516 = weight(abstract_txt:corpus in 3196) [ClassicSimilarity], result of:
            0.11220516 = score(doc=3196,freq=1.0), product of:
              0.23575626 = queryWeight, product of:
                2.2968998 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.016848514 = queryNorm
              0.47593716 = fieldWeight in 3196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=3196)
          0.24178281 = weight(abstract_txt:corpora in 3196) [ClassicSimilarity], result of:
            0.24178281 = score(doc=3196,freq=2.0), product of:
              0.3121727 = queryWeight, product of:
                2.6430652 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.016848514 = queryNorm
              0.77451617 = fieldWeight in 3196, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=3196)
          0.14619562 = weight(abstract_txt:linguistic in 3196) [ClassicSimilarity], result of:
            0.14619562 = score(doc=3196,freq=1.0), product of:
              0.3219398 = queryWeight, product of:
                3.2873306 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.016848514 = queryNorm
              0.45410857 = fieldWeight in 3196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.078125 = fieldNorm(doc=3196)
        0.2 = coord(5/25)