Document (#14647)

Author
Jacquemin, C.
Title
What is the tree that we see through the window : a linguistic approach to windowing and term variation
Source
Information processing and management. 32(1996) no.4, S.445-458
Year
1996
Abstract
Provides a linguistic approach to text windowing through an extraction of term variants with the help of a partial parser. The syntactic grounding of the method ensures ehat words observed within restricted spans are lexically related and that spurious word cooccurrences are rules out with a good level of confidence. The system is computationally tractable on large corpora and large lists of terms. Gives illustrative examples of term variation from a large medical corpus. An experimental evaluation of the method shows that only a small proportion of co-occuring words are lexically related and motivates the call for natural language parsing techniques in text windowing
Theme
Computerlinguistik

Similar documents (content)

  1. Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.22
    0.22139028 = sum of:
      0.22139028 = product of:
        0.61497295 = sum of:
          0.11205753 = weight(abstract_txt:variants in 1119) [ClassicSimilarity], result of:
            0.11205753 = score(doc=1119,freq=2.0), product of:
              0.16911088 = queryWeight, product of:
                1.0259691 = boost
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.021986837 = queryNorm
              0.6626276 = fieldWeight in 1119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.024068685 = weight(abstract_txt:through in 1119) [ClassicSimilarity], result of:
            0.024068685 = score(doc=1119,freq=1.0), product of:
              0.09627919 = queryWeight, product of:
                1.0947872 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.021986837 = queryNorm
              0.24998845 = fieldWeight in 1119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.03509749 = weight(abstract_txt:text in 1119) [ClassicSimilarity], result of:
            0.03509749 = score(doc=1119,freq=2.0), product of:
              0.09826636 = queryWeight, product of:
                1.1060276 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021986837 = queryNorm
              0.3571669 = fieldWeight in 1119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.107542284 = weight(abstract_txt:parser in 1119) [ClassicSimilarity], result of:
            0.107542284 = score(doc=1119,freq=1.0), product of:
              0.20730369 = queryWeight, product of:
                1.1359313 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.021986837 = queryNorm
              0.5187669 = fieldWeight in 1119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.027830118 = weight(abstract_txt:related in 1119) [ClassicSimilarity], result of:
            0.027830118 = score(doc=1119,freq=1.0), product of:
              0.10606551 = queryWeight, product of:
                1.149081 = boost
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.021986837 = queryNorm
              0.2623861 = fieldWeight in 1119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.034247156 = weight(abstract_txt:method in 1119) [ClassicSimilarity], result of:
            0.034247156 = score(doc=1119,freq=1.0), product of:
              0.121799976 = queryWeight, product of:
                1.2313659 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.021986837 = queryNorm
              0.2811754 = fieldWeight in 1119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.08171997 = weight(abstract_txt:words in 1119) [ClassicSimilarity], result of:
            0.08171997 = score(doc=1119,freq=2.0), product of:
              0.17262605 = queryWeight, product of:
                1.4659417 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.021986837 = queryNorm
              0.47339305 = fieldWeight in 1119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.104461566 = weight(abstract_txt:linguistic in 1119) [ClassicSimilarity], result of:
            0.104461566 = score(doc=1119,freq=2.0), product of:
              0.20332551 = queryWeight, product of:
                1.5909607 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.021986837 = queryNorm
              0.51376516 = fieldWeight in 1119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
          0.08794814 = weight(abstract_txt:term in 1119) [ClassicSimilarity], result of:
            0.08794814 = score(doc=1119,freq=2.0), product of:
              0.20752434 = queryWeight, product of:
                1.9685374 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021986837 = queryNorm
              0.42379674 = fieldWeight in 1119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.0625 = fieldNorm(doc=1119)
        0.36 = coord(9/25)
    
  2. Bowker, L.: ¬A corpus-based investigation of variation in the organization of medical terms (2000) 0.15
    0.14620884 = sum of:
      0.14620884 = product of:
        0.7310442 = sum of:
          0.029541807 = weight(abstract_txt:approach in 1092) [ClassicSimilarity], result of:
            0.029541807 = score(doc=1092,freq=1.0), product of:
              0.08422895 = queryWeight, product of:
                1.0239865 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.021986837 = queryNorm
              0.35073224 = fieldWeight in 1092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.09375 = fieldNorm(doc=1092)
          0.037226513 = weight(abstract_txt:text in 1092) [ClassicSimilarity], result of:
            0.037226513 = score(doc=1092,freq=1.0), product of:
              0.09826636 = queryWeight, product of:
                1.1060276 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.021986837 = queryNorm
              0.3788327 = fieldWeight in 1092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=1092)
          0.110798225 = weight(abstract_txt:linguistic in 1092) [ClassicSimilarity], result of:
            0.110798225 = score(doc=1092,freq=1.0), product of:
              0.20332551 = queryWeight, product of:
                1.5909607 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.021986837 = queryNorm
              0.5449303 = fieldWeight in 1092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.09375 = fieldNorm(doc=1092)
          0.42155546 = weight(abstract_txt:variation in 1092) [ClassicSimilarity], result of:
            0.42155546 = score(doc=1092,freq=5.0), product of:
              0.28978983 = queryWeight, product of:
                1.8993504 = boost
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.021986837 = queryNorm
              1.4546938 = fieldWeight in 1092, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.09375 = fieldNorm(doc=1092)
          0.13192222 = weight(abstract_txt:term in 1092) [ClassicSimilarity], result of:
            0.13192222 = score(doc=1092,freq=2.0), product of:
              0.20752434 = queryWeight, product of:
                1.9685374 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021986837 = queryNorm
              0.6356951 = fieldWeight in 1092, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.09375 = fieldNorm(doc=1092)
        0.2 = coord(5/25)
    
  3. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.14
    0.14133239 = sum of:
      0.14133239 = product of:
        0.58888495 = sum of:
          0.034815352 = weight(abstract_txt:approach in 5394) [ClassicSimilarity], result of:
            0.034815352 = score(doc=5394,freq=2.0), product of:
              0.08422895 = queryWeight, product of:
                1.0239865 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.021986837 = queryNorm
              0.41334188 = fieldWeight in 5394, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
          0.14007191 = weight(abstract_txt:variants in 5394) [ClassicSimilarity], result of:
            0.14007191 = score(doc=5394,freq=2.0), product of:
              0.16911088 = queryWeight, product of:
                1.0259691 = boost
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.021986837 = queryNorm
              0.8282845 = fieldWeight in 5394, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
          0.030085858 = weight(abstract_txt:through in 5394) [ClassicSimilarity], result of:
            0.030085858 = score(doc=5394,freq=1.0), product of:
              0.09627919 = queryWeight, product of:
                1.0947872 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.021986837 = queryNorm
              0.31248558 = fieldWeight in 5394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
          0.042808946 = weight(abstract_txt:method in 5394) [ClassicSimilarity], result of:
            0.042808946 = score(doc=5394,freq=1.0), product of:
              0.121799976 = queryWeight, product of:
                1.2313659 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.021986837 = queryNorm
              0.35146925 = fieldWeight in 5394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
          0.20646033 = weight(abstract_txt:linguistic in 5394) [ClassicSimilarity], result of:
            0.20646033 = score(doc=5394,freq=5.0), product of:
              0.20332551 = queryWeight, product of:
                1.5909607 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.021986837 = queryNorm
              1.0154177 = fieldWeight in 5394, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
          0.13464256 = weight(abstract_txt:term in 5394) [ClassicSimilarity], result of:
            0.13464256 = score(doc=5394,freq=3.0), product of:
              0.20752434 = queryWeight, product of:
                1.9685374 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021986837 = queryNorm
              0.64880365 = fieldWeight in 5394, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=5394)
        0.24 = coord(6/25)
    
  4. Bodoff, D.; Kambil, A.: Partial coordination : I. The best of pre-coordination and post-coordination (1998) 0.12
    0.12149229 = sum of:
      0.12149229 = product of:
        0.60746145 = sum of:
          0.024618173 = weight(abstract_txt:approach in 3322) [ClassicSimilarity], result of:
            0.024618173 = score(doc=3322,freq=1.0), product of:
              0.08422895 = queryWeight, product of:
                1.0239865 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.021986837 = queryNorm
              0.29227686 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=3322)
          0.15786393 = weight(abstract_txt:motivates in 3322) [ClassicSimilarity], result of:
            0.15786393 = score(doc=3322,freq=1.0), product of:
              0.23074703 = queryWeight, product of:
                1.1984408 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.021986837 = queryNorm
              0.6841428 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.078125 = fieldNorm(doc=3322)
          0.2854172 = weight(abstract_txt:spurious in 3322) [ClassicSimilarity], result of:
            0.2854172 = score(doc=3322,freq=2.0), product of:
              0.2718051 = queryWeight, product of:
                1.3007005 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.021986837 = queryNorm
              1.0500803 = fieldWeight in 3322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=3322)
          0.061826203 = weight(abstract_txt:large in 3322) [ClassicSimilarity], result of:
            0.061826203 = score(doc=3322,freq=1.0), product of:
              0.17814335 = queryWeight, product of:
                1.8238703 = boost
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.021986837 = queryNorm
              0.3470587 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.078125 = fieldNorm(doc=3322)
          0.07773591 = weight(abstract_txt:term in 3322) [ClassicSimilarity], result of:
            0.07773591 = score(doc=3322,freq=1.0), product of:
              0.20752434 = queryWeight, product of:
                1.9685374 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021986837 = queryNorm
              0.37458694 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=3322)
        0.2 = coord(5/25)
    
  5. Srihari, R.K.: Computational models for integrating linguistic and visual information : a survey (1994/95) 0.12
    0.11710685 = sum of:
      0.11710685 = product of:
        0.5855342 = sum of:
          0.16329816 = weight(abstract_txt:computationally in 2312) [ClassicSimilarity], result of:
            0.16329816 = score(doc=2312,freq=1.0), product of:
              0.20900059 = queryWeight, product of:
                1.1405709 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.021986837 = queryNorm
              0.7813287 = fieldWeight in 2312, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.09375 = fieldNorm(doc=2312)
          0.04174518 = weight(abstract_txt:related in 2312) [ClassicSimilarity], result of:
            0.04174518 = score(doc=2312,freq=1.0), product of:
              0.10606551 = queryWeight, product of:
                1.149081 = boost
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.021986837 = queryNorm
              0.39357919 = fieldWeight in 2312, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.198178 = idf(docFreq=1813, maxDocs=44421)
                0.09375 = fieldNorm(doc=2312)
          0.1830155 = weight(abstract_txt:spans in 2312) [ClassicSimilarity], result of:
            0.1830155 = score(doc=2312,freq=1.0), product of:
              0.22550277 = queryWeight, product of:
                1.1847439 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.021986837 = queryNorm
              0.81158864 = fieldWeight in 2312, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=2312)
          0.08667712 = weight(abstract_txt:words in 2312) [ClassicSimilarity], result of:
            0.08667712 = score(doc=2312,freq=1.0), product of:
              0.17262605 = queryWeight, product of:
                1.4659417 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.021986837 = queryNorm
              0.50210917 = fieldWeight in 2312, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=2312)
          0.110798225 = weight(abstract_txt:linguistic in 2312) [ClassicSimilarity], result of:
            0.110798225 = score(doc=2312,freq=1.0), product of:
              0.20332551 = queryWeight, product of:
                1.5909607 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.021986837 = queryNorm
              0.5449303 = fieldWeight in 2312, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.09375 = fieldNorm(doc=2312)
        0.2 = coord(5/25)