Document (#13221)

Author
Huffman, G.D.
Vital, D.A.
Bivins, R.G.
Title
Generating indices with lexical association methods : term uniqueness
Source
Information processing and management. 26(1990) no.4, S.549-558
Year
1990
Abstract
A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
Theme
Retrievalstudien
Indexierungsstudien

Similar documents (content)

  1. Bilal, D.: Ranking, relevance judgment, and precision of information retrieval on children's queries : evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids (2012) 0.16
    0.15870093 = sum of:
      0.15870093 = product of:
        0.56678903 = sum of:
          0.05945052 = weight(abstract_txt:word in 1393) [ClassicSimilarity], result of:
            0.05945052 = score(doc=1393,freq=3.0), product of:
              0.11541499 = queryWeight, product of:
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.021223523 = queryNorm
              0.5151022 = fieldWeight in 1393, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.10173125 = weight(abstract_txt:precision in 1393) [ClassicSimilarity], result of:
            0.10173125 = score(doc=1393,freq=8.0), product of:
              0.11907075 = queryWeight, product of:
                1.015714 = boost
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.021223523 = queryNorm
              0.85437644 = fieldWeight in 1393, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.06300947 = weight(abstract_txt:produced in 1393) [ClassicSimilarity], result of:
            0.06300947 = score(doc=1393,freq=3.0), product of:
              0.11997636 = queryWeight, product of:
                1.0195693 = boost
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.021223523 = queryNorm
              0.5251824 = fieldWeight in 1393, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.05291343 = weight(abstract_txt:ranking in 1393) [ClassicSimilarity], result of:
            0.05291343 = score(doc=1393,freq=2.0), product of:
              0.12224549 = queryWeight, product of:
                1.0291657 = boost
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.021223523 = queryNorm
              0.43284568 = fieldWeight in 1393, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.079261325 = weight(abstract_txt:retrieved in 1393) [ClassicSimilarity], result of:
            0.079261325 = score(doc=1393,freq=4.0), product of:
              0.12702419 = queryWeight, product of:
                1.0490885 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.021223523 = queryNorm
              0.62398607 = fieldWeight in 1393, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.031585447 = weight(abstract_txt:were in 1393) [ClassicSimilarity], result of:
            0.031585447 = score(doc=1393,freq=1.0), product of:
              0.1574819 = queryWeight, product of:
                2.0232282 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.021223523 = queryNorm
              0.20056558 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.17883754 = weight(abstract_txt:relevancy in 1393) [ClassicSimilarity], result of:
            0.17883754 = score(doc=1393,freq=1.0), product of:
              0.3970712 = queryWeight, product of:
                2.2716882 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.021223523 = queryNorm
              0.4503916 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
        0.28 = coord(7/25)
    
  2. Lucas, W.; Topi, H.: Form and function : the impact of query term and operator usage on Web search results (2002) 0.13
    0.13387667 = sum of:
      0.13387667 = product of:
        0.6693833 = sum of:
          0.018247142 = weight(abstract_txt:user in 1198) [ClassicSimilarity], result of:
            0.018247142 = score(doc=1198,freq=1.0), product of:
              0.07931668 = queryWeight, product of:
                1.0153056 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.021223523 = queryNorm
              0.23005427 = fieldWeight in 1198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0625 = fieldNorm(doc=1198)
          0.045292187 = weight(abstract_txt:retrieved in 1198) [ClassicSimilarity], result of:
            0.045292187 = score(doc=1198,freq=1.0), product of:
              0.12702419 = queryWeight, product of:
                1.0490885 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.021223523 = queryNorm
              0.35656348 = fieldWeight in 1198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1198)
          0.051049788 = weight(abstract_txt:were in 1198) [ClassicSimilarity], result of:
            0.051049788 = score(doc=1198,freq=2.0), product of:
              0.1574819 = queryWeight, product of:
                2.0232282 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.021223523 = queryNorm
              0.3241629 = fieldWeight in 1198, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=1198)
          0.45702046 = weight(abstract_txt:relevancy in 1198) [ClassicSimilarity], result of:
            0.45702046 = score(doc=1198,freq=5.0), product of:
              0.3970712 = queryWeight, product of:
                2.2716882 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.021223523 = queryNorm
              1.1509786 = fieldWeight in 1198, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=1198)
          0.097773716 = weight(abstract_txt:terms in 1198) [ClassicSimilarity], result of:
            0.097773716 = score(doc=1198,freq=3.0), product of:
              0.22335786 = queryWeight, product of:
                2.6025767 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.021223523 = queryNorm
              0.43774468 = fieldWeight in 1198, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=1198)
        0.2 = coord(5/25)
    
  3. Dumais, S.T.: Latent semantic analysis (2003) 0.13
    0.13050398 = sum of:
      0.13050398 = product of:
        0.40782493 = sum of:
          0.033971727 = weight(abstract_txt:word in 3462) [ClassicSimilarity], result of:
            0.033971727 = score(doc=3462,freq=3.0), product of:
              0.11541499 = queryWeight, product of:
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.021223523 = queryNorm
              0.29434413 = fieldWeight in 3462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.020392288 = weight(abstract_txt:generated in 3462) [ClassicSimilarity], result of:
            0.020392288 = score(doc=3462,freq=1.0), product of:
              0.11844994 = queryWeight, product of:
                1.0130627 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.021223523 = queryNorm
              0.17215954 = fieldWeight in 3462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.02125331 = weight(abstract_txt:unique in 3462) [ClassicSimilarity], result of:
            0.02125331 = score(doc=3462,freq=1.0), product of:
              0.12176111 = queryWeight, product of:
                1.0271248 = boost
                5.5855756 = idf(docFreq=452, maxDocs=44421)
                0.021223523 = queryNorm
              0.17454924 = fieldWeight in 3462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5855756 = idf(docFreq=452, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.032026414 = weight(abstract_txt:retrieved in 3462) [ClassicSimilarity], result of:
            0.032026414 = score(doc=3462,freq=2.0), product of:
              0.12702419 = queryWeight, product of:
                1.0490885 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.021223523 = queryNorm
              0.25212845 = fieldWeight in 3462, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.05637775 = weight(abstract_txt:vocabulary in 3462) [ClassicSimilarity], result of:
            0.05637775 = score(doc=3462,freq=4.0), product of:
              0.168257 = queryWeight, product of:
                1.4787716 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.021223523 = queryNorm
              0.33506927 = fieldWeight in 3462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.032499645 = weight(abstract_txt:association in 3462) [ClassicSimilarity], result of:
            0.032499645 = score(doc=3462,freq=1.0), product of:
              0.18500085 = queryWeight, product of:
                1.5506058 = boost
                5.6215343 = idf(docFreq=436, maxDocs=44421)
                0.021223523 = queryNorm
              0.17567295 = fieldWeight in 3462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6215343 = idf(docFreq=436, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.1176926 = weight(abstract_txt:lexical in 3462) [ClassicSimilarity], result of:
            0.1176926 = score(doc=3462,freq=3.0), product of:
              0.33293536 = queryWeight, product of:
                2.4019494 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.021223523 = queryNorm
              0.35349983 = fieldWeight in 3462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
          0.0936112 = weight(abstract_txt:terms in 3462) [ClassicSimilarity], result of:
            0.0936112 = score(doc=3462,freq=11.0), product of:
              0.22335786 = queryWeight, product of:
                2.6025767 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.021223523 = queryNorm
              0.41910863 = fieldWeight in 3462, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.03125 = fieldNorm(doc=3462)
        0.32 = coord(8/25)
    
  4. Amolochitis, E.; Christou, I.T.; Tan, Z.-H.; Prasad, R.: ¬A heuristic hierarchical scheme for academic search and retrieval (2013) 0.13
    0.12740877 = sum of:
      0.12740877 = product of:
        0.45503134 = sum of:
          0.036494285 = weight(abstract_txt:user in 3711) [ClassicSimilarity], result of:
            0.036494285 = score(doc=3711,freq=4.0), product of:
              0.07931668 = queryWeight, product of:
                1.0153056 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.021223523 = queryNorm
              0.46010855 = fieldWeight in 3711, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.04110563 = weight(abstract_txt:precision in 3711) [ClassicSimilarity], result of:
            0.04110563 = score(doc=3711,freq=1.0), product of:
              0.11907075 = queryWeight, product of:
                1.015714 = boost
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.021223523 = queryNorm
              0.3452202 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.04157547 = weight(abstract_txt:produced in 3711) [ClassicSimilarity], result of:
            0.04157547 = score(doc=3711,freq=1.0), product of:
              0.11997636 = queryWeight, product of:
                1.0195693 = boost
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.021223523 = queryNorm
              0.34653053 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.07406338 = weight(abstract_txt:ranking in 3711) [ClassicSimilarity], result of:
            0.07406338 = score(doc=3711,freq=3.0), product of:
              0.12224549 = queryWeight, product of:
                1.0291657 = boost
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.021223523 = queryNorm
              0.6058578 = fieldWeight in 3711, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.045292187 = weight(abstract_txt:retrieved in 3711) [ClassicSimilarity], result of:
            0.045292187 = score(doc=3711,freq=1.0), product of:
              0.12702419 = queryWeight, product of:
                1.0490885 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.021223523 = queryNorm
              0.35656348 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.09027506 = weight(abstract_txt:subset in 3711) [ClassicSimilarity], result of:
            0.09027506 = score(doc=3711,freq=1.0), product of:
              0.20117904 = queryWeight, product of:
                1.3202624 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.021223523 = queryNorm
              0.44872993 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
          0.12622532 = weight(abstract_txt:terms in 3711) [ClassicSimilarity], result of:
            0.12622532 = score(doc=3711,freq=5.0), product of:
              0.22335786 = queryWeight, product of:
                2.6025767 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.021223523 = queryNorm
              0.56512594 = fieldWeight in 3711, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=3711)
        0.28 = coord(7/25)
    
  5. Coladangelo, L.P.: Organizing controversy : toward cultural hospitality in controlled vocabularies through semantic annotation (2021) 0.12
    0.12442394 = sum of:
      0.12442394 = product of:
        0.44437122 = sum of:
          0.01596625 = weight(abstract_txt:user in 1579) [ClassicSimilarity], result of:
            0.01596625 = score(doc=1579,freq=1.0), product of:
              0.07931668 = queryWeight, product of:
                1.0153056 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.021223523 = queryNorm
              0.20129749 = fieldWeight in 1579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.078990676 = weight(abstract_txt:subset in 1579) [ClassicSimilarity], result of:
            0.078990676 = score(doc=1579,freq=1.0), product of:
              0.20117904 = queryWeight, product of:
                1.3202624 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.021223523 = queryNorm
              0.39263868 = fieldWeight in 1579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.0697639 = weight(abstract_txt:vocabulary in 1579) [ClassicSimilarity], result of:
            0.0697639 = score(doc=1579,freq=2.0), product of:
              0.168257 = queryWeight, product of:
                1.4787716 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.021223523 = queryNorm
              0.41462705 = fieldWeight in 1579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.030365763 = weight(abstract_txt:methods in 1579) [ClassicSimilarity], result of:
            0.030365763 = score(doc=1579,freq=1.0), product of:
              0.13400829 = queryWeight, product of:
                1.5238763 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.021223523 = queryNorm
              0.22659616 = fieldWeight in 1579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.031585447 = weight(abstract_txt:were in 1579) [ClassicSimilarity], result of:
            0.031585447 = score(doc=1579,freq=1.0), product of:
              0.1574819 = queryWeight, product of:
                2.0232282 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.021223523 = queryNorm
              0.20056558 = fieldWeight in 1579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.11891225 = weight(abstract_txt:lexical in 1579) [ClassicSimilarity], result of:
            0.11891225 = score(doc=1579,freq=1.0), product of:
              0.33293536 = queryWeight, product of:
                2.4019494 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.021223523 = queryNorm
              0.35716316 = fieldWeight in 1579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
          0.098786935 = weight(abstract_txt:terms in 1579) [ClassicSimilarity], result of:
            0.098786935 = score(doc=1579,freq=4.0), product of:
              0.22335786 = queryWeight, product of:
                2.6025767 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.021223523 = queryNorm
              0.442281 = fieldWeight in 1579, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1579)
        0.28 = coord(7/25)