Document (#16040)

Author
Almerri, J.
McGregor, D.R.
Title
Codon signatures : a document retrieval method
Source
Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
Imprint
London : Taylor Graham
Year
1996
Pages
S.154-173
Abstract
The performance of an information retrieval system depends on its ability to distinguish between relevant and non relevant documents in response to users' information needs. Proposes a new method called Codon Signatures (CS) that is able to use a relationship between terms and concepts. The Codon Signature is designed to improve retrieval performance (recall and precision) by creating the Codon structure, a representation of semantic meaning in context. It is also designed to reduce the amount of storage space required by the index. Presents a theoretical analysis of CS paprameters and performance. The method was tested against 3 document base collections and gave acceptable results regarding information effectiveness and efficiency, compared to a conventional Signature Files method

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.45
    0.4486104 = sum of:
      0.4486104 = product of:
        1.121526 = sum of:
          0.054192573 = weight(abstract_txt:files in 1303) [ClassicSimilarity], result of:
            0.054192573 = score(doc=1303,freq=2.0), product of:
              0.107092835 = queryWeight, product of:
                1.0163455 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.018404953 = queryNorm
              0.5060336 = fieldWeight in 1303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.040687624 = weight(abstract_txt:storage in 1303) [ClassicSimilarity], result of:
            0.040687624 = score(doc=1303,freq=1.0), product of:
              0.11145994 = queryWeight, product of:
                1.0368611 = boost
                5.8406816 = idf(docFreq=350, maxDocs=44421)
                0.018404953 = queryNorm
              0.3650426 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8406816 = idf(docFreq=350, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.045920968 = weight(abstract_txt:efficiency in 1303) [ClassicSimilarity], result of:
            0.045920968 = score(doc=1303,freq=1.0), product of:
              0.12082346 = queryWeight, product of:
                1.0795351 = boost
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.018404953 = queryNorm
              0.38006666 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.016879097 = weight(abstract_txt:between in 1303) [ClassicSimilarity], result of:
            0.016879097 = score(doc=1303,freq=1.0), product of:
              0.07811241 = queryWeight, product of:
                1.2275414 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.018404953 = queryNorm
              0.21608727 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.008670547 = weight(abstract_txt:information in 1303) [ClassicSimilarity], result of:
            0.008670547 = score(doc=1303,freq=1.0), product of:
              0.057352014 = queryWeight, product of:
                1.288238 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018404953 = queryNorm
              0.15118122 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.032339502 = weight(abstract_txt:document in 1303) [ClassicSimilarity], result of:
            0.032339502 = score(doc=1303,freq=1.0), product of:
              0.12049697 = queryWeight, product of:
                1.5246291 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.018404953 = queryNorm
              0.26838437 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.044584136 = weight(abstract_txt:retrieval in 1303) [ClassicSimilarity], result of:
            0.044584136 = score(doc=1303,freq=3.0), product of:
              0.118467025 = queryWeight, product of:
                1.8514864 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018404953 = queryNorm
              0.37634215 = fieldWeight in 1303, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.060401957 = weight(abstract_txt:performance in 1303) [ClassicSimilarity], result of:
            0.060401957 = score(doc=1303,freq=1.0), product of:
              0.20919518 = queryWeight, product of:
                2.460354 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.018404953 = queryNorm
              0.28873494 = fieldWeight in 1303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.6691007 = weight(abstract_txt:signature in 1303) [ClassicSimilarity], result of:
            0.6691007 = score(doc=1303,freq=7.0), product of:
              0.4747324 = queryWeight, product of:
                3.0262206 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.018404953 = queryNorm
              1.409427 = fieldWeight in 1303, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
          0.14874887 = weight(abstract_txt:method in 1303) [ClassicSimilarity], result of:
            0.14874887 = score(doc=1303,freq=4.0), product of:
              0.2645126 = queryWeight, product of:
                3.1945841 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.018404953 = queryNorm
              0.5623508 = fieldWeight in 1303, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=1303)
        0.4 = coord(10/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.30
    0.30052996 = sum of:
      0.30052996 = product of:
        1.2522082 = sum of:
          0.05740121 = weight(abstract_txt:efficiency in 1690) [ClassicSimilarity], result of:
            0.05740121 = score(doc=1690,freq=1.0), product of:
              0.12082346 = queryWeight, product of:
                1.0795351 = boost
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.018404953 = queryNorm
              0.47508332 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0810666 = idf(docFreq=275, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.02109887 = weight(abstract_txt:between in 1690) [ClassicSimilarity], result of:
            0.02109887 = score(doc=1690,freq=1.0), product of:
              0.07811241 = queryWeight, product of:
                1.2275414 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.018404953 = queryNorm
              0.2701091 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.07001708 = weight(abstract_txt:document in 1690) [ClassicSimilarity], result of:
            0.07001708 = score(doc=1690,freq=3.0), product of:
              0.12049697 = queryWeight, product of:
                1.5246291 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.018404953 = queryNorm
              0.5810692 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.07550245 = weight(abstract_txt:performance in 1690) [ClassicSimilarity], result of:
            0.07550245 = score(doc=1690,freq=1.0), product of:
              0.20919518 = queryWeight, product of:
                2.460354 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.018404953 = queryNorm
              0.36091867 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.31612036 = weight(abstract_txt:signature in 1690) [ClassicSimilarity], result of:
            0.31612036 = score(doc=1690,freq=1.0), product of:
              0.4747324 = queryWeight, product of:
                3.0262206 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.018404953 = queryNorm
              0.6658917 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.7120683 = weight(abstract_txt:signatures in 1690) [ClassicSimilarity], result of:
            0.7120683 = score(doc=1690,freq=3.0), product of:
              0.5656155 = queryWeight, product of:
                3.3032146 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.018404953 = queryNorm
              1.2589265 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
        0.24 = coord(6/25)
    
  3. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.29
    0.2863161 = sum of:
      0.2863161 = product of:
        1.1929837 = sum of:
          0.057479903 = weight(abstract_txt:files in 3417) [ClassicSimilarity], result of:
            0.057479903 = score(doc=3417,freq=1.0), product of:
              0.107092835 = queryWeight, product of:
                1.0163455 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.018404953 = queryNorm
              0.5367297 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.061031442 = weight(abstract_txt:storage in 3417) [ClassicSimilarity], result of:
            0.061031442 = score(doc=3417,freq=1.0), product of:
              0.11145994 = queryWeight, product of:
                1.0368611 = boost
                5.8406816 = idf(docFreq=350, maxDocs=44421)
                0.018404953 = queryNorm
              0.5475639 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8406816 = idf(docFreq=350, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.09701852 = weight(abstract_txt:document in 3417) [ClassicSimilarity], result of:
            0.09701852 = score(doc=3417,freq=4.0), product of:
              0.12049697 = queryWeight, product of:
                1.5246291 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.018404953 = queryNorm
              0.80515313 = fieldWeight in 3417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.038611 = weight(abstract_txt:retrieval in 3417) [ClassicSimilarity], result of:
            0.038611 = score(doc=3417,freq=1.0), product of:
              0.118467025 = queryWeight, product of:
                1.8514864 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018404953 = queryNorm
              0.3259219 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.09060294 = weight(abstract_txt:performance in 3417) [ClassicSimilarity], result of:
            0.09060294 = score(doc=3417,freq=1.0), product of:
              0.20919518 = queryWeight, product of:
                2.460354 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.018404953 = queryNorm
              0.43310243 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
          0.84823996 = weight(abstract_txt:signature in 3417) [ClassicSimilarity], result of:
            0.84823996 = score(doc=3417,freq=5.0), product of:
              0.4747324 = queryWeight, product of:
                3.0262206 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.018404953 = queryNorm
              1.786775 = fieldWeight in 3417, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.09375 = fieldNorm(doc=3417)
        0.24 = coord(6/25)
    
  4. Burgin, R.: ¬The Monte Carlo method and the evaluation of retrieval system performance (1999) 0.21
    0.2097486 = sum of:
      0.2097486 = product of:
        0.7491021 = sum of:
          0.05116204 = weight(abstract_txt:between in 3946) [ClassicSimilarity], result of:
            0.05116204 = score(doc=3946,freq=3.0), product of:
              0.07811241 = queryWeight, product of:
                1.2275414 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.018404953 = queryNorm
              0.6549797 = fieldWeight in 3946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.17013896 = weight(abstract_txt:distinguish in 3946) [ClassicSimilarity], result of:
            0.17013896 = score(doc=3946,freq=2.0), product of:
              0.1581167 = queryWeight, product of:
                1.2349519 = boost
                6.9565353 = idf(docFreq=114, maxDocs=44421)
                0.018404953 = queryNorm
              1.0760341 = fieldWeight in 3946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9565353 = idf(docFreq=114, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.13225889 = weight(abstract_txt:acceptable in 3946) [ClassicSimilarity], result of:
            0.13225889 = score(doc=3946,freq=1.0), product of:
              0.16842316 = queryWeight, product of:
                1.2745652 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.018404953 = queryNorm
              0.78527737 = fieldWeight in 3946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.0151734585 = weight(abstract_txt:information in 3946) [ClassicSimilarity], result of:
            0.0151734585 = score(doc=3946,freq=1.0), product of:
              0.057352014 = queryWeight, product of:
                1.288238 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018404953 = queryNorm
              0.26456714 = fieldWeight in 3946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.100726284 = weight(abstract_txt:retrieval in 3946) [ClassicSimilarity], result of:
            0.100726284 = score(doc=3946,freq=5.0), product of:
              0.118467025 = queryWeight, product of:
                1.8514864 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018404953 = queryNorm
              0.85024744 = fieldWeight in 3946, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.14948721 = weight(abstract_txt:performance in 3946) [ClassicSimilarity], result of:
            0.14948721 = score(doc=3946,freq=2.0), product of:
              0.20919518 = queryWeight, product of:
                2.460354 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.018404953 = queryNorm
              0.7145825 = fieldWeight in 3946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
          0.13015527 = weight(abstract_txt:method in 3946) [ClassicSimilarity], result of:
            0.13015527 = score(doc=3946,freq=1.0), product of:
              0.2645126 = queryWeight, product of:
                3.1945841 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.018404953 = queryNorm
              0.49205697 = fieldWeight in 3946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.109375 = fieldNorm(doc=3946)
        0.28 = coord(7/25)
    
  5. Yang, L.; Ji, D.; Leong, M.: Document reranking by term distribution and maximal marginal relevance for chinese information retrieval (2007) 0.18
    0.17789486 = sum of:
      0.17789486 = product of:
        0.55592144 = sum of:
          0.048548095 = weight(abstract_txt:recall in 1907) [ClassicSimilarity], result of:
            0.048548095 = score(doc=1907,freq=1.0), product of:
              0.10805678 = queryWeight, product of:
                1.0209093 = boost
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.018404953 = queryNorm
              0.44928318 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.050418034 = weight(abstract_txt:against in 1907) [ClassicSimilarity], result of:
            0.050418034 = score(doc=1907,freq=1.0), product of:
              0.11081397 = queryWeight, product of:
                1.0338521 = boost
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.018404953 = queryNorm
              0.45497906 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.0108381845 = weight(abstract_txt:information in 1907) [ClassicSimilarity], result of:
            0.0108381845 = score(doc=1907,freq=1.0), product of:
              0.057352014 = queryWeight, product of:
                1.288238 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018404953 = queryNorm
              0.18897653 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.08084876 = weight(abstract_txt:document in 1907) [ClassicSimilarity], result of:
            0.08084876 = score(doc=1907,freq=4.0), product of:
              0.12049697 = queryWeight, product of:
                1.5246291 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.018404953 = queryNorm
              0.6709609 = fieldWeight in 1907, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.071653925 = weight(abstract_txt:relevant in 1907) [ClassicSimilarity], result of:
            0.071653925 = score(doc=1907,freq=2.0), product of:
              0.14007604 = queryWeight, product of:
                1.6438345 = boost
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.018404953 = queryNorm
              0.5115359 = fieldWeight in 1907, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.032175828 = weight(abstract_txt:retrieval in 1907) [ClassicSimilarity], result of:
            0.032175828 = score(doc=1907,freq=1.0), product of:
              0.118467025 = queryWeight, product of:
                1.8514864 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018404953 = queryNorm
              0.27160156 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.07550245 = weight(abstract_txt:performance in 1907) [ClassicSimilarity], result of:
            0.07550245 = score(doc=1907,freq=1.0), product of:
              0.20919518 = queryWeight, product of:
                2.460354 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.018404953 = queryNorm
              0.36091867 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
          0.1859361 = weight(abstract_txt:method in 1907) [ClassicSimilarity], result of:
            0.1859361 = score(doc=1907,freq=4.0), product of:
              0.2645126 = queryWeight, product of:
                3.1945841 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.018404953 = queryNorm
              0.7029385 = fieldWeight in 1907, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=1907)
        0.32 = coord(8/25)