Document (#30262)

Author
Rodríguez, A.
Carazo, J.M.
Trelles-Salazar, O.
Title
Mining association rules from biological databases
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.5, S.493-504
Year
2005
Abstract
We present a novel application of knowledge discovery technology to a developing and challenging application area such as bioinformatics. This methodology allows the identification of relationships between low-magnitude similarity (LMS) sequence patterns and other well-contrasted protein characteristics, such as those described by database annotations. We start with the identification of these signals inside protein sequences by exhaustive database searching and automatic pattern recognition strategies. In a second step we address the discovering of association rules that will allow tagging sequences that hold LMS signals with consequent functional keywords. We have designed our own algorithm for discovering association rules, meeting the special necessities of bioinformatics problems, where the patterns we search lie in sparse datasets and are uncommon and thus difficult to locate. Computational efficiency has been verified both with synthetic and real biological data showing that the algorithm is well suited to this application area compared to state of the art algorithms. The usefulness of the method is confirmed by its ability to produce previously unknown and useful knowledge in the area of biological sequence analysis. In addition, we introduce a new and promising application of the rule extraction algorithm on gene expression databases.
Footnote
Beitrag in einem special issue on bioinformatics

Similar documents (author)

  1. Rodríguez, E.E.: Consolidated edition of ISBD, International Standard Bibliographic Description : a standard to trust, a quality brand (2014) 4.84
    4.841027 = sum of:
      4.841027 = weight(author_txt:rodríguez in 1996) [ClassicSimilarity], result of:
        4.841027 = score(doc=1996,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.12910482 = queryNorm
          4.8410273 = fieldWeight in 1996, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.625 = fieldNorm(doc=1996)
    
  2. Zazo Rodríguez, A.F. -> Rodríguez, A.F.Z.: 4.79
    4.7923717 = sum of:
      4.7923717 = weight(author_txt:rodríguez in 1935) [ClassicSimilarity], result of:
        4.7923717 = score(doc=1935,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.12910482 = queryNorm
          4.792372 = fieldWeight in 1935, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.4375 = fieldNorm(doc=1935)
    
  3. Rodríguez, E.M.M. -> Méndez Rodríguez, E.M.: 4.79
    4.7923717 = sum of:
      4.7923717 = weight(author_txt:rodríguez in 2856) [ClassicSimilarity], result of:
        4.7923717 = score(doc=2856,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.12910482 = queryNorm
          4.792372 = fieldWeight in 2856, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.4375 = fieldNorm(doc=2856)
    
  4. Rodríguez, Z.Chinchilla -> Chinchilla Rodríguez, Z.: 4.79
    4.7923717 = sum of:
      4.7923717 = weight(author_txt:rodríguez in 67) [ClassicSimilarity], result of:
        4.7923717 = score(doc=67,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.12910482 = queryNorm
          4.792372 = fieldWeight in 67, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.4375 = fieldNorm(doc=67)
    
  5. Rodríguez Z. Chinchilla- -> Chinchilla-Rodríguez, Z.: 4.11
    4.107747 = sum of:
      4.107747 = weight(author_txt:rodríguez in 795) [ClassicSimilarity], result of:
        4.107747 = score(doc=795,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.12910482 = queryNorm
          4.1077476 = fieldWeight in 795, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            7.7456436 = idf(docFreq=51, maxDocs=44218)
            0.375 = fieldNorm(doc=795)
    

Similar documents (content)

  1. Shachak, A.: Diffusion pattern of the use of genomic databases and analysis of biological sequences from 1970-2003 : bibliographic record analysis of 12 journals (2006) 0.40
    0.39946938 = sum of:
      0.39946938 = product of:
        1.2483418 = sum of:
          0.015976937 = weight(abstract_txt:well in 4906) [ClassicSimilarity], result of:
            0.015976937 = score(doc=4906,freq=1.0), product of:
              0.065435596 = queryWeight, product of:
                1.0062007 = boost
                3.9066048 = idf(docFreq=2416, maxDocs=44218)
                0.01664677 = queryNorm
              0.2441628 = fieldWeight in 4906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9066048 = idf(docFreq=2416, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.046052568 = weight(abstract_txt:databases in 4906) [ClassicSimilarity], result of:
            0.046052568 = score(doc=4906,freq=4.0), product of:
              0.08348996 = queryWeight, product of:
                1.136566 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.01664677 = queryNorm
              0.5515941 = fieldWeight in 4906, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.039412342 = weight(abstract_txt:patterns in 4906) [ClassicSimilarity], result of:
            0.039412342 = score(doc=4906,freq=1.0), product of:
              0.11946477 = queryWeight, product of:
                1.3595572 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.01664677 = queryNorm
              0.32990766 = fieldWeight in 4906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.08506895 = weight(abstract_txt:sequence in 4906) [ClassicSimilarity], result of:
            0.08506895 = score(doc=4906,freq=1.0), product of:
              0.1995258 = queryWeight, product of:
                1.7570215 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.01664677 = queryNorm
              0.42635563 = fieldWeight in 4906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.15483199 = weight(abstract_txt:sequences in 4906) [ClassicSimilarity], result of:
            0.15483199 = score(doc=4906,freq=2.0), product of:
              0.23607436 = queryWeight, product of:
                1.9111817 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.01664677 = queryNorm
              0.6558611 = fieldWeight in 4906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.3363244 = weight(abstract_txt:bioinformatics in 4906) [ClassicSimilarity], result of:
            0.3363244 = score(doc=4906,freq=4.0), product of:
              0.31427073 = queryWeight, product of:
                2.2051063 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.01664677 = queryNorm
              1.0701741 = fieldWeight in 4906, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.2094856 = weight(abstract_txt:protein in 4906) [ClassicSimilarity], result of:
            0.2094856 = score(doc=4906,freq=1.0), product of:
              0.36384895 = queryWeight, product of:
                2.3726742 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01664677 = queryNorm
              0.5757488 = fieldWeight in 4906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
          0.36118895 = weight(abstract_txt:biological in 4906) [ClassicSimilarity], result of:
            0.36118895 = score(doc=4906,freq=5.0), product of:
              0.350226 = queryWeight, product of:
                2.851001 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.01664677 = queryNorm
              1.0313025 = fieldWeight in 4906, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=4906)
        0.32 = coord(8/25)
    
  2. Rapp, B.A.; Wheeler, D.L.: Bioinformatics resources from the National Center for Biotechnology Information : an integrated foundation for discovery (2005) 0.20
    0.20451267 = sum of:
      0.20451267 = product of:
        0.85213614 = sum of:
          0.109509364 = weight(abstract_txt:gene in 5265) [ClassicSimilarity], result of:
            0.109509364 = score(doc=5265,freq=2.0), product of:
              0.14874163 = queryWeight, product of:
                1.0727013 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.01664677 = queryNorm
              0.73623884 = fieldWeight in 5265, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
          0.021008657 = weight(abstract_txt:database in 5265) [ClassicSimilarity], result of:
            0.021008657 = score(doc=5265,freq=1.0), product of:
              0.07853873 = queryWeight, product of:
                1.10235 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.01664677 = queryNorm
              0.26749423 = fieldWeight in 5265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
          0.023026284 = weight(abstract_txt:databases in 5265) [ClassicSimilarity], result of:
            0.023026284 = score(doc=5265,freq=1.0), product of:
              0.08348996 = queryWeight, product of:
                1.136566 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.01664677 = queryNorm
              0.27579704 = fieldWeight in 5265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
          0.1701379 = weight(abstract_txt:sequence in 5265) [ClassicSimilarity], result of:
            0.1701379 = score(doc=5265,freq=4.0), product of:
              0.1995258 = queryWeight, product of:
                1.7570215 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.01664677 = queryNorm
              0.85271126 = fieldWeight in 5265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
          0.10948275 = weight(abstract_txt:sequences in 5265) [ClassicSimilarity], result of:
            0.10948275 = score(doc=5265,freq=1.0), product of:
              0.23607436 = queryWeight, product of:
                1.9111817 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.01664677 = queryNorm
              0.46376383 = fieldWeight in 5265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
          0.4189712 = weight(abstract_txt:protein in 5265) [ClassicSimilarity], result of:
            0.4189712 = score(doc=5265,freq=4.0), product of:
              0.36384895 = queryWeight, product of:
                2.3726742 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01664677 = queryNorm
              1.1514976 = fieldWeight in 5265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=5265)
        0.24 = coord(6/25)
    
  3. Artymiuk, P.J.; Spriggs, R.V.; Willett, P.: Graph theoretic methods for the analysis of structural relationships in biological macromolecules (2005) 0.16
    0.15666385 = sum of:
      0.15666385 = product of:
        0.78331923 = sum of:
          0.04926543 = weight(abstract_txt:patterns in 5258) [ClassicSimilarity], result of:
            0.04926543 = score(doc=5258,freq=1.0), product of:
              0.11946477 = queryWeight, product of:
                1.3595572 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.01664677 = queryNorm
              0.41238457 = fieldWeight in 5258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.078125 = fieldNorm(doc=5258)
          0.10633619 = weight(abstract_txt:sequence in 5258) [ClassicSimilarity], result of:
            0.10633619 = score(doc=5258,freq=1.0), product of:
              0.1995258 = queryWeight, product of:
                1.7570215 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.01664677 = queryNorm
              0.53294456 = fieldWeight in 5258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.078125 = fieldNorm(doc=5258)
          0.13196991 = weight(abstract_txt:algorithm in 5258) [ClassicSimilarity], result of:
            0.13196991 = score(doc=5258,freq=2.0), product of:
              0.20935439 = queryWeight, product of:
                2.204267 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01664677 = queryNorm
              0.63036615 = fieldWeight in 5258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=5258)
          0.21020275 = weight(abstract_txt:bioinformatics in 5258) [ClassicSimilarity], result of:
            0.21020275 = score(doc=5258,freq=1.0), product of:
              0.31427073 = queryWeight, product of:
                2.2051063 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.01664677 = queryNorm
              0.6688588 = fieldWeight in 5258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=5258)
          0.28554493 = weight(abstract_txt:biological in 5258) [ClassicSimilarity], result of:
            0.28554493 = score(doc=5258,freq=2.0), product of:
              0.350226 = queryWeight, product of:
                2.851001 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.01664677 = queryNorm
              0.81531614 = fieldWeight in 5258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.078125 = fieldNorm(doc=5258)
        0.2 = coord(5/25)
    
  4. Toldo, L.; Rippmann, F.: Integrated bioinformatics application for automated target discovery. (2005) 0.13
    0.12867594 = sum of:
      0.12867594 = product of:
        0.6433797 = sum of:
          0.019971173 = weight(abstract_txt:well in 5260) [ClassicSimilarity], result of:
            0.019971173 = score(doc=5260,freq=1.0), product of:
              0.065435596 = queryWeight, product of:
                1.0062007 = boost
                3.9066048 = idf(docFreq=2416, maxDocs=44218)
                0.01664677 = queryNorm
              0.3052035 = fieldWeight in 5260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9066048 = idf(docFreq=2416, maxDocs=44218)
                0.078125 = fieldNorm(doc=5260)
          0.026260821 = weight(abstract_txt:database in 5260) [ClassicSimilarity], result of:
            0.026260821 = score(doc=5260,freq=1.0), product of:
              0.07853873 = queryWeight, product of:
                1.10235 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.01664677 = queryNorm
              0.33436778 = fieldWeight in 5260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.078125 = fieldNorm(doc=5260)
          0.10633619 = weight(abstract_txt:sequence in 5260) [ClassicSimilarity], result of:
            0.10633619 = score(doc=5260,freq=1.0), product of:
              0.1995258 = queryWeight, product of:
                1.7570215 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.01664677 = queryNorm
              0.53294456 = fieldWeight in 5260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.078125 = fieldNorm(doc=5260)
          0.19353998 = weight(abstract_txt:sequences in 5260) [ClassicSimilarity], result of:
            0.19353998 = score(doc=5260,freq=2.0), product of:
              0.23607436 = queryWeight, product of:
                1.9111817 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.01664677 = queryNorm
              0.81982636 = fieldWeight in 5260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.078125 = fieldNorm(doc=5260)
          0.29727155 = weight(abstract_txt:bioinformatics in 5260) [ClassicSimilarity], result of:
            0.29727155 = score(doc=5260,freq=2.0), product of:
              0.31427073 = queryWeight, product of:
                2.2051063 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.01664677 = queryNorm
              0.94590914 = fieldWeight in 5260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=5260)
        0.2 = coord(5/25)
    
  5. Warner, J.: Analogies between linguistics and information theory (2007) 0.11
    0.11336203 = sum of:
      0.11336203 = product of:
        0.56681013 = sum of:
          0.0755897 = weight(abstract_txt:contrasted in 138) [ClassicSimilarity], result of:
            0.0755897 = score(doc=138,freq=1.0), product of:
              0.14636934 = queryWeight, product of:
                1.0641127 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.01664677 = queryNorm
              0.5164313 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.039412342 = weight(abstract_txt:patterns in 138) [ClassicSimilarity], result of:
            0.039412342 = score(doc=138,freq=1.0), product of:
              0.11946477 = queryWeight, product of:
                1.3595572 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.01664677 = queryNorm
              0.32990766 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.08506895 = weight(abstract_txt:sequence in 138) [ClassicSimilarity], result of:
            0.08506895 = score(doc=138,freq=1.0), product of:
              0.1995258 = queryWeight, product of:
                1.7570215 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.01664677 = queryNorm
              0.42635563 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.2189655 = weight(abstract_txt:sequences in 138) [ClassicSimilarity], result of:
            0.2189655 = score(doc=138,freq=4.0), product of:
              0.23607436 = queryWeight, product of:
                1.9111817 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.01664677 = queryNorm
              0.92752767 = fieldWeight in 138, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.14777364 = weight(abstract_txt:signals in 138) [ClassicSimilarity], result of:
            0.14777364 = score(doc=138,freq=1.0), product of:
              0.28832546 = queryWeight, product of:
                2.1121223 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.01664677 = queryNorm
              0.5125237 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
        0.2 = coord(5/25)