Document (#30262)

Author
Rodríguez, A.
Carazo, J.M.
Trelles-Salazar, O.
Title
Mining association rules from biological databases
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.5, S.493-504
Year
2005
Abstract
We present a novel application of knowledge discovery technology to a developing and challenging application area such as bioinformatics. This methodology allows the identification of relationships between low-magnitude similarity (LMS) sequence patterns and other well-contrasted protein characteristics, such as those described by database annotations. We start with the identification of these signals inside protein sequences by exhaustive database searching and automatic pattern recognition strategies. In a second step we address the discovering of association rules that will allow tagging sequences that hold LMS signals with consequent functional keywords. We have designed our own algorithm for discovering association rules, meeting the special necessities of bioinformatics problems, where the patterns we search lie in sparse datasets and are uncommon and thus difficult to locate. Computational efficiency has been verified both with synthetic and real biological data showing that the algorithm is well suited to this application area compared to state of the art algorithms. The usefulness of the method is confirmed by its ability to produce previously unknown and useful knowledge in the area of biological sequence analysis. In addition, we introduce a new and promising application of the rule extraction algorithm on gene expression databases.
Footnote
Beitrag in einem special issue on bioinformatics

Similar documents (author)

  1. Rodríguez, E.E.: Consolidated edition of ISBD, International Standard Bibliographic Description : a standard to trust, a quality brand (2014) 4.84
    4.84389 = sum of:
      4.84389 = weight(author_txt:rodríguez in 2996) [ClassicSimilarity], result of:
        4.84389 = fieldWeight in 2996, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.750224 = idf(docFreq=51, maxDocs=44421)
          0.625 = fieldNorm(doc=2996)
    
  2. Zazo Rodríguez, A.F. -> Rodríguez, A.F.Z.: 4.80
    4.7952065 = sum of:
      4.7952065 = weight(author_txt:rodríguez in 1934) [ClassicSimilarity], result of:
        4.7952065 = fieldWeight in 1934, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.750224 = idf(docFreq=51, maxDocs=44421)
          0.4375 = fieldNorm(doc=1934)
    
  3. Rodríguez, E.M.M. -> Méndez Rodríguez, E.M.: 4.80
    4.7952065 = sum of:
      4.7952065 = weight(author_txt:rodríguez in 2855) [ClassicSimilarity], result of:
        4.7952065 = fieldWeight in 2855, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.750224 = idf(docFreq=51, maxDocs=44421)
          0.4375 = fieldNorm(doc=2855)
    
  4. Rodríguez, Z.Chinchilla -> Chinchilla Rodríguez, Z.: 4.80
    4.7952065 = sum of:
      4.7952065 = weight(author_txt:rodríguez in 1067) [ClassicSimilarity], result of:
        4.7952065 = fieldWeight in 1067, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.750224 = idf(docFreq=51, maxDocs=44421)
          0.4375 = fieldNorm(doc=1067)
    
  5. Rodríguez Z. Chinchilla- -> Chinchilla-Rodríguez, Z.: 4.11
    4.110177 = sum of:
      4.110177 = weight(author_txt:rodríguez in 794) [ClassicSimilarity], result of:
        4.110177 = fieldWeight in 794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.750224 = idf(docFreq=51, maxDocs=44421)
          0.375 = fieldNorm(doc=794)
    

Similar documents (content)

  1. Shachak, A.: Diffusion pattern of the use of genomic databases and analysis of biological sequences from 1970-2003 : bibliographic record analysis of 12 journals (2006) 0.40
    0.3996326 = sum of:
      0.3996326 = product of:
        1.2488519 = sum of:
          0.015936073 = weight(abstract_txt:well in 5906) [ClassicSimilarity], result of:
            0.015936073 = score(doc=5906,freq=1.0), product of:
              0.0653363 = queryWeight, product of:
                1.0070765 = boost
                3.9025342 = idf(docFreq=2437, maxDocs=44421)
                0.016624376 = queryNorm
              0.24390839 = fieldWeight in 5906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9025342 = idf(docFreq=2437, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.046114832 = weight(abstract_txt:databases in 5906) [ClassicSimilarity], result of:
            0.046114832 = score(doc=5906,freq=4.0), product of:
              0.08358098 = queryWeight, product of:
                1.1390399 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.016624376 = queryNorm
              0.5517384 = fieldWeight in 5906, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.039211083 = weight(abstract_txt:patterns in 5906) [ClassicSimilarity], result of:
            0.039211083 = score(doc=5906,freq=1.0), product of:
              0.11908021 = queryWeight, product of:
                1.3595806 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.016624376 = queryNorm
              0.32928297 = fieldWeight in 5906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.0852887 = weight(abstract_txt:sequence in 5906) [ClassicSimilarity], result of:
            0.0852887 = score(doc=5906,freq=1.0), product of:
              0.199907 = queryWeight, product of:
                1.7615671 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.016624376 = queryNorm
              0.42664188 = fieldWeight in 5906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.15520675 = weight(abstract_txt:sequences in 5906) [ClassicSimilarity], result of:
            0.15520675 = score(doc=5906,freq=2.0), product of:
              0.2364998 = queryWeight, product of:
                1.9160224 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.016624376 = queryNorm
              0.6562659 = fieldWeight in 5906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.33705536 = weight(abstract_txt:bioinformatics in 5906) [ClassicSimilarity], result of:
            0.33705536 = score(doc=5906,freq=4.0), product of:
              0.31478533 = queryWeight, product of:
                2.2105098 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.016624376 = queryNorm
              1.0707467 = fieldWeight in 5906, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.20991711 = weight(abstract_txt:protein in 5906) [ClassicSimilarity], result of:
            0.20991711 = score(doc=5906,freq=1.0), product of:
              0.36441723 = queryWeight, product of:
                2.3783987 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.016624376 = queryNorm
              0.5760351 = fieldWeight in 5906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.36012197 = weight(abstract_txt:biological in 5906) [ClassicSimilarity], result of:
            0.36012197 = score(doc=5906,freq=5.0), product of:
              0.34960195 = queryWeight, product of:
                2.8531048 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.016624376 = queryNorm
              1.0300914 = fieldWeight in 5906, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
        0.32 = coord(8/25)
    
  2. Rapp, B.A.; Wheeler, D.L.: Bioinformatics resources from the National Center for Biotechnology Information : an integrated foundation for discovery (2005) 0.20
    0.2049631 = sum of:
      0.2049631 = product of:
        0.85401297 = sum of:
          0.10975227 = weight(abstract_txt:gene in 265) [ClassicSimilarity], result of:
            0.10975227 = score(doc=265,freq=2.0), product of:
              0.14898963 = queryWeight, product of:
                1.075346 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.016624376 = queryNorm
              0.7366437 = fieldWeight in 265, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.021043848 = weight(abstract_txt:database in 265) [ClassicSimilarity], result of:
            0.021043848 = score(doc=265,freq=1.0), product of:
              0.07864126 = queryWeight, product of:
                1.104868 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.016624376 = queryNorm
              0.26759297 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.023057416 = weight(abstract_txt:databases in 265) [ClassicSimilarity], result of:
            0.023057416 = score(doc=265,freq=1.0), product of:
              0.08358098 = queryWeight, product of:
                1.1390399 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.016624376 = queryNorm
              0.2758692 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.1705774 = weight(abstract_txt:sequence in 265) [ClassicSimilarity], result of:
            0.1705774 = score(doc=265,freq=4.0), product of:
              0.199907 = queryWeight, product of:
                1.7615671 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.016624376 = queryNorm
              0.85328376 = fieldWeight in 265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.10974775 = weight(abstract_txt:sequences in 265) [ClassicSimilarity], result of:
            0.10974775 = score(doc=265,freq=1.0), product of:
              0.2364998 = queryWeight, product of:
                1.9160224 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.016624376 = queryNorm
              0.46405008 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.41983423 = weight(abstract_txt:protein in 265) [ClassicSimilarity], result of:
            0.41983423 = score(doc=265,freq=4.0), product of:
              0.36441723 = queryWeight, product of:
                2.3783987 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.016624376 = queryNorm
              1.1520702 = fieldWeight in 265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
        0.24 = coord(6/25)
    
  3. Artymiuk, P.J.; Spriggs, R.V.; Willett, P.: Graph theoretic methods for the analysis of structural relationships in biological macromolecules (2005) 0.16
    0.15660043 = sum of:
      0.15660043 = product of:
        0.78300214 = sum of:
          0.049013857 = weight(abstract_txt:patterns in 258) [ClassicSimilarity], result of:
            0.049013857 = score(doc=258,freq=1.0), product of:
              0.11908021 = queryWeight, product of:
                1.3595806 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.016624376 = queryNorm
              0.41160372 = fieldWeight in 258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.078125 = fieldNorm(doc=258)
          0.10661088 = weight(abstract_txt:sequence in 258) [ClassicSimilarity], result of:
            0.10661088 = score(doc=258,freq=1.0), product of:
              0.199907 = queryWeight, product of:
                1.7615671 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.016624376 = queryNorm
              0.53330237 = fieldWeight in 258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.078125 = fieldNorm(doc=258)
          0.1320164 = weight(abstract_txt:algorithm in 258) [ClassicSimilarity], result of:
            0.1320164 = score(doc=258,freq=2.0), product of:
              0.20944309 = queryWeight, product of:
                2.2083294 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016624376 = queryNorm
              0.63032115 = fieldWeight in 258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=258)
          0.21065958 = weight(abstract_txt:bioinformatics in 258) [ClassicSimilarity], result of:
            0.21065958 = score(doc=258,freq=1.0), product of:
              0.31478533 = queryWeight, product of:
                2.2105098 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.016624376 = queryNorm
              0.66921663 = fieldWeight in 258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=258)
          0.2847014 = weight(abstract_txt:biological in 258) [ClassicSimilarity], result of:
            0.2847014 = score(doc=258,freq=2.0), product of:
              0.34960195 = queryWeight, product of:
                2.8531048 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.016624376 = queryNorm
              0.8143587 = fieldWeight in 258, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.078125 = fieldNorm(doc=258)
        0.2 = coord(5/25)
    
  4. Toldo, L.; Rippmann, F.: Integrated bioinformatics application for automated target discovery. (2005) 0.13
    0.12895238 = sum of:
      0.12895238 = product of:
        0.6447619 = sum of:
          0.01992009 = weight(abstract_txt:well in 260) [ClassicSimilarity], result of:
            0.01992009 = score(doc=260,freq=1.0), product of:
              0.0653363 = queryWeight, product of:
                1.0070765 = boost
                3.9025342 = idf(docFreq=2437, maxDocs=44421)
                0.016624376 = queryNorm
              0.30488548 = fieldWeight in 260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9025342 = idf(docFreq=2437, maxDocs=44421)
                0.078125 = fieldNorm(doc=260)
          0.026304808 = weight(abstract_txt:database in 260) [ClassicSimilarity], result of:
            0.026304808 = score(doc=260,freq=1.0), product of:
              0.07864126 = queryWeight, product of:
                1.104868 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.016624376 = queryNorm
              0.3344912 = fieldWeight in 260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.078125 = fieldNorm(doc=260)
          0.10661088 = weight(abstract_txt:sequence in 260) [ClassicSimilarity], result of:
            0.10661088 = score(doc=260,freq=1.0), product of:
              0.199907 = queryWeight, product of:
                1.7615671 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.016624376 = queryNorm
              0.53330237 = fieldWeight in 260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.078125 = fieldNorm(doc=260)
          0.19400845 = weight(abstract_txt:sequences in 260) [ClassicSimilarity], result of:
            0.19400845 = score(doc=260,freq=2.0), product of:
              0.2364998 = queryWeight, product of:
                1.9160224 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.016624376 = queryNorm
              0.8203324 = fieldWeight in 260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.078125 = fieldNorm(doc=260)
          0.29791766 = weight(abstract_txt:bioinformatics in 260) [ClassicSimilarity], result of:
            0.29791766 = score(doc=260,freq=2.0), product of:
              0.31478533 = queryWeight, product of:
                2.2105098 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.016624376 = queryNorm
              0.9464153 = fieldWeight in 260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=260)
        0.2 = coord(5/25)
    
  5. Warner, J.: Analogies between linguistics and information theory (2007) 0.11
    0.11324964 = sum of:
      0.11324964 = product of:
        0.5662482 = sum of:
          0.07575838 = weight(abstract_txt:contrasted in 1138) [ClassicSimilarity], result of:
            0.07575838 = score(doc=1138,freq=1.0), product of:
              0.14661469 = queryWeight, product of:
                1.0667409 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.016624376 = queryNorm
              0.51671755 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.039211083 = weight(abstract_txt:patterns in 1138) [ClassicSimilarity], result of:
            0.039211083 = score(doc=1138,freq=1.0), product of:
              0.11908021 = queryWeight, product of:
                1.3595806 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.016624376 = queryNorm
              0.32928297 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.0852887 = weight(abstract_txt:sequence in 1138) [ClassicSimilarity], result of:
            0.0852887 = score(doc=1138,freq=1.0), product of:
              0.199907 = queryWeight, product of:
                1.7615671 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.016624376 = queryNorm
              0.42664188 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.2194955 = weight(abstract_txt:sequences in 1138) [ClassicSimilarity], result of:
            0.2194955 = score(doc=1138,freq=4.0), product of:
              0.2364998 = queryWeight, product of:
                1.9160224 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.016624376 = queryNorm
              0.92810017 = fieldWeight in 1138, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.14649454 = weight(abstract_txt:signals in 1138) [ClassicSimilarity], result of:
            0.14649454 = score(doc=1138,freq=1.0), product of:
              0.2867134 = queryWeight, product of:
                2.109644 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.016624376 = queryNorm
              0.5109442 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
        0.2 = coord(5/25)