Document (#33010)

Author
O'Kane, K.C.
Lockner, M.J.
Title
Indexing genomic sequence libraries
Source
Information processing and management. 41(2005) no.2, S.265-274
Year
2005
Abstract
This paper describes an extensible, open-source (GPL) data repository and retrieval system that supports fast, efficient, keyword based retrieval of genomic sequences from multiple libraries with retrieved sequences post-processed by FASTA, Smith-Waterman and other analysis software. This application is implemented for Linux and is written in Mumps, C, and C++ with supporting components that include the Berkeley Data Base, the Perl Compatible Regular Expression Library, GLADE, and tools such as FASTA, Smith-Waterman, and modules from EMBOSS. The package described here can quickly index data sets of up to 256 terabytes using a B-tree based multi-dimensional data model. An example is presented that indexes the text of the full NCBI Genbank library.

Similar documents (content)

  1. Shachak, A.: Diffusion pattern of the use of genomic databases and analysis of biological sequences from 1970-2003 : bibliographic record analysis of 12 journals (2006) 0.16
    0.16478607 = sum of:
      0.16478607 = product of:
        0.8239303 = sum of:
          0.056764483 = weight(abstract_txt:sequence in 5906) [ClassicSimilarity], result of:
            0.056764483 = score(doc=5906,freq=1.0), product of:
              0.13304949 = queryWeight, product of:
                1.072276 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.01817704 = queryNorm
              0.42664188 = fieldWeight in 5906, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.010014293 = weight(abstract_txt:that in 5906) [ClassicSimilarity], result of:
            0.010014293 = score(doc=5906,freq=2.0), product of:
              0.047907777 = queryWeight, product of:
                1.1144577 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01817704 = queryNorm
              0.20903271 = fieldWeight in 5906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.037283976 = weight(abstract_txt:data in 5906) [ClassicSimilarity], result of:
            0.037283976 = score(doc=5906,freq=2.0), product of:
              0.1266641 = queryWeight, product of:
                2.092458 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.01817704 = queryNorm
              0.29435313 = fieldWeight in 5906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.20659785 = weight(abstract_txt:sequences in 5906) [ClassicSimilarity], result of:
            0.20659785 = score(doc=5906,freq=2.0), product of:
              0.31480813 = queryWeight, product of:
                2.3325875 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.01817704 = queryNorm
              0.6562659 = fieldWeight in 5906, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
          0.5132697 = weight(abstract_txt:genomic in 5906) [ClassicSimilarity], result of:
            0.5132697 = score(doc=5906,freq=3.0), product of:
              0.50446236 = queryWeight, product of:
                2.952768 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01817704 = queryNorm
              1.0174589 = fieldWeight in 5906, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=5906)
        0.2 = coord(5/25)
    
  2. Trotman, A.: Searching structured documents (2004) 0.11
    0.108503364 = sum of:
      0.108503364 = product of:
        0.5425168 = sum of:
          0.071680486 = weight(abstract_txt:tree in 3538) [ClassicSimilarity], result of:
            0.071680486 = score(doc=3538,freq=1.0), product of:
              0.13395411 = queryWeight, product of:
                1.0759151 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.01817704 = queryNorm
              0.53511226 = fieldWeight in 3538, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.078125 = fieldNorm(doc=3538)
          0.032467857 = weight(abstract_txt:retrieval in 3538) [ClassicSimilarity], result of:
            0.032467857 = score(doc=3538,freq=3.0), product of:
              0.069017746 = queryWeight, product of:
                1.0921829 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01817704 = queryNorm
              0.4704277 = fieldWeight in 3538, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3538)
          0.008851468 = weight(abstract_txt:that in 3538) [ClassicSimilarity], result of:
            0.008851468 = score(doc=3538,freq=1.0), product of:
              0.047907777 = queryWeight, product of:
                1.1144577 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01817704 = queryNorm
              0.18476056 = fieldWeight in 3538, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3538)
          0.03295469 = weight(abstract_txt:data in 3538) [ClassicSimilarity], result of:
            0.03295469 = score(doc=3538,freq=1.0), product of:
              0.1266641 = queryWeight, product of:
                2.092458 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.01817704 = queryNorm
              0.26017386 = fieldWeight in 3538, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=3538)
          0.3965623 = weight(abstract_txt:smith in 3538) [ClassicSimilarity], result of:
            0.3965623 = score(doc=3538,freq=2.0), product of:
              0.4190151 = queryWeight, product of:
                2.6911 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.01817704 = queryNorm
              0.9464153 = fieldWeight in 3538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=3538)
        0.2 = coord(5/25)
    
  3. Rapp, B.A.; Wheeler, D.L.: Bioinformatics resources from the National Center for Biotechnology Information : an integrated foundation for discovery (2005) 0.11
    0.10692455 = sum of:
      0.10692455 = product of:
        0.44551897 = sum of:
          0.05037312 = weight(abstract_txt:repository in 265) [ClassicSimilarity], result of:
            0.05037312 = score(doc=265,freq=1.0), product of:
              0.12286494 = queryWeight, product of:
                1.0304192 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.01817704 = queryNorm
              0.40998775 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.050781973 = weight(abstract_txt:expression in 265) [ClassicSimilarity], result of:
            0.050781973 = score(doc=265,freq=1.0), product of:
              0.12352887 = queryWeight, product of:
                1.0331995 = boost
                6.5775037 = idf(docFreq=167, maxDocs=44421)
                0.01817704 = queryNorm
              0.41109398 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5775037 = idf(docFreq=167, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.11352897 = weight(abstract_txt:sequence in 265) [ClassicSimilarity], result of:
            0.11352897 = score(doc=265,freq=4.0), product of:
              0.13304949 = queryWeight, product of:
                1.072276 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.01817704 = queryNorm
              0.85328376 = fieldWeight in 265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.014996262 = weight(abstract_txt:retrieval in 265) [ClassicSimilarity], result of:
            0.014996262 = score(doc=265,freq=1.0), product of:
              0.069017746 = queryWeight, product of:
                1.0921829 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01817704 = queryNorm
              0.21728125 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.069751926 = weight(abstract_txt:data in 265) [ClassicSimilarity], result of:
            0.069751926 = score(doc=265,freq=7.0), product of:
              0.1266641 = queryWeight, product of:
                2.092458 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.01817704 = queryNorm
              0.5506843 = fieldWeight in 265, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
          0.14608674 = weight(abstract_txt:sequences in 265) [ClassicSimilarity], result of:
            0.14608674 = score(doc=265,freq=1.0), product of:
              0.31480813 = queryWeight, product of:
                2.3325875 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.01817704 = queryNorm
              0.46405008 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=265)
        0.24 = coord(6/25)
    
  4. Michon, J.: Biomedicine and the Semantic Web : a knowledge model for visual phenotype (2006) 0.10
    0.099851735 = sum of:
      0.099851735 = product of:
        0.4160489 = sum of:
          0.011573717 = weight(abstract_txt:library in 371) [ClassicSimilarity], result of:
            0.011573717 = score(doc=371,freq=1.0), product of:
              0.05807029 = queryWeight, product of:
                1.0018252 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.01817704 = queryNorm
              0.19930531 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
          0.056764483 = weight(abstract_txt:sequence in 371) [ClassicSimilarity], result of:
            0.056764483 = score(doc=371,freq=1.0), product of:
              0.13304949 = queryWeight, product of:
                1.072276 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.01817704 = queryNorm
              0.42664188 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
          0.014996262 = weight(abstract_txt:retrieval in 371) [ClassicSimilarity], result of:
            0.014996262 = score(doc=371,freq=1.0), product of:
              0.069017746 = queryWeight, product of:
                1.0921829 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01817704 = queryNorm
              0.21728125 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
          0.010014293 = weight(abstract_txt:that in 371) [ClassicSimilarity], result of:
            0.010014293 = score(doc=371,freq=2.0), product of:
              0.047907777 = queryWeight, product of:
                1.1144577 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01817704 = queryNorm
              0.20903271 = fieldWeight in 371, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
          0.02636375 = weight(abstract_txt:data in 371) [ClassicSimilarity], result of:
            0.02636375 = score(doc=371,freq=1.0), product of:
              0.1266641 = queryWeight, product of:
                2.092458 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.01817704 = queryNorm
              0.20813909 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
          0.2963364 = weight(abstract_txt:genomic in 371) [ClassicSimilarity], result of:
            0.2963364 = score(doc=371,freq=1.0), product of:
              0.50446236 = queryWeight, product of:
                2.952768 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.01817704 = queryNorm
              0.5874302 = fieldWeight in 371, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=371)
        0.24 = coord(6/25)
    
  5. Tsai, R.T.-H.; Chiu, B.; Wu, C.-E.: Visual webpage block importance prediction using conditional random fields (2011) 0.09
    0.094582744 = sum of:
      0.094582744 = product of:
        0.39409477 = sum of:
          0.019936899 = weight(abstract_txt:based in 924) [ClassicSimilarity], result of:
            0.019936899 = score(doc=924,freq=3.0), product of:
              0.057858884 = queryWeight, product of:
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.01817704 = queryNorm
              0.344578 = fieldWeight in 924, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
          0.11352897 = weight(abstract_txt:sequence in 924) [ClassicSimilarity], result of:
            0.11352897 = score(doc=924,freq=4.0), product of:
              0.13304949 = queryWeight, product of:
                1.072276 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.01817704 = queryNorm
              0.85328376 = fieldWeight in 924, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
          0.081097215 = weight(abstract_txt:tree in 924) [ClassicSimilarity], result of:
            0.081097215 = score(doc=924,freq=2.0), product of:
              0.13395411 = queryWeight, product of:
                1.0759151 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.01817704 = queryNorm
              0.60541046 = fieldWeight in 924, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
          0.0070811743 = weight(abstract_txt:that in 924) [ClassicSimilarity], result of:
            0.0070811743 = score(doc=924,freq=1.0), product of:
              0.047907777 = queryWeight, product of:
                1.1144577 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.01817704 = queryNorm
              0.14780845 = fieldWeight in 924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
          0.02636375 = weight(abstract_txt:data in 924) [ClassicSimilarity], result of:
            0.02636375 = score(doc=924,freq=1.0), product of:
              0.1266641 = queryWeight, product of:
                2.092458 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.01817704 = queryNorm
              0.20813909 = fieldWeight in 924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
          0.14608674 = weight(abstract_txt:sequences in 924) [ClassicSimilarity], result of:
            0.14608674 = score(doc=924,freq=1.0), product of:
              0.31480813 = queryWeight, product of:
                2.3325875 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.01817704 = queryNorm
              0.46405008 = fieldWeight in 924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=924)
        0.24 = coord(6/25)