1 / 20

Bioinformatics at USDA-ARS Livestock Issues Research Unit

Bioinformatics at USDA-ARS Livestock Issues Research Unit. Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton. Projects. Future: Interactive neural network based models to describe and predict gene expression in Livestock and Pathogens

Download Presentation

Bioinformatics at USDA-ARS Livestock Issues Research Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton

  2. Projects • Future: Interactive neural network based models to describe and predict gene expression in Livestock and Pathogens • Present: Various Projects Various States Leading to the Future • Molecular Modeling • Gene Finding • Distributed BLAST • Whole Genome Comparison • Functional Genomics and pathways • Pathway or system targeted Microarray design

  3. Functional Genomics • Functional Genomics/Gene Ontology- controlled vocabulary • Define, annotate, categorize, and describe large genetic datasets (e.g. est, mRNA) • We have developed a custom curated database for functional domain BLAST (regular blast and rps-BLAST using kog, cog, pfam, hmmr, smart domains) • Ultimately will become a comprehensive .NET suite of analyses for microarray design from new sequence all the way to result visualization.

  4. Ontology • Annotation – propogation of error in definitions • Ca

  5. BLAST: need for speed (II) • We are working with roughly 5000-100,000 queries against 1GB databases • 1 query takes a fairly fast PC 3 minute to complete • dual 3.2 GHZ XEON • 6 GB RAM • RAID0 SCSI-320 HD • Other methods MPI-BLAST, WU-BLAST, THREADED BLAST, SGE-BLAST, commercial TURBO BLAST, DNAstar etc.

  6. BLAST ALGORITHM Cgtcgctcgctgtaagtac– query e.g.1000 letter word Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) A basic local alignment search tool. Journal of Molecular Biology 215, 403-410. • What database sequence is most similar to my query. • Databases one of ours is 60GB worth of letters • BLAST generates statistics based upon similarity and substitution probabilities In simplest form purine to purine better than purine to pyrimidine • Slide along 4 GB database find word match and try to extend

  7. BLASTX as example-Translation into 6 reading frames, search database with these 6 sequences with word size of 3. • Time to BLAST • Up to a point decreased time correlated with number of slaves available • Average test machines (2.4 ghz/1gb RAM/SATA150) • (e.g. 90 seq/13 CPU/3 min) vs (90seq/1CPU/38.5 min) 350MB db GB-LAN

  8. .NET Distributed BLAST • Take advantage of unused laboratory compute resources • Provide easy, powerful tool for Distributing BLAST • Target Atmosphere • Windows LAN • Current Open Source Distributed BLAST Applications • Require server class master or version of UNIX • Difficult to set up, configure databases, compile and submit jobs. • No large job fault tolerance

  9. W.ND BLAST : A Bioinformatician promoting windows? • .NET C# • First tests Condor, MPI, a ported remote shell • Contractor • Project Manager • Database formatter • Worker machines • Job leasing • Output processing HT backend apps

  10. Gotta GUI

  11. Database formatter

  12. Functionality • Network bandwidth would eventually be limited • Fault tolerant to worker failure • Resume upon reboot if Contractor fails • No statistical problems with search results • Complete BLAST database on each worker node if resources allow • Easy to install a breeze to use

  13. .NET Distributed BLAST • Queue at each node • Contractor only allows maximum of two query sequences in each node’s queue • Ensures application wait a minimal amount of time between completion and next job • Thread per node • Makes use of .NET Asynchronous Delegate / AD – scalability ??? • Thread Invokes BLAST on remote node • Upon completion, remote node sends “finished” message to the Contractor • The contractor collects results and performs validity check • Once results are verified, remote worker BLAST starts on queue sequence and Contractor prepares allocates future job

  14. .NET Distributed BLAST • Fault Tolerance-revisited • Task migration handled through application-level checkpointing • Worker encounters fault or crashes, • Contractor redirects failed nodes sequence on another worker node. • Minimal loss of time • Integrating QOS functionality- current in works • decrease priority when workstation is in use –based upon system remote call checking CPU%, memory etc • GUI allows increasing or decreasing priority – rev gauges and throttles • Storage requirement limitations - redirect query to other database source (working with 10 connection limitation in XP pro)

  15. Future Directions • Quality of Service • Allow Contractor to set priority for application • Contractor Fault Tolerance • Large Network Optimization • Sub Contractors • Asynch Del. Thread limit- ewww kewl WEB SERVICE! • Shadow (Sub) Contractors- network load balance

  16. The End! • Questions? • Suggestions? • Advice? • Even Criticism?

More Related