1 / 28

Using the Starbridge Systems FPGA-based Hypercomputer for Cancer Research Experiences of a computational chemist/biologi

Using the Starbridge Systems FPGA-based Hypercomputer for Cancer Research Experiences of a computational chemist/biologist. Jack Collins, Ph.D. Advanced Biomedical Computing Center SAIC/National Cancer Institute Frederick, MD. Motivation.

lihua
Download Presentation

Using the Starbridge Systems FPGA-based Hypercomputer for Cancer Research Experiences of a computational chemist/biologi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Starbridge Systems FPGA-based Hypercomputer for Cancer ResearchExperiences of a computational chemist/biologist Jack Collins, Ph.D. Advanced Biomedical Computing Center SAIC/National Cancer Institute Frederick, MD

  2. Motivation • New technologies in proteomics, genomics, and imaging are providing more data and challenging the conventional wisdom of biologists. Computational biologists must develop more realistic and precise models of biological systems at the cellular and network levels to help make sense of this new data. • Biomedical research needs to measure performance in “Heartbeats to Solution” E195/MAPLD2004

  3. Biological Applications • Systems Biology • Correlated networks of cells and biological processes • Reaction pathways/cascades • Properties of cell/bacterial/viral populations (Biodefense) • Bacterial virulence factors • Generating diversity by changing immune signature • Environmental Adaptation of cells/pathogens • Drug Resistence (Cancer, HIV, Bacteria) • Nano-systems/nano-technology • Statistical fluctuations must be included in models • Single cells -- Essentially nano-systems • Machinery within cells/nucleus • Non-equilibrium dynamics E195/MAPLD2004

  4. Cellular Processes(Examples) • DNA Replication • Interactions with proteins and small molecules • Transcription Factors • Gene Regulation • RNA • Editing, interference, protein synthesis • Regulatory feedback • Kinase Pathways/Cascades E195/MAPLD2004

  5. Modeling Reactions/PathwaysDiscrete Processes • Must study populations of cells/molecules but the mean behavior is dependent on the states of the individual entities. • Low copy number of cells/molecules • Variation in copy number • Relatively slow reaction rates • Varied conditions/environments • “Activation potential” to reaction E195/MAPLD2004

  6. Simulation Methods • Stochastic Simulations • Deterministic modeling • Mean behavior of large numbers – often small numbers of biological components • Fluctuations are important • Boolean Networks • Lack of experimental rate constants E195/MAPLD2004

  7. Why use FPGAs? • Current Computational Limitations • Can only model relatively modest systems • Computational Efficiency • Inherent parallelism in molecular reactions • Scalability • Use multiple FPGAs to simultaneously model hundreds of reactions • Looking to Future • Computational power rapidly growing • Price/Performance E195/MAPLD2004

  8. Smith-Waterman Update(Proof of Concept) • Total # Operations / Second • 1 Smith-Waterman Step includes: • 25 Logic Operations (Adds, compares, mostly 26-27 bit ops, some single bit ops) • 13 Data Reorder Operations (Move, Combine…) • 11 Data Stor (Assignment) • Logic Operations Only: • 25 Ops * 25Mhz * 448 Smith-Waterman kernels = 280Billion Operations / Second • Logic & Data Operations: • 49 Ops * 25Mhz * 448 Smith-Waterman kernels = 550Billion Operations / Second • Total Aggregate Communications Bandwidth of Systolic Array • 12 * 88 * 25Mhz = 26.4 Gb/s plus 7 * 22 * 50Mhz = 7Gb/s = 34.1 Gb/s • Resources Consumed / Resources Available • PE2 – PE7: 60% to 70% consumed • PE1 20% consumed; XPE 5%; XPR .1% • DMA transfer between host PC and FPGAs • Initial results 210Mb/sec (FPGA->X86) E195/MAPLD2004

  9. Smith-Waterman (cont.)See Poster by Jim Yardley, SBS • Opportunities to further optimize the algorithm include: • Increasing the number of SW_Iterations that can be done in parallel (up to 100 Billion Smith Waterman steps/second) • Increasing the clock speed of the hardware (up to 1 Trillion Smith Waterman steps/second) • Friendlier User Interface E195/MAPLD2004

  10. Viva Environment • VIVA GRAPHICAL LANGUAGE • Capture natively parallel code • Accommodate data of any type, size, or precision • Tune algorithms for speed of execution or conservation of hardware resources • VIVA EDITOR • Call Viva algorithms from legacy code such as C, C++, or Fortran • Interactively debug code • Import/Export EDIF files • VIVA COMPILER/SYNTHESIZER • Program multi-million gate designs • Compile hardware designs quickly for efficient development • VIVA LIBRARIES • Reuse flexible Viva objects which accept any data type or size • Target any hardware platform with a ‘System Description’ • Prototype Viva on any X-86-based Windows machine E195/MAPLD2004

  11. Viva as a Modeling Language? • Programming FPGAs has generally been the domain of engineers. • Viva • “Pseudo-graphical language” – Map Model to Viva • Inherent parallelism of Model can map to FPGAs • Recursion of model • Document Code • Use the underlying elements of Viva™ to create an environment that the bio-informatician/computational biologist can use to program the FPGA hardware • Build Library Elements/Modules specific to Model E195/MAPLD2004

  12. Libraries for Biology/Biochemistry • Known Reaction Processes • Conditional Elements to relate the reactions to each other • Outputs to visualize the reactions • Built-in Infrastructure for handling I/O • Minimize Learning Curve for Modeling Biological Processes E195/MAPLD2004

  13. Ease of Programming?Library Creation • Examples of simple reactions programmed in Viva by a relatively novice user over a few days. • A  B • A  B • AB • A+BC E195/MAPLD2004

  14. E195/MAPLD2004

  15. E195/MAPLD2004

  16. E195/MAPLD2004

  17. E195/MAPLD2004

  18. Programming StyleProgram Design • Multiple ways to package logic • No Unique Solution • Which is “best” depends on “user” • Simplicity vs. Functionality • Ease of Debugging • Ease of Documenting E195/MAPLD2004

  19. E195/MAPLD2004

  20. E195/MAPLD2004

  21. Output Interfaces • Efficient Computation of the Model is Useless if you can’t see the results • Interface into COM objects • Integrate Data Analysis and Visualization E195/MAPLD2004

  22. E195/MAPLD2004

  23. E195/MAPLD2004

  24. E195/MAPLD2004

  25. Lessons Learned • Timing is Everything! • Complexity of building large systems of reactions means that both efficiency (minimize clock ticks) and stability of computation (consistent results by keeping latency and synchronization in check) must be considered in a general system. • Many ways to package the logic • Not all are equal! • Simplicity vs Functionality • Document your Code! • Bugs are often subtle • Potential is Enormous E195/MAPLD2004

  26. Obvious Extensions • Timing & Synchronization • Finite State machine • State of system may depend on several variables or conditions • Not all conditions need to be completely known • Some may be “black boxes” that produce a signal • Go-Done-Busy-Wait E195/MAPLD2004

  27. Future Directions • User-Friendly Interfaces to Applications • Expand Application Areas • Imaging • Pattern Recognition/Clustering/Data Mining • Expand Libraries for Reactions/Pathways • “Tinker-Toy” Modeling • Work with Vendor to bring FPGA solutions to wider community of computational biologists • Faster Application Development Time • Debugging and Documentation E195/MAPLD2004

  28. Acknowledgements • Starbridge Systems • Kent Gilson • Jim Yardley • Fred Geiger • NCI for Support • Stan Burt, Director ABCC E195/MAPLD2004

More Related