1 / 13

Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies

Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies B. Schmidt (UNSW) D.T. Singh (Genvea Biosciences) R. Trehan, T. Bretschneider (NTU, Singapore). Contents. H5N1 Genetics H5N1 Phyloinformatics Design Principles of Quascade

baker-york
Download Presentation

Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies B. Schmidt (UNSW) D.T. Singh (Genvea Biosciences) R. Trehan, T. Bretschneider (NTU, Singapore)

  2. Contents • H5N1 Genetics • H5N1 Phyloinformatics • Design Principles of Quascade • H5N1 Phyloinformatics with Quascade • Results • Conclusion and Future work

  3. H5N1 Genetics • Belongs to the Influenza A virus type • Segmented RNA genome • 8 genes, 11 proteins • Classification based on: • Hemagglutinin (HA): 15 subtypes • Neuraminidase (NA): 9 subtypes • Genetic variations in HA/NA • Genetic drift • Point mutations • 1918 Spanish flu • Genetic shift • Reassortment of the segmented genome • 1957, 1968, 1997 pandemics • 2003 Z strain of H5N1

  4. H5N1 Phyloinformatics • Essential to monitor new emerging strains • Molecular evolution at gene and genome level • Phylogenetic analysis for determining the origin of new strains • Phylogenetics • How fast do proteins evolve? • What is the best method to measure the evolution? • How to obtain the best phylogenetic tree? • Phylogenetic algorithms • Character based • Maximum Parsimony, Maximum Likelihood (ML) • Distance based • UPGMA, Neighborhood Join (NJ) • Bayesian MCMC based • Mr. Bayes, BEAST

  5. Quascade – User Interface Example Processing pipeline • Communication • A data-flow tool in which each black-box represents Java objects running on different computers! • Assignment of objects to available computers done automatically (manually if required) • Communication between objects done transparently • Configuration of objects done before run-time

  6. Java Object Java Object Java Object Object Features • Coding in regular Java/ C/ C++ • Persistent – activated whenever all data-inputs present • No explicit messaging protocol required • No distributed computing concepts need to be understood • Objects automatically or manually assigned to computers / CPU-cores

  7. Phyloinformatics Workflow with Quascade

  8. Parallelized Phyloinformatics Workflow

  9. Data and Algorithms • Core Group • 22 H5N1 NA sequences from SwissProt and TREMBL • Medium Set • 581 NA H5N1 sequences from Uniprot • Large Set • 909 NA Influenza A sequences from Uniprot • ProtDist • NJ • UPGMA • ProtPars • ProtML • Mr. Bayes

  10. Distance-based workflow MP workflow 400 400 360 360 300 300 200 Processing time [h] Processing time [h] 200 145 140 100 100 16 16 6 5 0 0 909sequences 581sequences 909sequences 581sequences Runtime and Scalability (NA Bird Flu Protein) • 25 processors 1 processor

  11. Mr Bayes – Tree Core Set

  12. Analysis and Observations • Clustering possibilities • Temporal, host-based, geographical • Algorithms • Mr. Bayes and ProtML are most consistent in their performance • Too compute-intensive for the larger “macro” sets • Observed pattern • All phylograms yielded geographic-based clustering rather than time-based clustering • Host ranges along clustered clades vary • Same strain with identical NA sequences can infect different hosts • NA may not be the sole factor responsible for determining the diverse host range • Glycan site acquisition or loss seems to play a critical role in the molecular evolution of H5N1 NA • Identification of “bridging isolates” may help in rapid monitoring and development of global scale warning system for H5N1

  13. Conclusion and Future Work • Quascade • New graphical data-flow tool to design automatically grid-enabled pipelines / workflows • Supports implicit high-performance parallelization • Supports persistent components • Can be used with Java / C/ C++ code or application-binaries • H5N1 Phyloinformatics • Can take advantage of workflow system and HPC • Can be easily used and modified by biologists • Use H5N1 NA sequences to better understand evolution of H5N1 • Analysis of H5N1 NA data with different algorithms indicates spatial clustering based on geographical distribution rather than temporal or host. • Future work • Studies in conjunction with other proteins such as HA, Polymerase etc., and also at gene and genome level

More Related