110 likes | 254 Views
D.Ludviga IMCS UL (SigmaNet). Overview of application CoPS ( C omparison o f P rotein S tructures). Outline. About CoPS ( scientific value); What's new?; Challenges (mentioned during 1AHM); Our solution; Collaboration possibilities. About CoPS (scientific value).
E N D
D.Ludviga IMCS UL (SigmaNet) Overview of application CoPS (Comparison of Protein Structures) 2nd BG-II AHM, 13.05.2009, Riga, Latvia
2nd BG-II AHM, 13.05.2009, Riga, Latvia Outline • About CoPS (scientific value); • What's new?; • Challenges (mentioned during 1AHM); • Our solution; • Collaboration possibilities.
2nd BG-II AHM, 13.05.2009, Riga, Latvia About CoPS (scientific value) • Started at the beginning of BG-II as the pilot application; • developed by Dr. Natalja Kurbatova and Asoc. Prof. Juris Viksna • Field – Bioinformatics; “It has taken biologists some 230 years to identify and describe three quarters of a million insects; if there are indeed at least thirty million ... then, working as they have in the past, insect taxonomists have ten thousand years of employment ahead of them.” R.Leakey and L.Roger
2nd BG-II AHM, 13.05.2009, Riga, Latvia About CoPS • Assumption - protein structures have evolved by a stepwise process, each step involving a small change in the structure. • Comparison of protein structures using Evolutionary Secondary Structures Matching (ESSM) algorithm • ESSM was created for pair wise comparison of structures that allow to identify fold mutations and to estimate evolutionary relationship between proteins. • For exploration of evolutionof protein structures all-against-all comparison have to be done • Application needs: • Protein data base (data set description files are stored) • PDB (3D), FASTA (.txt), structural elements; • size ~8 GB (~2.3GB if compressed); • Total number of tasks - 20 451 945, divided in 410 files
2nd BG-II AHM, 13.05.2009, Riga, Latvia About CoPS • Application consists of: • jdl.essm - JDL file for submitting ESSM (CoPS) job • essm.sh - shell script that is executed on WN once the job starts • database.tar.gz - archive of the protein database with protein descriptions, which is extracted on the WN before anything else starts • essm.linux - statically compiled executable for ESSM(CoPS) that works on Scientific Linux [CERN] 4, 32-bit binary • pairs.txt - sample calculation file that contains pair comparisons • At the end of each job result file pairs.result is generated • Afterwards visualized using a self made tool. • developed using one of GRADE components
2nd BG-II AHM, 13.05.2009, Riga, Latvia About CoPS
2nd BG-II AHM, 13.05.2009, Riga, Latvia Whats new? • Developed (results received); • ~2 weeks. • Implemented in Migrating Desktop; • Presented/demonstrated on OGF25/EGEE Users Forum in Catania, Italy • Demo
2nd BG-II AHM, 13.05.2009, Riga, Latvia Challenges and our solution • Challenges: • Transport the data; • 410 x 2.3GB ≈ 950GB • VOMS-proxy. • Solutions • The needed data was installed on separate clusters software directories (developed “devoted” protein clusters) • Myproxy
2nd BG-II AHM, 13.05.2009, Riga, Latvia Results • The results of the ESSM algorithm were successfully used for the exploration of theCATH fold space by using fold space graphs for representation of comparison results and estimation of "evolution distance" on the basis of observed changes. • The results obtained in the application can be represented as a few steps toward the creation of an general protein evolution model.
2nd BG-II AHM, 13.05.2009, Riga, Latvia Collaboration “Computer science is no more about computers than astronomy is about telescopes” E.W.Dijkstra • Continue collaboration with biologists in LU; • Develop an VO or just devoted servers: • PDB can be installed on a clusters VO software directory • To speed up execution of jobs and avoid per-job download and extraction of these databases.
Thank you! 2nd BG-II AHM, 13.05.2009, Riga, Latvia