1 / 28

National e-Science Centre Local Developments

Dr Richard Sinnott Dr Dave Berry 5 th February 2004. National e-Science Centre Local Developments. Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk. Research Manager

danica
Download Presentation

National e-Science Centre Local Developments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dr Richard Sinnott Dr Dave Berry 5th February 2004 National e-Science Centre Local Developments Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk Research Manager National e-Science Centre University of Edinburgh daveb@nesc.ac.uk

  2. Overview • NeSC Role in UK e-Science • NeSC Edinburgh developments • e-Science Institute • Infrastructure/set-up • Projects • Plans • NeSC Glasgow developments • Infrastructure/set-up • Projects • Plans • Conclusions

  3. NeSC’s Role • Help coordinate and lead the UK e-Science Programme • Community building activities, regional support & outreach • Grid building as a member of the Engineering Task Force • Skill building through training events & support centre • Help establish the UK’s international role • International meetings, standardisation work & presentations • Undertake R&D projects • To deliver reliable middleware • To engage industry • To stimulate the uptake of e-Science technology and methods • Run the e-Science Institute • Knowledge building through workshops and conferences • Research visitors and events

  4. NeSC at Edinburgh:Recent Developments • Globus Alliance • Digital Curation Centre • Edinburgh, Glasgow, UKOLN, CCLRC • New e-Science Lecturer (Particle Physics) • Training Team • PPARC and EGEE funding • Manager + 4 trainers • Europe-wide role • DAI Two (Extension of OGSA-DAI) • OGSA Test Grid

  5. Digital Curation Centre communities of practice: users curation organisations community support & outreach Collaborative Associates Network of Data Organisations management & co-ordination research collaborators services research development testbeds& tools Industry standards bodies

  6. e-Science Institute • A meeting place • The focus for presenting UK e-Science • Visiting researchers • Collaborate in our research and development • Engage in and develop our event programme • Build bridges with their community • Visits last between one week and six months • Research-oriented event programme • e-Science research topics • Training to e-Science research teams

  7. eSI Workshops • Space for real work • Crossing communities • Creativity: new strategies and solutions • Written reports • Scientific Data Mining, Integration and Visualisation • Grid Information Systems • Portals and Portlets • Virtual Observatory as a Data Grid • Imaging, Medical Analysis and Grid Environments • Open Issues in Grid Scheduling • Data Provenance & Annotation • e-Science Workflow Services • GeoSciences & Scottish Bioinformatics Forum Suggestions always welcome! http://www.nesc.ac.uk/events/

  8. Projects • OGSA-DAI/DAIT, MS.NETGrid, SunDCG, GridWeaver, BRIDGES, PGPGrid, FirstDIG, ODD-Genes • EGEE, NextGrid • OGSA Test Grid, IBM Early Evaluation • edikt • Publishing Scientific Data • GridPP, AstroGrid, QCDGrid, RealityGrid Portal • Biological Spatio-Temporal Databases • CoAKTinG, Grid-enabled Modelling Tools and Databases for Neuroinformatics, e-Diamond • Dynamic Configuration of Grid Fabrics, Dependable Grid Services, Deductive Synthesis Techniques, Inferring QoS Properties for Grid Applications, Mobile Resource Guarantees • TIES, TIES-II

  9. The Virtual Observatory • International Virtual Observatory Alliance UK, Australia, EU, China, Canada, Italy, Germany, Japan, Korea, US, Russia, France, India How to integrate manymulti-TB collections ofheterogeneous data distributed globally? Sociological and technological challenges to be met

  10. Data Services • GGF Data Access and Integration Svcs (DAIS) • OGSI-compliant interfaces to access relational and XML databases • Needs to be generalized to encompass other data sources (see next slide…) • Generalized DAIS becomes the foundation for: • Replication: Data located in multiple locations • Federation: Composition of multiple sources • Provenance: How was data generated?

  11. 1a. Request to Registry for sources of data about “x” SOAP/HTTP service creation API interactions Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with XPath, SQL, etc XML / Relational database Grid Data Service 3c. Results of query returned to client as XML 3b. GDS interacts with database Data Access & Integration Services

  12. Standards E-Science Apps CS Research Grid Services fore-Science Data Management Commercial SW componentsand skills edikt • The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SAB • SHEFC funded research and development grant • 3 years funding: May 2002 – 2005 • +3 years funding upon successful project and review Requirementsanalysis Technologymatchmaking Edikt project Gap filling Rigorousengineering

  13. Web User1 Grid Proxy Web Servlet DAC DAC DAC DAC ELDAS – Data Access Service Grid User1 Grid User2 • Implemented using Enterprise Java Beans • Data Access Components interface to distinct DBMSs • Accessible as a grid data service or a web data service JavaFramework ELDAS runs anywhere ELDAS Suitable for grid & web EJB - DAS Xindice DB MySQL DB DB2 DB Oracle 9i DB

  14. BinX file describes binary file structure BinX – accessing legacy binary data simulations • The Problem: • Many binary data files • Applications must “know”the data format • Binary data formats are machine-specific BinaryData File BinaryData File BinaryData File • The Solution: • Write a “stand-aside” format description in XML • Provide a library to • Interpret the description • Provide file access across different machines • Build higher-level services BinX Library e-ScienceApplication

  15. NeSC at Glasgow • E-Science Hub • Externally • Glasgow end of NeSC • Involved in UK wide activities • ETF: In May 2003 became first UK e-Science Centre to run integration tests across every site of the UK (Level 2) Grid. Therefore 100% access to UK Grid resources at this time • Public visibility of NeSC • responsible for NeSC web site • Internally • Focal point for e-Science research/activities at Glasgow • Work closely with foundation departments • Department of Computing Science • Department of Physics & Astronomy • Also working closely with other groups including • Bioinformatics Research Centre • Electronics and Electrical Engineering • Biostatistics, …

  16. Glasgow e-Science Investment • Major investment by university • 230m2 of newly renovated floor space in Kelvin Building • offices • access grid facility • training room • equipped with 20PCs/server for training courses • Funding Technical Director

  17. CDF BIO LHC Resource Consolidation at Glasgow • Building around ScotGrid • Providing shared Grid resource for wide variety of scientists inside/outside Glasgow • Particle physicists, computer scientists, electronic engineers, bioinformaticians, … • Focal point, knowledge pool, primary resource for e-Science activity at Glasgow • Target shares • 60% PP, 20% Bioinf, 20% open share… Shared Resources: Disk ~15TB CPU ~ 330 1GHz Hardware • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switches • IBM X Series 370 PIII Xeon with 32 x 512 MB RAM • 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD • eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory • eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory • CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory • CDF 7.5TB Raid disk

  18. Projects with NeSC Glasgow Involvement • DCC • National Digital Curation Centre • AMUSE • Autonomous Management of Ubiquitous Systems for e-Health • P2Popt • Performance measurement & mgt of 2-Layer Peer to Peer NWs… • PGPGrid • Peppers Ghost Productions • Equator • Environmental e-Science Interdisciplinary Research Project • BPS • Biochemical Pathway Simulator • BRIDGES

  19. Overview of BRIDGES • Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES) • NeSC (Edinburgh and Glasgow) and IBM • 2 year project started 1st October 2003 • Supporting project for CFG project • Generating data on hypertension • Rat, Mouse, Human genome databases • Variety of tools used • BLAST, FASTA, MPsrch, BLAT, Gene Prediction, visualisation, … • Variety of data sources and formats • Microarray data, genome DBs, project partner research data, medical records, … • Aim is integrated infrastructure supporting • Data federation • Security

  20. Public curated Shared data data Private data Private data Private data Private data Private Private data data CFG Partner Distribution Edinburgh Glasgow Leicester Oxford Netherlands London

  21. Problems specific to Bio-Community PDB Content Growth • DBs growing exponentially!!! • Biobliographic (MedLine, …) • Amino Acid Seq (SWISS-PROT, …) • 3D Molecular Structure (PDB, …) • Nucleotide Seq (GenBank, EMBL, …) • Biochemical Pathways (KEGG, WIT…) • Molecular Classifications (SCOP, CATH,…) • Motif Libraries (PROSITE, Blocks, …) • …

  22. Arabidopsis thaliana Buchnerasp. APS Yersinia pestis Aquifex aeolicus Archaeoglobus fulgidus Borrelia burgorferi Mycobacterium tuberculosis Vibrio cholerae Thermoplasma acidophilum Caenorhabitis elegans Campylobacter jejuni Chlamydia pneumoniae Drosophila melanogaster Escherichia coli Neisseria meningitidis Z2491 Plasmodium falciparum Ureaplasma urealyticum Helicobacter pylori Mycobacterium leprae Pseudomonas aeruginosa mouse Bacillus subtilis Thermotoga maritima Xylella fastidiosa Rickettsia prowazekii Saccharomyces cerevisiae Salmonella enterica rat More genomes …...

  23. Tissues Cell Organisms Organs Protein functions Protein Structures Physiology Gene expressions Nucleotide structures Cell signalling Nucleotide sequences Protein-protein interaction (pathways) Complexity of Biological Data distributed, heterogeneous, dirty, ... now link it together!

  24. BRIDGES Data Integration/Federation • Local repository being developed • Populated with data that cannot be federated • e.g. public data sets with no programmatic interface • Shared data sets of CFG scientists • Security through • X.509 PKI (authentication) • PERMIS (authorisation) • Will make use of e-Science technologies (OGSA-DAI/DAIT, ELDAS, IBM’s DiscoveryLink) • Automatically keep fresh/updated data • Web (Grid) services offered that allow to make use of these local data sets • For example for visualising, searching, querying, … • Example usage scenario …

  25. Push relevant data onto ScotGrid for BLAST’ing Up to date results input to DB Personalised Services BLAST Smith W SV wrappers Java App downloaded (via WebStart) System Usage Scenario Generic services used by other projects BRIDGES Portal Remote data in Oracle, DB2, Sybase, Excel, flat files, XML... Secure access for CFG VO Browser based clients… DL Client Site X OGSA-DAI Secure Data Repository Shared/ Private Data Sets Authorisation Per user, per site

  26. Conclusions • NeSC continues to provide leadership in UK e-Science • Difficult with multitude of scientific research areas, heterogeneity of systems and fluidity of technologies, • GT2, GT3, WSRF, GT4…? • Closer working with GridPP beneficial for everyone • move towards Production Grid • ScotGrid a good model for co-operation • Planning for soft landing through diversification and more integration into university • MRC bids, BBSRC bids, EPSRC bids, … • UK e-Science operating as community for upcoming DTI funding opportunities • Plans for developing Grid Computing teaching modules as part of advanced MSc

  27. Website • National e-Science Centre http://www.nesc.ac.uk/ • Mission, Background, Foundation • Locations, Staff, Resources, Projects • Register interest, Mailing lists, NeSCForge • Regional associations and Collaborations • News, Notices • Presentations & Lectures http://www.nesc.ac.uk/presentations/ • e-Science Institute http://www.nesc.ac.uk/esi/ • Mission, Events (Future and Past) • Register for Events, Visitor Programme • UK e-Science • Map and Index of Centres http://www.nesc.ac.uk/centres/ • Technical Papers http://www.nesc.ac.uk/technical_papers/ • Index of >100 Projects http://www.nesc.ac.uk/projects/ • Task Forces http://www.nesc.ac.uk/teams/ • General Information • Glossary, Bibliography, Who’s who • E-Science job vacancies

  28. Questions…?

More Related