1 / 34

Grid Activity at CCS

Grid Activity at CCS. Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba. About Myself. Name Toshiyuki Amagasa Affiliation: Division of Computational Informatics, Center for Computational Sciences

kira
Download Presentation

Grid Activity at CCS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba

  2. About Myself • Name • Toshiyuki Amagasa • Affiliation: • Division of Computational Informatics, Center for Computational Sciences • Department of Computer Science, Graduate School of Systems and Information Engineering • Area of research • Data engineering • Database system • Recent topics • XML databases • Parallel XML query processing • OLAP analysis for XML • Web information extraction for XML • Databases in scientific applications • Faceted navigation for QCDml • Meteorological database

  3. ILDG-JP Members • Prof. Mitsuhisa Sato (Director, CCS) • Prof. TomoteruYoshie (CCS) • Prof. Osamu Tatebe (CCS) • Dr. NaoyaUkita (CCS) • Prof. Toshiyuki Amagasa (CCS)

  4. Talk Outline • Current Status of ILDG • A Brief History of JLDG • An Overview of JLDG • A Development of New ILDG Client • Faceted Navigation of QCDml • Conclusions and Future Work

  5. Current Status of JLDG

  6. A Brief History of JLDG(1/3) • Hepnet-J/sc 2002- (SINET GbE private network) • Widely-distributed file system • Network backbone: Super SINET VPN • Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U.,and Kanazawa U. • Objective and Implementation • Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security • Mirroring among FSs attached to SCs with administrative CP-PACS SR8000 CCP @Tsukuba File Server File Server CRC @ KEK Hepnet-J/sc YITP @Kyoto File Server File Server RCNP @Osaka SX-5 SX-5

  7. A Brief History of JLDG(2/3) • Problems • Growing cost for managing data location • A dataset may be distributed in several disks. • It is hard for users to remember location of data and mirrors. • No concepts of users and user groups • Hard to support multiple research groups. • Necessary functionalities • A flat data sharing system which has not space limit (or can be extended at anytime) • Users and user group management over several organizations  Japan Lattice Data Grid (JLDG) • Project launched in November 2005 • Operation started in March 2007

  8. A Brief History of JLDG(3/3) • JLDG v1 started operation in May 2008 • Available datasets • CP-PACS Nf=2 QCD configuration • 8,000 files, 1.5 TBytes • CP-PACS/JLQCD Nf=2+1 QCDconfiguration • 21,000 files, 6 TBytes • PACS-CS Nf=2+1 323x64 lattice QCDconfiguration • 2,600 files, 3 TBytes • JLDG v2started operation in December 2009 • Storing and sharing research data generated in daily research activities • Data sharing within a research group

  9. An Overview of JLDG • A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics • Sharing simulation data computed by SCs for several months to several years. • Data files are distributed. Create replications if necessary. • A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas. • Storage space can be incrementally added during operation. Kanazawa ILDG KEK Gfarmfile system Kyoto www.jldg.org Hiroshima Tsukuba Osaka SINET3Network

  10. Software Components • Globus Toolkit V4 (ANL) www.globus.org • GSI authentication, Proxy user certificate creation • GridFTPserver / client • VOMS (EDG) • VO management • Naregi-CA (Naregi) www.naregi.org • User / host certificate creation • Gfarmfile system(U. of Tsukuba) datafarm.apgrid.org • Widely-distributed file system • Uberftp(NCSA) • http://dims.ncsa.uiuc.edu/set/uberftp/ • Interactive GridFTPclient

  11. /gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm DistributedFile System • An open-source distributed file system • A global namespace to unify storage systems • Scalable I/O performance exploiting data access locality • Automated replica selection for fault-tolerance and load-balancing Global namespace Mapping Replica creation Gfarm File System

  12. Summary • JLDG • A brief history • An overview • Used as an infrastructure for daily research activity • Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees

  13. Development of a New ILDG Client

  14. Int’l Lattice Data Grid (ILDG) • A data grid for sharing Lattice QCD configuration • File Formats in ILDG • Configuration binary • LIME (Lattice QCD Interchange Message Encapsulation) • Metadata(QCDml) • ensemble XML • configuration XML • LFN (Logical File Name) • Identifier for configuration binary configuration (binary) configuration (binary) configuration (binary) configuration (binary) configuration XML configuration XML configuration XML configuration XML ensemble XML LFN LFN LFN LFN markovChainURI

  15. QCDml Ensemble XML <markovChainxmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution>

  16. Typical Usecase of ILDG LFN (Logical File Name) SURL (Site URL) TURL (Transfer URL) VOMS Authentication

  17. Difficulties in Finding Desired Configuration • Directly use query language (XQuery / XPath) • A simple example: • Knowledge about XML, QCDml, and XQuery (XPath) are needed. • Hard to get the whole picture of available data. • Hierarchical list • Easy to use. • Need huge screen to show the entire list. • Still difficult to get the whole picture of the data. /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]

  18. Basic Idea • Applying faceted-navigation interface to browse QCDml ensemble XML data.

  19. Faceted-Navigation • What is “faceted-navigation”? • A scheme for browsing objects with attributes. • Successfully used in some applications, such as Apple iTunes. • Procedure • A user select a value in a facet • To select a set of objects of interest • The system updates the list of objects, list of facets, and respective values • (Repeat) • Example • The Flamenco Search http://flamenco.berkeley.edu/

  20. The Flamenco Searchhttp://flamenco.berkeley.edu/

  21. The Flamenco Searchhttp://flamenco.berkeley.edu/

  22. The Flamenco Searchhttp://flamenco.berkeley.edu/

  23. Faceted-Navigation • Good features • Users have a freedom to choose a facet • c.f. Hierarchical list • Give a big picture of the dataset • Available values along with their population • Effective • Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects.

  24. Technical Challenges • How to define facets? • How to extract values according to the facets? • How to achieve quick response from the database for improving user experience?

  25. Choosing the Facets • Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. • Selected elements from QCDml ensemble XML • Regional grid • Collaboration • Project name • Number of flavors • Time • Parameters • Lattice size • Gluon action • Parameters • Quark action • Parameters

  26. Extracting Values from a Facet(1/3) CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … • Extract text values • Collaboration • Project name • Need substring extraction • Date 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2000 2005 2006 2007 2008 <date>2007-02-26T21:39:33+09:00</date>

  27. Extracting Values from a Facet(2/3) <physics> <size> <elem> <name>X</name> <length>12</length> </elem> <elem> <name>Y</name> <length>12</length> </elem> <elem> <name>Z</name> <length>12</length> </elem> <elem> <name>T</name> <length>24</length> </elem> … • Need text value generation • Lattice size e.g. • 12 / 12 / 12 / 24

  28. Extracting Values from a Facet(2/3) • Gluon action / Quark action • An element name itself represents a value  Extract element name as a value of a facet <action> <gluon> <iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/... <action> <gluon> <DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla...

  29. QCDml Faceted Navigation I/FSystem Configuration Facet Navigation System (PHP + SQL + XQuery) Web Server (Apache) ILDG USQCD QCDml Ensemble (ILDG) & Configuration (JLDG) Facet Database JLDG LDG Facet extraction (XQuery) Downloading Ensemble XML UKQCD CSSM XML DB (eXist) RDBMS (MySQL)

  30. Database Design (1/2) • Use RDBMS for quick response • Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007

  31. Database Design (2/2) • Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:

  32. A Screenshot of the System

  33. Conclusion and Future Work • Conclusion • Current Status of ILDG • A Development of New ILDG Client • Future work • Exploring more chances to apply data engineering techniques in various e-Science fields. • Data mining • Data integration • …

  34. Thank you very muchfor your kind attention Questions should be addressed to amagasa@cs.tsukuba.ac.jp

More Related