1 / 27

Porting Applications to EU-IndiaGrid

This workshop discusses concepts, job categories, levels of gridification, and software remote installation for porting applications to EU-IndiaGrid. It also covers application toolkits, higher-level grid services, and grid applications development.

gravley
Download Presentation

Porting Applications to EU-IndiaGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Porting applications to EU-IndiaGrid: EGEEMarco Verlato (@pd.infn.it) EU-IndiaGrid Workshop 16-18 April 2007 Bangalore, India

  2. Outline • Concepts • Job categories • Levels of Gridification • Software remote installation • Applications invocation

  3. Application Application toolkits, ….. Higher-level grid services (brokering,…) Basic Grid services:AA, job submission, info, … Grid Applications Development Where computer science meets the application communities! VO-specific developments built on higher-level tools and core services Makes Grid services useable by non-specialists Grids provide the compute and data storage resources Production grids provide these core services.

  4. Job concept • gLite middleware follows the job submission concept for application execution and resource sharing • A job is a self contained entity that packages and conveys all the required information and artifacts for the successful remote execution of an application • Executable files • Input/Output data • Parameters • Environment • Infrastructure Requirements • Workflows

  5. Type of applications • The work done to port the application depends on its characteristics, complexity and the level of interactions with external grid services • Size of input/output data • Metadata request • Access to external software (database, libraries...)

  6. Applications categories • Simulation • Bulk Processing • Responsive Apps. • Workflow • Parallel Jobs

  7. Simulation • Examples • LHC Monte Carlo simulation • Fusion • Characteristics • Jobs are CPU-intensive • Large number of independent jobs • Run by few (expert) users • Small input; large output • Needs • Batch-system services • Minimal data management for storage of results ATLAS ITER

  8. Bulk Processing • Examples • HEP processing of raw data, analysis • Earth observation data processing • Characteristics • Widely-distributed input data • Significant amount of input and output data • Needs • Job management tools (workload management) • Meta-data services • More sophisticated data management

  9. Responsive Apps. (I) • Examples • Prototyping new applications • Monitoring grid operations • Characteristics • Small amounts of input and output data • Not CPU-intensive • Short response time (few minutes) • Needs • Configuration which allows “immediate” execution (QoS) • Services must treat jobs with minimum latency

  10. Responsive Apps. (II) • Grid as a backend infrastructure: • gPTM3D: interactive analysis of medical images • GPS@: bioinformatics via web portal • GATE: radiotherapy planning • DILIGENT: digital libraries • Characteristics • Rapid response: a human waiting for the result! • Many small but CPU-intensive tasks • User is not aware of “grid”! • Needs • Interfacing (data & computing) with non-grid application or portal • User and rights management between front-end and grid

  11. Workflow • Examples • “Bronze Standard”: medical images registration • Flood prediction • Characteristics • Use of grid and non-grid services • Complex set of algorithms for the analysis • Complex dependencies between individual tasks • Needs • Tools for managing the workflow itself • Standard interfaces for services (I.e. web- services)

  12. Parallel Jobs • Examples • Climate modeling • Earthquake analysis • Computational chemistry • Characteristics • Many interdependent, communicating tasks • Many CPUs needed simultaneously • Use of MPI libraries • Needs • Configuration of resources for flexible use of MPI • Pre-installation of optimized MPI libraries

  13. Universal Needs • Security • Ability to control access to services and to data • Fine-grained access control lists Encryption & logging for more demanding disciplines • Access control consistently implemented over all services • VO Management • Management of users, groups, and roles • Changing the priority of jobs for different users, groups, roles • Quota management for users, groups, roles • Definition and access to special resources • Application-level services • Responsive queues (guaranteed, low-latency execution)

  14. Levels of “Gridification” • No development:wrap existing applications as jobs. No source code modification is required • Minor modifications:the application exposes minimal interaction with the grid services (e.g. Data Managements) • Major modifications:a wide portion of the code is rewritten to adopt to the new environment (e.g. parallelization, metadata, information) • Pure grid applications:developed from scratch. Extensively exploit existing grid services to provide new capabilities customized for a specific domain (e.g. metadata, job management, credential management)

  15. No development • It is the simplest case : the application has not external dependencies, just needs computational power • Being sure that the application runs on the Worker Node, the executable, and the input files it may need are carried in the Input Sandbox and therefore executed • Although the application can be directly run, it’s recommended to invoke it from a supplied script which prepare the environment before the execution

  16. Minor modifications • The application has minimal interactions with grid service : it is self consistent but... • e.g. takes input files from an SE • output files are stored on a SE • Perform I/O from a metadata catalog • the application itself can be downloaded from a SE • User provides an adapter script which executes needed tasks, set the environment and run the application

  17. Major modifications • A part of the application software is rewritten taking care of the new environment : e.g. because of parallelization, interaction at run time with grid services... • Use of API • Web Service invoked • CLI wrap • Source code recompiled taking into account this

  18. Pure grid application • Entirely developed from scratch taking into account grid capabilities • Better exploits grid facilities • Needs a deep knowledge of grid environment and possibilities in order to engineer entirely the application logic over a grid environment • Very rare to meet...

  19. Additional software installed • Independently from the integration level, applications may need external software (libraries) that, as application itself, can be provided in several ways • at run time • carrying out it within InputSandbox • downloading it from a Storage Element • permanent install on the CE • as tar.gz installed by the VO software manager • as rpm being part of the gLite release

  20. Software installation on CE • The Experiment Software Manager (ESM) is the member of the experiment VO entitled to install Application Software in the different sites. • The ESM is generally acknowledged through a VOMS role (SoftwareManager) • The ESM can manage (install, validate, remove...) Experiment Software on a site at any time through a normal Grid job, without previous communication to the site administrators. • Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue. • lcg-asis is a good generator for such installation jobs

  21. Software installation on CE • The site provides a dedicated space where each supported VO can install or remove software. The amount of available space must be negotiated between the VO and the site • An environmental variable holds the path for the location of a such space. Its format is the following: VO_<name_of_VO>_SW_DIR

  22. Software validation • Once the software is installed, the ESM can perform the validation • The validation is a process verifying that the software was properly installed and works as expected • It can be performed in the same job, after installation (validation on the fly) or later on, via a dedicated job.

  23. Software validation • If the ESM judges that the software was properly installed he publishes in the Information System (IS) the tag which identifies unequivocally the software, e.g.:VO-cms-CMSSW_1_2_2 • Such a tag is added to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS, using the GRIS running in the CE • Jobs requesting for a particular piece of software can be directed to the appropriate CE just by setting special requirements on the JDL, e.g.: Requirements=Member("VO-cms-CMSSW_1_2_2", other.GlueHostApplicationSoftwareRunTimeEnvironment);

  24. Invoke Application • From the UI • Command Line Interfaces / Scripts • APIs • Higher level tools • From portals • Accessible from any browser • Tailored to applications

  25. References (I) https://grid.ct.infn.it/twiki/bin/view/GILDA/APIUsage

  26. References (II) https://glite-demo.ct.infn.it/

  27. Acknowledgements • Thanks to Mike Mineter (NeSC), Valeria Ardizzone and Emidio Giorgio of the INFN GILDA team

More Related