grid analysis environment and the ultralight project

1. Grid Analysis Environment and the Ultralight Project
2. Outline
GAE Overview GAE System Ultralight UAE
3. Grid Analysis Environment
4. Goal
Provide a transparent environment for a physicist to perform his/her analysis (batch/interactive) in a distributed dynamic environment: Identify your data (Catalogs), submit your (complex) job (Scheduling, Workflow,JDL), get �fair� access to resources (Priority, Accounting), monitor job progress (Monitor, Steering), get the results (Storage, Retrieval), repeat the process and refine results Support data transfers ranging from the (predictable) movement of large scale (simulated) data, to the highly dynamic analysis tasks initiated by rapidly changing teams of scientist
Network Compute Storage
5. System View
Local Services (High Level Services) Global Services (Domain) Applications Monitoring Development Deployment Support/ Feedback Service Oriented Architecture (Frameworks) (Domain) Portal Testing Resources System Stages Interface Specifications!
6. System View (Details)
Domains Virtual Organization and Role management Service Oriented Architecture Authorized Access Access Control Management(groups/individuals) Discoverable Protocols (XML-RPC, SOAP,�.) Service Version Management Frameworks: Clarens, MonALISA... Monitoring End-to-end monitoring,collecting and disseminating information Provide Visualization of Monitor Data to Users
Local Services (Local View) Local Catalogs, Storage Systems, Task Tracking (Single User Tasks), Policies, Job Submission Global Services (Global View) Discovery Service, Global Catalogs, Job Tracking (Multiple User Tasks), Policies High Level Services (�Autonomous�) Acts on monitor data and has global view Scheduling, Data Transfer, Network Optimization, Tasks Tracking (many users)
(Domain) Portal One Stop Shop for Applications/Users to access and Use Grid Resources Task Tracking (Single User Tasks) Graphical User Interface User session logging (provide feedback when failures occur) (Domain) Applications ORCA/COBRA, IGUANA, PHYSH,�.
10. Peer 2 Peer System
Allow a �Peer-to-Peer� configuration to be built, with associated robustness and scalability features. Discovery of Services No Single point of failure Discover services Discover services Discover services This slide shows that there is no single instance of a service; services are distributed across many service providing sites. The services at one site can interact with services at another site. Peer to peer is used to provide dynamic service discovery in order to prevent a single point of failure, and to allow easy integration of new services and service providers into the system. The term "super peer" is used to describe the natural tendency for the system to direct more requests to peers that have more resource availability (disk space, bandwidth, computing nodes); that is, not all peers are created equal and the more resource-affluent peers will tend to handle more of the grid-wide load than the resource-limited peers. P2P capabilities are being added to the Clarens service servers, as an addition to the existing security and service hosting capabilities. This makes Clarens a very key component of the CAIGEE architecture. Factors motivating design choices: P2P technologies are widely used for creating ad-hoc and dynamic groups of resource providers (file sharing). While still fairly young, P2P has proven that it is capable of scaling to the size that the GAE aims for.This slide shows that there is no single instance of a service; services are distributed across many service providing sites. The services at one site can interact with services at another site. Peer to peer is used to provide dynamic service discovery in order to prevent a single point of failure, and to allow easy integration of new services and service providers into the system. The term "super peer" is used to describe the natural tendency for the system to direct more requests to peers that have more resource availability (disk space, bandwidth, computing nodes); that is, not all peers are created equal and the more resource-affluent peers will tend to handle more of the grid-wide load than the resource-limited peers. P2P capabilities are being added to the Clarens service servers, as an addition to the existing security and service hosting capabilities. This makes Clarens a very key component of the CAIGEE architecture. Factors motivating design choices: P2P technologies are widely used for creating ad-hoc and dynamic groups of resource providers (file sharing). While still fairly young, P2P has proven that it is capable of scaling to the size that the GAE aims for.
11. Framework (Clarens)
Authentication (X509) Access control on Web Services. Remote file access (with access control) Discovery of Web Services and Software Shell service. Shell like access to remote machines (managed by access control lists) Proxy certificate functionality Group management VO and role management Good performance of the Web Service Framework Integration with MonALISA 3rd party application Client Web server Service Clarens Clarens http/ https XML-RPC, SOAP. JavaRMI, JSON RPC, �..
12. MonALISA:Monitoring Agents using a Large Integrated Services Architecture
MonALISA able to dynamically register & discover Based on multi-threaded engine Very scalable Services are self describing Code updates Automatic & secure Dynamic config for services Secure Admin Interface Active filter agents Process data Application specific monitoring Mobile agents decision support global optimisations Fully distributed, no single point of failure!
Example: Sphinx Scheduling Service Flexible System: Client (request/job submission) Clarens Web Service Grid Clients Scheduling Service Clarens Web Service MonALISA Monitoring Repository Grid Resource MonALISA Monitoring Service Grid Services Grid Resource Grid Resource Grid Resource Grid Client Functions as like a Nerve Centre Data Warehouse Policies, Accounting, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc Applies Data Mining methods �?� Recommendation Engine Clarens WS Backbone MonALISA Monitoring Backbone
13. Example: Sphinx scheduler
14. (Physics) Analysis on the Grid
8 Client Application Discovery Planner/ Scheduler Monitor Information Policy Steering Catalogs Job Submission Storage Management Storage Management Execution 1 2 3 4 5 5 6 7 Dataset service 9 Data Transfer
15. GAE Related Projects
DISUN (deployment) Deployment and Support for Distributed Scientific Analysis Ultralight (development) Treating the network as resource �Vertically� Integrated Monitor Information Multi User, resource constraint view MCPS (development) Provide Clarens based Web Services for batch analysis (workflow) SPHINX (development) Policy based scheduling (global service) exposed a Clarens Web Service using MonALISA monitor information SRM/Dcache (development) Service based data transfer (local service) Lambda Station (development) Authorized programmability of routers using MonALISA & Clarens PHYSH Clarens based services for command line user analysis CRAB Client to support user analysis using Clarens Framework
16. Combining Grid Projects into Grid Analysis Environment
DISUN Ultralight MCPS Privilege Project Clarens_Applications MonALISA_Applications OSG SPHINX SRM/dCache Lambda Station Policy Grid Analysis Environment MonALISA, Clarens,�., Frameworks ��.. PHEDEX CRAB PHYSH Development Deployment Support/ Feedback Testing Condor GAE focuses on integration
19. UltraLight Testbed
Now to the Driving Application Work� ?
22. Current Deployment
Clarens has been deployed on ~30+ machines. Other sites: Caltech, Florida, Fermilab, CERN, Pakistan, INFN, UERJ, USP Multiple service instances have been deployed on several Clarens servers. Different sets of service instances are deployed on each server to mimic a realistic distributed service environment. Installation of CMS (ORCA, COBRA, IGUANA,�) and LCG (POOL, SEAL,�) software on Caltech GAE testbed. Serves as environment to integrate applications as web services into the Clarens framework. Work with CERN to have the GAE components included in CMS software distribution. GAE components being integrated in the DPE and VDT distribution used in US-CMS and the greater OSG community. Demonstrated distributed multi user GAE prototype at SC03 and SC04 PHEDEX deployed at Caltech, UFL, UCSD and transferring data UFL submitting analysis jobs with CRAB
23. UltraLight Plans
UltraLight envisions a 4 year program to deliver a new, high-performance, network-integrated infrastructure: Phase I will last 12 months and focus on deploying the initial network infrastructure and bringing up first services Phase II will last 18 months and concentrate on implementing all the needed services and extending the infrastructure to additional sites (We are entering this phase starting approximately this summer) Phase III will complete UltraLight and last 18 months. The focus will be on a transition to production in support of LHC Physics; + eVLBI Astronomy
24. Brazil Plans
Join OSG-ITB at UERJ and USP Basic installation initially, add more services (GUMS, Discovery) one by one Validate that CMS MOP jobs can run on the UERJ and USP Start deploying some CMS + Ultralight services Clarens Phedex BOSS Sphinx?
25. CMS MOP Jobs
26. OSG Grid Catalog
27. Lessons learned
Quality of (the) service(s) Lot of exception handling needed for robust services (gracefully failure of services) Time outs are important Need very good performance for composite services Discovery service enables location independent service composition. Semantics of services are important (different name, name space, and/or WSDL) Web service design: Not every application is developed with a web service interface in mind Interfaces of 3rd party applications change: Rapid Application Development Social engineering Finding out what people want/need Overlapping functionality of applications (but not same interfaces!) Not one single solution for CMS Not every problem has a technical solution, conventions also important

grid analysis environment and the ultralight project

grid analysis environment and the ultralight project

Presentation Transcript

UltraLight Overview

The Marketing Environment and Competitor Analysis

The MESSAGE Project Mobile Environmental Sensing System Across a Grid Environment

The UNICORE GRID Project

The Grid-Occam Project

Visualization in Grid Environment

Ultralight OCSP

The UltraLight Program

The UltraLight Program

GAE (Grid Analysis Environment) Overview of Caltech effort

ALICE Environment on the GRID

UltraLight: Overview and Status

The Project Environment

Grid Computing Environment Shell

The ARDA Project Prototypes for User Analysis on the GRID

Grid Programming Environment (GPE)

Project Environment Analysis

The CMS Grid Analysis Environment GAE (The CAIGEE Architecture)

Grid Computing Environment Shell

The UltraLight Program

Ultralight Bicycle Helmet

Ultralight