1 / 26

grid analysis environment and the ultralight project

Example: Sphinx scheduler (Physics) Analysis on the Grid. 8. Client ... SPHINX (development) Policy based scheduling (global service) exposed a Clarens Web ...

Jims
Download Presentation

grid analysis environment and the ultralight project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Grid Analysis Environment and the Ultralight Project

    2. Outline

    GAE Overview GAE System Ultralight UAE

    3. Grid Analysis Environment

    4. Goal

    Provide a transparent environment for a physicist to perform his/her analysis (batch/interactive) in a distributed dynamic environment: Identify your data (Catalogs), submit your (complex) job (Scheduling, Workflow,JDL), get “fair” access to resources (Priority, Accounting), monitor job progress (Monitor, Steering), get the results (Storage, Retrieval), repeat the process and refine results Support data transfers ranging from the (predictable) movement of large scale (simulated) data, to the highly dynamic analysis tasks initiated by rapidly changing teams of scientist

    Network Compute Storage

    5. System View

    Local Services (High Level Services) Global Services (Domain) Applications Monitoring Development Deployment Support/ Feedback Service Oriented Architecture (Frameworks) (Domain) Portal Testing Resources System Stages Interface Specifications!

    6. System View (Details)

    Domains Virtual Organization and Role management Service Oriented Architecture Authorized Access Access Control Management(groups/individuals) Discoverable Protocols (XML-RPC, SOAP,….) Service Version Management Frameworks: Clarens, MonALISA... Monitoring End-to-end monitoring,collecting and disseminating information Provide Visualization of Monitor Data to Users

    7. System View (Details)

    Local Services (Local View) Local Catalogs, Storage Systems, Task Tracking (Single User Tasks), Policies, Job Submission Global Services (Global View) Discovery Service, Global Catalogs, Job Tracking (Multiple User Tasks), Policies High Level Services (“Autonomous”) Acts on monitor data and has global view Scheduling, Data Transfer, Network Optimization, Tasks Tracking (many users)

    8. System View (Details)

    (Domain) Portal One Stop Shop for Applications/Users to access and Use Grid Resources Task Tracking (Single User Tasks) Graphical User Interface User session logging (provide feedback when failures occur) (Domain) Applications ORCA/COBRA, IGUANA, PHYSH,….

    10. Peer 2 Peer System

    Allow a “Peer-to-Peer” configuration to be built, with associated robustness and scalability features. Discovery of Services No Single point of failure Discover services Discover services Discover services This slide shows that there is no single instance of a service; services are distributed across many service providing sites. The services at one site can interact with services at another site. Peer to peer is used to provide dynamic service discovery in order to prevent a single point of failure, and to allow easy integration of new services and service providers into the system. The term "super peer" is used to describe the natural tendency for the system to direct more requests to peers that have more resource availability (disk space, bandwidth, computing nodes); that is, not all peers are created equal and the more resource-affluent peers will tend to handle more of the grid-wide load than the resource-limited peers. P2P capabilities are being added to the Clarens service servers, as an addition to the existing security and service hosting capabilities. This makes Clarens a very key component of the CAIGEE architecture. Factors motivating design choices: P2P technologies are widely used for creating ad-hoc and dynamic groups of resource providers (file sharing). While still fairly young, P2P has proven that it is capable of scaling to the size that the GAE aims for.This slide shows that there is no single instance of a service; services are distributed across many service providing sites. The services at one site can interact with services at another site. Peer to peer is used to provide dynamic service discovery in order to prevent a single point of failure, and to allow easy integration of new services and service providers into the system. The term "super peer" is used to describe the natural tendency for the system to direct more requests to peers that have more resource availability (disk space, bandwidth, computing nodes); that is, not all peers are created equal and the more resource-affluent peers will tend to handle more of the grid-wide load than the resource-limited peers. P2P capabilities are being added to the Clarens service servers, as an addition to the existing security and service hosting capabilities. This makes Clarens a very key component of the CAIGEE architecture. Factors motivating design choices: P2P technologies are widely used for creating ad-hoc and dynamic groups of resource providers (file sharing). While still fairly young, P2P has proven that it is capable of scaling to the size that the GAE aims for.

    11. Framework (Clarens)

    Authentication (X509) Access control on Web Services. Remote file access (with access control) Discovery of Web Services and Software Shell service. Shell like access to remote machines (managed by access control lists) Proxy certificate functionality Group management VO and role management Good performance of the Web Service Framework Integration with MonALISA 3rd party application Client Web server Service Clarens Clarens http/ https XML-RPC, SOAP. JavaRMI, JSON RPC, …..

    12. MonALISA: Monitoring Agents using a Large Integrated Services Architecture

    MonALISA able to dynamically register & discover Based on multi-threaded engine Very scalable Services are self describing Code updates Automatic & secure Dynamic config for services Secure Admin Interface Active filter agents Process data Application specific monitoring Mobile agents decision support global optimisations Fully distributed, no single point of failure!

    Example: Sphinx Scheduling Service Flexible System: Client (request/job submission) Clarens Web Service Grid Clients Scheduling Service Clarens Web Service MonALISA Monitoring Repository Grid Resource MonALISA Monitoring Service Grid Services Grid Resource Grid Resource Grid Resource Grid Client Functions as like a Nerve Centre Data Warehouse Policies, Accounting, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc Applies Data Mining methods “?” Recommendation Engine Clarens WS Backbone MonALISA Monitoring Backbone

    13. Example: Sphinx scheduler

    14. (Physics) Analysis on the Grid

    8 Client Application Discovery Planner/ Scheduler Monitor Information Policy Steering Catalogs Job Submission Storage Management Storage Management Execution 1 2 3 4 5 5 6 7 Dataset service 9 Data Transfer

    15. GAE Related Projects

    DISUN (deployment) Deployment and Support for Distributed Scientific Analysis Ultralight (development) Treating the network as resource “Vertically” Integrated Monitor Information Multi User, resource constraint view MCPS (development) Provide Clarens based Web Services for batch analysis (workflow) SPHINX (development) Policy based scheduling (global service) exposed a Clarens Web Service using MonALISA monitor information SRM/Dcache (development) Service based data transfer (local service) Lambda Station (development) Authorized programmability of routers using MonALISA & Clarens PHYSH Clarens based services for command line user analysis CRAB Client to support user analysis using Clarens Framework

    16. Combining Grid Projects into Grid Analysis Environment

    DISUN Ultralight MCPS Privilege Project Clarens_Applications MonALISA_Applications OSG SPHINX SRM/dCache Lambda Station Policy Grid Analysis Environment MonALISA, Clarens,…., Frameworks …….. PHEDEX CRAB PHYSH Development Deployment Support/ Feedback Testing Condor GAE focuses on integration

    19. UltraLight Testbed

    Now to the Driving Application Work… ?

    22. Current Deployment

    Clarens has been deployed on ~30+ machines. Other sites: Caltech, Florida, Fermilab, CERN, Pakistan, INFN, UERJ, USP Multiple service instances have been deployed on several Clarens servers. Different sets of service instances are deployed on each server to mimic a realistic distributed service environment. Installation of CMS (ORCA, COBRA, IGUANA,…) and LCG (POOL, SEAL,…) software on Caltech GAE testbed. Serves as environment to integrate applications as web services into the Clarens framework. Work with CERN to have the GAE components included in CMS software distribution. GAE components being integrated in the DPE and VDT distribution used in US-CMS and the greater OSG community. Demonstrated distributed multi user GAE prototype at SC03 and SC04 PHEDEX deployed at Caltech, UFL, UCSD and transferring data UFL submitting analysis jobs with CRAB

    23. UltraLight Plans

    UltraLight envisions a 4 year program to deliver a new, high-performance, network-integrated infrastructure: Phase I will last 12 months and focus on deploying the initial network infrastructure and bringing up first services Phase II will last 18 months and concentrate on implementing all the needed services and extending the infrastructure to additional sites (We are entering this phase starting approximately this summer) Phase III will complete UltraLight and last 18 months. The focus will be on a transition to production in support of LHC Physics; + eVLBI Astronomy

    24. Brazil Plans

    Join OSG-ITB at UERJ and USP Basic installation initially, add more services (GUMS, Discovery) one by one Validate that CMS MOP jobs can run on the UERJ and USP Start deploying some CMS + Ultralight services Clarens Phedex BOSS Sphinx?

    25. CMS MOP Jobs

    26. OSG Grid Catalog

    27. Lessons learned

    Quality of (the) service(s) Lot of exception handling needed for robust services (gracefully failure of services) Time outs are important Need very good performance for composite services Discovery service enables location independent service composition. Semantics of services are important (different name, name space, and/or WSDL) Web service design: Not every application is developed with a web service interface in mind Interfaces of 3rd party applications change: Rapid Application Development Social engineering Finding out what people want/need Overlapping functionality of applications (but not same interfaces!) Not one single solution for CMS Not every problem has a technical solution, conventions also important

More Related