1 / 36

A Framework for Collaborative Distributed Simulation over the Grid

A Framework for Collaborative Distributed Simulation over the Grid. Stephen John Turner Parallel & Distributed Computing Centre Nanyang Technological University Singapore. Outline. Background Distributed Simulation Grid Computing Motivation Research Challenges

Jeffrey
Download Presentation

A Framework for Collaborative Distributed Simulation over the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Collaborative Distributed Simulation over the Grid Stephen John Turner Parallel & Distributed Computing Centre Nanyang Technological University Singapore

  2. Outline • Background • Distributed Simulation • Grid Computing • Motivation • Research Challenges • HLA-based Distributed Simulation • Grid Services and Service Discovery • Load Management System • Grid Enabled HLA/RTI • Conclusions Brunel

  3. Distributed Simulation • Provides a way of linking simulation components (federates) of various types at possibly different locations to create a common virtual environment (federation) Brunel

  4. Example Application Areas • Battlefield Simulation • Linking different types of forces at multiple physical locations to create a realistic and complex virtual world • Supply Chain Simulation • Managing material and information flow, from manufacturers through distributors to customers • Air Traffic Control • Simulating airports and airspace sectors to provide faster than real-time simulation for “what-if” analysis • Multi-player Internet Games • Involving massive multi-player (~10,000) virtual world Brunel

  5. Federation SOM SOM SOM SOM SOM SOM SOM SOM FOM SOM HLA Rules (Federations) HLA Rules (Federates) SimulationSurrogates Passive Viewers Simulations Interface FED Run-Time Infrastructure (RTI) Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management High Level Architecture Brunel

  6. High Level Architecture • Features of High Level Architecture • Each federate has a simulation object model (SOM) defining the data to be shared with other federates allowing reuse in different federations • The federation (set of federates) has a common federation object model (FOM) • HLA supports distributed simulations linking the federates of a federation over a LAN or the Internet • Time Management can be used to ensure the correct ordering of events • HLA is an IEEE (1516) and OMG standard Brunel

  7. Grid Computing • Grid technology is the next step in the evolution of computing, enabling new forms of collaboration through the seamless sharing of distributed computing and data resources Communities can share geographically distributed resources for their common purpose Brunel

  8. Grid Computing Web Services Grid Services OGSA OGSI Globus Toolkit Brunel

  9. Motivation • Collaborative Simulation Development • The development of complex simulations usually requires collaborative effort from analysts with different domain knowledge and expertise, possibly at different locations • Sharing of Computing Resources • Simulation systems often require huge computing resources and the participants in the simulation and/or data sets required may also be geographically distributed Brunel

  10. Motivation • HLA-based Distributed Simulation on the Grid • HLA defines a standard for reuse and interoperability • Grid technologies enable collaboration and the use of distributed computing resources • Collaborative • Distributed • Complex & Multi-dimensional Brunel

  11. Resource Managem’t Semantic Interfaces Policies Workflow Security Service/ModelDiscovery Service/ModelComposition Execution Simulation Life Cycle Brunel

  12. Research Challenges • Service/Model Discovery • Based on requirements, “suitable” component models are selected to form an overall simulation • Research Issues • How are simulation models registered as grid services • How are simulation models discovered? • How are the interfaces defined? • Are the simulation models HLA compliant? • Do they conform to any standard reference models (e.g. HLA-CSPIF)? Brunel

  13. Research Challenges • Service/Model Composition • Checking semantic interoperability between individual component simulation models from different sources • Research Issues • Can the output of one simulation model feed into the input of another? • How is the work flow of the configuration described? • What are the mechanisms for verifying the correctness of the simulation? Brunel

  14. Research Challenges • Security • Simulation partners should be allowed to specify selective access to their simulation models • Research Issues • Does a user have access to a particular simulation model or data? • Can a user selectively share sensitive data with different partners? • Does the simulation model originate from a trusted partner? • Must the model be executed on a particular resource? Brunel

  15. Research Challenges • Execution • Simulation partners may obtain computing resources from the Grid to supplement their needs • Research Issues • How can the different simulation runs be partitioned onto the available computing resources? • What mechanisms should be used for scheduling and load management of simulations on the Grid? • What kind of fault tolerance mechanisms are required? Brunel

  16. Main work Security Service/ModelDiscovery Service/ModelComposition Execution Simulation Life Cycle Resource Managem’t Semantic Interfaces Policies Workflow Brunel

  17. RTI RTI RTI Model Factory Model Factory RTI RTI federate federate federate federate federate HLA-based Distributed Simulation • Discovery and Composition of Models • Discovery of Resources • Management of Simulation Execution Brunel

  18. 5 1 2 4 3 Grid Services and Service Discovery • Query Index Service for RTI Service handle for federation • Create RtiExec if necessary and get endpoint used by RtiExec • Query Index Service for Federate Factory Service handle • Create Federate Service and Federate Process • Federate Processes join federation Brunel

  19. 4a 5 4 3 Grid Services and Service Discovery • Query Index Service for Federate Factory Service handle • Create Federate Service and Federate Process • 4a.Federate Service can query Index Service for RtiExec endpoint • 5. Federate Processes join federation Brunel

  20. Load Management System (LMS) SimulationSurrogates Passive Viewers Simulations Interface SimulationSurrogates Interface Passive Viewers Simulations Run-Time Infrastructure (RTI) Interface Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management Run-Time Infrastructure (RTI) Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management Load Management System • Use Grid software for • Authentication, • Resource Discovery, Allocation & Monitoring, and • Facilitating Federate Migration Brunel

  21. Load Management System (LMS) Load Management System (LMS) Load Management System (LMS) federate federate federate federate federate federate federate federate federate federate federate federate federate federate High Speed Myrinet Switch Load Management System (LMS) Load Management System (LMS) Load Management System Resource Discovery Allocation & Monitoring Globus Run Time Infrastructure Brunel

  22. LMS Simulation Code FederateAmbassador LMClient RTIambassador federate RTI Shared Data SimKernel • Simulation code extended with two interfaces: • One for communicating with Runtime Infrastructure (RTI) • One for communicating with Load Management System (LMS) Brunel

  23. LM Sub- Model Sub- Model Sub- Model LMClient LMClient LMClient federate federate federate SIMKernel SIMKernel SIMKernel RTI map SimKernel Design Implementation Execution Brunel

  24. Federate • Each federate contains two threads: (SimKernel) and load management thread (LMClient) • SimKernel processes simulation events as defined by the user and communicates with RTI • LMClient works with Load Manager (LM) to perform federate migration • receive instruction from LM • stop SimKernel • get SimKernel execution state • transfer SimKernel configuration and execution state Brunel

  25. Load Manager • Load Manager • Constantly monitors and collects load information of each individual participating computing node • Runs load balancing algorithm to determine which federate should migrate from which host to which destination • Communicates with the LMClients at both the source and destination hosts until migration succeeds Brunel

  26. Migration Approaches • Federation wide synchronization federate federate federate Federation-Wide Save Federate Migration Federation-Wide Restore Costly Operation! Brunel

  27. federate federate federate Migration Approaches • Communication among federates: • Messages may be lost in transit during migration publish subscribe msg network resign join subscribe subscribe unsubscribe Brunel

  28. Our Approach • We developed an algorithm aiming to: • Provide transparent migration, and • Minimize the migration overhead • Run two instances of the migrating federate until event integrity is ensured • No synchronization or FTP communication is required • Implementation is specific to federates based on SimKernel Brunel

  29. Federate Migration migrating federate sendOutgoingEvents returnStatus resignFederationExec suspend missingMsg receivedInteraction flushQueueRequest receivedInteraction collect returnStatus LMClient @source Req_migrate migrationSucceeded notifyMissingMsg returnInformation returnInformation requestInformation RTI Load Manager joinFederation pub/sub Interaction flushQueueRequest receivedInteraction Req_migrate getMsgCount recvMsgCount LMClient @destination resume restore new restarting federate Latency period Brunel

  30. Experimental Results Brunel

  31. Resource RtiExec FedExec1…m Proxies… Grid Enabled HLA/RTI Client 1 Client 1 Grid Network … … Client n Client n Federation 1 Federation m Brunel

  32. Design Grid Services: indexing, discovery, resource management, monitoring services … Grid Services Globus Proxy Simulation Code Proxies & Federates Grid-enabled API HLA API Grid-enabled HLA API HLA API Globus RTI on LAN Globus Grid Network Client Resource Brunel

  33. Discussion • Advantages • Avoids some firewall issues as client communicates with proxy via grid services • Client application code can run on heterogenous platforms • Provides easy migration of client code, proxy does not need to be migrated • Disadvantages • Overhead of communication as all simulation events use grid services Brunel

  34. Conclusions • Work Done: • Developed a simple prototype using Globus for resource discovery, allocation and federate deployment (DS-RT ’02) • Developed SimKernel framework to allow modeler to concentrate on the simulation, rather than implementation (DS-RT ’03) • Developed a federate migration protocol without using federation synchronization (ICCS ’04) • Developed Grid Service and Service Discovery Framework (submitted to DS-RT ‘04) Brunel

  35. Conclusions • Future Work: • Service/model discovery • Service/model composition • Grid workflow languages • Grid enabled HLA/RTI • Performance measurement • Alternative communication mechanisms • Migration and fault tolerance • Integration of sub-projects • Convert to GT4 (WS-RF) Brunel

  36. Thank you for your attention! Questions & Answers While the HLA defines a standard for the construction of large-scale distributed simulations, Grid technologies enable collaboration and the use of distributed computing resources, while also facilitating access to geographically distributed data sets

More Related