1 / 12

Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project

Young Suk Moon. Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project. Urban Water Distribution Systems. Supplying water Pipe networks Redundant flow paths Millions of pipes. http://www.crwr.utexas.edu/gis/gishydro03/Classroom/trmproj/Garcia-Fresca/UrbanRecharge.htm.

ciara
Download Presentation

Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Young Suk Moon Dynamic Fault Tolerant Grid Workflowin the Water Threat Management Project

  2. Urban Water Distribution Systems • Supplying water • Pipe networks • Redundant flow paths • Millions of pipes • http://www.crwr.utexas.edu/gis/gishydro03/Classroom/trmproj/Garcia-Fresca/UrbanRecharge.htm

  3. Water Threat Management Project • Analyzing contaminations of water in WDSs • EPANET simulation (developed at Environmental Protection Agency)‏ find the optimal solution find the contaminant source Simulation Engine (MPI)‏ Sensor Data Optimization Engine Middle Ware EPANET EPANET EPANET EPANET Grid Resources • From my presentation slides in the project/thesis seminar class

  4. Project Requirements for WTM • Changing the MPI system to a loosely coupled system • Parallel execution of EPANET • Large number of evaluations • Integrate fault tolerance

  5. Fault Tolerant Strategies • Replication • Run the same job on multiple machines concurrently • Fast, less reliable, needs enough resources • Checkpoint-restart • Store current computing states periodically • Slow (checkpoint overhead), more reliable

  6. Fault Tolerant Strategies in the project • Replication • A job (replica) is submitted to multiple nodes to run the same jobs concurrently • Multiple queuing • A set of jobs is submitted to multiple nodes with different (job) orders to run different jobs concurrently j1 j1 j1 j2 j1 j3 j2 j1 j3 Machines

  7. System Design • Figure from my pre-proposal

  8. Model Description • J = {j1, j2, j3, ... ,jn} • Q = {q1, q2, q3, ... ,qm} • R = {r1, r2, r3, ... ,rl}

  9. Model Description • Mapping jobs to queues • ft : J → Q • ft (j) = { j ∈ J | ∀q, ∃j, ft (j) = q ∈ Q} • Mapping queues to available resources • g t : Q → P (R)‏ • g t (q) = {q ∈ Q | ∀q, ∃Ra , g t (q) = Ra ∈ P (R)} • Mapping a queue to a resource • h t : (Q, Ra ) → Q × Ra • h t (q, r) = {q ∈ Q, r ∈ Rat | (q, r) ∈ Q × Rat }

  10. An example of Dynamic Fault Tolerance Selection Algorithm na: number of nodes that are available nr : number of jobs that can be run in parallel while (there is any job remaining)‏ na← check resource availability nr← check job parallelism if nr < na < 2nr then do partial replication and partial queuing else if na≥ 2nr then do full replication else do queuing end while

  11. Resource Selection • A number of ways to choose resources • Minimization functions related to • Performance of resources • Temperature of resources • laplace’s equation

  12. References • G. von Laszewski, K. Mahinthakumar, R. Ranjithan, D. Brill, J. Uber, K. Harrison, S. Sreepathi, and E. Zechman, “An Adaptive Cyberinfrastructure for Threat Management in Urban Water Distribution Systems,” in Proceedings of ICCS 2006, vol. 3993, 2006, pp. 401–. • S. Sreepathi, “Cyberinfrastructure for Contamination Source Characterization in Water Distribution Systems,” Master’s thesis, North Carolina State University, 2006 • L. Ramakrishnam and D. A. Reed, “Performability modeling for scheduling and fault tolerance strategies for scientific workflows,” in Proceedings of the 17th international symposium on High performance distributed computing, Boston, MA, USA: ACM, June 2008, pp. 23-34 • G. Wrzesiska, R. V. V. Nieuwpoort, J. Maassen, T. Kielmann, and H. E. Bal, “Fault-tolerant Scheduling of Fine-grained Tasks in Grid Environments,” in International Journal of High Performance Applications, vol. 20, no. 1, February 2006, pp. 103-114. • “Laplace’s Equation” http://mathworld.wolfram.com/LaplacesEquation.html

More Related