1 / 26

Robust Resource Allocation of DAGs in a Heterogeneous Multi-core System

Luis Diego Briceño , Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell, Russ Wakefield, Abdulla Al-Qawasmeh, Ron C. Chiang, and Jiayin Li. outline motivation and introduction system model robustness example of heuristic results and conclusions.

shadi
Download Presentation

Robust Resource Allocation of DAGs in a Heterogeneous Multi-core System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell, Russ Wakefield, Abdulla Al-Qawasmeh, Ron C. Chiang, and Jiayin Li outline • motivation and introduction • system model • robustness • example of heuristic • results and conclusions Robust Resource Allocation of DAGs in a Heterogeneous Multi-core System Supported by the NSF under grants CNS-0615170 and CNS-0905399

  2. Motivation • need to execute applications on satellite data • satellite data is processed in a heterogeneous computing system • results are needed before a deadline deadline multi-core heterogeneous data processing system • satellite • data • app1 result • app2 applications ...

  3. Problem Statement t1,α • multiple applications (for this presentation consider one) • each application is a DAG of tasks • a set of applications must complete before a deadline Δ • completion time of an application must be robust against uncertainties in the estimated execution time of its tasks • actual time is data dependent • goal:robust resource allocation of data and tasks to heterogeneous multi-core system to meet deadlineΔ forapplications application α time t2,α t3,α t4,α t5,α t6,α Δ t7,α

  4. Environment • consider a heterogeneous environment used to analyze satellite imaging • based on commodity hardware • these environments require analysis of large data sets • environment similar to systems in use at • National Center for Atmospheric Research (NCAR) • DigitalGlobe • static resource allocation • estimated time to compute a task is known in advance

  5. Contributions • contributions • model and simulation of a complex multi-core-based data processing environment that executes data intensive applications • multi-core machines • RAM management • hard drive management • parallel tasks • satellite data placement • a robustness metric for this environment • resource allocation heuristics to maximize robustness using this metric

  6. System Model — Satellite Data Placement • satellite data is split into smaller subsets and distributed among the hard drives of the compute nodes multi-core heterogeneous data processing system • satellite • data • satellite • data • processing element (PE) is a core • PEj,k— PE k on compute node j (1 – 8 per node) • PEs within a compute node are homogeneous • no multi-tasking within a PE • HDj RAMj computenode j PEj,1 PEj,8 …

  7. System Model — Processing • tasks execute on processing elements (PEs) [if data on HDj] • required input data must be present in RAM to execute task satellite data at compute node j input data sets • ex. results task 1 • input data sets are staged to RAM • task 1 (t1) can start execution • result is stored in RAMj • RAM space is limited • HDj RAMj computenode j t1 PEj,1 PEj,8 …

  8. System Model — RAM Management • RAM has a fixed capacity • 160Gbytes (based on DigitalGlobe computer center) • assume 152Gbytes available for data • typical data set was from 1Gbyte to 32Gbytes • data sets can be swapped in and out of RAM if needed later • all input data sets must be in RAM before task execution • data sets must remain in RAM until execution is finished • must reserve space in RAM for result

  9. System Model — Storage • satellite data sets allocated prior to task execution • two scenarios for satellite data allocation • determined by the heuristic • randomly assigned (pre-determined) • inter-task data is transmitted if destination is not equal to source

  10. System Model — Applications • each application appα must complete before Δ • appαis divided into Tα tasks (tasks form a DAG) • each task requires satellite data sets or produced inter-task data sets • ti,αis the ith task in the application α • each task produces other data items (e.g.,data 7) • last task produces a result appα data 3 t1,α • t3,α • result • sat. data 1 data 2 • t2,α • data 7 • sat. data 4 • sat. data 6

  11. System Model — Computation Parallelism • 50% of tasks are parallelizable • only parallelizable on PEs in the same compute node • parallel time = sequential time / divider • parallel execution time is used to model different speed ups • two types of parallelizable tasks • 25% good parallel tasks • 25% average parallel tasks • divider values chosen arbitrarily for the simulation study

  12. Robustness — Three Questions • What behavior of the system makes it robust? • all applications finish before Δ • What uncertainties is the system robust against? • differencesbetween actual and estimated times • assume communications times are fixed • Quantitatively, exactly how robust is the system? • smallest common percentage increase (ρ) for all task execution times that causes the makespan > Δ • note: in a real system, the execution times of all tasks will not be increased by the same common percentage • ρis just a mathematical value used as a robustness measure

  13. Robustness — Example • assume 3 applications • blue (b, d, g, and h), green (a, e, and i), and pink(c and f) Δ Δ g′ i′ h′ makespan i g completion time completion time h f′ d′ e′ f e d a′ c′ a c b′ b PE2,1 PE1,1 PE2,1 PE3,1 PE1,1 PE3,1 makespan based on estimated task time makespan when task times = ρ∙ estimated task time

  14. Related Work • significant amount of research • assign a DAG to a heterogeneous computing system • several critical path heuristics • robustness in resource allocation • our research considers the robustness of theallocation in DAGs • two heuristics for minimization of makespan from literature were adapted to this paper • heuristics originally meant to minimize makespan • adapted heuristics can handle memory, satellite data placement, and robustness • Dynamic Available Tasks Critical Path (DATCP) heuristic • will be explained today

  15. Dynamic Available Tasks Critical Path (DATCP) critical path value average exec. time outline • calculate the critical path for each application • for each task, fromtexit to tentry • edge labels are average transfer time/byte betweenany two nodes ∙ data size • determine the maximum time from any successor (child) node to the texit(maxtime) • critical path value is the sum of task data and satellite data transfer times, maxtime, and average execution time of ti 37 3 8 6 7 26 23 27 5 4 6 5 8 3 5 17 14 7 3 5 4 6 6

  16. Dynamic Available Tasks Critical Path (DATCP) critical path value average exec. time outline • calculate the critical path for each application • dynamically create a list of all tasks available for mapping • determine the task with the longest critical path from the list of available tasks • task ti determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping • repeat steps (2)–(4) until all tasks are mapped 37 3 8 6 7 26 23 27 5 4 6 5 8 3 5 17 14 7 3 5 4 6 6

  17. Dynamic Available Tasks Critical Path (DATCP) critical path value average exec. time outline • calculate the critical path for each application • dynamically create a list of all tasks available for mapping • determine the task with the longest critical path from the list of available tasks • task ti determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping • repeat steps (2)–(4) until all tasks are mapped 37 3 8 6 7 26 23 27 5 4 6 5 8 3 5 17 14 7 3 5 4 6 6

  18. Dynamic Available Tasks Critical Path (DATCP) critical path value average exec. time outline • calculate the critical path for each application • dynamically create a list of all tasks available for mapping • determine the task with the longest critical path from the list of available tasks • task ti determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping • repeat steps (2)–(4) until all tasks are mapped 37 3 8 6 7 26 23 27 5 4 6 5 8 3 5 17 14 7 3 5 4 6 6

  19. Dynamic Available Tasks Critical Path (DATCP) critical path value average exec. time outline • calculate the critical path for each application • dynamically create a list of all tasks available for mapping • determine the task with the longest critical path from the list of available tasks • task ti determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping • repeat steps (2)–(4) until all tasks are mapped 37 3 8 6 7 26 23 27 5 4 6 5 8 3 5 17 14 7 3 5 4 6 6

  20. DATCP — Memory Management • determine available space in RAM • decide if the required task and the input data can be stored in RAM immediately • if there is not enough space • heuristic checks when the task's input data sets can be moved into memory • heuristic schedules task to start execution at that time • if incoming data is from another compute node • send it to destination compute node’s RAM • if there is no space in RAM then send to the HD

  21. DATCP — Parallelizable Tasks • two approaches are studied • no parallelization • “max” approach • heuristic always parallelizes across multiple PEs within a compute node • determine system robustness for each possible assignment • determine the node with the most PEs that have same maximum robustness • map the task to all PEs that have the same robustness value within this compute node

  22. DATCP — Satellite Data Placement • two methods • random placement • first time a satellite data set is required, that data set and the task that requires it are mapped • task is assigned to the PE that maximizes robustness • storage location of satellite data set has not been previously determined • satellite data set is stored in the HD of this PE's corresponding compute node

  23. Results DATCP 1: Max parallel with satellite mapping DATCP 2: Max parallel with random satellite mapping DATCP 3: no parallelism with random satellite mapping HRD 1: satellite data (SD) placement based on first task placement with duplication HRD 2: SD placement based on first task placement with no duplication HRD 3: SD placement based on reference count with no duplication HRD 4: random SD placement with duplication HRD 5: random SD placement and no duplication

  24. Plot of Makespan vs. Robustness

  25. Conclusions • derived a metric to measure the robustness • interdependency of tasks within applications complicate the derivation of a robustness metric • DATCP has highest average robustness values • initial ordering created by DATCP is much better than the order created by HRD • if DATCP order is used in HRD then the results of HRD are significantly improved • satellite data placement did not have any apparent effect on robustness

  26. Questions?

More Related