Chapter – 5.2 Static Process Scheduling

Chapter– 5.2 Static Process Scheduling Anjum Reyaz-Ahmed

Outline Part I : Static Process Scheduling Precedence process model Communication system model Part II: Current Literary Review "Optimizing Static Job Scheduling in a Network of Heterogeneous Computers," ICPP 2000 “Design Optimization of Time- and Cost- Constrained Fault-Tolerant Distribution Embedded Systems”, DATE 2005 “White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling”, DATE 2009 Part III: Future Research Initiatives

Static Process Scheduling Given a set of partially ordered tasks, define a mapping of processes to processors before the execution of the processes. Cost model: CPU cost and communication cost, both should be specified in prior. Minimize the overall finish time (makespan) on a non-preemptive multiprocessor system (of identical processors) Except for some very restricted cases, scheduling to optimize the makespan are NP-Complete Heuristic solution are usually proposed [Chow and Johnson 1997]

Precedence Process Model This model is used to describe scheduling for ‘program’ which consists of several sub-tasks. The schedulable unit is sub-tasks. Program is represented by a DAG. Precedence constraints among tasks in a program are explicitly specified. critical path: the longest execution path in the DAG, often used to compare the performance of a heuristic algorithm. [Chow and Johnson 1997]

Precedence Process and Communication System Models Communication overhead for A(P1) and E(P3)= 4 * 2 = 8 Execution time Communication overhead for one message No. of messagesto communicate [Chow and Johnson 1997]

contd.. Scheduling goal: minimize the makespan time. Algorithms: List Scheduling (LS): Communication overhead is not considered. Using a simple greedy heuristic: No processor remains idle if there are some tasks available that it could process. Extended List Scheduling (ELS): the actual scheduling results of LS with communication consideration. Earliest Task First scheduling (ETF): the earliest schedulable task (with communication delay considered) is scheduled first. [Chow and Johnson 1997]

Makespan Calculation for LS, ELS, and ETF [Chow and Johnson 1997]

Communication Process Model There are no precedence constrains among processes modeled by a undirected graph G, node represent processes and weight on the edge is the amount of communication messages between two connected processes. Process execution cost might be specified some times to handle more general cases. Scheduling goal: maximize the resource utilization. [Chow and Johnson 1997]

contd… the problem is to find an optimal assignment of m process to P processors with respect to the target function: P: a set of processors. ej(pi): computation cost of execution process pi in processor Pj. ci,j(pi,pj): communication overhead between processes piand pj. Assume a uniform communicating speed between processors. [Chow and Johnson 1997]

This is referred as Module Allocation problem. It is NP-complete except for a few cases: For P=2, Stone suggested an polynomial time solution using Ford-Fulkerson’s maximum flow algorithm. For some special graph topologies such as trees, Bokhari’salgorithm can be used. Known results: The mapping problem for an arbitrary number of processors is NP-complete. [Chow and Johnson 1997]

Stone’s two-processor model to achieve minimum total execution and communication cost • Example: • Partition the graph by drawing a line cutting through some edges • Result in two disjoint graphs, one for each process • Set of removed edges  cut set • Cost of cut set  sum of weights of the edges • Total inter-process communication cost between processors • Of course, the cost of cut sets is 0 if all processes are assigned to the same node • Computation constraints (no more k, distribute evenly…) • Example: • Maximum flow and minimum cut in a commodity-flow network • Find the maximum flow from source to destination [Chow and Johnson 1997]

Maximum Flow Algorithm in Solving the Scheduling Problem [Chow and Johnson 1997]

Minimum-Cost Cut Only the cuts that separate A and Bare feasible [Chow and Johnson 1997]

Generalized solution for more than two processor Stone uses a repetitive approach based on two-processor algorithm to solve n-processor problems. Treat (n-1) processors as one super processor The processors in the super-processor is further broken down based on the results from previous step. [Chow and Johnson 1997]

Other Heuristics Other heuristic: separate the optimization of computation and communication. Assume communication delay is more significant cost merge processes with higher interprocess interaction into cluster of processes clusters of processes are then assigned to the processor that minimizes the computation cost With reduced problem size, the optimal is relatively easier to solve (exhaust search) A simple heuristic: merge processes if communication costs is higher than a threshold C Also can put constrains on the total computation for the cluster, to prevent over clustering. [Chow and Johnson 1997]

Cluster of Processes • For C = 9, We get three clusters (2,4), (1,6 )and (3,5) • Clusters (2,4) and (1,6) must be mapped to processors A and B. • Cluster (3,5) can be assigned to A 0r B But assigned to A due to lower communication cost • Total Cost = 41 ( Computation cost = 17 on A and 14 on B Communication cost = 10) 6 (2,4)) (1,6)) 11 4 (3,5)) [Chow and Johnson 1997]

Part II Current Literary Review

Optimizing Static Job Scheduling in a Network of Heterogeneous Computers-----Xueyan Tang & Samuel T. Chanson-----IEEE 2000 Summary: Static job scheduling schemes in a network of computers with different speeds. Optimization techniques are proposed for workload allocation and job dispatching. The proposed job dispatching algorithm is an extension of the traditional round-robin scheme [Tang & Chanson 2000]

Optimization for Workload Allocation • a fraction αi of all the jobs are sent to computer ci • where [Tang & Chanson 2000] [200]

Simple Weighted Workload Allocation Amount of workload for each computer proportional to its processing speed All computers are equally utilized . Does not provide best performance [Tang & Chanson 2000]

Dynamic Least-Load Scheduling Beneficial to allocate a disproportional higher fraction of the workload to the more powerful computers. Assign new job to the machine with least normalized load it is known that jobs moved from a slow machine to a fast machine, decreases slow machine’s utilization decreases a lot whereas utilization of fast machine does not increase that much [Tang & Chanson 2000]

Optimizing Technique for Job Dispatching Random Based Job Dispatching Newly arrived job is scheduled to run on “randomly” selected computer Round-Robin Based Job Dispatching The objective here is to smooth inter-arrival intervals of consecutive jobs . For example suppose there are 4 computers c1, c2, c3 and c4 with workload fractions 1/8, 1/8, 1/4 and ½ respectively. Dispatching scheme - c4, c3, c4, c2, c4, c3, c4, c1, c4, c3, c4, c2, c4, c3, c4, c1, …… [Tang & Chanson 2000]

Summary The key idea of optimizing the workload allocation scheme it to send a disproportionately high fraction of workload to the most powerful computers. An analytical model is developed to derive the optimized allocation strategy mathematically For job dispatching an algorithm that extends round-robin to a general case is presented [Tang & Chanson 2000]

Design Optimization of Time- and Cost-Constrained Fault-Tolerent Distributed Embedded Systems---- V Izosimov, P Pop, P Eles & Z Peng-------DATE, IEEE 2005 Synopsis Re-execution and Replication are used for tolerating transient faults Processes are statically schedules and communication are performed using the time triggered protocol [Izosimov et al. 2005]

System Architecture • Each node has a CPU and communication controller running independently • Time Triggered Communication Protocol [Izosimov et al. 2005]

Fault-Tolerance Mechanisms • Re-execution • Active Replication [Izosimov et al. 2005]

Summary Addresses optimization of distributed embedded systems for fault tolerance Two fault-tolerance mechanism Re-execution – time redundancy Active replication – space redundancy [Izosimov et al. 2005]

White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling --- A Viehl, M Pressler, Oliver Bringmann---- DATE IEEE 2009 Synopsis A novel approach for the integration of cooperative and static non-preemptive scheduling in formal white box analysis presented [Viehl et al. 2009]

Future Research Initialtive Use AI techniques for Static Scheduling Genetic Algorithm Simulated Annealing

References: • Randy Chow & Theodore Johnson . “Distributed Operating Systems & Algorithms”. pp 156-163 Addison-Wesley 1997 • Xueyan Tang & Samuel T. Chanson. “ Optimizing Static Job Scheduling in a Network of Heterogeneous Computers”. pp 373- 382, icpp, IEEE 2000 • Viacheslav Izosimov, Paul Pop, Petru Else & Zebo Peng. “ Design Optimization of Time- and C0st-Constrained Fault Tolerant Distribution Embedded Systems”. Design Automation and Test in Europe (DATE), IEEE, 2005 • Alxander Viehl, Michael Pressler and Oliver Bringmann. “ White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling”. Design Automation and Test in Europe (DATE), IEEE, 2009

Thank you!!

Chapter – 5.2 Static Process Scheduling

Chapter – 5.2 Static Process Scheduling

Presentation Transcript