580 likes | 755 Views
A Scheduling Service Oriented Approach for Workflow Scheduling. by Conan Fan Li. Supervisor: Dr. Wendy MacCaull Committee member: Dr. Man Lin Committee member: Dr. Iker Gondra. A SSO approach for workflow scheduling. Initiatives
E N D
A Scheduling Service Oriented Approach for Workflow Scheduling by Conan Fan Li Supervisor: Dr. Wendy MacCaull Committee member: Dr. Man Lin Committee member: Dr. IkerGondra
A SSO approach for workflow scheduling • Initiatives • AIF project: building decision-support through dynamic workflow systems, academia and industry working together for better healthcare • Workflow engines are not naturally built for scheduling problems
What is a workflow? • A workflow or a workflow model is a depiction of a business process composed of a sequence of operations (tasks). • Tasks are connected in the form of a directed graph to provide an abstraction of the real work for further assessment.
Why workflow? • Abstraction & visualization • Clarity & consistency • Automation
What is scheduling? • The process of making decisions about the allocation of resources for a number of tasks to achieve one or more objectives? • Two main applications
Application of scheduling • Manufacturing (e.g., a car factory)
Application of scheduling Service industry (e.g., Gate Assignments at an Airport)
Basics of Scheduling • A scheduling problem can be described by a triplet α | β | γ. • α describes the machine environment • β describes the processing characteristics and constraints • γ describes the objective • The triplet Jm | prec | Cmaxdescribes a job shop scheduling problem with precedence constraints and an objective to minimize the makespan
The SSO approach • An approach that maps a scheduling problem onto a workflow model that consists of tasks with built-in services for scheduling.
Definitions • A schedule has unforced idleness if some machines idle when there are jobs waiting for processing. A schedule is non-delay if unforced idleness is prohibited. • Examples of possible objective functions to be minimized are: • Makespan: completion time of the last job to leave the system. • Maximum lateness: the worst violation of the due dates. • Total weighted completion time: the sum of the weighted completion times of the n jobs. • ... • A multi-instancetaskis a task that may have multiple distinct execution instances running concurrently within the same workflow case.
Definitions • We say a job is waiting when it is not assigned to any machine or finished. • We say a job is independent when it does not have a precedence constraint or the precedence constraint is satisfied (i.e., the preceding job is finished). • We say a job is machine-ready when its required machine is free. • We say a job is enabled when it is waiting, independent and machine-ready.
Definitions • Schedule-flow is a framework that has a predisposition to model scheduling processes in workflow. Schedule-flow patterns are an extension to the workflow language formalism - Workflow Patterns. By assembling and modifying the existing workflow patterns, schedule-flow introduces new patterns that carries particular responsibilities and services in scheduling systems.
Definitions • Schedule-flow is a framework that has a predisposition to model scheduling processes in workflow. It is an extension to the workflow language formalism - Workflow Patterns. By assembling and modifying the existing workflow patterns, schedule-flow introduces new patterns that carries particular responsibilities and services in scheduling systems. • A* search uses a distance-plus-cost evaluation function (f(x)) to determine the order in which the search visits nodes in the fringe. The distance-plus-cost heuristic is a sum of two functions: • the cost function, which is the cost from the starting node to the current node (usually denoted g(x)) • and an "heuristic estimate" of the distance to the goal (h(x)).
Definitions • We say an event e=(m,j,start,end) is a future event of schedule S if e.start >= S.clock
Relation • Scheduling is a ___ and a workflow is to represent a ___. • Why use workflow to model scheduling? • Workflow is concise, comprehensive and high-level • Scheduling is diverse, technical and low-level • We want to visualize the scheduling process Workflow has a wider range of audience than scheduling does. That is why we need to bridge them.
Attempt • Case: a simple job shop scheduling problem • Objective: minimize makespan(Cmax)
Attempt Messy What you see is messy What you do not see is messier
Attempt 2. multi-instance tasks are confusing and high-maintenance 3. Need to incorporate smart choices We may choose Job0 to process first. Why does this seem like a smart choice?
Attempt Finish Job0 Clock+=7 Job0.waiting=False
Attempt 4. Options for unforced idleness We want to assign as many jobs as possible before processing
Attempt We may choose Job4 to process
Attempt Finish job1 Clock=7+9=16 Job1.waiting=False (Note, job4 has been processed for 9 time units)
Attempt No other jobs are available to process except Job4
Attempt Finish Job4 Clock=16+(15-9)=22 Job4.waiting=False
Attempt Assign job2
Attempt clock attribute of a schedule • … we get this: (57, {<0,0,0,7>, <0,4,7,22>, <0,2,22,43>, <1,1,7,16>, <1,5,22,52>, <1,3,52,57>}) An event: <machine, job,start,end>
Attempt • … we get this: (57, {<0,0,0,7>, <0,4,7,22>, <0,2,22,43>, <1,1,7,16>, <1,5,22,52>, <1,3,52,57>}) Gantt chart Forced idleness
Problems • The size issue • The size of the resulting workflow grows in accordance to the size of the problem (number of jobs and number of machines). There are also too many variables to configure inside the workflow. This is neither concise or comprehensive. • The complication of multi-instance tasks • Users may not understand when to use them • The lack of heuristic incorporation • When several jobs are presented to a machine, there should be a mechanism to decide which one appears to be the best option
Problems • The lack of options for unforced idleness • There should be an easy way of expressing that we do not want to allow unforced idleness, that is, when a machine is free, we always try to assign a job to it if possible. • The lack of comparison and sorting • We do not want the workflow to stop as soon as it finds one feasible schedule. Instead, we need it to compare all the schedules and present the best one. There should be a mechanism to easily compare and sort the schedules.
Proposal • The current workflow components are clearly not sufficient for constructing sophisticated schedulers. Therefore, we need a set of new workflow patterns to provide the services we need in scheduling. We call the extension schedule-flow patterns.
Schedule-flow • Present the data (jobs and machines) in a single file instead of mapping each one of them to a task in the workflow. This way, the size of the workflow will not be proportional to the size of the scheduling problem. More importantly, the same workflow can now work with different sets of data. • Eliminate the usage of multi-instance tasks. Instead, we use a data structure called “fringe”(a collection of ideas, see A*) which is implemented as a priority queue. Different execution instances (schedules) will be evaluated first and then pushed into the fringe. • Heuristics may be given by users regarding the preference of assignment. For example, we may want to assign the jobs with the least processing times first (SPT).
Schedule-flow • By default, we do not allow unforced idleness. We would like to keep the machines as busy as possible. • “For many the models that have regular objective functions, there are optimal schedules that are non-delay” • The fringe provides options for its priority rule, which is the order that the schedules are sorted.
Schedule-flow For the same problem, use schedule-flow: Built-in variables: fringe, current
Schedule-flow task components • Creation • Does nothing • Pop • current=fringe.pop() • Selection • By default, select jobs that are enabled (or independent, machine-ready…) • Allocation • Generate schedules by assigning jobs to corresponding machines • Process • Generate one schedule by advancing time until any job is finished • Push • fringe.push(schedules)
Schedule-flow condition components • Exist • Tests if the input exists • Continue • By default, tests if the fringe is empty. Other options may be used such as limiting the execution time or the number of schedules generated…
Use Schedule-flow in practice • Design a schedule-flow with a graphic editor • We used YAWL in this case.
Use Schedule-flow in practice <task id="POP_93"> <name>POP</name> <flowsInto> <nextElementRef id="SELECT_87" /> </flowsInto> <join code="xor" /> <split code="and" /> </task> • Provide the files • Job file (e.g., “jobs.csv”) • Machine file • Schedule-flow XML file generate by the graphic editor • Parse the schedule-flow file and detect schedule-flow components by matching names. • We used a Python script to parse the YAWL’s XML file.
Use Schedule-flow in practice • Configure the components, objective and heuristic • In this case, every component uses the default setting • Objective is set to minimize the makespan. Therefore, the cost function g(S)=S.clock • Heuristic is initially 0 (f(x)=g(x), see A*). For this case, we set the heuristic function to return the processing time of the future event (see future event) with the earliest end time • Bind: automatically connect the components according the schedule-flow • Run the schedule-flow
Schedule-flow simulation S0 is the initial schedule where no jobs are assigned. S0={ }
Schedule-flow simulation S1: (0,{<0,0,0,7>}) S2:(0,{<0,2,0,21>}) S3:(0,{<0,4,0,15>})
Schedule-flow simulation f(S1)=0+7=7 f(S3)=0+15 f(S2)=0+21 S1: (0,{<0,0,0,7>}) S3:(0,{<0,4,0,15>}) S2:(0,{<0,2,0,21>})