2011 4th International Conference on Biomedical Engineering and Informatics (BMEI)
A Recursive Algorithm for Scheduling of Tasks in a Heterogeneous Distributed
Environment
Yan Kang Ying Lin
Department of Software Engineering Department of Information Security
School of Software, Yunnan University School of Software, Yunnan University
Kunming, Yunnan, China Kunming, Yunnan, China
Abstract—Optimal scheduling of parallel tasks with some Because of its key importance on performance, this problem
precedence is critical for achieving high performance in has been extensively studied and various algorithms have been
heterogeneous computing system. The application scheduling is proposed in the literature[3-17] which are mainly for systems
known to be NP-complete in general cases. The complexity of the with homogeneous processors. These heuristics are classified
problem increase when task scheduling is to be done in a into a variety of categories such as list scheduling algorithms,
heterogeneous environment. This paper presents a recursive task clustering algorithms[7-9], Genetic algorithms[10-14] and task
scheduling algorithm for a bounded number of heterogeneous duplication based algorithms[15-17]. In list scheduling
processors run on the network of Heterogeneous systems. It is algorithms[4-6], the tasks in a list are assumed priorities and
three-phase task scheduling algorithm. The task-prioritizing
are assigned to the different processors based on descending
phase is to is to compute the upward rank of each task and assign
the priority to all tasks. The processor selection phase is to
order of priorities. List scheduling algorithms are generally
schedule the tasks onto the processors that give the latest start preferred since they generate good quality schedules with less
time for the task. The moving phase is to move all the possible complexity. Several variant list scheduling algorithms have
tasks until the starting time of the entry task is zero. The been proposed to deal with heterogeneous system, for example
performance of the algorithm is illustrated by comparing the Mapping Heuristic (MH)[4], Levelized-MinTime (LMT) [5],
scheduling length ratio, frequency of best results with the existing Dynamic-Level Scheduling (DLS), Heterogeneous Earliest
effectively scheduling algorithms, Heterogeneous Earliest Finish Finish Time (HEFT) [6] and Critical Path On a processor
Time and Iterative List Scheduling algorithm. (CPOP). The HEFT algorithm significantly outperforms the
DLS algorithm, MH, LMT and CPOP algorithm in terms of
Keywords-scheduling algorithm; heterogeneous; recursive; average schedule length ratio, speedup, etc.
distributed system
Although there a few algorithms in the literature for
I. INTRODUCTION heterogeneous processors, we present a recursive heuristic
scheduling algorithms for a bounded number of heterogeneous
Heterogeneous environment, diverse sets of resources
processors with an objective to simultaneously meet high
interconnected with a high-speed network, makes a cost-
performance and fast scheduling time. The recursive algorithm
effective utilization of underlying parallelism for scientific and
selects the task with the so-called highest downward rank value
commercial applications like weather prediction, image
at each step, assigns the selected task to the processor which
processing, high-definition television, real-time and distributed
maximizes its latest finish time, and finally moves all task
database systems. The efficient scheduling of the tasks of an
towards left as soon as possible. The recursive algorithm
application on the available resources is critical for achieving
schedules the task onto the processors from the exit task to
high performance.
entry task, while the normal list scheduling algorithm like
The general task scheduling problem includes the problem HEFT, CPOP, LMT, etc. schedule the tasks onto the processors
of assigning the tasks of an application to suitable processors from the entry task to exit task. The algorithm is three-phase
and the problem of ordering task executions on each resources. task scheduling algorithm for a bounded number of
The problem of optimal scheduling of tasks with required heterogeneous processors. Recursive algorithm has high
precedence relationship, in the most general case, has been performance in terms of both performance metrics (schedule
proven to be NP-complete[1,2], and optimal solutions can be length ratio, speedup, efficiency and frequency of best results)
achieved if adequate time is available for an exhaustive search. and a cost metric (scheduling time).
And then, many heuristics have been proposed for giving an
approximate optimization in polynomial time. The II. TASK SCHEDULING PROBLEM
characteristics of an application represented by a directed A scheduling system model consists of an application, a
acyclic graph (DAG) in which the nodes represent application target computing system and a performance criteria for
tasks and edges represent inter-task data dependencies. The scheduling. An application program is represented by a
objective of task scheduling is to map the tasks on the Directed Acyclic Graph (DAG), G =(V,E), where V is the set
processors and order their execution so that task precedence of v tasks nodes, and E is the set of e directed communication
requirements are satisfied and a minimum overall completion edges between the tasks. Each edge ei,j represents the
time is obtained. precedence constraint that vj cannot be scheduled until task vi
has been completed, hence vi is a predecessor of vj and vj is a
978-1-4244-9352-4/11/$26.00 ©2011 IEEE 2099
successor of vi. In a given task graph, a task without any parent algorithm in terms of average schedule length ratio, speedup
is called an entry task and a task without any child is called exit and so on.
task. Without loss of generality, it is assumed that there is one
entry task to the DAG and one exit task from the DAG. If there IV. A RECURSIVE SCHEDULING ALGORITHM
are more than one exit (entry) task, they are connected to a We noted that the above list scheduling algorithms just
pseudo-exit (pseudo-entry) task with zero computation time consider the local optimal solution by scheduling the current
and communication time. A heterogeneous computing system task onto processor that gives the earliest finish time for the
P consists of a set of p independent different types of current task. In this way, the list scheduling algorithm cannot
processors which are assumed to be fully interconnected by an obtain the optimal solution for some special scheduling
arbitrary network. The estimated execution cost time wi,j to problems. Fig.1. shows a DAG with four tasks and 4 edges.
complete task vi on processor pi may be different on different There are two processors available in the heterogeneous
processor depending on the processors computational computing system. Table 1 shows the computation time of each
capability. task on every processor. For simplicity, we assume
D is a nxn matrix of communication data, where di,j is the homogeneous communication and the communication times
amount of data required to be transmitted from task vi to task vj. are as labeled on the edges in Fig. 1. Table 2 shows the start
The communication cost of edge ei,k which is for transferring time and finish time of all the tasks that are obtained by the
data from task vi (scheduled on processor pm) to task vk HEFT algorithm. The optimal schedule length is 48 which is
(scheduled on processor pn) is less than the schedule length obtained by HEFT algorithm is
59. And the optimal schedule length cannot be obtained by
, changing the order of the tasks.
, ,
(1)
where rm,n is the link communication speed between two
processors pm and processor pn, depends on the channel
initialization at both sender processor pm and receiver processor
pn in addition to the communication time on the channel. This
is a dominant factor and can be assumed to be independent of
the source and destination processors. In this study, the channel
initialization time is assumed to be negligible. Otherwise, di,k = Fig 1. A sample task graph with 4 tasks
0 when both the tasks vi and vk are scheduled on the same
TABLE 1 Computation time of Fig 1
processor. Further, for illustration, we assumed that the data
transfer rate for each link is 1.0 and hence communication cost V 1 2 3 4
and amount of data to be transferred will be the same. P1 16 13 11 8
The objective function of the task-scheduling problem is to P2 9 19 19 17
schedule the tasks of an application to processors such that its
TABLE 2 Schedule of Fig1. by HEFT algorithm
schedule length is minimized.
V 1 2 3 4
III. HEFT SCHEDULING ALGORITHM
P1 21-32 47-55
Heterogeneous earliest finish time (HEFT) algorithm: It
is two-phase task scheduling algorithm for a bounded number P2 0-9 9-28
of heterogeneous processors. The first phase namely, task- TABLE 3 Optimal Schedule of Fig1.
prioritizing phase is to assign the priority to all tasks. To assign
priority, the upward rank of each task is computed. The upward V 1 2 3 4
rank of a task is the critical path of that task, which is the P1 0-16 16-29 29-40 40-48
highest sum of communication time and average execution
P2
time starting from that task to exit task. Based on upward rank
priority will be assigned to each task. The second phase First, we present a recursive algorithm which schedules the
(processor selection phase) is to schedule the tasks onto the task onto the processors from the exit task to entry task, while
processors that give the earliest finish time for the task. It uses the list scheduling algorithm like HEFT, CPOP, LMT, etc.
an insertion based policy which considers the possible insertion normally schedule the tasks onto the processors from the entry
of a task in an earliest idle time slot between two already task to exit task. The recursive algorithm orders the tasks by
scheduled tasks on a processor, should be at least capable of computing the downward rank of each task, while the other list
computation cost of the task to be scheduled and also algorithms normally order the tasks by computing the upward
scheduling on this idle time slot should preserve precedence rank of each task. It is three-phase task scheduling algorithm
constraints. The time complexity of HEFT algorithm is equal to for a bounded number of heterogeneous processors.
O (v2 x p) where v is the number of tasks in a dense graph and p
is the number of processors. HEFT algorithm significantly Task prioritizing phase:
outperforms dynamic-level scheduling (DLS) algorithm,
The first phase namely, task-prioritizing phase is to assign
mapping heuristic (MH) and levelized-min time (LMT)
the priority to all tasks based on upward rank priority. To
assign priority, the upward rank of each task is computed as the
2100
critical path of that task, which is the highest sum of scheduled, the schedule length (i.e. the overall completion
communication time weight and execution time weight starting time) will be the actual finish time of the exit task.
from entry task to exit task. The priority of task vi is
Moving phase:
, (2) The third phase (moving phase) of our recursive algorithm
is to moving tasks towards left until the start time of entry task
while the priority of task vi using HEFT strategy is is zero. Actual start time and actual finish time of each task is
given as
, (3)
, , , (9)
where prec(vi) is the set of immediate predecessors of task vi,
succ(vi) is the set of immediate successors of task vi, wi, the , , , (10)
time-weight of task vi is the average computation cost of task
vi The procedure for the recursive scheduling algorithm is
given as follows:
∑ ,
(4) Compute the priorities of the tasks with Eq. (3) downward
from the entry task
Processor selection phase: Sort the tasks into a scheduling list by non-increasing order
The second phase (processor selection phase) of our of priority values
recursive algorithm is to schedule the tasks vi onto processor pj While the scheduling list is not empty do
that gives the latest start time for the task, while the second begin
Remove the first task vi from the scheduling list
phase (processor selection phase) of HEFT algorithm is to
Assign task vi to the processor that maximize LST of vi
schedule the tasks onto the processors that give the earliest
end
finish time for the task. LST(vi,pj) and LFT(vi,pj) are the Latest
Moving all tasks towards left as soon as possible
Start Time and Latest Finish Time of task vi, on pj,
Return the current schedule length.
respectively. For the exit task v0, LFT(vn,pj) = deadline and for
the other tasks in the graph, the LST and LFT values are Our algorithm assigns the selected task to the processor
computed recursively, starting from the exit task, as shown in which maximizes its latest finish time from the exit task to
(5) and (6). In order to compute the LFT and EST of a task vi, entry task.
all immediate successor tasks of vi must have been scheduled.
V. PERFORMANCE ANALYSES AND DISCUSSION
, , , , ,
Fig.2. shows a DAG with ten tasks and 15 edges. There are
(5)
three processors available in the heterogeneous computing
, , , (6) system. Table 1 shows the computation time of each task on
every processor. For simplicity, we assume homogeneous
where vk is the set of immediate successor tasks of task , and communication and the communication times are as labeled on
AvailS(vi,pj) is the latest start time that processor pj completed the edges in Fig. 2. Table 2 shows the start time and finish
the execution of the last assigned task. time of all the tasks that are obtained by the HEFT algorithm
HEFT algorithm is to schedule the tasks onto the processors and the recursive algorithm. The schedule length obtained by
that give the earliest finish time for the task. EST(vi,pj) and recursive algorithm is 84 which is less than the schedule
EFT(vi,pj) are the Earliest Start Time and Earliest Finish Time length obtained by HEFT algorithm is 106.
of task vi, on pj, respectively. For the entry task v0, EST(v0,pj) =
0 and for the other tasks in the graph, the EST and EFT values
are computed recursively, starting from the entry task, as
shown in (7) and (8). In order to compute the EFT and EST of a
task vi, all immediate predecessor tasks of vi must have been
scheduled.
, , , , ,
(7)
, , , (8)
where pred(vi) is the set of immediate predecessor tasks of task
and Avail(vi,pj) is the earliest time that processor pj
completed the execution of the last assigned task, or the idle
slot between the assigned tasks with an insertion-based
scheduling policy. The inner max block in the EST equation
returns the ready time, i.e., the time when all the data needed
by vi has arrived at processor pj. After all tasks in a graph are
Fig 2. Exmple of a task graph with 10 tasks
2101
(processor selection phase) is to schedule the tasks onto the
TABLE 1 COMMUNICATION AND COMPUTATION TIME OF FIG 1 processors that give the latest start time for the task. The third
phase (moving phase) is to move all tasks towards left as soon
V 1 2 3 4 5 6 7 8 9 10 as possible. The time complexity of recursive algorithm is
P1 14 13 11 13 12 13 7 5 18 21 equal to O (v 2 x p) where v is the number of tasks in a dense
P2 16 19 13 8 13 16 15 11 12 7 graph and p is the number of processors. We observe the
P3 9 18 19 17 10 9 11 14 20 16 percentage of cases that result in a shorten schedule length and
the average improvement ratio with randomly generated task
TABLE 2 SCHEDULE OF FIG1. BY HEFT ALGORITHM graphs under various parameters and two real applications. It is
observed that when the task communication cost over
V 1 2 3 4 5 6 7 8 9 10
computation cost ration is small, the recursive algorithm does
P1 21- 32- 53- not perform well; but when the task communication cost over
32 39 58 computation cost ration is greater than the special threshold, an
P2 18- 26- 48- 99- improvement in the final schedule is obtained in most cases
26 39 60 106
that were simulated. And it is also observed that the percentage
P3 0-9 9- 27-
of final schedule length is less than the initial one and the
27 36
average improvement ratio are both sensitive to the graph
TABLE 3 SCHEDULE OF FIG1. BY THE RECURISVE ALGORITHM structure and the initial one. Generally, recursive algorithm
makes larger improvements on the initial one with long
V 1 2 3 4 5 6 7 8 9 10
schedule, and the algorithm is more effectively with the graph
P1 34- 18- 39- 66- structure is more flexible.
47 31 54 77
P2 26- 54- 77- ACKNOWLEDGMENT
39 66 84
P3 0-9 31- 22-
This work is supported by National Natural Science
41 31 Foundation of China (Grant No. 60763008), and “CDIO-based
software system modeling and design research and
In fact, this strategy can corporate with other scheduling implementation”(Grant No. Rj14) .
algorithm to decrease the schedule length obtained by the other
algorithms.
We present the comparative evaluation of proposed REFERENCES
recursive algorithm and the existing algorithms for [1] Graham, R.L., L.E. Lawler, J.K. Lenstra and A.H. Kan, “Optimization
heterogeneous system such as IIS, HEFT and CPOP for DAGs and approximation in deterministic sequencing and scheduling: A
with various characteristics by simulation. For this purpose, we survey.” Ann. Discrete Math., pp. 287-326, 1979.
consider two sets of graphs as the workload for testing the [2] Cassavant. T. and J.A. Kuhl, “Taxonomy of scheduling in general
algorithms: randomly generated task graphs and the graphs that purpose distributed memory systems.” IEEE Trans. Software Engg., vol.
represent some of numerical real world problems. We have 14, pp. 141-154, 1988.
used Intel Xeon processors with 1 GHz speed for our [3] Hui, C.C. and S.T. Chanson, “Allocating task interaction graphs to
processors in heterogeneous networks.” IEEE Trans. Parallel and
experiments. Distributed Systems, vol. 8, pp. 908-926, 1997.
The performances and cost of the algorithms were [4] EI-Rewini, H. and T.G. Lewis, Scheduling parallel program tasks onto
compared with respect to set of experiments with various graph arbitrary target machines. J. Parallel and Distributed Computing, vol. 9,
pp. 138-153, 1990.
characteristics. We investigate how the various parameters of
the algorithm will impact the degree to which the schedules are [5] Iverson, M., F. Ozguner and G. Follen, “Parallelizing existing
applications in a distributed heterogeneous environments.” Proc.
improved through the recursive strategy. Heterogeneous Computing Workshop, pp: 93-100, 1995.
In addition to randomly generated DAGs, we also ran the [6] Topcuoglu, H., S. Hariri and M.Y. Wu, “Performance effective and
low-complexity task scheduling for heterogeneous computing.” IEEE
iterative algorithm on two real-world problems: DSP [18] and Trans. on Parallel and Distributed Systems, vol. 13( 3), 2002.
Gaussian elimination [19].The experiments show that average
[7] Kafil, M. and I. Ahmed, “Optimal task assignment in heterogeneous
improvement ratio increases with the percentage of improved distributed computing systems.” IEEE Concurrency,vol. 6, pp. 42-51,
cases when communication cost over execution time ratio is 1998.
large. The results show that the recursive algorithm is more [8] Ranaweera, A. and D.P. Agrawal, “A task duplication based algorithm
effective when task number over processor number ratio is not for heterogeneous systems.” Proc. IPDPS, pp. 445-450, 2000.
small or large. [9] Cristina Boeres, Jos´e Viterbo Filho and Vinod E. F. Rebello, “A
cluster-based strategy for scheduling task on heterogeneous processors.”
VI. CONCLUSION Proc. 16th Symp. on Computer Architecture and High Performance
Computing (SBAC-PAD), 2004.
In this paper the recursive algorithm for the heterogeneous [10] Wang, L., H.J. Siegel, V.P. Rowchoudhry and A.A. Maciejewski, “Task
distributed computing systems is proposed and studied. We use matching and scheduling in heterogeneous computing environments
the three-phase recursive policy. The first phase namely, task- using a genetic algorithm-based approach.” J. Parallel and Distributed
prioritizing phase, is to compute downward rank of each task, Computing, vol. 47, pp. 8-22, 1997.
and to assign the priority to all tasks. The second phase [11] Dhodhi, M.K., I. Ahmad, A. Yatama, “ An integrated technique for task
matching and scheduling onto distributed heterogeneous computing
2102
systems.” J. Parallel and Distributed Computing, vol. 62, pp. 1338-1361, [16] Basker, S. and SaiRanga, P.C. “Scheduling directed a-cyclic task graphs
2002. on heterogeneous network of workstations to minimize schedule length.
[12] Kim, S.C. and S. Lee, “ Push-pull: Guided search DAG scheduling for “Proc. ICPPW, 2003.
heterogeneous clusters.” Proc. Intl. Conf. Parallel Processing (ICPP’05), [17] Bajaj, R. and Agrawal, D.P. “Improving scheduling of tasks in a
2005. heterogeneous environments.” IEEE Trans. on Parallel and Distributed
[13] Annie, S.W., H. Yu, S. Jin, K.-C. Lin, “An incremental genetic Systems, vol. 15, pp. 107-118, 2004.
algorithm approach to multiprocessor scheduling.” IEEE Trans. on [18] Wu, M.-Y. and Gajski, D.D.” Hypercool: a programming aid for
Parallel and Distributed Systems, vol. 15, pp. 824-834. 2004. messagepassing systems,” IEEE Trans. Parallel Distrib. Systems 1 (3) ,
[14] Braun, T.D., H.J. Siegel, N. Beck and L.L. Boloni et al., “A comparison pp. 330–343. 1990.
study of static mapping heuristics for a class of meta-tasks on [19] Yang, T. and Gerasoulis, A.” DSC: scheduling parallel tasks on an
heterogeneous computing systems.” Proc. 8th Workshop on unbounded number of processors,” IEEE Trans. Parallel Distrib.
Heterogeneous Processing, pp. 15-29, 1999. Systems vol. 5 (9), pp. 951–967, 1994
[15] Ahmed, I. and Y. Kwok, “On exploiting task duplication in parallel
program scheduling. IEEE Trans. on Parallel and Distributed Systems,”
vol. 9, pp. 872- 892, 1998.
2103