Design and Implementation of a Flexible Cluster-Scheduling Framework
Publication Type: MS Thesis
In the past, centralized supercomputers were the main source of computing power for those needing hundreds to thousands of processor hours. The schedulers for these systems were usually first-in-first-out (FIFO) queues, with reservations for special allocations having to be done by contacting the administrators. As powerful workstations became more common and people realized how many cycles were going unused, systems such as Condor came about to take advantage of this by harvesting idle cycles. Now, however, small clusters of twenty to a hundred or more dedicated compute nodes are becoming more common. These clusters are owned by diverse organizations with varied scheduling needs. Trying to use the FIFO schedulers of supercomputers or the cycle-harvesting schedulers used for idle workstations often leads to scheduling policies that are less than optimal for the owners of the cluster. For this reason, we have designed and implemented a flexible cluster-scheduling framework. This framework allows for easy implementation of different scheduling strategies. It provides a robust system for changing the information stored about jobs, changing how jobs are scheduled, and changing how jobs are monitored. Furthermore, it allows for the implementation of scheduling strategies which understand the run-time systems of the applications running on the cluster to allow for advanced features such as checkpointing and shrinking and expanding of jobs to make the best scheduling decisions possible.
Pauli, Esteban, "Design and Implementation of a Flexible Cluster-Scheduling Framework", MS Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2006.