Faucets: Shared Computing Power

The Faucets client interface.

As scientists write more and more parallel applications that require large amounts of computational resources, large parallel machines will become more commonplace. However, these machines are often prohibitively expensive for any single user and so must be shared by many users. Furthermore, the machines tend to go through periods of low and high utilization. During the periods of high utilization, users might not be able to get enough computational resources to get results in time to meet a deadline. During the periods of low utilization, the computational power sits idle while someone across the country may be wishing they had just a few more nodes so that they could run their program.

The Faucets system addresses this second problem by allowing subscribers to share their resources. In the Faucets system, computational power is viewed as a utility. A supercomputer produces the utility and the users consume it. When a user finds the cluster they normally use is not providing enough of the utility, they can "borrow" some from another subscribing cluster which is currently being underused. In the future, a user of the second cluster might run an application on the first when the situation is reversed. Faucets provides both the technical framework and economic model to facilitate the discovery, distribution, and pricing of computational power.

As part of our Faucets project, we present several complementary ideas and their implementation that improve the efficiency of parallel servers in the Grid environment:

  • Bartering: a system which permits cluster maintainers to exchange computational power with each other. Jobs are submitted to the Faucets system with a Quality of Service requirement and subscribing clusters return bids; the best bid which meets all criteria is selected. When an application is succesfully run by the bidding cluster, it is awarded bartering units which its users can later trade for the use of resources on other subscribing clusters.
  • Adaptive Jobs: a system that supports adaptive jobs (also known as malleable and/or evolving jobs) that can shrink and expand to a variable number of processors at run-time, and an Adaptive Job Scheduler for timeshared parallel machines that exploits this ability.
  • On-Demand Computing: users can get the resources they need when they need them. Furthermore, they can specify that their jobs need to be run immediately. High priority jobs can preempt low priority jobs so that computational resources are used as efficiently as possible.
  • Scheduler: we provide a scheduling system for clusters which can either be used in conjunction with Faucets or as a stand-alone scheduler.
Adaptive jobs should be written to use our version of MPI (called AMPI for Adaptive MPI), Charm++, or other supported languages.

Technical Overview of the Faucets System
Cluster Scheduler Manual
 

News
Oct., 19, 2005: The source code available for download through this page has been updated to match the version in CVS. This version addresses some bug fixed and adds some more options to the cluster scheduler.

The Faucets project recently received funding through the NCSA/UIUC Faculty Fellows Program. Read the official announcement.

Software
Below are the download links and installation instructions. Please note that this is software is still in development. There are still many things to fix and features to add.
People
Papers
  • 04-09    L.V.Kale, Sameer Kumar, Jayant DeSouza, Mani Potnuru, and Sindhura Bandhakavi,  Faucets: Efficient Resource Allocation on the Computational Grid,  Proceedings of the 2004 International Conference on Parallel Processing (ICPP 2004), 15-18 August 2004, Montreal, Quebec, Canada.
  • 03-14    Sindhura Bandhakavi,  Analyzing Bidding Strategies For Schedulers In A Simulated Multiple-Cluster Market Driven Environment,  Master's Thesis, Dept. of Computer Science, University of Illinois 2003
  • 03-01    L.V.Kale, Sameer Kumar, Jayant DeSouza, Mani Potnuru, and Sindhura Bandhakavi,  Faucets: Efficient Resource Allocation on the Computational Grid,  PPL Technical Report 03-01, University of Illinois at Urbana-Champaign, Mar 2003
  • 02-01    L. V. Kale, Sameer Kumar, and Jayant DeSouza,  A Malleable-Job System for Timeshared Parallel Machines,  2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2002), May 21-24, 2002, Berlin, Germany.
  • 01-06    Sameer Kumar,  Thesis: An Adaptive Job Scheduler for Timeshared Parallel Machines,  Master's Thesis, Dept. of Computer Science, University of Illinois 2001
  • 00-02    L. V. Kale, Sameer Kumar, and Jayant DeSouza,  An Adaptive Job Scheduler for Timeshared Parallel Machines,  PPL Technical Report 00-02, University of Illinois at Urbana-Champaign, Sep 2000.

This page maintained by Eric Bohm. Back to the PPL Research Page