1. Cluster Manager (CM)
    1. Overview
    2. The Cluster Manager executes jobs and their associated processes on the physical cluster. The main components of the Cluster Manager are the CM Database and the Scheduler.

    3. CM Database
    4. The CM Database is used to store important information with the intent of providing efficient execution and restart functionality if an error occurs. The values in this database are closely related to the scheduler's job class. Instructions on creating the database are provided in the use section (Section 1.4). When a job is running, the processors it uses are controlled by the bitmap field in the database.

    5. Scheduler
    6. The Scheduler determines how, when, and if jobs run. This includes decisions on timing and the number and location of processors that run the job. If a job is migratable, the scheduler will expand and contract the job if necessary. The method of scheduling is determined at compile time. There are three scheduling strategies provided that inherit from the class SchedulingStrategy. There is a basic strategy, a FIFO strategy, and a Gantt Chart Strategy. The specific details and differences can be found in section 1.3.3.

      1. Abstract SchedulingStrategy Class
      2. Because the strategies inherit from SchedulingStrategy, they must provide the Scheduling Strategy's four abstract functions. These functions are called by the scheduler to make decisions on scheduling.

      3. Functions
      4. addjob(char *jobname, int numnodes) - This adds a job to the strategy with a specified number of nodes.

        removejob(char *jobname, int numnodes) - This function removes a job from the strategy.

        is_available(Job *j, Job *waitq, Job *runq) - This function tests if the job can be accepted by the cluster. It returns a floating point value. If this value is -1, the job can not be accepted; otherwise, the value can be interpreted in the cluster daemon for bidding purposes. Generally, the value returned is the cluster utilization.

        allocate_nodes(Job *waitq, Job *runq) - This function is called to allocate nodes using the selected strategy.

      5. Strategies
      6. The following describes the Basic, Limit FIFO and Gantt Chart strategies. Each strategy uses information provided by the user to schedule jobs. This information is listed under the requirements section for the specific strategy. The Overview part of the strategy describes how each strategy prioritizes jobs.

        1. Basic Strategy
          1. Requirments
          2. The basic strategy only needs the job's minimum and maximum number of nodes to execute appropriately. If the values are not provided, the values default to one node for the minimum and the clusters total nodes for the maximum.

          3. Overview
          4. The basic strategy is a simple FIFO strategy. Jobs are run in the order they are submitted with no exceptions. When jobs are in the queue, the scheduler try to 'make room' for jobs by taking nodes from migratable job without violating a job's minnodes value.

        2. Limit FIFO (first in first out) Strategy
          1. Requirements
          2. Limit FIFO requires a minnodes, maxnodes, and runtime value to execute appropriately. If the requirments are not provided, the default values are one node for minnodes, the clusters total nodes for the maxnodes, and 4 hours for the runtime.

          3. Overview
          4. Limit FIFO is a modified first in first out policy. There are three different subpolicies in Limit FIFO. These can be changed by commenting out the policies that will not be used in LimitFIFO.C. The default policy is policy number 3. Limit FIFO accepts a job if enough available nodes exist to meet the job's minimum number of nodes requirement.

            Policy 1: This policy sets an absolute limit on the number of jobs a user can run. The default limit for all users can be sent in limitFIFO.h (DEFUALT_MAXJOBS). However, seperate limits can be specified in a file named in limitFIFO.h (USAGE_LIMIT_FILENAME) .

            Policy 2: This policy gives priority to the user with a minimum number of nodes. The next job to run is for the user with the minimum number of nodes occupied. This policy is not greedy and will stall other waiting jobs until sufficient processors are available.

            Policy 3: This policy is similar to policy 2 and gives priority to the user with a minimum number of nodes. However, this strategy is greedy. If the highest priority job can not run because of insufficient resources, the scheduler will try to accommodate the next highest priority job. This may led to starvation of large jobs in the presence of a small stream of jobs.

        3. Gantt Chart Strategy
          1. Requirements
          2. The Gantt Chart Strategy requires runtime, softdeadline, harddeadline, minnodes, and maxnodes to execute appropriately. If the values are not provided, the values default to one node for the minnodes, the clusters total nodes for the maxnodes, 4 hours for the runtime, and one million seconds or about 278 hours for the soft and hard deadline.

          3. Overview
          4. The Gantt Chart Strategy is a deadline driven strategy that uses a Gantt Chart to schedule jobs. The process is complicated. For more information, read Sindura Bandhakavi's paper, Analyzing Bidding Strategies For Schedulers In A Simulated Multiple-Cluster Market Driven Environment1. Jobs are submitted with runtime, two deadline values, and the minimum and maximum of nodes. The job's runtime is the maximum length of time that a job can run on the standard cluster. Units of time are measured in seconds. If a job runs over the runtime value, the job is terminated and the user is charged the bid price. Therefore, it is very important to slightly overestimate the runtime value. The deadlines values should not be confused with the runtime value. Runtime is the actual running time of the process. The deadline values are times that the runtime must be scheduled within. The deadline values are named hard-deadline and soft-deadline. The soft-deadline is the time point where the cluster gets the full amount negotiated for the job. The hard-deadline is the traditional deadline value. The job must be completed by this deadline to satisfy the quality of service contract. The cluster gets no barter units for a job completed at the hard-deadline. The cluster gets a linearly decreasing Barter Units for jobs that complete after the soft-deadline and before the hard deadline. Pricing is calculated in the cluster daemon and will be reviewed in the next section. In summary, the Gantt Chart Strategy provides scheduling with a deadline.

    7. Installation and Usage
      1. Installing the cluster manager
      2. Installing the cluster manager can be a complicated process. Detailed instructions can be found in the INSTALL file in the cluster_manager directory. This section paraphases the detailed instructions.

        1. Before Installing
        2. There are two external components needed by the cluster manager – a database and charm components for adaptive jobs. It may additionally be important for the user to open a port in the firewall of the cluster's head node.

          1. Database
          2. The instructions in INSTALL use a mysql database. One can be downloaded at www.mysql.com. In many distributions a graphical database tool is provided. This is useful for database maintenance.

          3. Charm Components
          4. Download the latest charm distribution at http://charm.cs.uiuc.edu/download. Follow the installation instructions in the README file in the downloaded charm directory.

        3. Installing
        4. Follow the INSTALL file instructions in the cluster_manager directory to install the cluster manager. To change scheduling strategies, you mush change the scheduler source code before compiling the scheduler (step 2). In the scheduler constructor, a new strategy is assigned to the scheduler's strategy pointer. The default strategy is GanttChartStrategy. Change the type of strategy created for a different strategy.

          strategy = new BasicStrategy(num_nodes);

          strategy = new GanttChartStrategy(num_nodes);

          strategy = new limitFIFO(num_nodes);

      3. Use
      4. Use of the Cluster Manager is divided into two components, using the scheduler and using the client.

        1. Using the Scheduler
        2. To start the scheduler: > ./startScheduler

          To abort the scheduler: > ./abortScheduler

          To shut down the scheduler gracefully: > ./fshutdown

        3. Using the Client
        4. The client is used to interface with the scheduler. The clients commands are used by the Cluster Daemon to query, submit, and monitor jobs. The following commands are used in the Faucets system:

          fsub - submits a job to the Cluster Manager.

          fsub +n2 +ppn2 ./hello -stdout out -runtime 1:0:0

          fquery - tests if the Cluster Manager can accept the job.

          fquery ./hello +n2 +ppn2 -runtime 1:0:0 100

          fkill - kills a running job

          fkill 10881235

          fkill -u jbmeyer

          fjobs - lists the status of a running job