Work is defined the amount of computational resources and time needed to complete a job. It is similar to the physical definition of work, power multiplied by time. Power can be thought of as the processor power. More processors yields more more power. Time is the runtime of a job. Work is a static value. All jobs take some value of work, and a cluster must provide exactly that amount of work for a job to complete. If a job can be expanded to run on more processors, the computational power will increase. In this scenario, a cluster could use less time to achieve the same work value. GanttChartStrategy uses this assumption in its scheduling decisions. BidSim calculates work by multiplying the minimum number of processors by the runtime.
Why Do We Need A Work Evaluation Framework?
Clusters used in the Faucets system can be very different from each other. Some clusters' processors will run faster then others. Some will have more memory and faster networks. Some processors will have large caches and different instruction sets. Some clusters aren't single processor per node machines and will have many processors on one node sharing memory. The Faucets scheduler can only schedule on nodes and does not use processors in allocation. Another assumption that BidSim makes is that all applications will have perfect scaling when expanding or contracting the number of processors. For example, if the BidSim expanded a job to twice as many processors, the job would execute twice as fast and complete in half the time. This function would hold true as long as the job stays between the minproc and maxproc value given to BidSim. Nearly all parallel programming application scientists would be envious of this type of scalability. In practice, very few applications can support perfect scalability for even a short range of processors. In a real system, the scalability of individual application must be accounted for to prevent similar problems that differences in computational power created. In Faucets, such an assumptions will lower utilization and create high prices in fast clusters and cause premature termination of applications in slow machines. The reason is that Faucets calculates work in a similar method as BidSim. Work is minnodes multiplied by runtime. If a user supplies an average runtime value, fast clusters will have to schedule the requested runtime value to ensure its quality of service contract. However, the fast cluster will finish computation faster then the provided runtime value. The extra length causes a higher bid value as most clusters bid on requested runtime value in Faucets and a lower utilization as more jobs could have been accepted in the unused time. Slow clusters also have a problem with an average runtime value. Clusters are only required to ensure the QOS contract is meet. If a job runs past its runtime value, the job can be terminated. An average runtime value will not give slow clusters enough time complete jobs and will result in prematurely terminated jobs. Clearly, a solution is needed for the differences in computational power among clusters in Faucets. SolutionThe problem of implementing the GanttChartStrategy in Faucets can be equated to finding how fast a cluster's computational power can perform the work of a given application. In BidSim and given equal processors, work done on one cluster would take the same time as work done on another cluster. Faucets does not provide this luxury. Instead, we define two entities, the standard node and work rate.
The standard node is used to evaluate different cluster node speeds. The standard node is a 1 cpu per node machine with 1000 bogomips, 256 KB of cache, and 512 Megabytes of RAM. A nodes work rate is the computational power of a cluster's node compared to the standard node. A node with a work rate of 2 would, on average, complete applications twice as fast as the standard node. The speed up function is the scaling of an application given a specific number of nodes. cdInfoBidGeneratorappList |