HPC-Colony: Services and Interfaces for Very Large Systems
| Sayantan Chakravorty | Celso Mendes | Laxmikant Kale | Terry Jones | Andrew Tauferner | Todd Inglett | Jose Moreira
OSR Special Issue on HEC OS/Runtimes 2006
Publication Type: Paper
Repository URL: OSR2006
Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming para-digm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.
Sayantan Chakravorty and Celso L. Mendes and Laxmikant V. Kale and Terry Jones and Andrew Tauferner and Todd Inglett and Jose Moreira, "HPC-Colony: Services and Interfaces for Very Large Systems", ACM SIGOPS Operating Systems Review: Operating and Runtime Systems for High-end Computing Systems, vol. 40, April 2006.
Research Areas