Colony II is a computer science project that is investigating and demonstrating
the effectiveness of innovative system software technologies on leadership-class
scale machines. It is researching and developing system software that enables
general purpose operating and runtime systems for tens of thousands of processors.
To make a general purpose operating system scale to such levels, new technology
is required for fault management, resource management for parallel load balancing,
resource management for changing the set of processors allocated to a job, scalable
peer-to-peer communication systems, and Linux at one million nodes.
Publications:
Esteban Meneses, Celso L. Mendes and Laxmikant V. Kale.
Team-based Message-Logging: Preliminary Results.
Proceedings of the 3rd Workshop on Resiliency in High Performance Computing at CCGrid'2010,
Melbourne, Australia, May 2010.
Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale.
Optimizing Communication for Charm++ Applications by Reducing Network Contention,
Accepted for publication in Concurrency and Computation: Practice and Experience (EuroPar special issue), 2010.
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale.
Hierarchical Load Balancing for Large Scale Supercomputers,
Accepted at the Third International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2),
San Diego - CA, September 2010.