A Scalable Double In-memory Checkpoint and Restart Scheme towards Exascale
This talk described recent progress in optimizing inmem checkpoint/restart fault tolerance scheme to 64K cores of Blue Gene/P machine with scalable performance.
