Run-time Support for Controlling Communication-Induced Memory Fluctuation

PPL Paper Number: 06-06
PPL CVS: ReduceMem

Authors:
Yan Shi, Gengbin Zheng, Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign

PPL Technical Report


Abstract

Many parallel applications require a large volume of transient memory to hold data from communication, therefore demonstrating a pattern of communication-induced memory usage fluctuation. Even though these applications' persistent working data might fit in physical memory, the transient peak memory usage could still lead to disk swapping or even out-of-memory error. In this paper, we present a solution to the above problems by runtime support for controlling the communication-induced memory fluctuation. The idea consists of imposing runtime flow control for large data transfers and thus controlling the peak transient memory consumed by communication. We explore the idea with both send-based and fetch-based low level communication primitives. We develop a runtime support based on the Charm++ integrated runtime environment. We test this runtime system with a set of real applications and show considerable performance improvements.


[postscript] [PDF] [bibtex] [text reference]