Run-time Support for Controlling Communication-Induced Memory Fluctuation
Authors:
Yan Shi, Gengbin Zheng, Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
PPL Technical Report
Many parallel applications require a large volume of transient memory to hold data from communication, therefore demonstrating a pattern of communication-induced memory usage fluctuation. Even though these applications' persistent working data might fit in physical memory, the transient peak memory usage could still lead to disk swapping or even out-of-memory error. In this paper, we present a solution to the above problems by runtime support for controlling the communication-induced memory fluctuation. The idea consists of imposing runtime flow control for large data transfers and thus controlling the peak transient memory consumed by communication. We explore the idea with both send-based and fetch-based low level communication primitives. We develop a runtime support based on the Charm++ integrated runtime environment. We test this runtime system with a set of real applications and show considerable performance improvements.