Accelerating messages by avoiding copies using RDMA in an asynchronous parallel runtime system
Thesis 2017
Publication Type: Paper
Repository URL:
Abstract
With the advent of Exascale computing, the number and size of messages is expected to
increase greatly. One sided communication with the help of Remote Direct Memory Access
(RDMA) supported hardware is the natural choice for large messages as it has proven to
provide reduced latencies and increased bandwidth for large payloads in High Performance
Computing (HPC) networks. Using RDMA technology enables the network to bypass the
Operating System and perform data transfers without the involvement of the Central Processing Unit (CPU). In addition to not consuming CPU cycles, using RDMA also benefits
from zero copy networking where the data being transferred is not copied between the layers
of the network stack.
Since memory performance is significantly lesser than the CPU performance, it has been
observed that memory intensive operations reduce application performance and increase energy consumption. For this reason, reducing memory pressure by saving the cost of allocation
and copy helps in improving application performance significantly.
The asynchronous message sending paradigm in Charm++ makes a copy of the payload
at the sender side. It also requires copying the data from the message into the user’s data
structure at the receiver side. As the payload gets larger, the cost of these allocations
and copies also increase proportionally. In this thesis, we show the benefits of avoiding
the copies at both the sender and receiver side using RDMA on different applications. We
also discuss the design of the zero copy user level Application Programming Interface (API)
in Charm++ along with the underlying RDMA implementations for different networks in
today’s supercomputers.
People
Research Areas