Runtime Systems and Tools:
BigNetSim - Parallel InterConnection Network Simulation


Parallel Interconnection Network Simulation (BigNetSim)

BigNetSim is a parallel simulator which is built on top of pose, a parallel discrete event simulation environment developed using charm++ at the Parallel Program Laboratory. The BigNetSim architecture can be effectively described by the following diagram:

BigNetSim is an effort to simulate large current and future computer systems to study the behavior of applications developed for those systems. It simulates, with reasonable detail, an integrated model for computation (processors) and communication (interconnection networks). Our earlier work on computation simulation for performance prediction, BigSim, assumed fixed message latencies. The BigSim emulator was the first phase of our performance prediction system. Charm++ and Ampi applications can be compiled to run on this emulator as though it were the target architecture. The emulator captures a collection of tasks (blocks of computation and communication) on a number of processors (objects) along with their dependencies and writes these tasks to log files. These application tasks are translated into discrete events. Each event has a timestamp, and originating and destination objects. The logs are read by BigNetSim, which simulates the execution of the original tasks by elapsing time, satisfying dependencies, and spawning additional tasks by passing messages through a detailed network contention model. This generates corrected times for each event, which can be used to analyze its performance on the target machine.

At a higher level, the entire design is extremely modular. New topologies and routing algorithms can be easily plugged into the system. We use virtual cut-through packet switching with a credit-based flow control to keep track of packets in the network. The system supports virtual topologies for virtual channel routing, which is essential for deadlock-free routing algorithms on most topologies. Topologies implemented include N-dimensional meshes and Tori, N-dimensional Hypercubes and K-ary N-trees and Hybrid topologies. All topologies have physical and virtual channel routing algorithms. Most routing algorithms are adaptive. To support adaptivity based on the network load, we developed a contention model and a load model for the IN. Each port of a switch has information which is dynamically updated and fed to the routing engine to make informed decisions to minimize contention. The load model maintains load information on each of the neighbors, while the contention model maintains information about the number of packets contending for a particular output port of a switch.
Evaluating HPC Networks via Simulation of Parallel Workloads [SC 2016]
[PhD Thesis]
Optimization of Communication Intensive Applications on HPC Networks [Thesis 2016]
Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations [PADABS, EURO-PAR 2015]
| Bilge Acun | Nikhil Jain | Abhinav Bhatele | Misbah Mubarak | Christopher Carothers | Laxmikant Kale
Scaling an Optimistic Parallel Simulation of Large-scale Interconnection Networks [WSC 2005]
[MS Thesis]
Low Diameter Regular Graph as a Network Topology in Direct and Hybrid Interconnection Networks [Thesis 2005]
Performance Prediction using Simulation of Large-scale InterconnectionNetworks in POSE [PADS 2005]