Live Webcast 15th Annual Charm++ Workshop

-->

PPL/Charm++ at SC17

Optimizing Matrix Transpose on Torus Interconnects
| Venkatesan Chakaravarthy | Nikhil Jain | Yogish Sabharwal
International European Conference on Parallel and Distributed Computing (Euro-Par) 2010
Publication Type: Paper
Repository URL:
Abstract
Matrix transpose is a fundamental matrix operation that arises in many scientific and engineering applications. Communication is the main bottleneck in performing matrix transpose on most multiprocessor systems. In this paper, we focus on torus interconnection networks and propose application-level routing techniques that improve load balancing, resulting in better performance. Our basic idea is to route the data via carefully selected intermediate nodes. However, directly employing this technique may lead to worsening of the congestion. We overcome this issue by employing the routing only for selected set of communicating pairs. We implement our optimizations on the Blue Gene/P supercomputer and demonstrate up to 35% improvement in performance.
TextRef
People
Research Areas