Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell
Authors:
David M. Kunzman and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
To appear in the Proceedings of the 22nd Annual International Conference for High Performance Computing, Networking, Storage, and Analysis 2009
Supercomputing 2009 (SC09), Portland, Oregon, USA, Novermber 2009
While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.