Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell

PPL Paper Number: 09-10
PPL CVS: 200911_AccelSC09

Authors:
David M. Kunzman and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign

To appear in the Proceedings of the 22nd Annual International Conference for High Performance Computing, Networking, Storage, and Analysis 2009
Supercomputing 2009 (SC09), Portland, Oregon, USA, Novermber 2009


Abstract

While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.


[PDF] [bibtex]