Grid computing offers a model for solving large-scale scientific
problems by uniting computational resources owned by multiple
organizations to form a single cohesive resource for the duration of
individual jobs. Despite the appeal of using Grid computing to solve
large problems, its use has been hindered by the challenges involved
in developing applications that can run efficiently in Grid
environments. Such challenges include efficiently mapping work to
heterogeneous resources in a constantly-changing environment and
dealing with the effects of cross-site communication latencies.
This talk describes how the use of a message-driven execution model,
implemented in the Charm++ and Adaptive MPI runtime systems, can be used
to address many challenges involved in deploying tightly-coupled parallel
applications in Grid computing environments. Because the techniques
described in this talk are implemented at the runtime system level, they
are available to applications with little or no modifications required
to application software.
Often, in a large parallel program, the pattern of communication
changes dramatically as the program runs. For this reason, there may
be no single programming paradigm that suits the entire application
well. The Charm++ runtime system provides a way for the programmer to
seamlessly use the most appropriate paradigm for each phase of the
program. I present ParFUM, a framework for unstructured meshes, as an
example of this technique which takes advantages of three ways of
writing parallel programs: MPI, Charm++, and MSA.
In this talk we discuss our research effort at PPL that provide the
programmer with a global view of data (Multiphase Shared Arrays) and a
global view of control (Charisma) to complement Charm++'s local view of data
and local view of control. These higher-level languages are based on the
same adaptive run-time system and aimed to provide higher productivity for
NAMD is a widely used program for molecular dynamics simulations of large
biomolecular systems. Running on everything from laptops to supercomputers,
NAMD must provide robustly scalable parallelism in the hands of ordinary biomedical
researchers. The talk will discuss how NAMD is currently being adapted to run on
both upcoming petascale machines and commodity clusters with NVIDIA CUDA graphics
PetaFLOPS-class computers are currently being developed and even larger computers are
being planned. Our BigSim project is aimed at developing tools that allow one to develop,
debug and tune/scale/predict the performance of applications before such machines are
available so that the applications can be ready when the machine first becomes operational.
It also allows easier "offline" experimentation of parallel performance tuning strategies ---
without using the full parallel computer. To the machine architects, BigSim provides a method
for modeling the impact of architectural choices (including the communication network) on actual,
full-scale applications. In this talk, we will present our simulation framework which consists
of an emulator and a simulator; we will focus on the recent progress in integrating instruction
level simulation with our framework.
The Cell processor, jointly developed by IBM, Sony, and Toshiba, has great computational
power when compared to other commodity processors. For this reason, we would like to
adapt the Charm++ runtime system to be able to harness the power of the Cell. While the
design of the Cell processor is the source of this computational power, it is also the
source of several difficulties such as ease of programming and portability. In this talk,
we will be discussing our efforts to port the Charm++ runtime system, and thus the Charm++
programming model, to Cell-based platforms. There are several aspects of the Charm++
programming model that make it a good fit for the Cell processor, including: data
encapsulation, virtualization, message queue peek-ahead, and so on. We also believe that
the Charm++ can help with several of the difficulties facing the Cell.
The scalability of performance tools in high performance computing has
been lagging behind the growth in the sizes of supercomputers and the
applications that run on them. The volume of performance trace data
generated easily becomes unmanagable without appropriate controls as we
scale upwards. At the same time, the amount of information that has to be
presented to a human analyst can also become overwhelming.
We present techniques used to address the above problems and enhance
the scalability of Projections, a performance instrumentation and
visualization framework for the migratable object programming model
Charm++. Projections provides multiple resolutions of performance
data. We couple this feature with the use of heuristics and clustering
algorithms at application runtime to provide powerful mechanisms to
reduce performance data volume and analysis time while preserving data
relevance. We employ similar heuristics and algorithms for enhanced
interactive post-mortem visualization and analysis assistance to a
Important scientific problems can be treated via
ab initio based molecular modeling approaches wherein atomic forces are
derived from an energy function that explicitly considers
the electrons. The Car-Parrinello ab initio molecular dynamics method (CPAIMD)
is widely used to study systems containing hundreds to thousands of atoms. However,
CPAIMD's impact has been limited due to difficulties inherent in
scaling the technique beyond processor numbers about equal to the number of
electronic states, until recent efforts by ourselves and others.
CPAIMD computations involve a large number of interdependent phases
with high communication overhead including
multiple concurrent sparse 3D-Fast-Fourier Transforms (3D-FFTs),
non-square matrix multiplies and few concurrent dense 3D-FFTs.
Using Charm++ and its concept of processor virtualization, the phases are
discretized into a large number of virtual processors which are, in
turn, mapped flexibly onto physical processors, thereby
allowing significant interleaving of work.
Interleaving is enhanced through both architecturally independent methods
and network topology aware mapping techniques. Algorithmic and
Blue Gene/L specific optimizations are employed to scale CPAIMD to 20480 nodes,
about 30-times the number of electronic states in the largest benchmark system
The enormous dynamic range involved in forming galaxies in their
cosmological context continues to tax the capabilities of the largest
computers available. Galaxies are influenced by the gravitational
forces originating tens of megaparsecs away, while the star formation
process which ultimately leads to galaxies being visible occurs on
sub-parsec scales. The large range in length scales also imply a
large range in timescales, hence spatially and temporally adaptive
algorithms are needed for efficiently performing these calculations.
Strategies for using the cabilities of the Charm++ system for tackling
these challenges will be discussed.
In order to scale cosmological simulations to PetaScale computers, we built
ChaNGa: a cosmological simulator which has recently been released to the public.
This simulator has shown outstanding scalability to machines with tens of thousands of processors.
In this talk I will describe the main features of this code based on the Charm++
Runtime System, focusing in particular on issues regarding load balancing and
multistepping. I will accompany my description with results from
large machines, such as IBM-BlueGene/L, Cray-XT3 and commodity clusters.
Projections is a performance tool used to analyze Charm++ applications.
The tutorial will guide the attendee through the basics of
instrumentation, trace-generation and various visualization features.
We will use a mid-sized NAMD dataset as a case-study and will also explore
more advanced features to aid the identification of bottlenecks on larger
This Tutorial will present the basics of Adaptive MPI (AMPI), an
MPI implementation over Charm++. It will show how to run regular
MPI applications under AMPI, and how to modify existing MPI
applications to benefit from the extensions provided by AMPI.
Those extensions include features such as checkpoint support,
automatic load balancing, asynchronous communication and others.