next up previous
Next: 3 Integrated debugging system Up: Debugging Support for Charm++ Previous: 1 Introduction

Subsections


2 Charm++

CHARM++[10,8,9] is an object oriented parallel programming language based on C++. CHARM++ is built on Converse [7], a message-passing layer that supports multi-lingual interoperability. CHARM++ supports a variety of distributed and shared memory machines, and directly supports Linux or Windows clusters of PCs connected using Ethernet or Myrinet, Alpha servers using Quadrics interconnects, SGI shared-memory machines, the Cray T3E, or any machine using MPI or pthreads.

The execution model of CHARM++ is message-driven [4] wherein Converse treats the parallel machine as a collection of nodes that communicate primarily via messages. Each node is comprised of a number of processors that share memory. When a message arrives at a processor it triggers the execution of a handler function as specified by the message[14]. The message is a contiguous sequence of bytes and has two parts - the header and the data. The header contains a handler number which specifies which handler function is to be executed when the message arrives. Converse maintains a table mapping handler numbers to function pointers. Each processor has its own copy of the mapping.

Communication primitives send messages to the scheduler queues of remote processors, where the scheduler thread finds them and processes them. The Converse scheduler serves not only as a message receiver but also as a central allocator of CPU time. Both locally generated as well as messages from the network contend for scheduling time in the same way.

The parallel programming model of CHARM++ is based on the concept of processor virtualization [6], where the programmer divides the work into a large number of pieces called virtual processors or parallel objects, and lets the runtime system map these pieces to processors. Communication between pieces is based on virtual addresses managed by the runtime system [10], so the system can migrate pieces of the computation between processors without changing the way the pieces communicate, and hence without changing the programmer's view of the computation. The number of pieces a computation is broken into is typically independent of, and normally much larger than, the number of processors. The pieces of the computation are implemented by the programmer as parallel objects, which in CHARM++ are regular C++ objects. As regular C++ objects, CHARM++ parallel objects can contain public and private data and methods as usual.

A machine-generated ``proxy'' C++ object is used to invoke methods on these parallel objects from other processors. As with Smalltalk, we use the term ``send an object a message'' to refer to remote object method invocation via this proxy object. In accord with the message-driven execution model of CHARM++ all computations are initiated in response to messages being received. Method calls in CHARM++ are non-blocking--they are asynchronous method invocations [13], so the caller does not wait for the method to be executed or return a value. Because these remote methods can be called from ``outside'', they are called entry methods or entry points.

In CHARM++ parallel objects are normally stored in an Array [10]. An Array is a collection of parallel objects keyed by an ``array index''. The size of the array is not fixed, and not constrained in any way by the features of the underlying parallel machine such as the number of processors or nodes. Each array element of an array has a globally unique index, and messages are addressed to that index. Most of the data in CHARM++ programs is stored in array elements.

2.1 Existing debugging support

1]

#1

Syncprint

The simplest debugging support provided by CHARM++ is an ordered parallel logging facility, enabled by the command-line parameter ``+syncprint''. This forces causality by making output statements using printf block the calling object until the output is queued at a central location. This global ordering slows down output, but ensures no out-of-order debugging statements.

Standalone mode

Another simple feature is the ability to run a parallel program serially, in a single process. This ``stand-alone'' mode allows programmers to debug Charm++ programs on their local workstation using their favorite serial debugger, such as the graphical debugger included with Microsoft's Visual C++. Because of the virtualization aspect of CHARM++, programs on a single processor are not limited to using a single object or flow of control, so this trivial feature can be used for real programs and has allowed us to catch a number of bugs. There is no true concurrency with this method, however, as CHARM++ switches between in-process flows of control in a cooperative fashion.

Multiple sequential debuggers

We begin to track down concurrent bugs by spawning separate sequential debugger, such as gdb or dbx on each process of a parallel job using the command-line run-time option ``++debug''. Each debugger runs in a separate window, and shows the terminal output of its parallel process. Because of all the separate windows, this method becomes unusable for more than a few dozen processors.

Record and replay

Bugs due to message ordering can be extremely difficult to track down, because message ordering on many parallel machines is nondeterministic [16,1]. CHARM++ provides a ``record and replay'' mechanism that allows a user to record and later reproduce a program's order of message arrivals, which can help catch message ordering bugs. The key idea here is to tag messages at the sender, and record the message execution order to a file using the sender-generated tags. CHARM++ tags messages using the sending processor and an ``outgoing message count'' sequence number. This means the same message executions can be replayed by processing incoming messages in file order, as long as senders tag their messages the same way on re-execution. Because CHARM++ scheduling is deterministic and non-preemptive, the only nondeterminism in CHARM++ programs comes from message arrival order. Thus we only need to ensure senders also process incoming messages in file order to ensure the entire program repeats itself exactly.

To enable the required tracing for record and replay, a CHARM++ program is linked with the option ``-tracemode recordreplay'' and run with the ``+record'' option, which records message orders in a file for each processor. The same execution order can be replayed using the ``+replay'' runtime option; which can be used at the same time as the other debugging tools in CHARM++.




next up previous
Next: 3 Integrated debugging system Up: Debugging Support for Charm++ Previous: 1 Introduction

January 23, 2004
Charm Homepage