We have created a new debugging system with a number of useful features for CHARM++ programmers. The system includes a Java GUI client program which runs on the programmer's desktop, and a CHARM++ parallel program which acts as a server. The client and server need not be on the same machine, and communicate over the network using a secure protocol described in Section 4.2.
The system provides the following new features.
The debugging client provides these features via extensive support built into the CHARM++ runtime. The parallel runtime is in a unique position to provide this debugging information, as it is much closer to the application level than the machine binary used by sequential debuggers.
The CHARM++ programmer starts the debugger client from the command-line specifying the program to be debugged, its parameters and the number of processor elements it should run on as command-line parameters. Alternatively, the program and the parameters could be set via a menu item provided by the debugger GUI. The menu usage is shown in Figure 1.
Once the debugger client's GUI loads, the programmer triggers the program execution by clicking the Start button. The program begins in a frozen state, displaying the ``user'' and ``system'' entry points as a list of check boxes. ``System'' entry points belong to libraries and CHARM++ code, while ``user'' entry points are defined in the application program being debugged. The programmer sets and removes breakpoints by checking and unchecking the checkboxes corresponding to the entry points, then begins execution by clicking the Continue Button. The program freezes when a breakpoint is reached. Figure 2 shows a snapshot of the debugger when a breakpoint is reached.
The server runtime inserts a breakpoint by changing the CHARM++ entry handler table, a table of function pointers that normally directly jump to application entry method code. By overwriting the entry method's function table entry with a jump to debugging code, the next message which attempts to execute that method will instead jump directly to the debugging runtime. This is a much more efficient implementation than our previous version[15], which kept a list of breakpoints to check against each incoming message. In fact, the new version imposes zero overhead if not used, so it can be permanently enabled rather than requiring a special debug build.
Clicking the Freeze button stops the selected processors before the start of their next message, and drains their network queues. The Continue button resumes execution. The Quit button exits the debugged program.
Entities (for instance, array elements) and their contents on any processor can be viewed at any point, as illustrated in Figure 3. The CHARM++ PUP framework, as described in Section 4.1, is used to retrieve and format program entities. The Converse scheduler, which is the core of CHARM++, interacts with a pool of messages placed in queues on each of the processors[7]. These messages could be generated locally or could be from remote processors. In CHARM++, a message could be due to an entry method invocation, a ready thread, a message sent to a ready thread or a handler posted previously. A message is a chunk of memory with a header and data. The debugger allows the user to freeze the program and inspect the messages in the queues. From the data part of the CHARM++ message the debugging framework encodes the destination object, the method being invoked and the parameters for the user to interpret.
Specific individual processes of the CHARM++ program can be attached to instances of gdb during the course of program execution as shown in Figure 4. The older ``++debug'' option provides the same ability, but it always starts a sequential debugger for every process, while the new interface can start the debugger on a subset of processors.
January 23, 2004
Charm Homepage