What's New:
1-20-2002 : There are a few Changes to the Blue Gene emulator API. The new API is improved to allow further porting of other programming models including Charm++. See the Section Changes for details.
3-25-2001 : The Blue Gene emulator is now completely rewritten on top of Converse instead of Charm++, while the API supported by the original emulator is kept without major changes. The new emulator is implemented on a lower layer communication library - Converse in order to achieve better performance by avoiding the cross layer overhead. Switching to Converse Blue Gene emulator allows further porting of Charm++ parallel language on the emulator.
New features are also added in the Converse Blue Gene emulator including supporting thread-committed messages that can be send to a specific thread in a Blue Gene node; supporting both Blue Gene node level
and thread level broadcast.
Changes:
1-20-2002: In the new API, while most of the function calls remain the same, there are a few changes:
(i) The message format changed. User defined message now must include a CmiBlueGeneMsgHeaderSizeBytes byte long pre-allocated space as message header which is used by the runtime communication library.
(ii) In the new API, when you pass the message to a packet send call, you cannot use or free the message afterwards in the user code. The system will hold the message pointer until it is done. This passing of ownership avoided buffer copying in the emulator and improved the performance.
(iii) The handler registration calls must be made in BgNodeStart. This is because in new API, the handler table is created for each Blue Gene node, instead of one for each emulator machine.
(iv) Implemented node private macros(Bnvs),
one can declare node private variables individually instead of declared them
together in a big data structure and use functions to access them.
Objectives:
The Blue Gene emulator environment is designed
with the following objectives:
(i) To support a realistic Blue Gene API
on existing parallel machines.
(ii) To obtain first-order performance estimates
of algorithms.
(ii) To facilitate implementations of alternate
programming models for Blue Gene; Charm++ can be one of the parallel langaugae on top of emulator.
The "Blue Gene" machine supported by the emulator
consists of three-dimensional grid of 1-chip nodes. The user may specify the
size of the machine along each dimension (e.g. 34x34x36).
The chip supports k threads (e.g. 200), each with its own integer
unit. The proximity of the integer unit with individual memory modules
within a chip is not currently modeled.
The API supported by the emulator can be broken
down into several components:
Level 0: Low-level API for chip-to-chip communication
Level 1a: Mid-level API that supports local micro-tasking with a chip level scheduler
Level 1b: Features such as: read-only variables, reductions,
broadcasts,distributed tables, get/put operations
Level 2: Migratable objects with automatic load
balancing support
Of these, the first two have been implemented.
The simple time stamping algorithm, without error correction, has been
implemented. More sophisticated timing algorithms, specifically
aimed at error correction, and more sophisticated features (1b, 2 and others),
as well as libraries of commonly needed parallel operations are part
of the proposed work for future.
The following sections define the appropriate
parts of the API, with example programs and instructions for executing
them.
Blue Gene
API: Level 0
void addBgNodeInbuffer(bgMsg *msgPtr, int nodeID)
void addBgThreadMessage(bgMsg *msgPtr, int threadID)
void addBgNodeMessage(bgMsg *msgPtr)
CmiHandler msgHandlerFunc(char *msg)
void BgSendNonLocalPacket(int x, int y, int z,
int threadID, int handlerID, WorkType type, int numbytes, char * data)
void BgSendLocalPacket(int threadID, int handlerID,
WorkType type, int numbytes, char * data)
boolean checkReady()
bgMsg * getFullBuffer()
typedef void (*BgHandler)(void*)
Initialization API: Level 1a
All the functions defined in API Level 0 are used internally for the
implementation of bluegene node communication and worker threads.
From this level, the functions defined are exposed to users to write bluegene
program on emulator. For each Blue Gene node, the execution starts at BgNodeStart(int argc, char **argv) After completion of execution, user program
invokes a function Handler Function API: Level 1a
The following functions can be called in user's application program to retrieve the BleneGene machine information, get thread execution time, and perform
the communication.
Handler declarations
void BgEmulatorInit(int argc, char **argv)
void *BgNodeStart(int argc, char **argv)
Handler Function 1, void handlerName(char *info)
sample
application 2
sample application 3
Blue Gene Programming Environment
The basic philosophy of the Blue Gene Emulator is to hide intricate details of Blue Gene machine from the application developer. Thus, the
application developer needs to provide intialization details (involving setting
up Blue Gene dimensions and number of communication/worker threads) and
handler functions only and gets the result as though running on a real
machine. Communication, Thread creation, Time Stamping, etc are done
by the emulator.
(low-level primitive invoked by Blue Gene emulator to put the message to
the inbuffer queue of a node.)
msgPtr - pointer to the message to be sent to target node
nodeID - node ID of the target node, it is the serial number of a bluegene node in the emulator's physical node.
(add
a message to a thread's affinity queue, these messages can be only executed by a specific thread indicated by threadID.)
(add
a message to a node's non-affinity queue, these messages can be executed by any thread in the node.)
(handler
to process the msg)
(
chip-to-chip communication function. It send a message to Node[x][y][z])
threadID - affinity message for thread identified by threadID, -1 as any thread.
handlerID - Id of the handler which executes on this message
type
- defines whether the handler is to be executed by communication thread
or worker thrad
numbytes - size of the message
data - pointer
to the message to be sent
(create
a micro-task i.e. work for some thread in the same node as the invoking
thread
Arguments
have same meaning as that of BgSendNonLocalPacket described above.)
(invoked
by communication thread to see if there is any unattended message in
the inBuffer.)
(invoked
by communication thread to retrieve the unattended message in inBuffer.)
(It
represents a handler function that returns nothing and takes a (void *))
Considering that the emulator machine will emulator several Bluegene nodes on
each physical node, the emulator program define this function
BgEmulatorInit(int argc, char **argv) to initialize each emulator
node. In this function, user program can define the Bluegene machine size,
number of communication/worker threads, and check the command line arguments.
The
size of the Blue Gene machine being emulated and the number of thread per
node is determined either by the command line arguments or calling following
functions.
void
BgSetSize(int sx, int sy, int sz);
(
set Blue Gene Machine size.)
void
BgSetNumWorkThread(int num);
(
set number of worker threads per node.)
void
BgSetNumCommThread(int num);
(
set number of communication threads per node.)
User message handler functions are registered to Bluegene emulator via:
int BgRegisterHandler(BgHandler h);
(
register a handler h, and returns the global identifier for that handler)
Similar to pthread's thread specifc data, each bluegene node can has its
own node specific data associated with it. To do this, user need to define its own the Node Specific Variables encapsulated in a struct definition and register
the pointer to the data to the emulator by following function:
void BgSetNodeData(char *data);
To retrieve the node specific data, call:
char *BgGetNodeData();
A set of Bnv macros are implemented to add flexibilty to the declaring and use of node private data:
BnvDeclare(int, data);
BnvStaticDeclare(int, data);
BnvInitialize(int, data);
BnvExtern(int, data);
BnvAccess(data);
void BgShutdown()
void BgGetSize(int *sx, int *sy, int *sz);
int BgGetNumWorkThread();
int BgGetNumCommThread();
int BgGetThreadID();
int BgGetGlobalThreadID();
double BgGetTime();
void BgSendPacket(int x, int
y, int z, int threadID, int handlerID, WorkType type, int numbytes, char*
data);
Writing a Blue Gene Application
Application Skeleton
Struct definitions encapsulating Node (specific)
variables
(set bluegene machine configuration parameters including size, node thread configuration
You also neet to register handlers in this function.)
(The usual practice in this function is to send an intial message to trigger the
execution.
You can also register node specific data in this function.)
Hanlder Function 2, void handlerName(char
*info)
..
Handler Function N, void handlerName(char
*info)
----------------------------------------------------------
sample application 1
/* Application:
Each node starting at [0,0,0] sends a packet to next node in the ring order.
*
After node [0,0,0] gets message from last node in the ring, first iteration
ends.
*
After doing 20000 iterations the execution ends.
*/
/* Application:
Find the maximum element.
*
Each node computes maximum of it's elements and the max values it received
from other nodes
*
and sends the result to next node in the reduction sequence.
*
*
Reduction Sequence: Reduce max data to X-Y Plane
*
Reduce max data to Y Axis
*
Reduce max data to origin.
*/
/* Application: Find the number
of primes in a given range
*/
Compiling and Running
Dowload the source code, and see README and Makefile
for compiling and running