OpenAtom
Version1.5a
|
Public Member Functions | |
PairCalculator (CProxy_InputDataHandler< CollatorType, CollatorType > inProxy, const pc::pcConfig _cfg) | |
Entry Method. (obviously) More... | |
PairCalculator (CkMigrateMessage *) | |
Constructor for migration. | |
~PairCalculator () | |
Destructor (nothing needs to be done?) More... | |
CollatorType * | leftHandler () const |
Returns a pointer to the collator that will buffer the left matrix data (only for use by the corresponding InputDataHandler chare) | |
CollatorType * | rightHandler () const |
Returns a pointer to the collator that will buffer the right matrix data (only for use by the corresponding InputDataHandler chare) | |
void | phantomDone () |
Entry Method. To handle correct startup for symm PC w/phantoms. More... | |
void | acceptLeftData (paircalcInputMsg *msg) |
Entry Method. Method to send in the complete block of the left matrix. More... | |
void | acceptRightData (paircalcInputMsg *msg) |
Entry Method. Method to send in the complete block of the right matrix. More... | |
void | launchComputations (paircalcInputMsg *aMsg) |
NOT an entry method. Called locally from the acceptData* methods to launch the appropriate number-crunching method. More... | |
void | multiplyForward (bool flag_dp) |
Forward path multiply driver. Prepares matrices, calls DGEMM, contributes results to Ortho subTiles and also passes relevant data to phantom PC chares. More... | |
void | multiplyForwardRDMA () |
Entry Method. Simply redirects call to multiplyForward() | |
void | contributeSubTiles (internalType *fullOutput) |
Piece up a tile and send all the pieces as this PC's contribution to the Ortho chares. More... | |
void | sendTiles (bool flag_dp) |
Contribute orthoGrainSized tiles of data (that are ready?) to the corresponding ortho chares. More... | |
void | collectTile (bool doMatrix1, bool doMatrix2, bool doOrthoT, int orthoX, int orthoY, int orthoGrainSizeX, int orthoGrainSizeY, int numRecdBW, int matrixSize, internalType *matrix1, internalType *matrix2) |
Receive data from ortho chares and copy into matrix. | |
void | initGRed (initGRedMsg *msg) |
Entry Method. Initializes the section cookie and the reduction client. Called on startup as the chare world is being created. More... | |
void | acceptOrthoT (multiplyResultMsg *msg) |
Entry Method. During dynamics, each Ortho calls this on the Asymm loop PC instances to send its share of T back to avoid a race condition between Gamma and T. More... | |
void | multiplyResult (multiplyResultMsg *msg) |
Entry Method. Backward path multiplication. More... | |
void | multiplyPsiV () |
Dynamics: PsiV Tolerance correction loop called on symmetric instances. Technically, backward path. More... | |
void | bwMultiplyDynOrthoT () |
Multiplies Fpsi by T (from Ortho) More... | |
void | multiplyResultI (multiplyResultMsg *msg) |
Entry Method. Simply forwards the call to multiplyResult(). Dont seem to be any instances in source which call this method. Check. More... | |
void | bwbarrier (CkReductionMsg *msg) |
Entry Method. a debugging tool: a barrier at the end of the backward path before anything is sent over to GSP. More... | |
void | bwMultiplyHelper (int size, internalType *matrix1, internalType *matrix2, internalType *amatrix, internalType *amatrix2, bool unitcoef, int m_in, int n_in, int k_in, int BNAoffset, int BNCoffset, int BTAoffset, int BTCoffset, int orthoX, int orthoY, double beta, int ogx, int ogy) |
multiplyPsiV() and multiplyResult() call this to perform the matrix multiply math on the backward path. This calls DGEMM routines for the same. More... | |
void | bwSendHelper (int orthoX, int orthoY, int sizeX, int sizeY, int ogx, int ogy) |
Called on the normal backward path (non-psiV) to set up the data sends to GSpace. | |
void | sendBWResult (sendBWsignalMsg *msg) |
Entry Method. Send the results via multiple reductions as triggered by a prioritized message. More... | |
void | sendBWResultDirect (sendBWsignalMsg *msg) |
Entry Method. More... | |
void | sendBWResultColumn (bool other, int startGrain, int endGrain) |
Send the result for this column. | |
void | sendBWResultColumnDirect (bool other, int startGrain, int endGrain) |
void | initResultSection (initResultMsg *msg) |
Entry Method. Initialize the section cookie for each slice of the result. More... | |
void | pup (PUP::er &) |
PUP method. More... | |
void | reorder (int *offsetMap, int *revOffsetMap, double *data, double *scratch) |
void | dumpMatrix (const char *, double *, int, int, int xstart=0, int ystart=0, int xtra1=0, int xtra2=0) |
void | dumpMatrix (const char *, complex *, int, int, int xstart=0, int ystart=0, int xtra1=0, int xtra2=0) |
void | dumpMatrixComplex (const char *, complex *, int, int, int xstart=0, int ystart=0, int iter=0) |
void | dgemmSplitBwdM (int m, int n, int k, char *trans, char *transT, double *alpha, double *A, double *B, double *bt, double *C) |
void | dgemmSplitFwdStreamMK (int m, int n, int k, char *trans, char *transT, double *alpha, double *A, int *lda, double *B, int *ldb, double *C, int *ldc) |
void | dgemmSplitFwdStreamNK (int m, int n, int k, char *trans, char *transT, double *alpha, double *A, int *lda, double *B, int *ldb, double *C, int *ldc) |
void | ResumeFromSync () |
void | lbsync () |
Entry Method. Something to sync? | |
Private Member Functions | |
void | enqueueBWsend (bool unitcoef, int priority=1) |
Schedules the entry methods that send out the results to GSpace with appropriate priority. More... | |
void | cleanupAfterBWPath () |
Cleans up at end of an iteration (fw-bw computation loop); frees mem, resets counters etc. More... | |
void | copyIntoTiles (double *source, double **dest, int sourceRows, int sourceCols, int *offsetsRow, int *offsetsCol, int *touched, int tileSize, int tilesPerRow) |
Copy the results from outdata1 and outdata2 into the tiles. More... | |
Private Attributes | |
pc::pcConfig | cfg |
A private copy of the input configurations. | |
CProxy_InputDataHandler < CollatorType, CollatorType > | myMsgHandler |
A handle to the co-located chare array that handles data input. | |
CollatorType * | leftCollator |
Data collators for the left and right matrix blocks. | |
CollatorType * | rightCollator |
bool | isLeftReady |
Flags indicating if the left and right matrix blocks have been received. | |
bool | isRightReady |
int | numRecd |
number of messages received | |
int | numRecdBW |
number of messages received BW | |
int | numRecdBWOT |
number of messages received BW orthoT | |
int | numExpected |
number of messages expected all | |
int | numExpectedX |
number of messages expected x-axis | |
int | numExpectedY |
number of messages expected y-axis | |
int | grainSizeX |
number of states per chare x-axis | |
int | grainSizeY |
number of states per chare y-axis | |
int | orthoGrainSizeRemX |
sgrainSizeX % orthoGrainSize | |
int | orthoGrainSizeRemY |
sgrainSizeY % orthoGrainSize | |
int | blkSize |
number points in gspace plane | |
int | numPoints |
number of points in this chunk | |
int | streamCaughtR |
number of rows caught so far R stream | |
int | streamCaughtL |
number of rows caught so far L stream | |
int | numRecLeft |
number of rows so far total left | |
int | numRecRight |
number of rows so far total right | |
int | gemmSplitFWk |
number of rows in split FW dgemm | |
int | gemmSplitFWm |
number of columns in split FW dgemm | |
int | gemmSplitBW |
number of rows in split BW dgemm | |
int * | LeftOffsets |
index numbers of caught stream elements | |
int * | RightOffsets |
index numbers of caught stream elements | |
int * | LeftRev |
reverse index numbers of caught stream elements | |
int * | RightRev |
reverse index numbers of caught stream elements | |
double ** | outTiles |
in output streaming we populate the tiles directly | |
int * | touchedTiles |
tracker to detect when tiles are full | |
bool | notOnDiagonal |
being on or off diagonal changes many things | |
bool | symmetricOnDiagonal |
diagonal symmetric special case | |
bool | expectOrthoT |
Is true only in asymmetric, dynamics scenario. For PC instances in the asymmetric chare array, orthoT should arrive before end of fwd path. | |
bool | amPhantom |
consolidate thisIndex.x<thisIndex.y && cfg.isSymmetric && phantomsym | |
CkArrayID | cb_aid |
bw path callback array ID | |
int | cb_ep |
bw path callback entry point | |
int | cb_ep_tol |
bw path callback entry point for psiV tolerance | |
bool | existsLeft |
inDataLeft allocated | |
bool | existsRight |
inDataRight allocated | |
bool | existsOut |
outData allocated | |
bool | existsNew |
newData allocated | |
bool | resumed |
have resumed from load balancing | |
inputType * | mynewData |
results of bw multiply | |
inputType * | othernewData |
results of sym off diagonal multiply, | |
internalType * | inDataLeft |
or the C=-1 inRight orthoT +c in dynamics More... | |
internalType * | inDataRight |
the input pair to be transformed | |
paircalcInputMsg * | msgLeft |
paircalcInputMsg * | msgRight |
Incoming messages with left and right matrix data that are kept around so that we can directly compute on them. | |
internalType * | outData |
results of fw multiply | |
int | actionType |
matrix usage control [NORMAL, KEEPORTHO, PSIV] | |
double * | allCaughtLeft |
unordered rows of FW input | |
double * | allCaughtRight |
unordered rows of FW input | |
internalType * | inResult1 |
accumulate ortho or lambda | |
internalType * | inResult2 |
used in gamma calc (non minimization) | |
int | rck |
count of received cookies | |
CkGroupID | mCastGrpIdOrtho |
group id for multicast manager ortho | |
int | numOrthoCol |
sGrainSizeX/orthoGrainSize | |
int | numOrthoRow |
sGrainSizeY/orthoGrainSize | |
int | numOrtho |
number of orthos in our grain = numOrthoCol*numOrthoRow | |
CkGroupID | mCastGrpId |
group id for multicast manager bw | |
CkSectionInfo * | resultCookies |
array of bw path section cookies | |
CkSectionInfo * | otherResultCookies |
extra array of bw path section cookies for sym off diag, or dynamics | |
CkCallback * | orthoCB |
forward path callbacks | |
CkSectionInfo * | orthoCookies |
forward path reduction cookie | |
int * | columnCount |
count of processed rows in BW by column | |
int * | columnCountOther |
count of processed rows in BW by column | |
Definition at line 305 of file ckPairCalculator.h.
PairCalculator::PairCalculator | ( | CProxy_InputDataHandler< CollatorType, CollatorType > | inProxy, |
const pc::pcConfig | _cfg | ||
) |
Entry Method. (obviously)
constructor
–—— Setup the forward path input message handling –——
Set isDataReady flags to false only for those inputs this chare will be getting
Phantoms will get only a right matrix input from their mirror chares
Non-phantoms will always get at least a left matrix input
Chares on the chare array diagonals will get ONLY a left matrix (which will also be the right).
Setup the callbacks
Create a string that holds the chare ID and pass it to the message handlers
Create the message handlers for the left and right input matrix blocks
This is the first point during execution when I can supply my InputDataHandler with pointers to the msg handlers, hence it is (now) safe to insert the [w,x,y,z]th element of the InputDataHandler chare array (as it will immediately clamor for access to these handlers)
Definition at line 80 of file ckPairCalculator.C.
References allCaughtLeft, allCaughtRight, amPhantom, cp::paircalc::pcConfig::areBWTilesCollected, cp::paircalc::pcConfig::arePhantomsOn, cb_aid, cb_ep, cb_ep_tol, cfg, columnCount, columnCountOther, cp::paircalc::pcConfig::conserveMemory, existsLeft, existsNew, existsOut, existsRight, expectOrthoT, gemmSplitBW, cp::paircalc::pcConfig::gemmSplitFWk, gemmSplitFWk, cp::paircalc::pcConfig::grainSize, grainSizeX, grainSizeY, cp::paircalc::pcConfig::gSpaceAID, cp::paircalc::pcConfig::gSpaceEP, inDataLeft, inDataRight, inResult1, inResult2, cp::paircalc::pcConfig::isBWstreaming, cp::paircalc::pcConfig::isDynamics, cp::paircalc::pcConfig::isLBon, isLeftReady, cp::paircalc::pcConfig::isSymmetric, leftCollator, msgRight, myMsgHandler, mynewData, notOnDiagonal, numExpected, numExpectedX, numExpectedY, numOrtho, numOrthoCol, numOrthoRow, numPoints, numRecd, numRecdBW, numRecdBWOT, numRecLeft, numRecRight, cp::paircalc::pcConfig::numStates, orthoCB, orthoCookies, cp::paircalc::pcConfig::orthoGrainSize, orthoGrainSizeRemX, orthoGrainSizeRemY, othernewData, otherResultCookies, outData, cp::paircalc::pcConfig::PsiVEP, resultCookies, resumed, streamCaughtL, streamCaughtR, symmetricOnDiagonal, and touchedTiles.
PairCalculator::~PairCalculator | ( | ) |
Destructor (nothing needs to be done?)
destructor
Definition at line 332 of file ckPairCalculator.C.
References cfg, existsLeft, existsNew, existsOut, existsRight, inDataLeft, inDataRight, cp::paircalc::pcConfig::isSymmetric, msgRight, mynewData, notOnDiagonal, orthoCB, orthoCookies, othernewData, otherResultCookies, outData, and resultCookies.
void PairCalculator::acceptLeftData | ( | paircalcInputMsg * | msg | ) |
Entry Method. Method to send in the complete block of the left matrix.
accept data for left side of pair
Assert that data is a valid pointer
Assert that numRows is as expected
Check data validity
Once the basic checks have passed, and if we're debugging print status info
Set member data pertinent to the left block
If all data is ready
Trigger the computation
Definition at line 478 of file ckPairCalculator.C.
References paircalcInputMsg::blkSize, cfg, paircalcInputMsg::data(), existsLeft, cp::paircalc::MessageDataCollator< msgType, dataType >::getSampleMsg(), inDataLeft, isLeftReady, cp::paircalc::pcConfig::isSymmetric, launchComputations(), leftCollator, paircalcInputMsg::numCols(), numExpectedX, numPoints, numRecd, and paircalcInputMsg::numRows().
void PairCalculator::acceptOrthoT | ( | multiplyResultMsg * | msg | ) |
Entry Method. During dynamics, each Ortho calls this on the Asymm loop PC instances to send its share of T back to avoid a race condition between Gamma and T.
Basically calls collectTile to accept incoming ortho data from an Ortho chare.
Do not delete msg. Its a nokeep. /////////////////////////////////////////////////////////////////////////// ollect orthoT from matrix2
?
Definition at line 1030 of file ckPairCalculator.C.
References actionType, cfg, collectTile(), expectOrthoT, grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, numOrtho, numOrthoCol, numOrthoRow, numRecd, numRecdBWOT, cp::paircalc::pcConfig::numStates, cp::paircalc::pcConfig::orthoGrainSize, orthoGrainSizeRemX, and orthoGrainSizeRemY.
void PairCalculator::acceptRightData | ( | paircalcInputMsg * | msg | ) |
Entry Method. Method to send in the complete block of the right matrix.
accept data for right side of pair
Assert that data is a valid pointer
Assert that numRows is as expected
Check data validity
Once the basic checks have passed, and if we're debugging print status info
Set member data pertinent to the right block
If all data is ready
Phantom chares already have correct info in the incoming msg. Only non-phantoms have to be handled
Trigger the computation
Definition at line 538 of file ckPairCalculator.C.
References amPhantom, paircalcInputMsg::blkSize, cfg, paircalcInputMsg::data(), existsRight, cp::paircalc::MessageDataCollator< msgType, dataType >::getSampleMsg(), inDataRight, isLeftReady, cp::paircalc::pcConfig::isSymmetric, launchComputations(), msgRight, paircalcInputMsg::numCols(), numExpectedY, numPoints, numRecd, and paircalcInputMsg::numRows().
void PairCalculator::bwbarrier | ( | CkReductionMsg * | msg | ) |
Entry Method. a debugging tool: a barrier at the end of the backward path before anything is sent over to GSP.
heap hack for minimzation only
Definition at line 1701 of file ckPairCalculator.C.
References enqueueBWsend().
void PairCalculator::bwMultiplyDynOrthoT | ( | ) |
Multiplies Fpsi by T (from Ortho)
Ensure that we're the correct paircalc instance
Ensure that we really have received all the OrthoT pieces
Reset the counter
Definition at line 1972 of file ckPairCalculator.C.
References blkSize, cfg, existsNew, expectOrthoT, grainSizeX, grainSizeY, inDataRight, inResult2, cp::paircalc::pcConfig::isSymmetric, mynewData, notOnDiagonal, cp::paircalc::pcConfig::numChunks, numExpectedX, numExpectedY, numOrtho, numPoints, numRecdBWOT, othernewData, and streamCaughtR.
void PairCalculator::bwMultiplyHelper | ( | int | size, |
internalType * | matrix1, | ||
internalType * | matrix2, | ||
internalType * | amatrix, | ||
internalType * | amatrix2, | ||
bool | unitcoef, | ||
int | m_in, | ||
int | n_in, | ||
int | k_in, | ||
int | BNAoffset, | ||
int | BNCoffset, | ||
int | BTAoffset, | ||
int | BTCoffset, | ||
int | orthoX, | ||
int | orthoY, | ||
double | beta, | ||
int | orthoGrainSizeX, | ||
int | orthoGrainSizeY | ||
) |
multiplyPsiV() and multiplyResult() call this to perform the matrix multiply math on the backward path. This calls DGEMM routines for the same.
PRE: amatrix contains the entire matrix for the multiplication.
matrix contains just what was sent in the trigger message.
omes in with a minus sign
Definition at line 1716 of file ckPairCalculator.C.
References actionType, amPhantom, cp::paircalc::pcConfig::areBWTilesCollected, cp::paircalc::pcConfig::arePhantomsOn, blkSize, cfg, collectTile(), existsNew, existsOut, existsRight, cp::paircalc::pcConfig::grainSize, grainSizeX, grainSizeY, inDataLeft, inDataRight, cp::paircalc::pcConfig::isSymmetric, mynewData, notOnDiagonal, cp::paircalc::pcConfig::numChunks, numExpectedX, numExpectedY, numOrtho, numPoints, numRecdBW, cp::paircalc::pcConfig::orthoGrainSize, othernewData, outData, streamCaughtR, and symmetricOnDiagonal.
Referenced by multiplyPsiV(), and multiplyResult().
|
private |
Cleans up at end of an iteration (fw-bw computation loop); frees mem, resets counters etc.
Resets counters, zeroes out data structures, frees mem when appropriate.
Readies other components (collators) for the next iteration. Ends the backward path substep timing. Paircalc behavior under different cfg.conserveMemory settings are defined solely in this method.
Reset the backward path tile counter
For all mem usage modes
If we're stream computing, zero out these containers
Phantom chares will get a fresh right matrix every time in an incoming msg. Hence, delete it irrespective of cfg.conserveMemory or RDMA settings
For only the speed-optimized (high-mem) mode
<
<
For the normal and strict low-mem modes
These deletes lived in sendBWResult() and sendBWResultDirect()
For the strictest of the low-mem footprint modes,
Delete the input matrices and they'll get reallocated on the next pass. Only do this if we dont use RDMA because we dont want to setup an RDMA channel every iteration
If this chare received a right matrix message and it still exists
Delete the storage for the output matrix too
Definition at line 1483 of file ckPairCalculator.C.
References actionType, amPhantom, cfg, columnCount, columnCountOther, cp::paircalc::pcConfig::conserveMemory, existsLeft, existsNew, existsOut, existsRight, cp::paircalc::MessageDataCollator< msgType, dataType >::expectNext(), inDataLeft, inDataRight, cp::paircalc::pcConfig::isBWstreaming, cp::paircalc::pcConfig::isSymmetric, leftCollator, msgRight, mynewData, notOnDiagonal, numExpectedX, numExpectedY, numOrthoCol, numPoints, numRecdBW, othernewData, and outData.
Referenced by multiplyResult(), sendBWResult(), and sendBWResultDirect().
void PairCalculator::contributeSubTiles | ( | internalType * | fullOutput | ) |
Piece up a tile and send all the pieces as this PC's contribution to the Ortho chares.
Done: necessary changes: 1: border tiles are not uniform size 2: offset calculation must handle non-uniformity solutions, use sGrainSize for all row skips use orthoGrainSizeX or orthoGrainSizeY as needed for row and column iteration
Definition at line 948 of file ckPairCalculator.C.
References cfg, grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, mCastGrpIdOrtho, numOrtho, numOrthoCol, numOrthoRow, orthoCB, orthoCookies, cp::paircalc::pcConfig::orthoGrainSize, orthoGrainSizeRemX, and orthoGrainSizeRemY.
Referenced by multiplyForward().
|
private |
Copy the results from outdata1 and outdata2 into the tiles.
Iterate through the source array, look up the destination row in offsetsRow, destination col in offsetsCol. This will be the destination row and column for the output if the output were considered as a single matrix. Use tileSize to map these values into the destination tile.
Definition at line 2573 of file ckPairCalculator.C.
References numOrtho.
|
inlineprivate |
Schedules the entry methods that send out the results to GSpace with appropriate priority.
Create a signal message
Collapse this into 1 flag
Either reduce the results or sum them direct in GSpace
Definition at line 1087 of file ckPairCalculator.C.
References amPhantom, cp::paircalc::pcConfig::arePhantomsOn, cfg, cp::paircalc::pcConfig::isOutputReduced, cp::paircalc::pcConfig::isSymmetric, notOnDiagonal, and cp::paircalc::pcConfig::shouldDelayBWsend.
Referenced by bwbarrier(), multiplyPsiV(), and multiplyResult().
void PairCalculator::initGRed | ( | initGRedMsg * | msg | ) |
Entry Method. Initializes the section cookie and the reduction client. Called on startup as the chare world is being created.
initialize the ortho reduction trees and cookies
Do not delete msg. Its a nokeep.
Definition at line 370 of file ckPairCalculator.C.
References cp::paircalc::pcConfig::arePhantomsOn, cfg, cp::paircalc::pcConfig::instanceIndex, cp::paircalc::pcConfig::isSymmetric, mCastGrpIdOrtho, notOnDiagonal, numOrtho, numOrthoCol, numRecd, cp::paircalc::pcConfig::numStates, orthoCB, orthoCookies, cp::paircalc::pcConfig::orthoGrainSize, and cp::paircalc::pcConfig::uponSetupCompletion.
void PairCalculator::initResultSection | ( | initResultMsg * | msg | ) |
Entry Method. Initialize the section cookie for each slice of the result.
initialize the multicast tree and cookies for backward path
Do not delete msg. Its a nokeep.
Definition at line 431 of file ckPairCalculator.C.
References cfg, cp::paircalc::pcConfig::grainSize, cp::paircalc::pcConfig::isSymmetric, mCastGrpId, otherResultCookies, rck, and resultCookies.
void PairCalculator::launchComputations | ( | paircalcInputMsg * | aMsg | ) |
NOT an entry method. Called locally from the acceptData* methods to launch the appropriate number-crunching method.
once the data has arrived, launch the forward path
Ensure that we're really ready to launch computations
< Flag to make sure we cleanup only if FW path starts
expectOrthoT is false in any scenario other than asymmetric, dynamics. numRecdBWOT is equal to numOrtho only when it is asymm, dynamics and T has been received completely (from Ortho). So this condition, invokes multiplyForward() on all cases except (asymm, dynamics when T has not been received)
In that exception scenario, we dont do anything now. Later, when all of T is received, both multiplyForward() and bwMultiplyDynOrthoT() are called. Look for these calls in acceptOrthoT().
Do nothing for the phantom chare, non-psiv loops. Computation will be triggered only in the backward path
Reset the counters and flags for the next iteration
If asymm,dyn and T has not yet arrived completely, fw path has not yet been triggered. In this case numRecd will be reset in acceptOrthoT() after the fw path has been triggered. numRecd is reset here for all other cases.
All non-phantoms should expect left matrix data again
All non(symm, on-diagonal) chares should expect right matrix data again
Definition at line 602 of file ckPairCalculator.C.
References actionType, amPhantom, paircalcInputMsg::blkSize, blkSize, cfg, cp::paircalc::MessageDataCollator< msgType, dataType >::expectNext(), expectOrthoT, isLeftReady, cp::paircalc::pcConfig::isSymmetric, leftCollator, numExpected, numExpectedX, numExpectedY, numOrtho, numRecd, numRecdBWOT, and symmetricOnDiagonal.
Referenced by acceptLeftData(), and acceptRightData().
void PairCalculator::multiplyForward | ( | bool | flag_dp | ) |
Forward path multiply driver. Prepares matrices, calls DGEMM, contributes results to Ortho subTiles and also passes relevant data to phantom PC chares.
Forward path multiply.
Essentially does a simple GEMM on the input matrices.
A = [numExpectedX, numPoints] (left matrix, input) B = [numExpectedY, numPoints] (right matrix, input) C = [numExpectedX, numExpectedY] (output matrix, also called S matrix)
In terms of the input matrix dimensions, we need: [numExpectedX, numExpectedY] = [numExpectedX, numPoints] X [numPoints, numExpectedY]
GEMM performs: C = alpha * op(A).op(B) + beta * C where the matrix dimensions are: op(A) = [m,k] op(B) = [k,n] C = [m,n] with m = numExpectedX k = numPoints n = numExpectedY
At first glance, it appears we need to transpose matrix B to make this work. However, since GEMMs are fortran routines, they have a transposed view of the data i.e, a matrix M[r,c] in C++ appears to be a matrix M'[c,r] if the raw array is directly passed into fortran.
The input arrays passed directly into fortran routines will appear to be of dimensions [numPoints, numExpectedX] and [numPoints, numExpectedY]. Hence, to achieve the desired multiply, we transpose the first matrix (A). This gives us the solution matrix we want in one step.
If this is an asymmetric loop, dynamics case AND Ortho has already sent T, call bwMultiplyDynOrthoT() as we must also multiply orthoT by Fpsi
Definition at line 831 of file ckPairCalculator.C.
References cp::paircalc::pcConfig::arePhantomsOn, blkSize, cfg, contributeSubTiles(), paircalcInputMsg::data(), existsOut, existsRight, expectOrthoT, grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, msgRight, notOnDiagonal, numExpectedX, numExpectedY, numOrtho, numPoints, numRecdBWOT, outData, and symmetricOnDiagonal.
Referenced by multiplyForwardRDMA().
void PairCalculator::multiplyPsiV | ( | ) |
Dynamics: PsiV Tolerance correction loop called on symmetric instances. Technically, backward path.
Tolerance correction PsiV Backward path multiplication This is the same as the regular one matrix backward path with the following exceptions:
We do not need to go through the all of multiplyresult for PsiV. All we really need is the setup for multiplyHelper
Schedule the entry methods that will send the bw results out
Definition at line 1134 of file ckPairCalculator.C.
References actionType, amPhantom, cp::paircalc::pcConfig::arePhantomsOn, blkSize, bwMultiplyHelper(), cfg, paircalcInputMsg::data(), enqueueBWsend(), existsRight, grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, msgRight, notOnDiagonal, numExpectedY, numPoints, and outData.
void PairCalculator::multiplyResult | ( | multiplyResultMsg * | msg | ) |
Entry Method. Backward path multiplication.
Backward path multiplication.
Involves more GEMMs
The basic idea is to apply the orthonormalized 'correction' matrix to the input matrices from the forward path and ship them back to GSpace.
For the symmetric instance, since the input matrices (left and right) are the same, there's only one input matrix to apply the correction to. We achieve better load balance by making the phantom chares also participate in this process. This is the reason the phantoms exist. And this is why input data is shipped off to the phantoms at the end of the forward path.
For symm instance: phantoms do: inDataRight . transpose(amatrix) non-phantoms do: inDataLeft . amatrix
For asymm instance: inDataLeft . amatrix
Increment the number of tiles received in the backward path
Get the indices of the ortho that chare that sent this data to us
If I am a phantom chare and have not yet received the right matrix data from my non-phantom mirror
<
If the grain size for paircalc and ortho are the same
else, if this is a PsiV loop
else, if PC is collecting all the tiles from Ortho
If we have all the input we need (received all the tiles, or streaming the computations)
If we're stream computing without any barriers, simply send whatever results are ready now
If we're done with all the paircalc work in this loop (iteration), then cleanup
If all the computations are done, and we're either collecting tiles or are barriered, then we need to send our results to GSpace now
If we're barriered on the backward path
If we're not streaming, delete inResult*
Definition at line 1207 of file ckPairCalculator.C.
References actionType, amPhantom, cp::paircalc::pcConfig::areBWTilesCollected, cp::paircalc::pcConfig::arePhantomsOn, bwMultiplyHelper(), bwSendHelper(), cfg, cleanupAfterBWPath(), collectTile(), cp::paircalc::pcConfig::conserveMemory, enqueueBWsend(), existsRight, expectOrthoT, cp::paircalc::pcConfig::grainSize, grainSizeX, grainSizeY, inResult1, inResult2, cp::paircalc::pcConfig::isBWbarriered, cp::paircalc::pcConfig::isBWstreaming, cp::paircalc::pcConfig::isSymmetric, notOnDiagonal, numOrtho, numOrthoCol, numOrthoRow, numPoints, numRecdBW, cp::paircalc::pcConfig::numStates, cp::paircalc::pcConfig::orthoGrainSize, orthoGrainSizeRemX, orthoGrainSizeRemY, and symmetricOnDiagonal.
Referenced by multiplyResultI().
void PairCalculator::multiplyResultI | ( | multiplyResultMsg * | msg | ) |
Entry Method. Simply forwards the call to multiplyResult(). Dont seem to be any instances in source which call this method. Check.
PairCalculator::multiplyResult(int size, double *matrix1, double *matrix2)
Do not delete msg. Its a nokeep.
Definition at line 1119 of file ckPairCalculator.C.
References multiplyResult().
void PairCalculator::phantomDone | ( | ) |
Entry Method. To handle correct startup for symm PC w/phantoms.
setup of phantom is done
Definition at line 424 of file ckPairCalculator.C.
References cfg, cp::paircalc::pcConfig::instanceIndex, numOrtho, and cp::paircalc::pcConfig::uponSetupCompletion.
void PairCalculator::pup | ( | PUP::er & | p | ) |
PUP method.
pack and unpack
Definition at line 228 of file ckPairCalculator.C.
References actionType, amPhantom, cp::paircalc::pcConfig::areBWTilesCollected, cb_aid, cb_ep, cb_ep_tol, cfg, columnCount, columnCountOther, cp::paircalc::pcConfig::conserveMemory, existsLeft, existsNew, existsOut, existsRight, expectOrthoT, cp::paircalc::pcConfig::grainSize, grainSizeX, grainSizeY, inDataLeft, inDataRight, cp::paircalc::pcConfig::isBWstreaming, cp::paircalc::pcConfig::isLBon, cp::paircalc::pcConfig::isSymmetric, mCastGrpId, mCastGrpIdOrtho, mynewData, notOnDiagonal, cp::paircalc::pcConfig::numChunks, numExpected, numExpectedX, numExpectedY, numOrtho, numOrthoCol, numOrthoRow, numPoints, numRecd, numRecdBW, numRecdBWOT, cp::paircalc::pcConfig::numStates, orthoCB, orthoCookies, orthoGrainSizeRemX, orthoGrainSizeRemY, othernewData, otherResultCookies, outData, rck, resultCookies, resumed, and symmetricOnDiagonal.
void PairCalculator::sendBWResult | ( | sendBWsignalMsg * | msg | ) |
Entry Method. Send the results via multiple reductions as triggered by a prioritized message.
This is an entry method to allow us to delay this outbound communication to minimize brain dead BG/L interference we have a signal to prioritize this.
If we're done with all the paircalc work in this loop (iteration), then cleanup
Definition at line 2459 of file ckPairCalculator.C.
References actionType, amPhantom, cb_aid, cb_ep, cb_ep_tol, cfg, cleanupAfterBWPath(), grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, mCastGrpId, mynewData, numOrtho, numPoints, numRecdBW, othernewData, otherResultCookies, and resultCookies.
void PairCalculator::sendBWResultDirect | ( | sendBWsignalMsg * | msg | ) |
Entry Method.
Determine the callback to use
Off-diagonal PCs in the symm (and asymm, in dynamics) instance, have to handle othernewdata
If we're done with all the paircalc work in this loop (iteration), then cleanup
Definition at line 2364 of file ckPairCalculator.C.
References actionType, amPhantom, cb_aid, cb_ep, cb_ep_tol, cfg, cleanupAfterBWPath(), grainSizeX, grainSizeY, cp::paircalc::pcConfig::isSymmetric, mynewData, numOrtho, numPoints, numRecdBW, othernewData, and cp::paircalc::pcConfig::resultMsgPriority.
void PairCalculator::sendTiles | ( | bool | flag_dp | ) |
Contribute orthoGrainSized tiles of data (that are ready?) to the corresponding ortho chares.
Definition at line 738 of file ckPairCalculator.C.
References cfg, cp::paircalc::pcConfig::isSymmetric, mCastGrpIdOrtho, numOrtho, numOrthoCol, orthoCB, orthoCookies, cp::paircalc::pcConfig::orthoGrainSize, orthoGrainSizeRemX, orthoGrainSizeRemY, outTiles, and touchedTiles.
|
private |
or the C=-1 inRight orthoT +c in dynamics
the input pair to be transformed
Definition at line 462 of file ckPairCalculator.h.
Referenced by acceptLeftData(), bwMultiplyHelper(), cleanupAfterBWPath(), PairCalculator(), pup(), and ~PairCalculator().