Ortho is decomposed by orthoGrainSize. More...

#include "debug_flags.h"
#include "orthoConfig.h"
#include "ortho.decl.h"
#include "pcSectionManager.h"
#include "CLA_Matrix.h"
#include "ckcallback-ccs.h"

Classes
class	initCookieMsg

class	orthoMtrigger

class	Ortho
	For definition of CkDataMsg. More...

Macros
#define	INVSQR_TOLERANCE 1.0e-15
	< More...

#define	INVSQR_MAX_ITER 10

#define	myabs std::abs

Variables
bool	fakeTorus

int	numPes

Detailed Description

Ortho is decomposed by orthoGrainSize.

We restrict orthograin to be a factor of sGrainsize then we have no section overlap issues. Thereby leaving us with ortho sections that need a simple tiling split of the sgrain sections. Mirrored by a stitching of the submatrix inputs for the backward path.

This can be accomplished manually within the current codebase with some waste in data replication and computation replication to handle the splitting/stiching operations.

A more efficient implementation would adopt the multicast manager group model of building a tree of participants for these operations. The reduction side from the PC would be broken up into multiple reductions, one for each orthograin within the sgrain. With a separate contribution for each orthograin. The multicast requires us to stitch together the input matrices into one per sgrain section. This might be accomplished in two stages, one in which the stitching is done, and a second in which the stitched sgrainsize matrices are multicast. The alternative is to just multicast the orthograin submatrices where needed and have each scalc do its strided copying stitching. As stitching is not computationally intensive, this may be the simplest and fastest solution. The second approach allows you to simply use the reductions and multicasts as mirror uses of the tree. Where each little ortho can run once it gets its input, while the scalcs would have to assemble their inputs from multiple multicasts.

Implementation details for this require that each ortho object participate in a section which has a section multicast client directed to the sGrainSize PC section. The converse PC sGrainSize elements will have an array of section cookies, one for each of the subsections for all orthograin elements within the sGrain. The forward path of the PC will contribute its orthograin tile (via a strided contribute) which will end up at the correct ortho object.

Note: these PC sections must include all 4th dim blocks.

OrthoHelper can be used to perform the 2nd of the multiplies in the 3 step S->T process in parallel with the 3rd multiply. If used, the results of multiply 1 are sent from ortho[x,y] to orthoHelper[x,y]. The results are then returned to ortho[x,y]. The last of step2 or step3 will then trigger step4. Due to the copy and communication overhead this is only worth doing if the number of processors is greater than 2 * the number of ortho chares.

Allowing sgrainsize choices which are nstates % sgrainsize != 0 forces us to handle remainder logic. To avoid overlap/straddle issues between ortho and PC, we still enforce sgrainssize % orthograinsize ==0. Complexity cost here comes in two forms.

Now ortho tiles are not guaranteed to be of uniform size. The remainder states which will reside in the last row and column will result in tiles larger than orthograinsize*orthograinsize.
Ortho tiles are not guaranteed to be square. Ortho tiles for the last row and column of PC will have M x N size where M != N.

The total multiply itself will still of course be nstates X nstates.

Definition in file ortho.h.

Macro Definition Documentation

#define INVSQR_TOLERANCE 1.0e-15

<

Todo:: : Temporary, till Ortho classes live within namespace ortho

Definition at line 85 of file ortho.h.

Referenced by Ortho::Ortho().

Classes

Macros

Variables

Detailed Description

Macro Definition Documentation