OpenAtom  Version1.5a
ortho.h
Go to the documentation of this file.
1 /*****************************************************************************
2  * $Source$
3  * $Author$
4  * $Date$
5  * $Revision$
6  *****************************************************************************/
7 
8 //////////////////////////////////////////////////////////////////////////////
9 //////////////////////////////////////////////////////////////////////////////
10 //////////////////////////////////////////////////////////////////////////////
11 /** \file ortho.h
12  *
13  * Ortho is decomposed by orthoGrainSize.
14  *
15  * We restrict orthograin to be a factor of sGrainsize then we have
16  * no section overlap issues. Thereby leaving us with ortho sections
17  * that need a simple tiling split of the sgrain sections. Mirrored
18  * by a stitching of the submatrix inputs for the backward path.
19  *
20  * This can be accomplished manually within the current codebase with
21  * some waste in data replication and computation replication to
22  * handle the splitting/stiching operations.
23  *
24  * A more efficient implementation would adopt the multicast manager
25  * group model of building a tree of participants for these
26  * operations. The reduction side from the PC would be broken up into
27  * multiple reductions, one for each orthograin within the sgrain.
28  * With a separate contribution for each orthograin. The multicast
29  * requires us to stitch together the input matrices into one per
30  * sgrain section. This might be accomplished in two stages, one in
31  * which the stitching is done, and a second in which the stitched
32  * sgrainsize matrices are multicast. The alternative is to just
33  * multicast the orthograin submatrices where needed and have each
34  * scalc do its strided copying stitching. As stitching is not
35  * computationally intensive, this may be the simplest and fastest
36  * solution. The second approach allows you to simply use the
37  * reductions and multicasts as mirror uses of the tree. Where each
38  * little ortho can run once it gets its input, while the scalcs would
39  * have to assemble their inputs from multiple multicasts.
40  *
41  * Implementation details for this require that each ortho object
42  * participate in a section which has a section multicast client
43  * directed to the sGrainSize PC section. The converse PC sGrainSize
44  * elements will have an array of section cookies, one for each of the
45  * subsections for all orthograin elements within the sGrain. The
46  * forward path of the PC will contribute its orthograin tile (via a
47  * strided contribute) which will end up at the correct ortho object.
48  *
49  * Note: these PC sections must include all 4th dim blocks.
50  *
51  * OrthoHelper can be used to perform the 2nd of the multiplies in the
52  * 3 step S->T process in parallel with the 3rd multiply. If used,
53  * the results of multiply 1 are sent from ortho[x,y] to
54  * orthoHelper[x,y]. The results are then returned to ortho[x,y].
55  * The last of step2 or step3 will then trigger step4. Due to the
56  * copy and communication overhead this is only worth doing if the
57  * number of processors is greater than 2 * the number of ortho
58  * chares.
59  *
60  * Allowing sgrainsize choices which are nstates % sgrainsize != 0
61  * forces us to handle remainder logic. To avoid overlap/straddle
62  * issues between ortho and PC, we still enforce sgrainssize %
63  * orthograinsize ==0. Complexity cost here comes in two forms.
64  * 1. Now ortho tiles are not guaranteed to be of uniform size. The
65  * remainder states which will reside in the last row and column will
66  * result in tiles larger than orthograinsize*orthograinsize.
67  * 2. Ortho tiles are not guaranteed to be square. Ortho tiles for
68  * the last row and column of PC will have M x N size where M != N.
69  *
70  * The total multiply itself will still of course be nstates X
71  * nstates.
72  ******************************************************************************/
73 
74 #include "debug_flags.h"
75 #include "orthoConfig.h"
76 #include "ortho.decl.h"
77 #include "pcSectionManager.h"
78 #include "CLA_Matrix.h"
79 using namespace cp::ortho; ///< @todo: Temporary, till Ortho classes live within namespace ortho
80 
81 #ifndef _ortho_h_
82 #define _ortho_h_
83 
84 
85 #define INVSQR_TOLERANCE 1.0e-15
86 #define INVSQR_MAX_ITER 10
87 
88 #ifdef CP_PAIRCALC_USES_COMPLEX_MATH
89 #define myabs abs
90 #else
91 #define myabs std::abs
92 #endif
93 
94 extern bool fakeTorus;
95 extern int numPes;
97 };
98 
100 };
101 
102 /** @addtogroup Ortho
103  @{
104 */
105 class Ortho : public CBase_Ortho
106 {
107  public:
108  /// Default empty constructor. For?
109  Ortho() {}
110  Ortho(CkMigrateMessage *m){}
111  ~Ortho();
112  Ortho(int m, int n,
116  orthoConfig &_cfg,
117  CkArrayID step2Helper,
118  int timeKeep, CkGroupID _oMCastGID, CkGroupID _oRedGID);
119 
120  /// Trigger the creation of appropriate sections of paircalcs to talk to. Also setup internal comm sections
121  void makeSections(const pc::pcConfig &cfgSymmPC, const pc::pcConfig &cfgAsymmPC, CkArrayID symAID, CkArrayID asymAID);
122 
123  /// Symmetric PCs contribute data that is summed via this reduction to deposit a portion of the S matrix with this ortho, triggering S->T
124  void start_calc(CkReductionMsg *msg);
125  /// Triggers the matrix multiplies in step 1 of an ortho iteration.
126  void do_iteration(void);
127  /// Used when array broadcasts in ortho are delegated to comlib/CkMulticast so as to not involve all PEs in bcast
128  void do_iteration(orthoMtrigger *m) { do_iteration(); /* do not delete nokeep msg */ }
129  /// Triggers step 2, and optionally step 3 (if ortho helpers are being used)
130  void step_2();
131  /// Receives the results of the call to OrthoHelper from step_2
132  void recvStep2(CkDataMsg *msg); //double *step2result, int size);
133  /// Triggers step 3 in the S->T process
134  void step_3();
135  /// Computes square of the residuals and contributes to a reduction rooted at Ortho(0,0)::collect_error()
136  void tolerance_check();
137  /// Computes the RMS error and either launches the next ortho iteration (if needed) or calls collect_results
138  void collect_error(CkReductionMsg *msg);
139  /// Used when ortho redn/bcasts are delegated to comlib/CkMulticast because charm array broadcasts involve all PEs
140  void collect_results(orthoMtrigger *m) { collect_results(); /* do not delete nokeep msg */ }
141  /// Computes walltimes, prints simulation status msgs and calls resume() if more openatom iterations are needed
142  void collect_results(void);
143  /// Sends results (T matrix) to the symm PCs (and also the asymms if this is dynamics)
144  void resume();
145  /// Used in dynamics, to send the results of S->T to the asymm paircalc instance which will use them
146  void sendOrthoTtoAsymm();
147 
148  // Accepts lamda reduced from the asymm PC instance. In min, acts as via point and mcasts lambda back to the asymm PCs. In dynamics, triggers computation of gamma = lambda x T
149  void acceptSectionLambda(CkReductionMsg *msg);
150  /// Used in dynamics, to accept computed gamma and send it to the asymm PC instance. Also sends T if it hasnt yet been sent
151  void gamma_done();
152 
153  /// S should be equal to 2I. This returns max value of deviation from that in this ortho's portion of the S matrix.
154  inline double array_diag_max(int sizem, int sizen, internalType *array);
155  /// Called on ortho(0,0). Checks if PsiV update is needed based on the max deviation in S received via a redn across all orthos. Notifies GSpaceDriver if so. Called periodically in start_calc only for dynamics
156  void maxCheck(CkReductionMsg *msg);
157  /// Once all GSpaceDriver chares are notified, they resume Ortho execution via a redn broadcast to this method
158  void resumeV(CkReductionMsg *msg);
159 
160  /// Dumps the T matrix to an appropriately named file
161  void print_results(void);
162  /// pack/unpack method
163  virtual void pup(PUP::er &p);
164  void orthoCookieinit(initCookieMsg *msg) { CkGetSectionInfo(orthoCookie,msg); delete msg; }
165  /// called from each CLA_Matrix array (3 per multiplication, 3 mults)
166  void all_ready() { if(++num_ready == 9) thisProxy.ready(); }
167  /// Startup/Init synchronization. When all elements (PC, CLA_Matrix etc) are ready, first iteration is triggered
168  void ready()
169  {
170  // got_start comes from upstream PC reduction the last of got_start and
171  // when all the CLA_Matrix arrays are ready, computation for first iteration is triggered
172  num_ready = 10;
173  if(got_start)
174  do_iteration();
175  }
176 
177  /// Static methods used as callbacks. Could be replaced by CkCallbacks
178  static inline void step_2_cb(void *obj) { ((Ortho*) obj)->step_2(); }
179  static inline void step_3_cb(void *obj) { ((Ortho*) obj)->step_3(); }
180  static inline void gamma_done_cb(void *obj) { ((Ortho*) obj)->gamma_done(); }
181  static inline void tol_cb(void *obj)
182  {
183  ((Ortho*) obj)->step3done=true;
184  if(((Ortho*) obj)->parallelStep2)
185  {
186  //if step2 is done do this now, otherwise step2 will trigger
187  if(((Ortho*) obj)->step2done)
188  ((Ortho*) obj)->tolerance_check();
189  }
190  else
191  ((Ortho*) obj)->tolerance_check();
192  }
193 
194  bool parallelStep2;
195  bool step2done;
196  bool step3done;
197 
198  private:
199  orthoConfig cfg;
200  int timeKeep;
201  internalType *orthoT; // only used on [0,0]
202  internalType *ortho; //only used on [0,0]
203  int numGlobalIter; // global leanCP iterations
204  // used in each element
205  int iterations; //local inv_sq iterations
206  CProxySection_Ortho multiproxy;
207  CkSectionInfo orthoCookie;
208  int num_ready;
209  bool got_start;
210  int lbcaught;
211  /// Section of symmetric PC chare array used by an Ortho chare
213  /// Section of asymmetric PC chare array used by an Ortho chare
215  /// Group IDs for the multicast manager groups
216  CkGroupID oMCastGID, oRedGID;
217  /// The proxy of the step 2 helper chare array
218  CProxy_OrthoHelper step2Helper;
219  bool toleranceCheckOrthoT; //trigger tolerance failure PsiV conditions
220  internalType *A, *B, *C, *tmp_arr;
221  int step;
222  int m, n;
223  double invsqr_tolerance;
224  int invsqr_max_iter;
225  CLA_Matrix_interface matA1, matB1, matC1, matA2, matB2, matC2, matA3, matB3, matC3;
226  #ifdef _CP_ORTHO_DEBUG_COMPARE_TMAT_
227  double *savedtmat;
228  #endif
229  #ifdef _CP_ORTHO_DEBUG_COMPARE_LMAT_
230  double *savedlmat;
231  #endif
232  #ifdef _CP_ORTHO_DEBUG_COMPARE_SMAT_
233  double *savedsmat;
234  #endif
235  #ifdef _CP_ORTHO_DEBUG_COMPARE_GMAT_
236  double *savedgmat;
237  #endif
238 };
239 
240 
241 
242 
243 #include "ckcallback-ccs.h" ///< For definition of CkDataMsg
244 
245 /**
246  * Result of 0.5 * S3 * S2 arrives from helper
247  */
248 inline void Ortho::recvStep2(CkDataMsg *msg)//double *step2result, int size)
249 {
250  // copy our data into the tmp_arr
251  CmiMemcpy(tmp_arr, msg->getData(), m * n * sizeof(internalType));
252  step2done=true;
253  delete msg;
254  //End of iteration check
255  if(step3done)
256  tolerance_check();
257 }
258 
259 
260 
261 
262 inline void Ortho::print_results(void)
263 {
264  char outname[80];
265  snprintf(outname,80,"tmatrix_t:%d_%d_%d.out",numGlobalIter,thisIndex.x,thisIndex.y);
266  FILE *outfile = fopen(outname, "w");
267  for(int i=0; i<m; i++)
268  for(int j=0; j<n; j++)
269  fprintf(outfile, "%d %d %.12g \n",i+thisIndex.x*n+1,j+thisIndex.y*n+1, A[i*n+j]);
270  fclose(outfile);
271 }
272 
273 
274 
275 
276 /**
277  * OrthoT tolerance check util return max value
278  */
279 inline double Ortho::array_diag_max(int sizem, int sizen, internalType *array)
280 {
281  double absval, max_ret;
282  if(thisIndex.x!=thisIndex.y)
283  { //not diagonal
284  max_ret = myabs(array[0]);
285  for(int i=0;i<sizem;i++)
286  {
287  for(int j=0;j<sizen;j++)
288  {
289  absval = myabs(array[i*sizen+j]);
290  max_ret = (max_ret>absval) ? max_ret : absval;
291  }
292  }//endfor
293  }
294  else
295  { //on diagonal
296  max_ret = myabs(array[0]-2.0);
297  for(int i=0;i<sizem;i++)
298  {
299  for(int j=0;j<sizen;j++)
300  {
301  absval = myabs(array[i*sizen+j]);
302  if(i == j)
303  absval = myabs(absval - 2.0);
304  max_ret = (max_ret>absval) ? max_ret : absval;
305  }
306  }//endfor
307  }//endif
308  return max_ret;
309 }//end routine
310 
311 #endif // #ifndef _ortho_h_
312 
void ready()
Startup/Init synchronization. When all elements (PC, CLA_Matrix etc) are ready, first iteration is tr...
Definition: ortho.h:168
PCSectionManager asymmSectionMgr
Section of asymmetric PC chare array used by an Ortho chare.
Definition: ortho.h:214
For definition of CkDataMsg.
Definition: ortho.h:105
CProxy_OrthoHelper step2Helper
The proxy of the step 2 helper chare array.
Definition: ortho.h:218
Class that manages the paircalc sections that each Ortho chare communicates with. ...
PCSectionManager symmSectionMgr
Section of symmetric PC chare array used by an Ortho chare.
Definition: ortho.h:212
void do_iteration(orthoMtrigger *m)
Used when array broadcasts in ortho are delegated to comlib/CkMulticast so as to not involve all PEs ...
Definition: ortho.h:128
static void step_2_cb(void *obj)
Static methods used as callbacks. Could be replaced by CkCallbacks.
Definition: ortho.h:178
Ortho()
Default empty constructor. For?
Definition: ortho.h:109
Configuration settings for the ortho world.
Definition: orthoConfig.h:17
void print_results(void)
Dumps the T matrix to an appropriately named file.
Definition: ortho.h:262
bool fakeTorus
readonly defined in cpaimd.C
Definition: cpaimd.C:172
void collect_results(orthoMtrigger *m)
Used when ortho redn/bcasts are delegated to comlib/CkMulticast because charm array broadcasts involv...
Definition: ortho.h:140
CkGroupID oMCastGID
Group IDs for the multicast manager groups.
Definition: ortho.h:216
double array_diag_max(int sizem, int sizen, internalType *array)
S should be equal to 2I. This returns max value of deviation from that in this ortho's portion of the...
Definition: ortho.h:279
void all_ready()
called from each CLA_Matrix array (3 per multiplication, 3 mults)
Definition: ortho.h:166
Dumb structure that holds all the configuration inputs required for paircalc instantiation, functioning and interaction.
Definition: pcConfig.h:23
void recvStep2(CkDataMsg *msg)
Receives the results of the call to OrthoHelper from step_2.
Definition: ortho.h:248
Useful debugging flags.