OpenAtom  Version1.5a
cpaimd.C
Go to the documentation of this file.
1 /** \file cpaimd.C
2  * cpaimd-charm-driver
3  * \brief
4  * This file contains cpaimd-charm-driver main. It creates and
5  * initializes all the arrays and libraries.
6  */
7 
8 /** \mainpage
9 
10  * OpenAtom is a **Car-Parrinello Ab-Initio Molecular Dynamics**
11  * software developed by the **Parallel Programing Laboratory, UIUC**
12  * in collaboration with **IBM, NYU, and Yale**. See also the
13  * [OpenAtom webpage](http://Charm.cs.illinois.edu/OpenAtom).
14  *
15  * ###Introduction###
16  * OpenAtom is parallel simulation software for studying atomic and molecular
17  * systems based on quantum chemical principles. In contrast to classical
18  * computational molecular dynamics based on Newtonian mechanims, it uses
19  * the Car-Parrinello Ab Initio Molecular Dynamics (CPAIMD) approach.
20  * Instead of using an empirical force function, the CPAIMD algorithm
21  * computes the forces acting on each atom as the summation of multiple
22  * terms derived from plane-wave density functional theory. This allows
23  * OpenAtom to study complex atomic and electronic physics in semiconductor,
24  * metallic, biological and other molecular systems.
25  *
26  * OpenAtom is implemented on top of [Charm++](http://Charm.cs.illinois.edu),
27  * which is an over-decomposition based parallel programming framework that provides
28  *
29  * support for message-driven execution of migratable entities empowered by an
30  * adaptive runtime system. Charm++ encourages decomposition of parallel computation
31  * using units that are natural to the application domain, instead of dividing
32  * data into as many pieces as processors. In particular, OpenAtom decomposes the data
33  * and the computation across a number of *chare* objects, whose type and/or number
34  * only depend on the CPAIMD algorithm and the desired grainsize. This allows OpenAtom
35  * to exploit the underlying mathematics via a seamless mix of both data and functional
36  * decompositions resulting in greater expressed parallelism, and several overlapping
37  * phases of computation combined with a longer critical path of dependent computations.
38  * The current implementation of OpenAtom in Charm++ is highly scalable, and has
39  * exhibited portable performance across three generations of the IBM Blue Gene family,
40  * apart from other supercomputing platforms.
41  *
42  * ###Downloading OpenAtom
43  * OpenAtom is hosted using git and can be downloaded using the following command:
44  *
45  * git clone http://Charm.cs.uiuc.edu/gerrit/openatom.git
46  *
47  * Recent commit history can be view
48  * [here](http://\charm.cs.illinois.edu/cgi-bin/gitweb.cgi?p=openatom.git;a=summary).
49  *
50  * You will also need to download either a stable version of Charm++ from this
51  * [weblink](http://\charm.cs.uiuc.edu/software) or the nightly version using the
52  * following command:
53  *
54  * git clone http://Charm.cs.uiuc.edu/gerrit/charm.git
55  *
56  * Sample data sets can be obtained using one of the following commands:
57  *
58  * git clone http://Charm.cs.uiuc.edu/gerrit/datasets/openatom/water_32M_10Ry.git
59  * git clone http://Charm.cs.uiuc.edu/gerrit/datasets/openatom/water_32M_70Ry.git
60  * git clone http://Charm.cs.uiuc.edu/gerrit/datasets/openatom/water_64M_70Ry.git
61  * git clone http://Charm.cs.uiuc.edu/gerrit/datasets/openatom/water_128_70Ry.git
62  *
63  * ###Compilation###
64  * Before OpenAtom is compiled, one needs to get access to a compiled version of
65  * Charm++. Detailed instructions on compiling Charm++ can be obtained
66  * [here](http://\charm.cs.illinois.edu/manuals/html/charm++/A.html). On a typical
67  * 64-bit linux machine, Charm++ can be compiled using the following command (executed
68  * within Charm++ directory):
69  *
70  * ./build charm++ net-linux-x86_64 --with-production -j8 (production version)
71  * ./build charm++ net-linux-x86_64 -j8 -g (debug version)
72  *
73  * You will also need FFTW library, configured for double precision, to compile OpenAtom.
74  *
75  * The INSTALL file in OpenAtom provides detailed instruction for its compilation. Here
76  * is a quick summary:
77  *
78  * 1. Copy a machine specific configuration file (*config.MACHINE.mk*) from
79  * the *makefiles* directory to the OpenAtom base directory and rename it to
80  * *config.mk*.
81  * 2. Update the values of **CHARMBASE** and **FFT_HOME** in the
82  * beginning of the newly created *config.mk*. If FFTW was built
83  * with --enable-type-prefix, you must enable DUAL_FFTW, otherwise
84  * set DUAL_FFTW_OFF.
85  * 3. Customize the config.mk for any desired compilation/link flags etc.
86  * 4. Now type "make", which should create a binary called *OpenAtom* in *build*
87  * directory on successful compilation.
88  *
89  *###Running OpenAtom###
90  *
91  *Before executing OpenAtom, obtain a dataset using the *git* command
92  *mentioned in the download section, and either place it in the *data*
93  *directory, or modify the **w3210** variable in *config.mk*.
94  *
95  * If the dataset uses the old format, you will need to execute
96  * *setup* in dataset directory. *setup* is located in *BASEDIR/util*
97  * directory.
98  *
99  * OpenAtom is to be executed as a Charm++ application, which is
100  * explained in detail at
101  * [this link](http://Charm.cs.illinois.edu/manuals/html/charm++/C.html). The
102  * general syntax is as follows:
103  *
104  * ./charmrun +p<N> ./OpenAtom <path to cpaimd config> <path to water input> (N
105  * is the number of processors to execute the job on)
106  *
107  * On most machine layers (excluding net), a Charm++ application can
108  * be launched in the same manner as an MPI application. For example,
109  * if built on top of MPI, one can launch OpenAtom as follows
110  *
111  * mpirun -np <N> ./OpenAtom <path to cpaimd config> <path to water input>
112  *
113  * In the datasets downloaded using the command listed above, these
114  * files are located in *regression* directory. You can also use
115  * *tidy* located in the *util* directory to clean up the dataset
116  * directory before performing a new run.
117  *
118  * See \ref GSpaceDriver::driveGSpace for SDAG flow of control.
119  *
120  * ![Overview Of OpenAtom Control Flow](controlFlowAmongstChareArrays_small.gif)
121  */
122 
123 #include "cpaimd.h"
124 
125 #include "InstanceController.h"
126 #include "ENL_EKE_Collector.h"
127 #include "pcCreationManager.h"
128 #include "eesCache.h"
129 #include "AtomsCache.h"
130 #include "AtomsCompute.h"
131 #include "energyGroup.h"
132 
134 #include "cp_state_ctrl/CP_State_ParticlePlane.h"
136 #include "structure_factor/StructFactorCache.h"
137 #include "structure_factor/StructureFactor.h"
138 #include "paircalc/pcMapConfig.h"
139 #include "load_balance/PeList.h"
140 #include "utility/MapFile.h"
141 #include "PIBeadAtoms.h"
142 #include "utility/util.h"
143 #include "src_piny_physics_v1.0/include/class_defs/Interface_ctrl.h"
144 #include "src_piny_physics_v1.0/include/class_defs/PINY_INIT/PhysicsParamTrans.h"
145 #include "src_piny_physics_v1.0/include/class_defs/PINY_INIT/PhysicsAtomPosInit.h"
146 
147 #include "MeshStreamingStrategy.h"
148 #include "MultiRingMulticast.h"
149 #include "OneTimeMulticastStrategy.h"
150 #include "TopoManager.h"
151 #include "TimeKeeper.h"
152 
153 #include "charm++.h"
154 #include "PhysScratchCache.h"
155 #include <cmath>
156 #include <unistd.h>
157 #include <iostream>
158 #include <sstream>
159 #include <string>
160 #include <assert.h>
161 int TimeKeeperID=0;
162 std::vector <std::string> TimeKeeperNames;
163 UberCollection thisInstance;
164 //////////////////////////////////////////////////////////////////////////////
165 /**
166  * \addtogroup mapping
167  * torus_vars Defining the size of the torus, handy when debugging
168  * torus map logic on non torus architectures.
169  */
170 /**@{*/
171 int numPes;
173 extern void initFFTLock(void);
174 
175 /**@}*/
176 
177 /**@defgroup piny_vars piny_vars
178  * \brief Defining all Charm++ readonly variables for PINY physics
179  *
180  */
181 //////////////////////////////////////////////////////////////////////////////
182 /** \addtogroup piny_vars
183 **@{*/
184 extern MDINTEGRATE readonly_mdintegrate;
185 extern MDATOMS readonly_mdatoms;
186 extern MDINTER readonly_mdinter;
187 extern MDINTRA readonly_mdintra;
188 extern GENERAL_DATA readonly_general_data;
189 extern CP readonly_cp;
190 /**@}*/
191 //////////////////////////////////////////////////////////////////////////////
192 
193 
194 #include "mapvariables.h"
195 
196 
197 //////////////////////////////////////////////////////////////////////////////
198 /// readonly globals
199 
200 double Timer;
201 int nstates;
202 int sizeX;
203 int nchareG;
204 int Ortho_UE_step2;
205 int Ortho_UE_step3;
206 int Ortho_UE_error;
207 bool Ortho_use_local_cb;
208 int done_init=0;
209 int planes_per_pe;
210 
211 //////////////////////////////////////////////////////////////////////////////
212 
213 #include "commlibhandles.h"
214 
215 /// Multicast manager group that handles many mcast/redns in the code. Grep for info
216 CkGroupID mCastGrpId;
217 CkReduction::reducerType complexVectorAdderType;
218 //////////////////////////////////////////////////////////////////////////////
219 
220 /// The build system should define this macro to be the commit identifier
221 #ifndef OPENATOM_REVISION
222  #define OPENATOM_REVISION Unknown
223 #endif
224 #define _QUOTEIT(x) #x
225 #define INQUOTES(x) _QUOTEIT(x)
226 
227 /// A global constant for use in the code
228 const char OpenAtomRevision[] = INQUOTES(OPENATOM_REVISION);
229 
230 
231 
232 //////////////////////////////////////////////////////////////////////////////
233 //////////////////////////////////////////////////////////////////////////////
234 //////////////////////////////////////////////////////////////////////////////
235 
236 /**
237  * \brief The Main of CPAIMD, it calls all the init functions.
238  */
239 main::main(CkArgMsg *msg) {
240  topoMgr = NULL;
241 //////////////////////////////////////////////////////////////////////////////
242 /**
243  # Sequential startup within Main */
244 /* Check arguments : Verbose output about startup procedures */
245 
246  done_init=0;
247  if (msg->argc < 3) {
248  CkAbort("Usage: cpaimd.x cpaimd_config pinysystem.input");
249  }//endif
250  CkPrintf("Executing OpenAtom: BINARY - %s\n", msg->argv[0]);
251  CkPrintf("Binary produced from source-tree at commit: %s\n",OpenAtomRevision);
252  CkPrintf("\n");
253  PRINT_LINE_STAR;
254  CkPrintf("Starting Cpaimd-Charm-Driver Setup Phase\n");
255  PRINT_LINE_DASH;
256  CkPrintf(" Cpaimd-Charm-Driver running on %d processors. \n", CkNumPes());
257  CkPrintf(" Reading Physics input from %s\n",msg->argv[2]);
258  CkPrintf(" Reading Driver input from %s\n",msg->argv[1]);
259 
260  PRINT_LINE_DASH; CkPrintf("\n");
261 
262 //////////////////////////////////////////////////////////////////////////////
263 /* check the debug flags for consistency*/
264 
265 #ifdef _CP_DEBUG_NON_LOCAL_ONLY_
266 #ifdef _CP_DEBUG_SFNL_OFF_
267  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
268  CkPrintf("You can't test non-local by itself and turn it off\n");
269  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
270  CkExit();
271 #endif
272 #endif
273 
274 #ifdef _CP_DEBUG_NON_LOCAL_ONLY_
275 #ifndef _CP_DEBUG_VKS_OFF_
276  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
277  CkPrintf("You can't test non-local by itself with vks on\n");
278  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
279  CkExit();
280 #endif
281 #endif
282 
283 #ifdef _CP_DEBUG_VKS_ONLY_
284 #ifdef _CP_DEBUG_VKS_OFF_
285  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
286  CkPrintf("You can't test vks by itself and turn it off\n");
287  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
288  CkExit();
289 #endif
290 #endif
291 
292 #ifdef _CP_DEBUG_VKS_ONLY_
293 #ifndef _CP_DEBUG_SFNL_OFF_
294  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
295  CkPrintf("You can't test vks by itself with SNFL on\n");
296  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
297  CkExit();
298 #endif
299 #endif
300 
301 #ifdef _CP_DEBUG_NON_LOCAL_VKS_ONLY_
302 #ifdef _CP_DEBUG_SFNL_OFF_
303  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
304  CkPrintf("You can't test vks and SNFL only without sfnl \n");
305  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
306  CkExit();
307 #endif
308 #ifdef _CP_DEBUG_VKS_OFF_
309  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
310  CkPrintf("You can't test vks and SNFL only without vks \n");
311  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
312  CkExit();
313 #endif
314 #endif
315 
316 //////////////////////////////////////////////////////////////////////////////
317 
318 
319 /** @defgroup startup startup
320  * \brief Startup parses the physics simulation input, parses the parallel driver config file, constructs chares, computes chare placement, coordinates the launch of chares in parallel, the reading of input, and initiates computation when everything is initialized and ready to start timestepping.
321 
322 
323 1) Read PINY config and the rest of the physical system parameter
324  files to set the constants which determine problem size. These are
325  stored in the CPcharmParaInfo class object named sim (for
326  simulation).*/
327 /**@{*/
328 
329  /* Invoke PINY input class */
330 
331  CkCallback piny_callback (CkCallback::ignore);
332  Interface_ctrl piny_interface (msg->argv[2],piny_callback);
333 
334  CPcharmParaInfo *sim = CPcharmParaInfo::get();
335  PhysicsParamTransfer::ParaInfoInit(sim);
336 
337  int ibinary_opt = sim->ibinary_opt;
338  int natm_nl = sim->natm_nl;
339  int ees_eext_opt = sim->ees_eext_on;
340  int nchareRhoRHart = sim->ngrid_eext_c;
341  int fftopt = sim->fftopt;
342  int natm_typ = sim->natm_typ;
343  /**@}*/
344 //////////////////////////////////////////////////////////////////////////////
345 /* Invoke parallel driver input class */
346 /** \addtogroup startup
347 2) Read the cpaimd_config file which determines parallel
348  decomposition. This is mostly stored in the config object, but a
349  bunch of readonly globals also get instantiated with this
350  information.
351  */
352 /**@{*/
353 
354  PRINT_LINE_STAR;
355  CkPrintf("Cpaimd-Charm-Driver input started \n");
356  PRINT_LINE_DASH; CkPrintf("\n");
357  Timer=CmiWallTimer();
358  double phase1start=Timer;
359  //numPes = 2048;
360  numPes=CkNumPes();
361  config.readConfig(msg->argv[1],sim->nstates,sim->sizeX,sim->sizeY,sim->sizeZ,
362  sim->ntime,ibinary_opt,natm_nl,fftopt,numPes,natm_typ,
363  ees_eext_opt,sim->gen_wave,sim->ncoef, sim->cp_min_opt, sim->ngrid_eext_c,
364  sim->doublepack,sim->pi_beads,sim->nkpoint,sim->ntemper,sim->nspin);
365 
366  fakeTorus = config.fakeTorus>0;
367 
368  if(fakeTorus)
369  {
370  numPes=config.torusDimNX * config.torusDimNY * config.torusDimNZ * config.torusDimNT;
371  CkPrintf("numpes set to %d by faketorus\n",numPes);
372  }
373  else if (CkNumPes() != config.numPes)
374  {
375  numPes=config.numPes;
376  CkPrintf("numpes set to %d by config file\n",numPes);
377  }
378  CkPrintf("for numInstances %d numPes %d numPesPerInstance is %d \n",config.numInstances, config.numPes, config.numPesPerInstance);
379  mapOffsets=new inttriple[config.numInstances];
380  int numSfGrps = config.numSfGrps; // local copies are nice
381  int doublePack = config.doublePack;
382  nchareG = config.nchareG;
383  sim->nchareG = nchareG;
384 
385  nstates = config.nstates; // globals : avail on all procs
386  sizeX = sim->sizeX;
387 
388  double newtime= CmiWallTimer();
389 
390  PRINT_LINE_DASH;
391  CkPrintf("Cpaimd-Charm-Driver input completed in %g\n",newtime-Timer);
392  PRINT_LINE_STAR; CkPrintf("\n");
393  Timer=newtime;
394 
395  // user event trace setup
397  // TODO timekeeper registerees will need to distinguish by instance
398  // timekeeper itself doesn't care.
399  TimeKeeperProxy = CProxy_TimeKeeper::ckNew();
400  // Create a multicast manager group that will handle many mcast/redns
401 
402  mCastGrpId = CProxy_CkMulticastMgr::ckNew(config.numMulticastMsgs);
403 
404  // Create a paircalc config object for the symmetric PC instance
405  pc::pcConfig cfgSymmPC;
406  // Create a paircalc config object for the asymmetric PC instance
407  pc::pcConfig cfgAsymmPC;
408 
409  /* choose whether ortho should use local callback */
410  Ortho_use_local_cb = true;
411  paircalcstartup(&cfgSymmPC, &cfgAsymmPC, sim, doublePack);
412 
413 //////////////////////////////////////////////////////////////////////////////
414 /// Compute structure factor grp parameters and static map for chare arrays
415 /**@}*/
416  sim->numSfGrps = numSfGrps;
417  int natm_nl_grp_max;
418  PhysicsParamTransfer::get_Sfgrp_max(natm_nl,config.numSfGrps,
419  &natm_nl_grp_max);
420  sim->natm_nl_grp_max = natm_nl_grp_max;
421  /* @} */
423 
424  PhysicsParamTransfer::control_new_mapping_function(sim,doublePack);
425 
426  make_rho_runs(sim);
427 
428  #include "initializeUber.C"
429 
430  pScratchProxy = CProxy_PhysScratchCache::ckNew();
431 
432 
433  mainProxy=thishandle;
434  // make one controller chare per instance
435  instControllerProxy= CProxy_InstanceController::ckNew(config.numInstances);
436  instControllerProxy.doneInserting();
437 
438  // make one controller temper
439  temperControllerProxy= CProxy_TemperController::ckNew(1);
440  temperControllerProxy.doneInserting();
441 
442  // make one collector per uberKmax
443  CkArrayOptions enlopts(config.UberKmax);
444  ENLEKECollectorProxy= CProxy_ENL_EKE_Collector::ckNew(config.UberImax*config.UberJmax*config.UberMmax, config.UberKmax, enlopts);
445  ENLEKECollectorProxy.doneInserting();
446 
447  /**@}*/
448 //////////////////////////////////////////////////////////////////////////////
449 /* Create the multicast/reduction manager for array sections
450  Create the parainfo group from sim
451  Initialize chare arrays for real and g-space of states
452 */
453 /** \addtogroup startup
454  3) Topological map setup : we initialize several structures which will be useful for using network topology for more optimal chare placement */
455 /**@{*/
456 
457  CkPrintf("Initializing TopoManager\n");
458  if(config.fakeTorus) {
459  CkPrintf("Initializing TopoManager with fakeTorus\n");
460  topoMgr = new TopoManager(config.torusDimNX, config.torusDimNY,
461  config.torusDimNZ, config.torusDimNT);
462  }
463  else {
464  topoMgr = new TopoManager();
465  }
466  CkPrintf(" Torus %d x %d x %d nodes %d x %d x %d VN %d DimNT %d .........\n",
467  topoMgr->getDimX(), topoMgr->getDimY(), topoMgr->getDimZ(),
468  topoMgr->getDimNX(), topoMgr->getDimNY(), topoMgr->getDimNZ(),
469  topoMgr->hasMultipleProcsPerNode(), topoMgr->getDimNT());
470  if(config.torusMap==1) {
471  PRINT_LINE_STAR; CkPrintf("\n");
472  CkPrintf(" Topology Sensitive Mapping being done for RSMap, GSMap, ....\n");
473  CkPrintf(" ......., PairCalc, RhoR, RhoG and RhoGHart .........\n\n");
474  PRINT_LINE_STAR; CkPrintf("\n");
475  }
476 
477  int l=config.Gstates_per_pe;
478  int m, pl, pm;
479  pl = nstates / l;
480  pm = config.numPesPerInstance / pl;
481  if(pm == 0) {
482  CkPrintf("Choose a larger Gstates_per_pe than %d such that { (no. of processors [%d] / no. of Instances [%d]) / (no. of states [%d] / Gstates_per_pe [%d]) } is > 0 \n",
483  l, config.numPes, config.numInstances, nstates, l);
484  CkAssert( pm > 0);
485  }
486  m = config.nchareG / pm;
487 
488  planes_per_pe=m;
489  if(planes_per_pe <= 0) {
490  CkPrintf("Choose a smaller Gstates_per_pe than %d such that { (no. of processors [%d] / no. of Instances [%d]) / (no. of states [%d] / Gstates_per_pe [%d]) } is > 0 and config.nchareG [%d] / pm [%d] {where pm = config.numPesPerInstance [%d]/ pl [%d] } > 0\n",
491  l, config.numPes, config.numInstances, nstates, l, config.nchareG, pm, config.numPesPerInstance, pl);
492  CkAssert( m > 0);
493  }
494 
495  // multiple instance mapping breaks if there isn't a topomanager
496  CkPrintf("Initializing PeList\n");
497 
498  PeList *gfoo=NULL;
499  PeList *rfoo=NULL;
500  int x, y, z;
501  PeListFactory *peList4PCmapping;
502  if(!config.loadMapFiles && config.useCuboidMap)
503  {
504  if( config.numPesPerInstance % config.nchareG != 0)
505  {
506  CkPrintf("To use CuboidMap nchareG %d should be chosen as a factor of numprocs %d / numInstances %d = %d\n",config.nchareG, config.numPes, config.numInstances, config.numPesPerInstance);
507  CkExit();
508  }
509  int procsPerPlane = config.numPesPerInstance / nchareG;
510  int bx, by, bz;
511  if(config.torusMap == 1) {
512  boxSize = procsPerPlane;
513  int order;
514 
515  // correction to accomodate multiple instances
516  int dimNX, dimNY, dimNZ, dimNT;
517  int longDim = -1, maxD;
518 
519  dimNX = topoMgr->getDimNX();
520  dimNY = topoMgr->getDimNY();
521  dimNZ = topoMgr->getDimNZ();
522  dimNT = topoMgr->getDimNT();
523  int x1 = dimNX, y1 = dimNY, z1 = dimNZ;
524 
525  maxD = dimNX;
526  longDim = 1;
527 
528  if (dimNY > maxD) { maxD = dimNY; longDim = 2; }
529  if (dimNZ > maxD) { maxD = dimNZ; longDim = 3; }
530 
531  for(int i=0; i<config.numInstances; i++) {
532  switch(longDim) {
533  case 1:
534  x = i*(maxD/config.numInstances); y = 0; z = 0;
535  x1 = dimNX / config.numInstances;
536  break;
537  case 2:
538  x = 0; y = i*(maxD/config.numInstances); z = 0;
539  y1 = dimNY / config.numInstances;
540  break;
541  case 3:
542  x = 0; y = 0; z = i*(maxD/config.numInstances);
543  z1 = dimNZ / config.numInstances;
544  break;
545  }
546  // printf("Origin %d %d %d Size %d %d %d\n", x, y, z, x1, y1, z1);
547  }
548 
549  CkPrintf(" Box per Instance: nodes %d x %d x %d .........\n", x1, y1, z1);
550  if(findCuboid(bx, by, bz, order, x1, y1, z1, topoMgr->getDimNT(), boxSize, topoMgr->hasMultipleProcsPerNode()))
551  {
552  CkPrintf("Using %d, %d, %d dimensions for box %d mapping order %d\n", bx, by, bz, boxSize, order);
553  gfoo = new PeList(bx, by, bz, order, x1, y1, z1, dimNT); // heap it
554  peList4PCmapping = new PeListFactory(bx,by,bz,order,x1,y1,z1,dimNT);
555  }
556  else
557  {
558  peList4PCmapping = new PeListFactory(config.numPes);
559  CkPrintf("no box for %d\n", boxSize);
560  config.useCuboidMap = 0;
561  gfoo = new PeList(config.numPes); // heap it
562  }
563  }
564  else
565  {
566  peList4PCmapping = new PeListFactory(config.numPes);
567  // just split by numInstances
568  x=numInst*config.numPesPerInstance;
569  y=0;
570  z=0;
571  gfoo = new PeList(config.numPes); // heap it
572  }
573  }
574  else
575  {
576  gfoo = new PeList(config.numPes); // heap it
577  peList4PCmapping = new PeListFactory(config.numPes);
578  }
579  if(!config.loadMapFiles && config.useCuboidMapRS)
580  rfoo = new PeList(*gfoo);
581  else
582  rfoo = new PeList(config.numPes); // heap it
583 
584  computeMapOffsets();
585  /* these really don't need to be different */
586  availGlobG = rfoo;
587  availGlobR = gfoo;
588  newtime = CmiWallTimer();
589  CkPrintf("Pelist initialized in %g with %d elements\n", newtime-Timer, availGlobG->count());
590  // availGlobG->dump();
591  Timer = newtime;
592  /**@}*/
593  /*
594 /////////////////////////////////////////////////////////////////////////////==
595 Per Instance startup BEGIN
596 /////////////////////////////////////////////////////////////////////////////==
597  */
598  // used as a signal during startup, to handle things, such as map
599  // creation, which will only be done for the first instance.
600 
601  // maps will have a transform function to compute the placement
602  // for instances after the first.
603 
604 /** \addtogroup startup
605 4) Parallel Object Proxy Creation
606  We loop through a four deep nested launcher by integral, kpoint,
607  temper, and spin to construct the appropriate chare arrays for each
608  instance.
609  + We setup the State chares (GS, RS, PP, RPP) along with their maps.
610  + We setup the Paircalc and ortho chares (SymPC, AsymmPC, ortho, CLA)
611  + We setup the Density chares (rho, rho*hart)
612  + We setup the non-local chares for Ees (if configured)
613 
614  In each case we are constructing the proxy and calling for parallel
615  object construction of the elements in each of those arrays. Note that while we are inside Main the chares for PE 0 will be constructed inline in program order. No assumptions should be made about chares constructed on other PEs until we exit main and pass control to the Charm++ scheduler for parallel launch.
616  */
617  /**@{*/
618  CkPrintf("NumInstances %d: Beads %d * Kpoints %d * Tempers %d * Spin %d\n",config.numInstances, config.UberImax, config.UberJmax, config.UberKmax,config.UberMmax);
619  for(int integral=0; integral< config.UberImax; integral++)
620  {
621  for(int kpoint=0; kpoint< config.UberJmax; kpoint++)
622  {
623  for(int temper=0; temper< config.UberKmax; temper++)
624  {
625  for(int spin=0; spin< config.UberMmax; spin++) {
626  // for each new instance we need a new Uber Index
627  CkVec <int> peUsedBySF;
628  CkVec <int> peUsedByNLZ;
629  CkVec <int> planeUsedByNLZ;
630 
631  UberIndex thisInstanceIndex(integral, kpoint, temper, spin); // Internal labels{x,y,z,s}
632  thisInstance=UberCollection(thisInstanceIndex);
633  UberAlles.push_back(thisInstance);// collection of proxies for all instances
634 
635  //////////////////////////////////////////////////////////////////////////////
636  // We will need a different one of these per instance
637  // Transfer parameters from physics to driver
638  // read in atoms : create atoms group
639  control_physics_to_driver(thisInstance);
640 
641  //////////////////////////////////////////////////////////////////////////////
642 
643  if(config.UberImax>1) // handle Path Integrals
644  init_PIBeads(sim, thisInstance);
645 
646  // and then we make the usual set of chares to which we pass
647  // the Uber Index.
648  init_state_chares(natm_nl,natm_nl_grp_max,numSfGrps,doublePack,sim, thisInstance);
649 
650  //////////////////////////////////////////////////////////////////////////////
651  // Create a paircalc/ortho bubble (symm and asymm pcs, ortho and related frills)
652  cp::ortho::orthoConfig orthoCfg;
653  orthostartup(&orthoCfg, &cfgSymmPC, &cfgAsymmPC, sim, peList4PCmapping);
654 
655  //////////////////////////////////////////////////////////////////////////////
656  // compute the location for the non-local Z reduction roots for each plane
657  // this can then be used in exclusion mapping to avoid overloading them
658  int *usedProc= new int[config.numPesPerInstance];
659  memset(usedProc, 0, sizeof(int)*config.numPesPerInstance);
660  int charperpe = nstates/(config.numPesPerInstance);
661  if(nstates % config.numPesPerInstance != 0) charperpe++;
662  if(charperpe<1) charperpe=1;
663  for(int state=0; state<nstates; state++) {
664  int plane = nchareG-1;
665  while(plane >= 0) {
666  bool used = false;
667  int thisstateplaneproc = GSImaptable[thisInstance.getPO()].get(state,plane) % config.numPesPerInstance;
668  if(usedProc[thisstateplaneproc]>charperpe) {
669  used=true;
670  }
671  if(!used || plane==0) {
672  peUsedByNLZ.push_back(thisstateplaneproc);
673  planeUsedByNLZ.push_back(plane);
674  usedProc[thisstateplaneproc]++;
675  plane=-1;
676  }
677  plane--;
678  }
679  }
680  peUsedByNLZ.quickSort();
681  delete [] usedProc;
682  UpeUsedByNLZ.push_back(peUsedByNLZ);
683  UplaneUsedByNLZ.push_back(planeUsedByNLZ);
684  // CkPrintf("UplaneUsedByNLZ length now %d\n",UplaneUsedByNLZ.length());
685  // Create mapping classes for Paircalcular
686 
687  //////////////////////////////////////////////////////////////////////////////
688  // Initialize the density chare arrays
689  init_rho_chares(sim, thisInstance);
690 
691  CmiNetworkProgressAfter(1);
692  //////////////////////////////////////////////////////////////////////////////
693  // Initialize commlib strategies for later association and delegation
694  if(sim->ees_nloc_on)
695  init_eesNL_chares( natm_nl, natm_nl_grp_max, doublePack, excludePes, sim, thisInstance);
696  CmiNetworkProgressAfter(1);
697  firstInstance=false;
698  numInst++;
699  }
700  }
701  }
702  // now safe to init atom bead commanders
703  UatomsComputeProxy[thisInstance.getPO()].init();
704  } // end of per instance init
705  //////////////////////////////////////////////////////////////////////////////
706  // Initialize commlib strategies for later association and delegation
707  if(config.numInstances>1)
708  CkPrintf("WARNING!!! Commlib does not work for multiple instances\n");
709 
710  init_commlib_strategies(sim->nchareRhoG, sim->sizeZ,nchareRhoRHart, thisInstance);
711 
712  TimeKeeperProxy.init();
713 
714 //////////////////////////////////////////////////////////////////////////////
715 /// clean up
716 
717  delete msg;
718  // delete sim;
719  delete rfoo;
720  delete gfoo;
721  delete excludePes;
722  // delete availGlobR;
723  // delete availGlobG;
724 
725 //////////////////////////////////////////////////////////////////////////////
726 
727  newtime=CmiWallTimer();
728  PRINT_LINE_DASH;
729  CkPrintf("Cpaimd-Charm-Driver setup phase completed in %g \n",newtime-phase1start);
730  PRINT_LINE_STAR; CkPrintf("\n");
731  PRINT_LINE_STAR;
732  PRINT_LINE_DASH;CkPrintf("\n");
733  CkPrintf("user mem %d\n",CmiMemoryUsage());
734  Timer=newtime;
735  /**@}*/
736 //////////////////////////////////////////////////////////////////////////////
737  }// end Main
738 //////////////////////////////////////////////////////////////////////////////
739 
740 //////////////////////////////////////////////////////////////////////////////
741 ///////////////////////////////////////////////////////////////////////////cc
742 //////////////////////////////////////////////////////////////////////////////
743 /**
744  * Cleanup stuff in the hopes of getting clean valgrind
745  */
747  if (config.useCommlib) {
748  if(config.usePairEtoM){
749  }//endif
750  }//endif
751 }//end routine
752 //////////////////////////////////////////////////////////////////////////////
753 
754 
755 //////////////////////////////////////////////////////////////////////////////
756 /**
757  * Initialize Commlib communication strategies
758  */
759 //////////////////////////////////////////////////////////////////////////////
760 //////////////////////////////////////////////////////////////////////////////
761 //////////////////////////////////////////////////////////////////////////////
762 void init_commlib_strategies(int numRhoG, int numReal, int numRhoRhart, UberCollection thisInstance){
763 //////////////////////////////////////////////////////////////////////////////
764 
765  PRINT_LINE_STAR;
766  PRINTF("Building Commlib strategies\n");
767  PRINT_LINE_DASH;printf("\n");
768  int i = 0;
769  int j = 0;
770  int rhoGhelpers = config.rhoGHelpers;
771  int numRhoGHart = rhoGhelpers*numRhoG;
772  int nchareHartAtmT = config.nchareHartAtmT;
773 //////////////////////////////////////////////////////////////////////////////
774 #ifdef USE_COMLIB
775  if (config.useCommlib) {
776 
777  /*
778  //StreamingStrategy *cmstrat = new StreamingStrategy(0.1,10);
779  MeshStreamingStrategy *cmstrat = new MeshStreamingStrategy(1,5);
780  gAsymInstance= ComlibRegister(cmstrat);
781 
782  //StreamingStrategy *csymstrat = new StreamingStrategy(0.1,10);
783  MeshStreamingStrategy *csymstrat = new MeshStreamingStrategy(1,5);
784  gSymInstance= ComlibRegister(csymstrat);
785  */
786  //////////////////////////////////////////////////////////////////
787  CkArrayIndexMax *rhoGElements=NULL;
788  CkArrayIndexMax *rhoRealElements = NULL;
789  if(config.useRInsRhoGP)
790  {
791  CkPrintf("Making real_strategy with :");
792  CkPrintf("src numReal %d dest numRhoG %d and numHartG %d numHartR %d\n",
793  numReal,numRhoG,numRhoGHart,numRhoRhart);
794  //--------------------------------------------------------------
795  // For rho(r) to rho(g)
796  rhoGElements = new CkArrayIndexMax[numRhoG];
797  for (i = 0; i < numRhoG; i++) {
798  //CkArrayIndex2D idx2d(i,0);
799  CkArrayIndex1D idx1d(i);
800  rhoGElements[i] = idx1d;
801  }//endfor
802 
803  rhoRealElements = new CkArrayIndexMax[numReal];
804  for(i = 0; i < numReal; i++) {
805  //CkArrayIndex2D idx2d(i,0);
806  CkArrayIndex1D idx1d(i);
807  rhoRealElements[i] = idx1d;
808  }//endfor
809 #ifdef OLD_COMMLIB
810  CharmStrategy *real_strat;
811 #else
812  Strategy *real_strat;
813 #endif
814  real_strat= new EachToManyMulticastStrategy
815  (USE_DIRECT, UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(),
816  numReal, rhoRealElements, numRhoG, rhoGElements);
817  commRealInstance= ComlibRegister(real_strat);
818  delete [] rhoGElements;
819  delete [] rhoRealElements;
820  }
821  //--------------------------------------------------------------
822  // For drho(r)/dx to igx*rho(g)
823  if(config.useRInsIGXRhoGP)
824  {
825  rhoGElements = new CkArrayIndexMax[numRhoG];
826  for (i = 0; i < numRhoG; i++) {
827  rhoGElements[i] = CkArrayIndex1D(i);
828  }//endfor
829 
830  rhoRealElements = new CkArrayIndexMax[numReal];
831  for(i = 0; i < numReal; i++) {
832  rhoRealElements[i] = CkArrayIndex1D(i);
833  }//endfor
834 #ifdef OLD_COMMLIB
835  CharmStrategy *real_strat_igx;
836 #else
837  Strategy *real_strat_igx;
838 #endif
839  real_strat_igx= new EachToManyMulticastStrategy
840  (USE_DIRECT, UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(),
841  numReal, rhoRealElements, numRhoG,rhoGElements);
842  commRealIGXInstance= ComlibRegister(real_strat_igx);
843  delete [] rhoGElements;
844  delete [] rhoRealElements;
845  }
846  if(config.useRInsIGYRhoGP)
847  {
848  //--------------------------------------------------------------
849  // For drho(r)/dy to igy*rho(g)
850  rhoGElements = new CkArrayIndexMax[numRhoG];
851  for (i = 0; i < numRhoG; i++) {
852  rhoGElements[i] = CkArrayIndex1D(i);
853  }//endfor
854 
855  rhoRealElements = new CkArrayIndexMax[numReal];
856  for(i = 0; i < numReal; i++) {
857  rhoRealElements[i] = CkArrayIndex1D(i);
858  }//endfor
859 #ifdef OLD_COMMLIB
860  CharmStrategy *real_strat_igy;
861 #else
862  Strategy *real_strat_igy;
863 #endif
864  real_strat_igy= new EachToManyMulticastStrategy
865  (USE_DIRECT, UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(),
866  numReal, rhoRealElements, numRhoG,rhoGElements);
867  commRealIGYInstance= ComlibRegister(real_strat_igy);
868 
869  delete [] rhoGElements;
870  delete [] rhoRealElements;
871  }
872  //--------------------------------------------------------------
873  // For drho(r)/dz to igz*rho(g)
874  if(config.useRInsIGZRhoGP)
875  {
876  rhoGElements = new CkArrayIndexMax[numRhoG];
877  for (i = 0; i < numRhoG; i++) {
878  rhoGElements[i] = CkArrayIndex1D(i);
879  }//endfor
880 
881  rhoRealElements = new CkArrayIndexMax[numReal];
882  for(i = 0; i < numReal; i++) {
883  rhoRealElements[i] = CkArrayIndex1D(i);
884  }//endfor
885 #ifdef OLD_COMMLIB
886  CharmStrategy *real_strat_igz;
887 #else
888  Strategy *real_strat_igz;
889 #endif
890  real_strat_igz= new EachToManyMulticastStrategy
891  (USE_DIRECT, UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(),
892  numReal, rhoRealElements, numRhoG,rhoGElements);
893  commRealIGZInstance= ComlibRegister(real_strat_igz);
894  delete [] rhoGElements;
895  delete [] rhoRealElements;
896  }
897  //--------------------------------------------------------------
898  // For hartree-Ext(g) to vks(r)
899  if(config.useGHartInsRhoRP)
900  {
901  rhoGElements = new CkArrayIndexMax[numRhoGHart*nchareHartAtmT];
902  for (j= 0; j < nchareHartAtmT; j++) {
903  for (i = 0; i < numRhoGHart; i++) {
904  rhoGElements[i+j*numRhoGHart] = CkArrayIndex1D(i+j*numRhoGHart);
905  }//endfor
906  }
907  rhoRealElements = new CkArrayIndexMax[numReal];
908  for(i = 0; i < numReal; i++) {
909  rhoRealElements[i] = CkArrayIndex1D(i);
910  }//endfor
911 #ifdef OLD_COMMLIB
912  CharmStrategy *gstrathart = new EachToManyMulticastStrategy
913 #else
914  Strategy *gstrathart = new EachToManyMulticastStrategy
915 #endif
916 
917  (USE_DIRECT, UrhoGHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
918  numRhoGHart, rhoGElements, numReal, rhoRealElements);
919  commGHartInstance = ComlibRegister(gstrathart);
920  delete [] rhoGElements;
921  delete [] rhoRealElements;
922  }
923  //--------------------------------------------------------------
924  // vks(g), igxrho igyrho igzrho and white byrd to g-space
925  if(config.useGIns0RhoRP)
926  {
927 
928  rhoGElements = new CkArrayIndexMax[numRhoG];
929  for (i = 0; i < numRhoG; i++) {
930  rhoGElements[i] = CkArrayIndex1D(i);
931  }//endfor
932 
933  rhoRealElements = new CkArrayIndexMax[numReal];
934  for(i = 0; i < numReal; i++) {
935  rhoRealElements[i] = CkArrayIndex1D(i);
936  }//endfor
937 #ifdef OLD_COMMLIB
938  CharmStrategy *gstrat0;
939 #else
940  Strategy *gstrat0;
941 #endif
942  gstrat0 = new EachToManyMulticastStrategy
943  (USE_DIRECT, UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
944  numRhoG, rhoGElements, numReal, rhoRealElements);
945  commGInstance0 = ComlibRegister(gstrat0);
946  delete [] rhoGElements;
947  delete [] rhoRealElements;
948  }
949  //--------------------------------------------------------------
950  // rhog sends to rhor : div_x rho
951  if(config.useGIns1RhoRP)
952  {
953  rhoGElements = new CkArrayIndexMax[numRhoG];
954  for (i = 0; i < numRhoG; i++) {
955  rhoGElements[i] = CkArrayIndex1D(i);
956  }
957 
958  rhoRealElements = new CkArrayIndexMax[numReal];
959  for(i = 0; i < numReal; i++) {
960  rhoRealElements[i] = CkArrayIndex1D(i);
961  }
962 #ifdef OLD_COMMLIB
963  CharmStrategy *gstrat1;
964 #else
965  Strategy *gstrat1;
966 #endif
967 
968  gstrat1 = new EachToManyMulticastStrategy
969  (USE_DIRECT, UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
970  numRhoG, rhoGElements, numReal, rhoRealElements);
971  commGInstance1 = ComlibRegister(gstrat1);
972  delete [] rhoGElements;
973  delete [] rhoRealElements;
974  }
975  //--------------------------------------------------------------
976  // rhog sends to rhor : div_y rho
977  if(config.useGIns2RhoRP)
978  {
979  rhoGElements = new CkArrayIndexMax[numRhoG];
980  for (i = 0; i < numRhoG; i++) {
981  rhoGElements[i] = CkArrayIndex1D(i);
982  }
983 
984  rhoRealElements = new CkArrayIndexMax[numReal];
985  for(i = 0; i < numReal; i++) {
986  rhoRealElements[i] = CkArrayIndex1D(i);
987  }
988 #ifdef OLD_COMMLIB
989  CharmStrategy *gstrat2;
990 #else
991  Strategy *gstrat2;
992 #endif
993 
994  gstrat2 = new EachToManyMulticastStrategy
995  (USE_DIRECT, UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
996  numRhoG, rhoGElements, numReal, rhoRealElements);
997  commGInstance2 = ComlibRegister(gstrat2);
998  delete [] rhoGElements;
999  delete [] rhoRealElements;
1000  }
1001  //--------------------------------------------------------------
1002  // rhog sends to rhor : div_z rho
1003  if(config.useGIns3RhoRP)
1004  {
1005  rhoGElements = new CkArrayIndexMax[numRhoG];
1006  for (i = 0; i < numRhoG; i++) {
1007  rhoGElements[i] = CkArrayIndex1D(i);
1008  }
1009 
1010  rhoRealElements = new CkArrayIndexMax[numReal];
1011  for(i = 0; i < numReal; i++) {
1012  rhoRealElements[i] = CkArrayIndex1D(i);
1013  }
1014 #ifdef OLD_COMMLIB
1015  CharmStrategy *gstrat3;
1016 #else
1017  Strategy *gstrat3;
1018 #endif
1019 
1020  gstrat3 = new EachToManyMulticastStrategy
1021  (USE_DIRECT, UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
1022  numRhoG, rhoGElements, numReal, rhoRealElements);
1023  commGInstance3 = ComlibRegister(gstrat3);
1024  delete [] rhoGElements;
1025  delete [] rhoRealElements;
1026  }
1027  //--------------------------------------------------------------
1028  // For white-byrd : rhog sends to rhor : white byrd
1029  if(config.useGByrdInsRhoRBP)
1030  {
1031  rhoGElements = new CkArrayIndexMax[numRhoG];
1032  for (i = 0; i < numRhoG; i++) {
1033  rhoGElements[i] = CkArrayIndex1D(i);
1034  }
1035 
1036  rhoRealElements = new CkArrayIndexMax[numReal];
1037  for(i = 0; i < numReal; i++) {
1038  rhoRealElements[i] = CkArrayIndex1D(i);
1039  }
1040 #ifdef OLD_COMMLIB
1041  CharmStrategy *gstratByrd;
1042 #else
1043  Strategy *gstratByrd;
1044 #endif
1045 
1046  gstratByrd = new EachToManyMulticastStrategy
1047  (USE_DIRECT, UrhoGProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRealProxy[thisInstance.proxyOffset].ckGetArrayID(),
1048  numRhoG, rhoGElements, numReal, rhoRealElements);
1049  commGByrdInstance = ComlibRegister(gstratByrd);
1050  delete [] rhoGElements;
1051  delete [] rhoRealElements;
1052  }
1053  //--------------------------------------------------------------
1054  // For rhogHart send to rhorHart SF atmtyp
1055  if(config.useGHartInsRHart && false)
1056  {
1057  rhoGElements = new CkArrayIndexMax[numRhoGHart];
1058  for (i = 0; i < numRhoGHart; i++) {
1059  rhoGElements[i] = CkArrayIndex1D(i);
1060  }
1061 
1062  rhoRealElements = new CkArrayIndexMax[numRhoRhart];
1063  for(i = 0; i < numRhoRhart; i++) {
1064  rhoRealElements[i] = CkArrayIndex3D(i,0,0);
1065  }
1066 #ifdef OLD_COMMLIB
1067  CharmStrategy *gstratEext0;
1068 #else
1069  Strategy *gstratEext0;
1070 #endif
1071 
1072  gstratEext0 = new EachToManyMulticastStrategy
1073  (USE_DIRECT, UrhoGHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(),
1074  numRhoGHart, rhoGElements, numRhoRhart, rhoRealElements);
1075 
1076  commGHartRHartIns0 = ComlibRegister(gstratEext0);
1077  }
1078  //--------------------------------------------------------------
1079  // For rhogHart send to rhoRHart SF tot
1080  if(config.useGHartInsRHart && false)
1081  {
1082  rhoGElements = new CkArrayIndexMax[numRhoGHart];
1083  for (i = 0; i < numRhoGHart; i++) {
1084  rhoGElements[i] = CkArrayIndex1D(i);
1085  }
1086 
1087  rhoRealElements = new CkArrayIndexMax[numRhoRhart];
1088  for(i = 0; i < numRhoRhart; i++) {
1089  rhoRealElements[i] = CkArrayIndex1D(i);
1090  }
1091 #ifdef OLD_COMMLIB
1092  CharmStrategy *gstratEext1;
1093 #else
1094  Strategy *gstratEext1;
1095 #endif
1096 
1097  gstratEext1 = new EachToManyMulticastStrategy
1098  (USE_DIRECT, UrhoGHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoRHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(),
1099  numRhoGHart, rhoGElements, numRhoRhart, rhoRealElements);
1100 
1101  commGHartRHartIns1 = ComlibRegister(gstratEext1);
1102  }
1103  //--------------------------------------------------------------
1104  // For rhoRHart send to rhoGHart SF
1105  if(config.useRHartInsGHart && false){
1106  rhoGElements = new CkArrayIndexMax[numRhoGHart];
1107  for (i = 0; i < numRhoGHart; i++) {
1108  rhoGElements[i] = CkArrayIndex1D(i);
1109  }//endfor
1110 
1111  rhoRealElements = new CkArrayIndexMax[numRhoRhart];
1112  for(i = 0; i < numRhoRhart; i++) {
1113  rhoRealElements[i] = CkArrayIndex1D(i);
1114  }//endfor
1115 #ifdef OLD_COMMLIB
1116  CharmStrategy *real_strat_eext;
1117 #else
1118  Strategy *real_strat_eext;
1119 #endif
1120 
1121  real_strat_eext = new EachToManyMulticastStrategy
1122  (USE_DIRECT, UrhoRHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(), UrhoGHartExtProxy[thisInstance.proxyOffset].ckGetArrayID(),
1123  numRhoRhart, rhoRealElements, numRhoGHart,rhoGElements);
1124  commRHartGHartIns = ComlibRegister(real_strat_eext);
1125  }
1126 
1127  }//endif : use commlib
1128 #endif
1129 
1130 
1131  //////////////////////////////////////////////////////////////////////////////
1132  // Real state space to gspace state and particle plane comm.
1133 
1134 #ifdef USE_COMLIB
1135  if (config.useCommlibMulticast) {
1136 #ifdef OLD_COMMLIB
1137  DirectMulticastStrategy *dstrat = new DirectMulticastStrategy
1138  (UrealSpacePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1139 
1140  RingMulticastStrategy *rstrat = new RingMulticastStrategy
1141  (UrealSpacePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1142 
1143  RingMulticastStrategy *r1strat = new RingMulticastStrategy
1144  (UparticlePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1145 
1146  MultiRingMulticast *mr1strat = new MultiRingMulticast
1147  (UparticlePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1148 
1149  DirectMulticastStrategy *ppdstrat = new DirectMulticastStrategy
1150  (UrealParticlePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1151 
1152  RingMulticastStrategy *pprstrat = new RingMulticastStrategy
1153  (UrealParticlePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1154  MultiRingMulticast *ppmr1strat = new MultiRingMulticast
1155  (UrealParticlePlaneProxy[thisInstance.proxyOffset].ckGetArrayID(),1);
1156 #else
1157  DirectMulticastStrategy *dstrat = new DirectMulticastStrategy();
1158 
1159  OneTimeMulticastStrategy *rstrat = new OneTimeMulticastStrategy();
1160 
1161  RingMulticastStrategy *r1strat = new RingMulticastStrategy();
1162 
1163  RingMulticastStrategy *mr1strat = new RingMulticastStrategy();
1164 
1165  DirectMulticastStrategy *ppdstrat = new DirectMulticastStrategy();
1166 
1167  RingMulticastStrategy *pprstrat = new RingMulticastStrategy();
1168  RingMulticastStrategy *ppmr1strat = new RingMulticastStrategy();
1169 #endif
1170 
1171  mcastInstance.reserve(config.UberJmax);
1172  //multiring should be good on large runs, but not on BG/L
1173  if(CkNumNodes()>64){
1174  for (int kp = 0; kp < config.UberJmax; kp++){
1175  mcastInstance.push_back(ComlibRegister(dstrat));
1176  }
1177  mcastInstancePP=ComlibRegister(mr1strat);
1178  mcastInstanceRPP=ComlibRegister(ppdstrat);
1179  mcastInstancemRPP=ComlibRegister(ppmr1strat);
1180  }else{
1181  for (int kp = 0; kp < config.UberJmax; kp++){
1182  mcastInstance.push_back(ComlibRegister(rstrat));
1183  }
1184  mcastInstancePP=ComlibRegister(r1strat);
1185  mcastInstanceRPP=ComlibRegister(pprstrat);
1186  mcastInstancemRPP=ComlibRegister(ppmr1strat);
1187  }//endif
1188  }
1189 #endif
1190 
1191 //////////////////////////////////////////////////////////////////////////////
1192 }//end routine
1193 //////////////////////////////////////////////////////////////////////////////
1194 
1195 
1196 //////////////////////////////////////////////////////////////////////////////
1197 //////////////////////////////////////////////////////////////////////////////
1198 ///Create the PIBeadAtoms array
1199 //////////////////////////////////////////////////////////////////////////////
1200 //////////////////////////////////////////////////////////////////////////////
1201 //////////////////////////////////////////////////////////////////////////////
1202 
1204 {
1205 
1206  if(thisInstance.idxU.x>0)
1207  { // the set of chares being created is for a non-zero PI all PI
1208  // use the same PIBeadAtoms we simply direct the proxyoffset here to
1209  // the one for the 0th bead.
1210  CkPrintf("Constructing PIMD Bead proxies for non zero instances\n");
1211  UberCollection zeroBeadInstance=thisInstance;
1212  zeroBeadInstance.idxU.x=0;
1213  int proxyOffset=zeroBeadInstance.setPO();
1214  UPIBeadAtomsProxy.push_back(UPIBeadAtomsProxy[proxyOffset]);
1215  }
1216  else
1217  {
1218  int natm = sim->natm_tot;
1219  CkPrintf("Constructing PIMD Bead array\n");
1220  CkArrayOptions opts(natm);
1221  UPIBeadAtomsProxy.push_back( CProxy_PIBeadAtoms::ckNew(thisInstance,config.UberImax,natm,opts));
1222  }
1223 
1224  //TODO: we should have a map for this array, the default scheme would
1225  //result in these chares being within the 0th Bead partition which is
1226  //not an optimal choice
1227 }
1228 //////////////////////////////////////////////////////////////////////////////
1229 
1230 ///Create the array elements for the GSpace, Particle and Real Space planes
1231 //////////////////////////////////////////////////////////////////////////////
1232 //////////////////////////////////////////////////////////////////////////////
1233 //////////////////////////////////////////////////////////////////////////////
1234 
1235 void init_state_chares(int natm_nl,int natm_nl_grp_max,int numSfGrps,
1236  int doublePack, CPcharmParaInfo *sim, UberCollection thisInstance)
1237 //////////////////////////////////////////////////////////////////////////////
1238  { //begin routine
1239 //////////////////////////////////////////////////////////////////////////////
1240 /*
1241  * Set up the map variables used to control the location of the
1242  * 2d-array elements over processors
1243 */
1244 //////////////////////////////////////////////////////////////////////////////
1245 /// Useful Local variables
1246 
1247 
1248  int ngrida = sim->sizeX;
1249  int ngridb = sim->sizeY;
1250  int ngridc = sim->sizeZ;
1251 
1252  int ngridaNl = sim->ngrid_nloc_a;
1253  int ngridbNl = sim->ngrid_nloc_b;
1254  int ngridcNl = sim->ngrid_nloc_c;
1255  int ees_nonlocal_on = sim->ees_nloc_on;
1256 
1257  int ngridaEext = sim->ngrid_eext_a;
1258  int ngridbEext = sim->ngrid_eext_b;
1259  int ngridcEext = sim->ngrid_eext_c;
1260  int ees_eext_on = sim->ees_eext_on;
1261 
1262  int nchareG = sim->nchareG;
1263  int nchareR = sim->sizeZ;
1264  int nchareRPP = ngridcNl;
1265 
1266  int numIterNL = sim->nlIters;
1267 
1268  int nchareRhoG = sim->nchareRhoG;
1269  int rhoGHelpers = config.rhoGHelpers;
1270  int nchareGHart = rhoGHelpers*nchareRhoG;
1271  int nchareRHart = ngridcEext;
1272 
1273  int Rstates_per_pe = config.Rstates_per_pe;
1274  int Gstates_per_pe = config.Gstates_per_pe;
1275  int sGrainSize = config.sGrainSize;
1276  int numChunks = config.numChunks;
1277  int nkpoint = sim->nkpoint;
1278 
1279  //Need our maps and groups to exist before anyone tries to use them
1280 
1281  //////////////////////////////////////////////////////////////////////////////
1282 
1283  //////////////////////////////////////////////////////////////////////////////
1284  //need some groups to exist before we kick off the state which use them
1285  //--------------------------------------------------------------------------------
1286  // Groups : no placement required
1287 
1288  UsfCacheProxy.push_back( CProxy_StructFactCache::ckNew(numSfGrps,natm_nl,natm_nl_grp_max, thisInstance));
1289  if(firstInstance) CkPrintf("created sfcache proxy\n");
1290  UsfCompProxy.push_back(CProxy_StructureFactor::ckNew());
1291  if(firstInstance) CkPrintf("created sfcomp proxy\n");
1292 
1293  if(thisInstance.idxU.y>0 || thisInstance.idxU.s >0)
1294  { // the set of chares being created is for a non-zero kpoint
1295  // all k-points and spins use the same atoms and energies
1296  // we simply direct the proxyoffset here to the one for
1297  // the 0th kpoint
1298 
1299  UberCollection zeroKpointInstance=thisInstance;
1300  zeroKpointInstance.idxU.y=0;
1301  int proxyOffset=zeroKpointInstance.setPO();
1302  // At some future point we could split the cache proxy so that
1303  // it could be shared across pretty much any kind of instance.
1304  // Currently we need different ones for beads and tempers due to
1305  // some atom bits that are blended in with the rest of the cache
1306  // stuff due to modularity failure in its design. If eesCache
1307  // proxy eats all your memory, complain to Glenn.
1308  UeesCacheProxy.push_back(UeesCacheProxy[proxyOffset]);
1309  }
1310  else
1311  {
1312  UeesCacheProxy.push_back(CProxy_eesCache::ckNew(nchareRPP,nchareG,nchareRHart,nchareGHart,
1313  nstates,nchareRhoG, nkpoint, thisInstance));
1314  }
1315 
1316  if(firstInstance) CkPrintf("created eescache proxy\n");
1317 
1318  int nchareRRhoTot = nchareR*(config.rhoRsubplanes);
1319  int nchareRHartTot = nchareRHart*(config.rhoRsubplanes);
1320 
1321  int *numGState = sim->nlines_per_chareG;
1322  int *numGNL = sim->nlines_per_chareG;
1323  int *numGRho = sim->nlines_per_chareRhoG;
1324  int *numGEext = sim->nlines_per_chareRhoGEext;
1325 
1326  if(firstInstance)
1327  {
1328  int *numRXState = new int [nchareR];
1329  int *numRYState = new int [nchareR];
1330  int *numRYStateLower = new int [nchareR];
1331  int *numRXNL = new int [nchareRPP];
1332  int *numRYNL = new int [nchareRPP];
1333  int *numRYNLLower = new int [nchareRPP];
1334  int nplane_x_use = sim->nplane_x;
1335  if(!config.doublePack){nplane_x_use = (nplane_x_use+1)/2;}
1336  for(int i=0;i<nchareR;i++){
1337  numRXState[i] = sim->sizeY;
1338  numRYState[i] = nplane_x_use;
1339  numRYStateLower[i] = nplane_x_use - 1;
1340  }//endif
1341  for(int i=0;i<nchareRPP;i++){
1342  numRXNL[i] = ngridbNl;
1343  numRYNL[i] = nplane_x_use;
1344  numRYNLLower[i] = nplane_x_use - 1;
1345  }//endfor
1346  int *numRXRho = new int [nchareRRhoTot];
1347  int *numRYRho = new int [nchareRRhoTot];
1348  int *numRXEext = new int [nchareRHartTot];
1349  int *numRYEext = new int [nchareRHartTot];
1350  int *numSubGx = sim->numSubGx;
1351  create_Rho_fft_numbers(nchareR,nchareRHart,config.rhoRsubplanes,
1352  sim->nplane_rho_x,sim->sizeY,ngridbEext,
1353  numRXRho,numRYRho,numRXEext,numRYEext,numSubGx);
1354  UfftCacheProxy.push_back(CProxy_FFTcache::ckNew(
1355  sim->sizeX,sim->sizeY,sim->sizeZ,
1356  ngridaEext,ngridbEext,ngridcEext,ees_eext_on,
1357  ngridaNl, ngridbNl, ngridcNl, ees_nonlocal_on,
1358  sim->nlines_max, sim->nlines_max_rho,
1359  config.nchareG,nchareR,
1360  config.nchareG,nchareRPP,
1361  nchareRhoG, nchareR, nchareRRhoTot,
1362  nchareGHart, nchareRHart,nchareRHartTot,
1363  numGState, numRXState, numRYState,numRYStateLower,
1364  numGNL, numRXNL, numRYNL, numRYNLLower,
1365  numGRho, numRXRho, numRYRho,
1366  numGEext, numRXEext, numRYEext,
1367  config.fftopt,config.fftprogresssplitReal,config.fftprogresssplit,
1368  config.rhoRsubplanes, thisInstance));
1369  CkPrintf("created fftcache proxy\n");
1370  delete [] numRXState;
1371  delete [] numRYState;
1372  delete [] numRYStateLower;
1373  delete [] numRXNL;
1374  delete [] numRYNL;
1375  delete [] numRYNLLower;
1376  delete [] numRXRho;
1377  delete [] numRYRho;
1378  delete [] numRXEext;
1379  delete [] numRYEext;
1380  }
1381  else
1382  { // direct to 0th proxy
1383  UfftCacheProxy.push_back(UfftCacheProxy[0]);
1384  }
1385 
1386  //////////////////////////////////////////////////////////////////////////////
1387  // Instantiate the Chares with placement determined by the maps
1388 
1389  //---------------------------------------------------------------------------
1390  // state g-space
1391 
1392  PRINT_LINE_STAR;
1393  PRINTF("Building G-space (%d %d) and R-space (%d %d/%d) state Chares\n",
1394  nstates, nchareG, nstates, nchareR, nchareRPP);
1395  PRINT_LINE_DASH;printf("\n");
1396  availGlobG->reset();
1397  double newtime=CmiWallTimer();
1398  int x=mapOffsets[numInst].getx();
1399  int y=mapOffsets[numInst].gety();
1400  int z=mapOffsets[numInst].getz();
1401 
1402  /** addtogroup mapping */
1403  /**@{*/
1404  if (firstInstance) {
1405  GSImaptable[numInst].buildMap(nstates, nchareG);
1406  int success = 0;
1407  if(config.loadMapFiles) {
1408  int size[2];
1409  size[0] = nstates; size[1] = nchareG;
1410  MapFile *mf = new MapFile("GSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1411 #ifdef USE_INT_MAP
1412  success = mf->loadMap("GSMap", &GSImaptable[0]);
1413 #else
1414  success = mf->loadMap("GSMap", &GSmaptable);
1415 #endif
1416  delete mf;
1417  }
1418 
1419  if(success == 0) {
1420  GSMapTable gsTable = GSMapTable(&GSImaptable[0], &GSImaptable[numInst], availGlobG,
1421  nchareG, nstates, Gstates_per_pe, config.useCuboidMap, numInst, x, y, z);
1422  }
1423 
1424  CkPrintf("GSMap created in %g\n", newtime-Timer);
1425  } else {
1426  if((CkNumPes()==1) && (config.fakeTorus!=1))
1427  GSImaptable[numInst]=GSImaptable[0];
1428  else
1429  GSImaptable[numInst].translate(&GSImaptable[0], x,y,z, config.torusMap==1);
1430 
1431  }
1432 
1433  // there is only one IntMap per chare type, but each instance has
1434  // its own map group
1435  CProxy_GSMap gsMap = CProxy_GSMap::ckNew(thisInstance);
1436  // CkArrayOptions gSpaceOpts(nstates,nchareG);
1437  /**@}*/
1438  /** addtogroup GSpaceState */
1439  /**@{*/
1440  CkArrayOptions gSpaceOpts(nstates,nchareG);
1441  std::string forwardname("GSpaceForward");
1442  std::ostringstream fwdstrm;
1443  fwdstrm << forwardname << "." << thisInstance.idxU.x << "." << thisInstance.idxU.y << "." << thisInstance.idxU.z;
1444  int gforward=keeperRegister(fwdstrm.str());
1445  std::string backwardname("GSpaceBackward");
1446  std::ostringstream bwdstrm;
1447  bwdstrm << backwardname << "." << thisInstance.idxU.x << "." << thisInstance.idxU.y << "." << thisInstance.idxU.z;
1448  int gbackward=keeperRegister(bwdstrm.str());
1449  gSpaceOpts.setMap(gsMap);
1450  gSpaceOpts.setAnytimeMigration(false);
1451  gSpaceOpts.setStaticInsertion(true);
1452  UgSpacePlaneProxy.push_back(CProxy_CP_State_GSpacePlane::ckNew(sizeX, 1, 1, sGrainSize, gforward, gbackward, thisInstance, gSpaceOpts));
1453  UgSpacePlaneProxy[thisInstance.proxyOffset].doneInserting();
1454  // CkPrintf("{%d} main uGSpacePlaneProxy[%d] is %d\n",thisInstance.proxyOffset,thisInstance.proxyOffset,CkGroupID(UgSpacePlaneProxy[thisInstance.proxyOffset].ckGetArrayID()).idx);
1455  /**@}*/
1456  //--------------------------------------------------------------------------------
1457  // Bind the GSpaceDriver array to the GSpacePlane array so that they migrate together
1458  CkArrayOptions gspDriverOpts(nstates,nchareG);
1459  gspDriverOpts.setAnytimeMigration(false);
1460  gspDriverOpts.setStaticInsertion(true);
1461 
1462  gspDriverOpts.bindTo(UgSpacePlaneProxy[thisInstance.proxyOffset]);
1463  UgSpaceDriverProxy.push_back( CProxy_GSpaceDriver::ckNew(thisInstance,gspDriverOpts) );
1464  UgSpaceDriverProxy[thisInstance.proxyOffset].doneInserting();
1465  /**@}*/
1466  //--------------------------------------------------------------------------------
1467  // We bind the particlePlane array to the gSpacePlane array migrate together
1468  /** addtogroup Particle */
1469  /**@{*/
1470  // CkArrayOptions particleOpts(nstates,nchareG);
1471  CkArrayOptions particleOpts(nstates,nchareG);
1472  particleOpts.setAnytimeMigration(false);
1473  particleOpts.setStaticInsertion(true);
1474  particleOpts.setMap(gsMap); // the maps for both the arrays are the same
1475  particleOpts.bindTo(UgSpacePlaneProxy[thisInstance.proxyOffset]);
1476  UparticlePlaneProxy.push_back(CProxy_CP_State_ParticlePlane::ckNew(
1477  nchareG, sim->sizeY, sim->sizeZ,ngridaNl,ngridbNl,ngridcNl,
1478  1, numSfGrps, natm_nl, natm_nl_grp_max, nstates,
1479  nchareG, Gstates_per_pe, numIterNL, ees_nonlocal_on,
1480  thisInstance, particleOpts));
1481  UparticlePlaneProxy[thisInstance.proxyOffset].doneInserting();
1482 
1483  if(config.dumpMapFiles) {
1484  int size[2];
1485  size[0] = nstates; size[1] = nchareG;
1486  MapFile *mf = new MapFile("GSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1487 #ifdef USE_INT_MAP
1488  mf->dumpMap(&GSImaptable[thisInstance.getPO()], thisInstance.getPO());
1489 #else
1490  mf->dumpMap(&GSmaptable);
1491 #endif
1492  delete mf;
1493  }
1494 
1495  if(config.dumpMapCoordFiles) {
1496  // if someone wants to dump nontopo maps, they'll need a manager
1497  if(topoMgr == NULL)
1498  topoMgr = new TopoManager();
1499 
1500  int size[2];
1501  size[0] = nstates; size[1] = nchareG;
1502  MapFile *mf = new MapFile("GSMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1503 #ifdef USE_INT_MAP
1504  mf->dumpMapCoords(&GSImaptable[thisInstance.getPO()], thisInstance.getPO());
1505 #else
1506  mf->dumpMapCoords(&GSmaptable);
1507 #endif
1508  delete mf;
1509  }
1510 
1511  //---------------------------------------------------------------------------
1512  // state r-space
1513 
1514  // correction to accomodate multiple instances
1515  /**@}*/
1516  /** addtogroup mapping */
1517  /**@{*/
1518  if(firstInstance) {
1519  RSImaptable[numInst].buildMap(nstates, nchareR);
1520  Timer=CmiWallTimer();
1521  availGlobR->reset();
1522 
1523  int success = 0;
1524  if(config.loadMapFiles) {
1525  int size[2];
1526  size[0] = nstates; size[1] = nchareR;
1527  MapFile *mf = new MapFile("RSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1528 #ifdef USE_INT_MAP
1529  success = mf->loadMap("RSMap", &RSImaptable[0]);
1530 #else
1531  success = mf->loadMap("RSMap", &RSmaptable);
1532 #endif
1533  delete mf;
1534  }
1535  if(success == 0) {
1536 #ifdef USE_INT_MAP
1537  RSMapTable RStable= RSMapTable(&RSImaptable[0], &RSImaptable[numInst], availGlobR,
1538  nstates, nchareR, Rstates_per_pe, config.useCuboidMapRS,
1539  &GSImaptable[0], config.nchareG, numInst, x, y, z);
1540 #else
1541  RSMapTable RStable= RSMapTable(&RSmaptable, availGlobR, nstates, nchareR,
1542  Rstates_per_pe, config.useCuboidMapRS, &GSmaptable, config.nchareG);
1543 #endif
1544  }
1545  newtime=CmiWallTimer();
1546  CkPrintf("RSMap created in %g\n", newtime-Timer);
1547  Timer=newtime;
1548  } else {
1549  if((CkNumPes()==1) && (config.fakeTorus!=1))
1550  RSImaptable[numInst]=RSImaptable[0];
1551  else
1552  RSImaptable[numInst].translate(&RSImaptable[0], x,y,z, config.torusMap==1);
1553  CkPrintf("RSMap instance %d created in %g\n", numInst, newtime-Timer);
1554  }
1555 
1556  CProxy_RSMap rsMap= CProxy_RSMap::ckNew(thisInstance);
1557  // CkArrayOptions realSpaceOpts(nstates,nchareR);
1558  CkArrayOptions realSpaceOpts(nstates,nchareR);
1559  realSpaceOpts.setMap(rsMap);
1560  realSpaceOpts.setAnytimeMigration(false);
1561  realSpaceOpts.setStaticInsertion(true);
1562 
1563  /**@}*/
1564  int rforward=keeperRegister(std::string("RealSpaceForward"));
1565  int rbackward=keeperRegister(std::string("RealSpaceBackward"));
1566  /** \addtogroup RealSpaceState */
1567  /**@{*/
1568  UrealSpacePlaneProxy.push_back( CProxy_CP_State_RealSpacePlane::ckNew(1, 1, ngrida, ngridb, ngridc, rforward, rbackward, thisInstance, realSpaceOpts));
1569  UrealSpacePlaneProxy[thisInstance.proxyOffset].doneInserting();
1570 
1571  if(config.dumpMapFiles) {
1572  int size[2];
1573  size[0] = nstates; size[1] = nchareR;
1574  MapFile *mf = new MapFile("RSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1575 #ifdef USE_INT_MAP
1576  mf->dumpMap(&RSImaptable[thisInstance.getPO()], thisInstance.getPO());
1577 #else
1578  mf->dumpMap(&RSmaptable);
1579 #endif
1580  delete mf;
1581  }
1582 
1583  if(config.dumpMapCoordFiles) {
1584  int size[2];
1585  size[0] = nstates; size[1] = nchareR;
1586  MapFile *mf = new MapFile("RSMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1587 #ifdef USE_INT_MAP
1588  mf->dumpMapCoords(&RSImaptable[thisInstance.getPO()], thisInstance.getPO());
1589 #else
1590  mf->dumpMapCoords(&RSmaptable);
1591 #endif
1592  delete mf;
1593 
1594  }
1595  /**@}*/
1596  //--------------------------------------------------------------------------------
1597  // state r-particleplane
1598 
1599  availGlobR->reset();
1600 
1601  //--------------------------------------------------------------------------------
1602  // Do a fancy dance to determine placement of structure factors
1603  CkPrintf("Making SF non-EES\n");
1604  if(!ees_nonlocal_on)
1605  {
1606  CkPrintf("Making SF non-EES\n");
1607  int *nsend = new int[nchareG];
1608  int **listpe = new int * [nchareG];
1609  int numproc = config.numPes;
1610  int *gspace_proc = new int [numproc];
1611 
1612  for(int i =0;i<numproc;i++){gspace_proc[i]=0;}
1613  for(int j=0;j<nchareG;j++){
1614  listpe[j]= new int[nstates];
1615  nsend[j]=0;
1616  for(int i=0;i<nstates;i++){
1617  listpe[j][i] = gsprocNum(sim, i, j, thisInstance.getPO());
1618  gspace_proc[listpe[j][i]]+=1;
1619  }//endfor
1620  lst_sort_clean(nstates, &nsend[j], listpe[j]);
1621  }//endfor
1622 
1623  FILE *fp = fopen("gspplane_proc_distrib.out","w");
1624  for(int i=0;i<numproc;i++){
1625  fprintf(fp,"%d %d\n",i,gspace_proc[i]);
1626  }//endfor
1627  fclose(fp);
1628  delete [] gspace_proc;
1629 
1630  int minsend=nstates;
1631  int maxsend=0;
1632  double avgsend=0.0;
1633  int chareG_use=0;
1634  PRINT_LINE_STAR;
1635  CkPrintf("Structure factor chareG dests\n");
1636  CkPrintf("Number of g-space chares : %d\n",nchareG);
1637  PRINT_LINE_DASH;
1638  for(int lsi=0;lsi<nchareG;lsi++){
1639  chareG_use++;
1640  CkPrintf("chareG [%d] nsend %d\n",lsi,nsend[lsi]);
1641  if(nsend[lsi]>maxsend)
1642  maxsend=nsend[lsi];
1643  if(nsend[lsi]<minsend)
1644  minsend=nsend[lsi];
1645  avgsend+=(double) nsend[lsi];
1646 #ifdef SF_SEND_LIST_OUT
1647  for(int lsj=0;lsj<nsend[lsi];lsj++){
1648  CkPrintf("[%d %d] pe %d\n",lsi,lsj,listpe[lsi][lsj]);
1649  }//endfor
1650 #endif
1651  }//endfor
1652  PRINT_LINE_DASH;
1653  CkPrintf("SFSends min %d max %d avg %g\n",minsend,maxsend,avgsend/(double)chareG_use);
1654  PRINT_LINE_STAR;
1655 
1656  //--------------------------------------------------------------------------------
1657  // Insert the objects into the StructureFactor array
1658 
1659  int dupmax=maxsend; // there is no point in ever having more than that
1660  if(config.numSfDups<dupmax){dupmax=config.numSfDups;}
1661  config.numSfDups = dupmax;
1662  int numSfDups = dupmax;
1663  CkPrintf("real numSfdups is %d based on maxsend of %d\n",numSfDups, maxsend);
1664  CkVec <int> peUsedBySF;
1665  for (int dup=0; dup<dupmax; dup++){
1666  for (int x = 0; x < nchareG; x += 1){
1667  int num_dup, istart, iend;
1668  get_grp_params( nsend[x], numSfDups, dup, x ,&num_dup, &istart, &iend);
1669  int pe_ind=istart;
1670  if(x%2==0)
1671  pe_ind=iend;
1672  for (int AtmGrp=0; AtmGrp<numSfGrps; AtmGrp++){
1673  UsfCompProxy[thisInstance.proxyOffset](AtmGrp, x, dup).insert(numSfGrps,numSfDups,
1674  natm_nl_grp_max, num_dup, &(listpe[x][istart]),
1675  thisInstance,atmGrpMap(istart, num_dup, nsend[x], listpe[x],AtmGrp,dup,x));
1676  peUsedBySF.push_back(atmGrpMap(istart, num_dup, nsend[x], listpe[x], AtmGrp, dup,x));
1677 
1678  pe_ind++;
1679  if(pe_ind>nsend[x]){ pe_ind=0;}
1680  }//endfor : AtmGrp
1681  }//endfor : chareG
1682  }//endfor : Dups
1683  UsfCompProxy[thisInstance.proxyOffset].doneInserting();
1684  UpeUsedBySF.push_back(peUsedBySF);
1685  for(int j=0;j<nchareG;j++){delete [] listpe[j];}
1686  delete [] listpe;
1687  delete [] nsend;
1688  }
1689 
1690  //////////////////////////////////////////////////////////////////////////////
1691  // Set some com strategy of Sameer
1692  if(firstInstance) {
1693 
1694 #ifdef USE_COMLIB
1695  if(config.useCommlib) {
1696  // TODO: do we need to do this for each Uber instance?
1697  // technically they are operating on mutually exclusive process
1698  // sets, so modulo insane commlib global state, it should be
1699  // safe to reuse these across ubers
1700 
1701  CkPrintf("Making State streaming strats useGssRP = %d, useMssGP = %d, useGssRPP = %d, useMssGPP = %d\n",(int)config.useGssInsRealP,(int)config.useMssInsGP,(int)config.useGssInsRealPP,(int)config.useMssInsGPP);
1702  //mstrat->enableShortArrayMessagePacking();
1703  //rspaceState to gspaceState : gspaceState to rspaceState
1704 
1705  if(config.useGssInsRealP)
1706  {
1707  StreamingStrategy *gmstrat = new StreamingStrategy(config.gStreamPeriod,
1708  config.gBucketSize);
1709  gssInstance= ComlibRegister(gmstrat);
1710  }
1711  //mstrat->enableShortArrayMessagePacking();
1712  //rPPState to gPPState : gPPState to rPPState
1713  if(config.useMssInsGP){
1714  StreamingStrategy *mstrat = new StreamingStrategy(config.rStreamPeriod,
1715  config.rBucketSize);
1716 
1717  mssInstance= ComlibRegister(mstrat);
1718  }
1719  if (config.useMssInsGPP){
1720  StreamingStrategy *rpmstrat = new StreamingStrategy(config.rStreamPeriod,
1721  config.rBucketSize);
1722 
1723  mssPInstance= ComlibRegister(rpmstrat);
1724  }
1725  if (config.useGssInsRealPP){
1726  StreamingStrategy *gpmstrat = new StreamingStrategy(config.gStreamPeriod,
1727  config.gBucketSize);
1728  gssPInstance= ComlibRegister(gpmstrat);
1729  }
1730  }//endif
1731 #endif
1732 
1733  }
1734 //////////////////////////////////////////////////////////////////////////////
1735 
1736  printf("\n");
1737 
1738  PRINT_LINE_DASH;
1739  PRINTF("Completed G-space/R-space state chare array build\n");
1740  PRINT_LINE_STAR;printf("\n");
1741 
1742 //----------------------------------------------------------------------------
1743  }//end routine
1744 //////////////////////////////////////////////////////////////////////////////
1745 
1746 /** /addtogroup Particle */
1747 /**@{*/
1748 //////////////////////////////////////////////////////////////////////////////
1749 /// Creating arrays CP_StateRealParticlePlane
1750 //////////////////////////////////////////////////////////////////////////////
1751 //////////////////////////////////////////////////////////////////////////////
1752 //////////////////////////////////////////////////////////////////////////////
1753 void init_eesNL_chares(int natm_nl,int natm_nl_grp_max,
1754  int doublePack, PeList *exclusion, CPcharmParaInfo *sim,
1755  UberCollection thisInstance)
1756 //////////////////////////////////////////////////////////////////////////////
1757  { //begin routine
1758 //////////////////////////////////////////////////////////////////////////////
1759 /*
1760  * Set up the map variables used to control the location of the
1761  * 2d-array elements over processors
1762 */
1763 //////////////////////////////////////////////////////////////////////////////
1764 /// Useful Local variables
1765 
1766 
1767  int ngridaNl = sim->ngrid_nloc_a;
1768  int ngridbNl = sim->ngrid_nloc_b;
1769  int ngridcNl = sim->ngrid_nloc_c;
1770  int ees_nonlocal_on = sim->ees_nloc_on;
1771 
1772  int nchareG = sim->nchareG;
1773  int nchareRPP = ngridcNl;
1774 
1775  int numIterNL = sim->nlIters;
1776  int zmatSizeMax = sim->nmem_zmat_max;
1777 
1778  PeList *nlexcludePes;
1779  if(config.useRhoExclusionMap)
1780  nlexcludePes=exclusion;
1781  else if(config.useReductionExclusionMap)
1782  nlexcludePes= new PeList(UpeUsedByNLZ[thisInstance.proxyOffset]);
1783  else
1784  nlexcludePes= new PeList(0);
1785  if(config.excludePE0 && ! config.loadMapFiles)
1786  nlexcludePes->appendOne(0);
1787  int Rstates_per_pe = config.Rstates_per_pe;
1788  availGlobG->reset();
1789  double newtime=CmiWallTimer();
1790 #ifdef USE_INT_MAP
1791  RPPImaptable[thisInstance.getPO()].buildMap(nstates, nchareRPP);
1792 #endif
1793  if(firstInstance)
1794  {
1795  int success = 0;
1796  if(config.loadMapFiles) {
1797  int size[2];
1798  size[0] = nstates; size[1] = nchareRPP;
1799  MapFile *mf = new MapFile("RPPMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1800 #ifdef USE_INT_MAP
1801  success = mf->loadMap("RPPMap", &RPPImaptable[thisInstance.getPO()]);
1802 #else
1803  success = mf->loadMap("RPPMap", &RPPmaptable);
1804 #endif
1805  delete mf;
1806  }
1807 
1808  if(success == 0) {
1809 #ifdef USE_INT_MAP
1810  RPPMapTable RPPtable= RPPMapTable(&RPPImaptable[thisInstance.getPO()], availGlobG, nlexcludePes,
1811  nstates, nchareRPP, Rstates_per_pe,
1812  boxSize, config.useCuboidMap,
1813  config.nchareG, &GSImaptable[thisInstance.getPO()]);
1814 #else
1815  RPPMapTable RPPtable= RPPMapTable(&RPPmaptable, availGlobG, nlexcludePes,
1816  nstates, nchareRPP, Rstates_per_pe,
1817  boxSize, config.useCuboidMap,
1818  config.nchareG, &GSmaptable);
1819 #endif
1820  }
1821  }
1822  else
1823  {
1824  int x=mapOffsets[numInst].getx();
1825  int y=mapOffsets[numInst].gety();
1826  int z=mapOffsets[numInst].getz();
1827  if((CkNumPes()==1) && (config.fakeTorus!=1))
1828  RPPImaptable[numInst]=RPPImaptable[0];
1829  else
1830  RPPImaptable[numInst].translate(&RPPImaptable[0], x,y,z, config.torusMap==1);
1831  }
1832  CProxy_RPPMap rspMap= CProxy_RPPMap::ckNew(thisInstance);
1833  newtime=CmiWallTimer();
1834  CmiNetworkProgressAfter(0);
1835  CkPrintf("RPPMap created in %g\n",newtime-Timer);
1836 
1837  if(config.dumpMapFiles) {
1838  int size[2];
1839  size[0] = nstates; size[1] = nchareRPP;
1840  MapFile *mf = new MapFile("RPPMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1841 #ifdef USE_INT_MAP
1842  mf->dumpMap(&RPPImaptable[thisInstance.getPO()], thisInstance.getPO());
1843 #else
1844  mf->dumpMap(&RPPmaptable);
1845 #endif
1846  delete mf;
1847  }
1848  if(config.dumpMapCoordFiles) {
1849  int size[2];
1850  size[0] = nstates; size[1] = nchareRPP;
1851  MapFile *mf = new MapFile("RPPMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
1852 #ifdef USE_INT_MAP
1853  mf->dumpMapCoords(&RPPImaptable[thisInstance.getPO()], thisInstance.getPO());
1854 #else
1855  mf->dumpMapCoords(&RPPmaptable);
1856 #endif
1857  delete mf;
1858  }
1859 
1860  if(config.useRhoExclusionMap)
1861  {
1862  nlexcludePes=NULL;
1863  }
1864  else
1865  {
1866  if(nlexcludePes!=NULL)
1867  delete nlexcludePes;
1868  }
1869  if(config.excludePE0 && !config.loadMapFiles)
1870  {
1871  if(nlexcludePes==NULL)
1872  nlexcludePes=new PeList(0);
1873  nlexcludePes->appendOne(0);
1874  }
1875  Timer=newtime;
1876  CkArrayOptions pRealSpaceOpts(nstates,ngridcNl);
1877  pRealSpaceOpts.setMap(rspMap);
1878  pRealSpaceOpts.setAnytimeMigration(false);
1879  pRealSpaceOpts.setStaticInsertion(true);
1880  UrealParticlePlaneProxy.push_back(CProxy_CP_State_RealParticlePlane::ckNew(
1881  ngridaNl,ngridbNl,ngridcNl,
1882  numIterNL,zmatSizeMax,Rstates_per_pe,
1883  nchareG,ees_nonlocal_on, thisInstance,
1884  pRealSpaceOpts));
1885  UrealParticlePlaneProxy[thisInstance.proxyOffset].doneInserting();
1886  printf("\n");
1887  PRINT_LINE_DASH;
1888  PRINTF("Completed RealParticle chare array build\n");
1889  PRINT_LINE_STAR;printf("\n");
1890 
1891 }
1892 /**@}*/
1893 
1894 /** \addtogroup Density */
1895 /**@{*/
1896 //////////////////////////////////////////////////////////////////////////////
1897 /// Creating arrays CP_Rho_GSpacePlane, CP_Rho_GSpacePlaneHelper
1898 /// and CP_Rho_RealSpacePlane
1899 //////////////////////////////////////////////////////////////////////////////
1900 //////////////////////////////////////////////////////////////////////////////
1901 //////////////////////////////////////////////////////////////////////////////
1903 //////////////////////////////////////////////////////////////////////////////
1904 {//begin routine
1905  //////////////////////////////////////////////////////////////////////////////
1906  /*
1907  * create the array for real-space densities (two-dimensional chare array)
1908  */
1909  //////////////////////////////////////////////////////////////////////////////
1910  // Chare array sizes and offsets
1911  int ngrid_eext_a = sim->ngrid_eext_a;
1912  int ngrid_eext_b = sim->ngrid_eext_b;
1913  int ngrid_eext_c = sim->ngrid_eext_c;
1914  int ees_eext_on = sim->ees_eext_on;
1915  int ees_nonlocal_on = sim->ees_nloc_on;
1916  int natmTyp = sim->natm_typ;
1917  int nchareRhoG = sim->nchareRhoG;
1918  int nchareRhoR = sim->sizeZ;
1919  int rhoGHelpers = config.rhoGHelpers;
1920  int nchareHartAtmT = config.nchareHartAtmT;
1921  int nchareRhoGHart = rhoGHelpers*nchareRhoG;
1922  int nchareRhoRHart = ngrid_eext_c;
1923 
1924 
1925  if(thisInstance.idxU.y>0)
1926  { // the set of chares being created is for a non-zero kpoint
1927  // all k-points use the same rho, therefore we do not make new
1928  // chares, we simply direct the proxyoffset here to the one for
1929  // the 0th kpoint and return
1930 
1931 
1932  UberCollection zeroKpointInstance=thisInstance;
1933  zeroKpointInstance.idxU.y=0;
1934  int proxyOffset=zeroKpointInstance.setPO();
1935  UrhoRealProxy.push_back(UrhoRealProxy[proxyOffset]);
1936  UrhoGProxy.push_back(UrhoGProxy[proxyOffset]);
1937  UrhoGHartExtProxy.push_back(UrhoGHartExtProxy[proxyOffset]);
1938  if(ees_eext_on){
1939  UrhoRHartExtProxy.push_back(UrhoRHartExtProxy[proxyOffset]);
1940  }
1941  UlsRhoGProxy.push_back(UlsRhoGProxy[proxyOffset]);
1942  UlsRhoRealProxy.push_back(UlsRhoRealProxy[proxyOffset]);
1943 
1944  return 1;
1945  }
1946 
1947 
1948 
1949 
1950  //////////////////////////////////////////////////////////////////////////////
1951  // Output to the screen
1952 
1953  PRINT_LINE_STAR;
1954  CkPrintf("Building RhoR, RhoG, RhoGHartExt, RhoRHartExt Chares %d %d %d %d natmtyp:%d\n",
1955  nchareRhoR,nchareRhoG,nchareRhoGHart,nchareRhoRHart,natmTyp);
1956  PRINT_LINE_DASH;printf("\n");
1957 
1958  //////////////////////////////////////////////////////////////////////////////
1959  // Nuke some procs from the list : reset, nuke, reset if you run out
1960 
1961 
1962  PeList *RhoAvail=NULL;
1963  if(!config.loadMapFiles)
1964  {
1965  availGlobR->reset();
1966  // make RhoAvail based on the pes used by RS
1967  RhoAvail=new PeList(*availGlobR);
1968  }
1969  //------------------------------------------------------------------------
1970  // subtract processors used by other nonscaling chares (i.e., non local reduceZ)
1971  excludePes= new PeList(0);
1972  if(config.excludePE0 && !config.loadMapFiles)
1973  excludePes->appendOne(0);
1974 
1975  // avoid co-mapping with the particle plane non-local reduction roots
1976  if(config.useReductionExclusionMap && !config.loadMapFiles)
1977  {
1978  if( nchareRhoR*config.rhoRsubplanes+UpeUsedByNLZ[thisInstance.proxyOffset].size() <
1979  RhoAvail->count()){
1980 
1981  CkPrintf("subtracting %d NLZ nodes from %d for RhoR Map\n",
1982  UpeUsedByNLZ[thisInstance.proxyOffset].size(),RhoAvail->count());
1983  // nlz.dump();
1984  *RhoAvail-*excludePes; //unary minus operator defined in PeList.h
1985  RhoAvail->reindex();
1986  CkPrintf("Leaving %d for RhoR Map\n",RhoAvail->count());
1987  }//endif
1988 
1989  //------------------------------------------------------------------------
1990  // subtract processors used by other nonscaling chares
1991 
1992  if(ees_nonlocal_on==0){
1993  if( nchareRhoR*config.rhoRsubplanes+UpeUsedBySF[thisInstance.proxyOffset].size()<RhoAvail->count()){
1994  CkPrintf("subtracting %d SF nodes from %d for RhoR Map\n",
1995  UpeUsedBySF[thisInstance.proxyOffset].size(),RhoAvail->count());
1996  PeList sf(UpeUsedBySF[thisInstance.proxyOffset]);
1997  *RhoAvail-sf;
1998  RhoAvail->reindex();
1999  }//endif
2000  }//endif
2001  }
2002  if(!config.loadMapFiles && RhoAvail->count()>2 ) { RhoAvail->reindex(); }
2003 
2004  //////////////////////////////////////////////////////////////////////////////
2005  // Maps and options
2006  //CkPrintf("RhoR map for %d x %d=%d chares, using %d procs\n",nchareRhoR, config.rhoRsubplanes, nchareRhoR*config.rhoRsubplanes, RhoAvail->count());
2007 
2008  //---------------------------------------------------------------------------
2009  // rho RS
2010  if(firstInstance)
2011  {
2012  RhoRSImaptable[thisInstance.getPO()].buildMap(nchareRhoR, config.rhoRsubplanes);
2013  int success = 0;
2014  if(config.loadMapFiles) {
2015  int size[2];
2016  size[0] = nchareRhoR; size[1] = config.rhoRsubplanes;
2017  MapFile *mf = new MapFile("RhoRSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2018  success = mf->loadMap("RhoRSMap", &RhoRSImaptable[thisInstance.getPO()]);
2019  delete mf;
2020  }
2021  if(success == 0) {
2022  RhoRSMapTable RhoRStable(&RhoRSImaptable[thisInstance.getPO()], RhoAvail, nchareRhoR, config.rhoRsubplanes, config.nstates, config.useCentroidMapRho, &RSImaptable[thisInstance.getPO()], excludePes);
2023  }
2024  }
2025  else
2026  {
2027  int x=mapOffsets[numInst].getx();
2028  int y=mapOffsets[numInst].gety();
2029  int z=mapOffsets[numInst].getz();
2030  if((CkNumPes()==1) && (config.fakeTorus!=1))
2031  RhoRSImaptable[numInst]=RhoRSImaptable[0];
2032  else
2033  RhoRSImaptable[numInst].translate(&RhoRSImaptable[0], x,y,z, config.torusMap==1);
2034  }
2035 
2036  CProxy_RhoRSMap rhorsMap = CProxy_RhoRSMap::ckNew(thisInstance);
2037  CkArrayOptions rhorsOpts(nchareRhoR, config.rhoRsubplanes);
2038  //CkArrayOptions rhorsOpts;
2039  rhorsOpts.setMap(rhorsMap);
2040  rhorsOpts.setAnytimeMigration(false);
2041  rhorsOpts.setStaticInsertion(true);
2042  if(config.dumpMapFiles) {
2043  int size[2];
2044  size[0] = nchareRhoR; size[1] = config.rhoRsubplanes;
2045  MapFile *mf = new MapFile("RhoRSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2046 #ifdef USE_INT_MAP
2047  mf->dumpMap(&RhoRSImaptable[thisInstance.getPO()], thisInstance.getPO());
2048 #else
2049  mf->dumpMap(&RhoRSmaptable);
2050 #endif
2051  delete mf;
2052  }
2053  if(config.dumpMapCoordFiles) {
2054  int size[2];
2055  size[0] = nchareRhoR; size[1] = config.rhoRsubplanes;
2056  MapFile *mf = new MapFile("RhoRSMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2057 #ifdef USE_INT_MAP
2058  mf->dumpMapCoords(&RhoRSImaptable[thisInstance.getPO()], thisInstance.getPO());
2059 #else
2060  mf->dumpMapCoords(&RhoRSmaptable);
2061 #endif
2062  delete mf;
2063  }
2064  CmiNetworkProgressAfter(0);
2065  //---------------------------------------------------------------------------
2066  // rho GS
2067  // if there aren't enough free procs refresh the RhoAvail list;
2068  if(!config.loadMapFiles && nchareRhoG>RhoAvail->count())
2069  {
2070  CkPrintf("refreshing avail list count %d less than rhog %d\n",RhoAvail->count(), nchareRhoG);
2071  RhoAvail->reset();
2072  }
2073  if(firstInstance)
2074  {
2075  RhoGSImaptable[thisInstance.getPO()].buildMap(nchareRhoG, 1);
2076  int success = 0;
2077  if(config.loadMapFiles) {
2078  int size[2];
2079  size[0] = nchareRhoG; size[1] = 1;
2080  MapFile *mf = new MapFile("RhoGSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2081  success = mf->loadMap("RhoGSMap", &RhoGSImaptable[thisInstance.getPO()]);
2082  delete mf;
2083  }
2084  if(success == 0) {
2085  RhoGSMapTable RhoGStable(&RhoGSImaptable[thisInstance.getPO()], RhoAvail,nchareRhoG, config.useCentroidMapRho, &RhoRSImaptable[thisInstance.getPO()], excludePes);
2086 
2087  }
2088  }
2089  else
2090  {
2091  int x=mapOffsets[numInst].getx();
2092  int y=mapOffsets[numInst].gety();
2093  int z=mapOffsets[numInst].getz();
2094  if((CkNumPes()==1) && (config.fakeTorus!=1))
2095  RhoGSImaptable[numInst]=RhoGSImaptable[0];
2096  else
2097  RhoGSImaptable[numInst].translate(&RhoGSImaptable[0], x,y,z, config.torusMap==1);
2098  }
2099 
2100 
2101  CProxy_RhoGSMap rhogsMap = CProxy_RhoGSMap::ckNew(thisInstance);
2102  CkArrayOptions rhogsOpts(nchareRhoG,1);
2103  //CkArrayOptions rhogsOpts;
2104  rhogsOpts.setMap(rhogsMap);
2105  rhogsOpts.setAnytimeMigration(false);
2106  rhogsOpts.setStaticInsertion(true);
2107  if(config.dumpMapFiles) {
2108  int size[2];
2109  size[0] = nchareRhoG; size[1] = 1;
2110  MapFile *mf = new MapFile("RhoGSMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2111 #ifdef USE_INT_MAP
2112  mf->dumpMap(&RhoGSImaptable[thisInstance.getPO()], thisInstance.getPO());
2113 #else
2114  mf->dumpMap(&RhoGSmaptable);
2115 #endif
2116  delete mf;
2117  }
2118  if(config.dumpMapCoordFiles) {
2119  int size[2];
2120  size[0] = nchareRhoG; size[1] = 1;
2121  MapFile *mf = new MapFile("RhoGSMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2122 #ifdef USE_INT_MAP
2123  mf->dumpMapCoords(&RhoGSImaptable[thisInstance.getPO()], thisInstance.getPO());
2124 #else
2125  mf->dumpMapCoords(&RhoGSmaptable);
2126 #endif
2127  delete mf;
2128  }
2129 
2130 
2131  //---------------------------------------------------------------------------
2132  // rho RHart
2133  // if there aren't enough free procs refresh the avail list;
2134  if(!config.loadMapFiles && nchareRhoRHart*nchareHartAtmT > RhoAvail->count())
2135  RhoAvail->reset();
2136  CkArrayOptions rhorhartOpts(nchareRhoRHart, config.rhoRsubplanes, nchareHartAtmT);
2137  //CkArrayOptions rhorhartOpts;
2138  rhorhartOpts.setAnytimeMigration(false);
2139  rhorhartOpts.setStaticInsertion(true);
2140  if(ees_eext_on) {
2141  if(firstInstance)
2142  {
2143  RhoRHartImaptable[thisInstance.getPO()].buildMap(nchareRhoRHart, config.rhoRsubplanes, nchareHartAtmT);
2144  int success = 0;
2145  if(config.loadMapFiles) {
2146  int size[3];
2147  size[0] = nchareRhoRHart; size[1] = config.rhoRsubplanes;
2148  size[2] = nchareHartAtmT;
2149  MapFile *mf = new MapFile("RhoRHartMap", 3, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2150  success = mf->loadMap("RhoRHartMap", &RhoRHartImaptable[thisInstance.getPO()]);
2151  delete mf;
2152  }
2153  if(success == 0) {
2154  RhoRHartMapTable RhoRHarttable(&RhoRHartImaptable[thisInstance.getPO()], RhoAvail,
2155  nchareRhoRHart, config.rhoRsubplanes,
2156  config.nchareHartAtmT, excludePes);
2157  }
2158  }
2159  else
2160  {
2161  int x=mapOffsets[numInst].getx();
2162  int y=mapOffsets[numInst].gety();
2163  int z=mapOffsets[numInst].getz();
2164  if((CkNumPes()==1) && (config.fakeTorus!=1))
2165  RhoRHartImaptable[numInst]=RhoRHartImaptable[0];
2166  else
2167  RhoRHartImaptable[numInst].translate(&RhoRHartImaptable[0], x,y,z, config.torusMap==1);
2168  }
2169 
2170 
2171  CProxy_RhoRHartMap rhorHartMap = CProxy_RhoRHartMap::ckNew(thisInstance);
2172  rhorhartOpts.setMap(rhorHartMap);
2173 
2174  if(config.dumpMapFiles) {
2175  int size[3];
2176  size[0] = nchareRhoRHart; size[1] = config.rhoRsubplanes,
2177  size[2] = nchareHartAtmT;
2178  MapFile *mf = new MapFile("RhoRHartMap", 3, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2179 #ifdef USE_INT_MAP
2180  mf->dumpMap(&RhoRHartImaptable[thisInstance.getPO()], thisInstance.getPO());
2181 #else
2182  mf->dumpMap(&RhoRHartmaptable);
2183 #endif
2184  delete mf;
2185  }
2186  if(config.dumpMapCoordFiles) {
2187  int size[3];
2188  size[0] = nchareRhoRHart; size[1] = config.rhoRsubplanes,
2189  size[2] = nchareHartAtmT;
2190  MapFile *mf = new MapFile("RhoRHartMap_coord", 3, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2191 #ifdef USE_INT_MAP
2192  mf->dumpMapCoords(&RhoRHartImaptable[thisInstance.getPO()], thisInstance.getPO());
2193 #else
2194  mf->dumpMapCoords(&RhoRHartmaptable);
2195 #endif
2196  delete mf;
2197  }
2198  } //endif : ees_ext_on
2199  CkPrintf("RhoRHartMap built %d x %d x %d\n",nchareRhoRHart, config.rhoRsubplanes, config.nchareHartAtmT);
2200  //---------------------------------------------------------------------------
2201  // rho GHart
2202  // if there aren't enough free procs refresh the avail list;
2203  if(!config.loadMapFiles && nchareRhoGHart>RhoAvail->count())
2204  RhoAvail->reset();
2205  if(firstInstance)
2206  {
2207  RhoGHartImaptable[thisInstance.getPO()].buildMap(nchareRhoGHart, nchareHartAtmT);
2208  int success = 0;
2209  if(config.loadMapFiles) {
2210  int size[2];
2211  size[0] = nchareRhoGHart; size[1] = nchareHartAtmT;
2212  MapFile *mf = new MapFile("RhoGHartMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2213  success = mf->loadMap("RhoGHartMap", &RhoGHartImaptable[thisInstance.getPO()]);
2214  delete mf;
2215  }
2216  if(success == 0) {
2217 
2218  MapType3 *RhoRHartImaptablep=NULL;
2219  if(ees_eext_on)
2220  RhoRHartImaptablep=&RhoRHartImaptable[thisInstance.getPO()];
2221  RhoGHartMapTable RhoGHarttable(&RhoGHartImaptable[thisInstance.getPO()], RhoAvail,
2222  nchareRhoGHart, config.nchareHartAtmT,
2223  config.useCentroidMapRho,
2224  RhoRHartImaptablep, excludePes);
2225 
2226  }
2227  }
2228  else
2229  {
2230  int x=mapOffsets[numInst].getx();
2231  int y=mapOffsets[numInst].gety();
2232  int z=mapOffsets[numInst].getz();
2233  if((CkNumPes()==1) && (config.fakeTorus!=1))
2234  RhoGHartImaptable[numInst]=RhoGHartImaptable[0];
2235  else
2236  RhoGHartImaptable[numInst].translate(&RhoGHartImaptable[0], x,y,z, config.torusMap==1);
2237  }
2238 
2239 
2240  CProxy_RhoGHartMap rhogHartMap = CProxy_RhoGHartMap::ckNew(thisInstance);
2241  CkArrayOptions rhoghartOpts(nchareRhoGHart, nchareHartAtmT);
2242  // CkArrayOptions rhoghartOpts;
2243  rhoghartOpts.setMap(rhogHartMap);
2244  rhoghartOpts.setAnytimeMigration(false);
2245  rhoghartOpts.setStaticInsertion(true);
2246  if(config.dumpMapFiles) {
2247  int size[2];
2248  size[0] = nchareRhoGHart; size[1] = nchareHartAtmT;
2249  MapFile *mf = new MapFile("RhoGHartMap", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2250 #ifdef USE_INT_MAP
2251  mf->dumpMap(&RhoGHartImaptable[thisInstance.getPO()], thisInstance.getPO());
2252 #else
2253  mf->dumpMap(&RhoGHartmaptable);
2254 #endif
2255  delete mf;
2256  }
2257  if(config.dumpMapCoordFiles) {
2258  int size[2];
2259  size[0] = nchareRhoGHart; size[1] = nchareHartAtmT;
2260  MapFile *mf = new MapFile("RhoGHartMap_coord", 2, size, config.numPes, "TXYZ", 2, 1, 1, 1);
2261 #ifdef USE_INT_MAP
2262  mf->dumpMapCoords(&RhoGHartImaptable[thisInstance.getPO()], thisInstance.getPO());
2263 #else
2264  mf->dumpMapCoords(&RhoGHartmaptable);
2265 #endif
2266  delete mf;
2267  }
2268 
2269  //////////////////////////////////////////////////////////////////////////////
2270  // Instantiate the chares
2271 
2272  bool dummy = true;
2273 
2274  //--------------------------------------------------------------------------
2275 
2276  // insert rhoreal
2277  int rhokeeper= keeperRegister(std::string("Density"));
2278  // rhorsopts contains the nchareRhoR, rhoRSubplanes, the maps, and will make all at once
2279  UrhoRealProxy.push_back(CProxy_CP_Rho_RealSpacePlane::ckNew(sizeX,dummy,
2280  ees_eext_on, ngrid_eext_c,
2281  rhokeeper,
2282  thisInstance,
2283  rhorsOpts));
2284  /* for (int i = 0; i < nchareRhoR; i++){
2285  for (int j = 0; j < config.rhoRsubplanes; j++){
2286  UrhoRealProxy[thisInstance.proxyOffset](i,j).insert(sizeX,dummy, ees_eext_on, ngrid_eext_c, rhokeeper);
2287  } //endfor
2288  } //endfor
2289  */
2290  UrhoRealProxy[thisInstance.proxyOffset].doneInserting();
2291  /// @todo: valgrind complains of a tiny memleak here. Check if callbacks get destroyed properly.
2292  UrhoRealProxy[thisInstance.proxyOffset].ckSetReductionClient( new CkCallback(CkIndex_InstanceController::printEnergyEexc(NULL),CkArrayIndex1D(thisInstance.proxyOffset),instControllerProxy));
2293  CmiNetworkProgressAfter(0);
2294  //--------------------------------------------------------------------------
2295  // insert rhog
2296  UrhoGProxy.push_back(CProxy_CP_Rho_GSpacePlane::ckNew(sizeX, 1,
2297  1, dummy, thisInstance,
2298  rhogsOpts));
2299 /* for (int i = 0; i < nchareRhoG; i++){
2300  UrhoGProxy[thisInstance.proxyOffset](i,0).insert(sizeX, 1,1,dummy );
2301  }//endfor
2302 */
2303  UrhoGProxy[thisInstance.proxyOffset].doneInserting();
2304  CmiNetworkProgressAfter(0);
2305  //--------------------------------------------------------------------------
2306  // insert rhoghart
2307  UrhoGHartExtProxy.push_back(CProxy_CP_Rho_GHartExt::ckNew(ngrid_eext_a,ngrid_eext_b,
2308  ngrid_eext_c,ees_eext_on,
2309  natmTyp, thisInstance,
2310  rhoghartOpts));
2311  /*
2312  for (int k = 0; k < nchareHartAtmT; k++){
2313  for (int i = 0; i < nchareRhoGHart; i++){
2314  UrhoGHartExtProxy[thisInstance.proxyOffset](i,k).insert(ngrid_eext_a,ngrid_eext_b,
2315  ngrid_eext_c,ees_eext_on,natmTyp);
2316  }//endfor
2317  }//endfor
2318  */
2319  /// @todo: valgrind complains of a tiny memleak here. Check if callbacks get destroyed properly.
2320  UrhoGHartExtProxy[thisInstance.proxyOffset].ckSetReductionClient(new CkCallback(CkIndex_InstanceController::printEnergyHart(NULL),CkArrayIndex1D(thisInstance.proxyOffset),instControllerProxy));
2321  UrhoGHartExtProxy[thisInstance.proxyOffset].doneInserting();
2322  CmiNetworkProgressAfter(0);
2323  //--------------------------------------------------------------------------
2324  // insert rhoRhart
2325  if(ees_eext_on){
2326  UrhoRHartExtProxy.push_back(CProxy_CP_Rho_RHartExt::ckNew(ngrid_eext_a,ngrid_eext_b,
2327  ngrid_eext_c,ees_eext_on,
2328  natmTyp,thisInstance,
2329  rhorhartOpts));
2330  /*
2331  for (int k = 0; k < nchareHartAtmT; k++){
2332  for (int i = 0; i < nchareRhoRHart; i++){
2333  for (int j = 0; j < config.rhoRsubplanes; j++){
2334  UrhoRHartExtProxy[thisInstance.proxyOffset](i,j,k).insert(ngrid_eext_a,ngrid_eext_b,ngrid_eext_c,
2335  ees_eext_on,natmTyp);
2336  }//endfor
2337  }//endfor
2338  }//endfor
2339  */
2340  UrhoRHartExtProxy[thisInstance.proxyOffset].doneInserting();
2341  }//endif
2342  CmiNetworkProgressAfter(0);
2343  ///////////////////////////////////////////////////////////////////////////==
2344  // Output to the screen
2345  // need to add maps for these, for now just let em default
2346  // IF some condition which triggers QMMM
2347  int nchareRhoGLSP=1;
2348  int nchareRhoRealLSP=1;
2349  int nchareRhoRealLSPsubplanes=1;
2350  CkArrayOptions lspgspOpts(nchareRhoGLSP);
2351  CkArrayOptions lsprealOpts(nchareRhoRealLSP, nchareRhoRealLSPsubplanes);
2352  UlsRhoGProxy.push_back(CProxy_CP_LargeSP_RhoGSpacePlane::ckNew(thisInstance,lspgspOpts));
2353  UlsRhoRealProxy.push_back(CProxy_CP_LargeSP_RhoRealSpacePlane::ckNew(thisInstance,lsprealOpts));
2354  printf("\n");
2355  PRINT_LINE_DASH;
2356  PRINTF("Completed G-space/R-space Rho chare array build\n");
2357  PRINT_LINE_STAR;printf("\n");
2358  if(RhoAvail!=NULL)
2359  delete RhoAvail;
2360 
2361  return 1;
2362  ///////////////////////////////////////////////////////////////////////////==
2363 }//end routine
2364 //////////////////////////////////////////////////////////////////////////////
2365 /**@}*/
2366 
2367 
2368 //////////////////////////////////////////////////////////////////////////////
2369 /// Get the atoms and the parainfo
2370 //////////////////////////////////////////////////////////////////////////////
2371 //////////////////////////////////////////////////////////////////////////////
2372 //////////////////////////////////////////////////////////////////////////////
2374 //////////////////////////////////////////////////////////////////////////////
2375 /// make a group : create a proxy for the atom class and also a reduction client
2376 
2377  // Make groups for the atoms and energies
2378  if(thisInstance.idxU.y>0|| thisInstance.idxU.s>0)
2379  { // the set of chares being created is for a non-zero kpoint
2380  // all k-points use the same atoms and energies
2381  // we simply direct the proxyoffset here to the one for
2382  // the 0th kpoint
2383  // same holds true for spin
2384  UberCollection zeroKpointInstance=thisInstance;
2385  zeroKpointInstance.idxU.y=0;
2386  zeroKpointInstance.idxU.s=0;
2387  int proxyOffset=zeroKpointInstance.setPO();
2388  UatomsCacheProxy.push_back(UatomsCacheProxy[proxyOffset]);
2389  UatomsComputeProxy.push_back(UatomsComputeProxy[proxyOffset]);
2390  UegroupProxy.push_back(UegroupProxy[proxyOffset]);
2391  }
2392  else
2393  {
2394  int ibead = thisInstance.idxU.x;
2395  int itemper = thisInstance.idxU.z;
2396  PhysicsAtomPosInit *PhysicsAtom = new PhysicsAtomPosInit(ibead,itemper);
2397  int natm = PhysicsAtom->natm_tot;
2398  int natm_nl = PhysicsAtom->natm_nl;
2399  int len_nhc = PhysicsAtom->len_nhc;
2400  int iextended_on = PhysicsAtom->iextended_on;
2401  int cp_min_opt = PhysicsAtom->cp_min_opt;
2402  int cp_wave_opt = PhysicsAtom->cp_wave_opt;
2403  int isokin_opt = PhysicsAtom->isokin_opt;
2404  double kT = PhysicsAtom->kT;
2405 
2406  Atom *atoms = new Atom[natm];
2407  AtomNHC *atomsNHC = new AtomNHC[natm];
2408 
2409  PhysicsAtom->DriverAtomInit(natm,atoms,atomsNHC,ibead,itemper);
2410  UegroupProxy.push_back(CProxy_EnergyGroup::ckNew(thisInstance));
2411  // FIXME, this needs a real computation
2412  // also we need a real map
2413  int nChareAtoms=(config.numPesPerInstance<natm) ? config.numPesPerInstance : natm;
2414 
2415  if (firstInstance) {
2416  // build the base atom map
2417  AtomImaptable[numInst].buildMap(nChareAtoms);
2418  availGlobG->reset();
2419  AtomMapTable aTable = AtomMapTable(&AtomImaptable[numInst], availGlobG,
2420  numInst,nChareAtoms);
2421 
2422 
2423  } else {
2424  int x=mapOffsets[numInst].getx();
2425  int y=mapOffsets[numInst].gety();
2426  int z=mapOffsets[numInst].getz();
2427  if((CkNumPes()==1) && (config.fakeTorus!=1))
2428  AtomImaptable[numInst]=AtomImaptable[0];
2429  else
2430  AtomImaptable[numInst].translate(&AtomImaptable[0], x,y,z, config.torusMap==1);
2431  }
2432  CProxy_AtomComputeMap aMap = CProxy_AtomComputeMap::ckNew(thisInstance);
2433  CkArrayOptions atomOpts(nChareAtoms);
2434  atomOpts.setMap(aMap);
2435  atomOpts.setAnytimeMigration(false);
2436  atomOpts.setStaticInsertion(true);
2437  UatomsCacheProxy.push_back( CProxy_AtomsCache::ckNew(natm,natm_nl,
2438  atoms,thisInstance));
2439  UatomsComputeProxy.push_back( CProxy_AtomsCompute::ckNew(natm,natm_nl,
2440  len_nhc,
2441  iextended_on,
2442  cp_min_opt,cp_wave_opt,isokin_opt,
2443  kT,atoms,
2444  atomsNHC,
2445  nChareAtoms,
2446  thisInstance,
2447  atomOpts
2448  ));
2449  delete [] atoms;
2450  delete [] atomsNHC;
2451  delete PhysicsAtom;
2452  }
2453 
2454 
2455 //----------------------------------------------------------------------------
2456  }//end routine
2457 //////////////////////////////////////////////////////////////////////////////
2458 
2459 
2460 //////////////////////////////////////////////////////////////////////////////
2461 //////////////////////////////////////////////////////////////////////////////
2462 //////////////////////////////////////////////////////////////////////////////
2463 static CkReductionMsg *complexSum(int nMsg, CkReductionMsg **msgs){
2464 //////////////////////////////////////////////////////////////////////////////
2465  // get the size of the messages from the first message
2466  int matrixSize = msgs[0]->getSize() / sizeof(complex);
2467  complex *matrix = new complex[matrixSize];
2468  int i, m;
2469  for (m = 0; m < nMsg; m++) {
2470  // sanity check
2471  CkAssert(msgs[m]->getSize() == matrixSize * sizeof(complex));
2472  complex *data = (complex *) msgs[m]->getData();
2473  for (i = 0; i < matrixSize; i++)
2474  matrix[i] += data[i];
2475  }
2476  CkReductionMsg *ret = CkReductionMsg::buildNew(matrixSize *
2477  sizeof(complex), matrix);
2478  delete [] matrix;
2479  return ret;
2480 }
2481 //////////////////////////////////////////////////////////////////////////////
2482 
2483 
2484 //////////////////////////////////////////////////////////////////////////////
2485 //////////////////////////////////////////////////////////////////////////////
2486 //////////////////////////////////////////////////////////////////////////////
2487 void get_grp_params(int natm_nl, int numSfDups, int indexSfGrp, int planeIndex,
2488  int *n_ret, int *istrt_ret, int *iend_ret)
2489 //////////////////////////////////////////////////////////////////////////////
2490  {// begin routine
2491 //////////////////////////////////////////////////////////////////////////////
2492 
2493  int n = (natm_nl/numSfDups);
2494  int m = (natm_nl % numSfDups);
2495 
2496  int istrt = n*indexSfGrp;
2497  if(indexSfGrp>=m){istrt += m;}
2498  if(indexSfGrp<m) {istrt += indexSfGrp;}
2499  if(indexSfGrp<m) {n++;}
2500  int iend = n+istrt;
2501  if(numSfDups>natm_nl)
2502  if(m>=indexSfGrp)
2503  {
2504  n=0;
2505  istrt=natm_nl+1;
2506  iend=natm_nl;
2507  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
2508  CkPrintf("Redundant DupSF for chare-G %d\n",planeIndex);
2509  CkPrintf("At present this hangs, so out you go\n");
2510  CkPrintf("@@@@@@@@@@@@@@@@@@@@_error_@@@@@@@@@@@@@@@@@@@@@@\n");
2511  // CkExit();
2512  }
2513 
2514  (*n_ret) = n;
2515  (*istrt_ret) = istrt;
2516  (*iend_ret) = iend;
2517 
2518 //---------------------------------------------------------------------------
2519  }//end routine
2520 //////////////////////////////////////////////////////////////////////////////
2521 
2522 
2523 /** \addtogroup mapping */
2524 /**@{*/
2525 //////////////////////////////////////////////////////////////////////////////
2526 //////////////////////////////////////////////////////////////////////////////
2527 //////////////////////////////////////////////////////////////////////////////
2528 int atmGrpMap(int istart, int nsend, int listsize, int *listpe, int AtmGrp,
2529  int dup, int planeIndex)
2530 //////////////////////////////////////////////////////////////////////////////
2531  {// begin routine
2532 //////////////////////////////////////////////////////////////////////////////
2533 
2534  int numSfDups=config.numSfDups;
2535 
2536  if(listsize <= numSfDups){ //dup list not unique
2537  return listpe[(dup % listsize)];
2538  }//endif
2539 
2540  if(nsend <= config.numSfGrps){ //atom group list not unique
2541  return listpe[(istart + (AtmGrp % nsend))];
2542  }//endif
2543 
2544  // nsend must be > numSfGrps and its list is unique
2545  return listpe[(istart + ((AtmGrp + config.numSfGrps * planeIndex)%nsend))];
2546 
2547 //---------------------------------------------------------------------------
2548  }// end routine
2549 //////////////////////////////////////////////////////////////////////////////
2550 
2551 
2552 //////////////////////////////////////////////////////////////////////////////
2553 //////////////////////////////////////////////////////////////////////////////
2554 //////////////////////////////////////////////////////////////////////////////
2555 int gsprocNum(CPcharmParaInfo *sim, int state, int plane, int numInst) {
2556  int proc;
2557  proc = GSImaptable[numInst].get(state, plane);
2558  if(config.fakeTorus)
2559  proc=proc%CkNumPes();
2560  return(proc);
2561 }
2562 //////////////////////////////////////////////////////////////////////////////
2563 
2564 
2565 //////////////////////////////////////////////////////////////////////////////
2566 //////////////////////////////////////////////////////////////////////////////
2567 //////////////////////////////////////////////////////////////////////////////
2568 void mapOutput()
2569 //////////////////////////////////////////////////////////////////////////////
2570  {//begin routine
2571 //////////////////////////////////////////////////////////////////////////////
2572 
2573 //----------------------------------------------------------------------------
2574  }//end routine
2575 //////////////////////////////////////////////////////////////////////////////
2576 
2577 
2578 //////////////////////////////////////////////////////////////////////////////
2579 //////////////////////////////////////////////////////////////////////////////
2580 //////////////////////////////////////////////////////////////////////////////
2581 /// return the cuboid x,y,z of a subpartition exactly matching that volume
2582 //////////////////////////////////////////////////////////////////////////////
2583 bool findCuboid(int &x, int &y, int &z, int &order, int maxX, int maxY, int maxZ, int maxT, int volume, int vn){
2584 //////////////////////////////////////////////////////////////////////////////
2585  int maxD=maxX;
2586  int minD=maxX;
2587  int minX=maxX;
2588  int minY=maxY;
2589  int minZ=maxZ;
2590  // vn=0;
2591  if(config.forceMappingAxis>-1)
2592  {
2593  order=config.forceMappingAxis;
2594  if (config.forceMappingAxis==0)
2595  minD=minX;
2596  if (config.forceMappingAxis==1)
2597  minD=maxY;
2598  if (config.forceMappingAxis==2)
2599  minD=maxZ;
2600  }
2601  else
2602  {
2603  order=0;
2604  maxD = (maxY>maxD) ? maxY : maxD;
2605  maxD = (maxZ>maxD) ? maxZ : maxD;
2606  if(vn)
2607  { // using Y as the prism axis seems to suck
2608  maxD = (maxY>maxD) ? maxY : maxD;
2609  maxD = (maxZ>maxD) ? maxZ : maxD;
2610  minD = (maxY<minD) ? maxY : minD;
2611  minD = (maxZ<maxD) ? maxZ : minD;
2612 
2613  }
2614  else
2615  {
2616  maxD = (maxY>maxD) ? maxY : maxD;
2617  maxD = (maxZ>maxD) ? maxZ : maxD;
2618  minD = (maxY<minD) ? maxY : minD;
2619  minD = (maxZ<minD) ? maxZ : minD;
2620 
2621  }
2622  }
2623  // CkPrintf("minD %d maxD %d\n",minD, maxD);
2624  if(config.useCuboidMapRS)
2625  {
2626  CkPrintf("Using long prisms for useCuboidMapRS\n");
2627  }
2628 
2629  // We were reducing the volume by maxT and then finding the dimensions of the
2630  // box in terms of the no. of nodes and not processors
2631 
2632  // That worked ok on BG/L, but BG/P maxT of 4 distorts the long axis logic.
2633  // in TXYZ you have a 32x8x16 for 1 Rack or 32x8x32 for 2 Rack
2634  // in XYZT you have 8x8x64 for 1 Rack or 8x8x128 for 2 Rack
2635 
2636  // The latter case makes clear the bizarre and unreal distortion created by
2637  // pretending that maxT is just a multiplier along one dimension when
2638  // considering actual box size. What was merely odd on BG/L is now
2639  // seriously peculiar. Also default TXYZ mappings and default Order 0
2640  // mappings to use the X as long axis for prisms fail to work right because
2641  // we aren't really spanning the X*maxT dimension with these long prisms.
2642 
2643  int redVol = volume / maxT;
2644  double cubert= cbrt((double) redVol);
2645  int cubetrunc= (int) cubert;
2646  x=y=z=cubetrunc;
2647  if(cubetrunc>minD)
2648  cubetrunc=minD;
2649  if(cubetrunc>maxY)
2650  cubetrunc=maxY;
2651  if(cubetrunc>maxZ)
2652  cubetrunc=maxZ;
2653  if(redVol==x*y*z && !config.useCuboidMapRS)
2654  return true;
2655  bool switchSet=false;
2656  CkAssert(redVol>0);
2657  switch (redVol) // for the common values we just pick cuboids we like
2658  {
2659  case 1:
2660  x=1; y=1; z=1; switchSet=true; break;
2661  case 2:
2662  x=2; y=1; z=1; switchSet=true; break;
2663  case 3:
2664  x=3; y=1; z=1; switchSet=true; break;
2665  case 4:
2666  x=2; y=2; z=1; switchSet=true; break;
2667  case 5:
2668  x=5; y=1; z=1; switchSet=true; break;
2669  case 6:
2670  x=3; y=2; z=1; switchSet=true; break;
2671  case 7:
2672  x=7; y=1; z=1; switchSet=true; break;
2673  case 8:
2674  if(config.useCuboidMapRS)
2675  {
2676  if(minD>=2)
2677 
2678  { x=2; y=2; z=2; switchSet=true; break;}
2679 /* no evidence that these other schemes help
2680  if(minD==4)
2681  { x=4; y=2; z=1; switchSet=true; break;}
2682  if(minD>=8)
2683  { x=8; y=1; z=1; switchSet=true; break;}
2684 */
2685  }
2686  case 9:
2687  x=3; y=3; z=1; switchSet=true; break;
2688  case 10:
2689  x=5; y=2; z=1; switchSet=true; break;
2690  case 12:
2691  x=2; y=3; z=2; switchSet=true; break;
2692  case 14:
2693  x=7; y=2; z=1; switchSet=true; break;
2694  case 15:
2695  x=5; y=3; z=1; switchSet=true; break;
2696  case 16:
2697  if(config.useCuboidMapRS)
2698  {
2699  if(minD>=8)
2700  { x=8; y=2; z=1; switchSet=true; break;}
2701  }
2702  x=4; y=2; z=2; switchSet=true; break;
2703  case 18:
2704  x=3; y=3; z=2; switchSet=true; break;
2705  case 20:
2706  x=5; y=2; z=2; switchSet=true; break;
2707  case 21:
2708  x=7; y=3; z=1; switchSet=true; break;
2709  case 24:
2710  x=4; y=3; z=2; switchSet=true; break;
2711  case 25:
2712  x=5; y=5; z=1; switchSet=true; break;
2713  case 27:
2714  x=3; y=3; z=3; switchSet=true; break;
2715  case 28:
2716  x=7; y=2; z=2; switchSet=true; break;
2717  case 30:
2718  x=5; y=2; z=2; switchSet=true; break;
2719  case 32:
2720  if(config.useCuboidMapRS)
2721  {
2722  if(minD==8)
2723  { x=8; y=2; z=2; switchSet=true; break;}
2724  if(minD==16)
2725  { x=16; y=2; z=1; switchSet=true; break;}
2726  if(minD>=32)
2727  { x=32; y=1; z=1; switchSet=true; break;}
2728 
2729  }
2730  x=4; y=2; z=4; switchSet=true; break;
2731  case 35:
2732  x=7; y=5; z=1; switchSet=true; break;
2733  case 36:
2734  x=4; y=3; z=3; switchSet=true; break;
2735  case 40:
2736  x=5; y=4; z=2; switchSet=true; break;
2737  case 42:
2738  x=7; y=3; z=2; switchSet=true; break;
2739  case 43:
2740  x=7; y=3; z=2; switchSet=true; break;
2741  case 45:
2742  x=5; y=3; y=3; switchSet=true; break;
2743  case 48:
2744  x=4; y=3; z=4; switchSet=true; break;
2745  case 50:
2746  x=5; y=5; z=2; switchSet=true; break;
2747  case 54:
2748  x=6; y=3; z=3; switchSet=true; break;
2749  case 56:
2750  x=7; y=4; z=2; switchSet=true; break;
2751  case 60:
2752  x=5; y= 4; z=3; switchSet=true; break;
2753  case 64:
2754  if(config.useCuboidMapRS)
2755  {
2756  if(minD==8)
2757  { x=8; y=4; z=2; switchSet=true; break;}
2758  if(minD==16)
2759  { x=16; y=2; z=2; switchSet=true; break;}
2760  if(minD==32)
2761  { x=32; y=2; z=1; switchSet=true; break;}
2762  if(minD>=64)
2763  { x=64; y=1; z=1; switchSet=true; break;}
2764  }
2765  x=4; y=4; z=4; switchSet=true; break;
2766  case 128:
2767  if(config.useCuboidMapRS)
2768  {
2769  if(minD==8)
2770  { x=8; y=4; z=4; switchSet=true; break;}
2771  if(minD==16)
2772  {x=16; y=4; z=2; switchSet=true; break;}
2773  if(minD==32)
2774  { x=32; y=2; z=2; switchSet=true; break; }
2775  if(minD==64)
2776  { x=64; y=2; z=1; switchSet=true; break; }
2777  if(minD>=128)
2778  { x=128; y=1; z=1; switchSet=true; break; }
2779  }
2780  x=8; y=4; z=4; switchSet=true; break;
2781  case 256:
2782  if(config.useCuboidMapRS)
2783  {
2784  if(minD==8)
2785  { x=8; y=8; z=4; switchSet=true; break;}
2786  if(minD==16)
2787  { x=16; y=4; z=4; switchSet=true; break;}
2788  if(minD==32)
2789  { x=32; y=4; z=2; switchSet=true; break;}
2790  if(minD==64)
2791  { x=64; y=2; z=2; switchSet=true; break;}
2792  if(minD>=128)
2793  { x=64; y=2; z=1; switchSet=true; break;}
2794  }
2795  x=8; y=8; z=4; switchSet=true; break;
2796  case 512:
2797  if(config.useCuboidMapRS)
2798  {
2799  if(minD==8)
2800  { x=8; y=8; z=8; switchSet=true; break;}
2801  if(minD==16)
2802  { x=16; y=4; z=8; switchSet=true; break;}
2803  if(minD==32)
2804  { x=32; y=4; z=4; switchSet=true; break;}
2805  if(minD==64)
2806  { x=64; y=4; z=2; switchSet=true; break;}
2807  if(minD==128)
2808  { x=128; y=2; z=2; switchSet=true; break;}
2809  if(minD>=256)
2810  { x=256; y=2; z=1; switchSet=true; break;}
2811  }
2812  x=8; y=8; z=8; switchSet=true; break;
2813 
2814  default:
2815  break;
2816  }
2817  if(switchSet && config.forceMappingAxis>-1)
2818  {
2819 
2820  switch(config.forceMappingAxis)
2821  {
2822  case 0:
2823  return true; //no change
2824  case 1:
2825  {
2826  order=1; //YXZ
2827  int swap=x;
2828  x=y;
2829  y=swap;
2830  return true;
2831  }
2832  case 2:
2833  {
2834  order=2; //ZXY
2835  int swap=x;
2836  x=z;
2837  z=swap;
2838  return true;
2839  }
2840  }
2841  }
2842  else if (switchSet)
2843  {
2844  // now correct the x,y,z to put long prism axis along the
2845  // smallest torus dimension which will fit.
2846  if(x==maxX)
2847  return true;
2848  if(x==maxY)
2849  { // change to Y
2850  order=1; //YXZ
2851  int swap=x;
2852  x=y;
2853  y=swap;
2854  return true;
2855  }
2856  if(x==maxZ)
2857  { // change to Z
2858  order=2; //ZXY
2859  int swap=x;
2860  x=z;
2861  z=swap;
2862  return true;
2863  }
2864  // if we're here then we don't have a spanning prism
2865  // just pick the smallest which will fit.
2866  if(x<maxX)
2867  return true;
2868  if(x<maxY)
2869  { // change to Y
2870  order=1; //YXZ
2871  int swap=x;
2872  x=y;
2873  y=swap;
2874  return true;
2875  }
2876  if(x<maxZ)
2877  { // change to Z
2878  order=2; //ZXY
2879  int swap=x;
2880  x=z;
2881  z=swap;
2882  return true;
2883  }
2884  }
2885  // its something weird so try a best fit
2886  int start=cubetrunc-1;
2887  if(config.useCuboidMapRS)
2888  x = (redVol>=maxX) ? maxX : cubetrunc;
2889  else
2890  x=cubetrunc;
2891  for(; x<=maxX;x++)
2892  {
2893  for(y=start; y<=maxY;y++)
2894  {
2895  for(z=start; z<=maxZ;z++)
2896  {
2897  if(redVol==x*y*z)
2898  return true;
2899  }
2900  }
2901  }
2902  return false;
2903 
2904 
2905 }
2906 //////////////////////////////////////////////////////////////////////////////
2907 /**@}*/
2908 //////////////////////////////////////////////////////////////////////////////
2909 //////////////////////////////////////////////////////////////////////////////
2910 //////////////////////////////////////////////////////////////////////////////
2911 CkReduction::reducerType sumFastDoubleType;
2912 void registersumFastDouble(void){
2913  sumFastDoubleType=CkReduction::addReducer(sumFastDouble);
2914 }
2915 //////////////////////////////////////////////////////////////////////////////
2916 
2917 //////////////////////////////////////////////////////////////////////////////
2918 //////////////////////////////////////////////////////////////////////////////
2919 //////////////////////////////////////////////////////////////////////////////
2920 /// sum together matrices of doubles
2921 /// possibly faster than sum_double due to minimizing copies
2922 /// and using sameer's fastadd
2923 //////////////////////////////////////////////////////////////////////////////
2924 inline CkReductionMsg *sumFastDouble(int nMsg, CkReductionMsg **msgs){
2925 
2926  int size0=msgs[0]->getSize();
2927  int size=size0/sizeof(double);
2928 
2929  double *inmatrix;
2930  // int progcount=0;
2931 
2932  double *ret=(double *)msgs[0]->getData();
2933 
2934  for(int i=1; i<nMsg;i++)
2935  {
2936  inmatrix=(double *) msgs[i]->getData();
2937  for(int d=0;d<size;d++)
2938  ret[d]+=inmatrix[d];
2939  }
2940  return CkReductionMsg::buildNew(size0,ret);
2941 }
2942 //////////////////////////////////////////////////////////////////////////////
2943 
2944 
2945 //////////////////////////////////////////////////////////////////////////////
2946 //////////////////////////////////////////////////////////////////////////////
2947 //////////////////////////////////////////////////////////////////////////////
2948 void create_Rho_fft_numbers(int nchareR,int nchareRHart,int rhoRsubplanes,
2949  int nplane, int sizeY, int ngridbHart,
2950  int *numRXRho,int *numRYRho,int *numRXEext,int *numRYEext,
2951  int *numSubGx)
2952 //////////////////////////////////////////////////////////////////////////////
2953  {//begin routine
2954 //////////////////////////////////////////////////////////////////////////////
2955 /// Simple case : everyone is the same size
2956 
2957  if(rhoRsubplanes==1){
2958 
2959  for(int i=0;i<nchareR;i++){
2960  numRXRho[i] = sizeY;
2961  numRYRho[i] = nplane;
2962  }//endfor
2963 
2964  for(int i=0;i<nchareRHart;i++){
2965  numRXEext[i] = ngridbHart;
2966  numRYEext[i] = nplane;
2967  }//endfor
2968 
2969  }//endif
2970 
2971 //////////////////////////////////////////////////////////////////////////////
2972 /// Subplane case : different sizes with subplanes label
2973 
2974  if(rhoRsubplanes>1){
2975  int div,rem;
2976 
2977  //---------------------------------------
2978  // how many x FFTs for Rho
2979  div = (sizeY / rhoRsubplanes);
2980  rem = (sizeY % rhoRsubplanes);
2981  for(int j=0;j<rhoRsubplanes;j++){
2982  int mySizeY = (j < rem ? div+1 : div);
2983  for(int i=0;i<nchareR;i++){
2984  int ind = j*nchareR + i;
2985  numRXRho[ind] = mySizeY;
2986  }//endfor
2987  }//endfor
2988 
2989  //---------------------------------------
2990  // how many y FFTs for Rho
2991  for(int j=0;j<rhoRsubplanes;j++){
2992  int myNplane = numSubGx[j];
2993  for(int i=0;i<nchareR;i++){
2994  int ind = j*nchareR + i;
2995  numRYRho[ind] = myNplane;
2996  }//endfor
2997  }//endfor
2998 
2999  //---------------------------------------
3000  // how many x FFTs for HartEExt EES
3001  div = (ngridbHart / rhoRsubplanes);
3002  rem = (ngridbHart % rhoRsubplanes);
3003  for(int j=0;j<rhoRsubplanes;j++){
3004  int myNgridb = (j < rem ? div+1 : div);
3005  for(int i=0;i<nchareRHart;i++){
3006  int ind = j*nchareRHart + i;
3007  numRXEext[ind] = myNgridb;
3008  }//endfor
3009  }//endfor
3010 
3011  //---------------------------------------
3012  // how many y FFTs for HartEExt EES
3013  for(int j=0;j<rhoRsubplanes;j++){
3014  int myNplane = numSubGx[j];
3015  for(int i=0;i<nchareRHart;i++){
3016  int ind = j*nchareRHart + i;
3017  numRYEext[ind] = myNplane;
3018  }//endfor
3019  }//endfor
3020 
3021  }//endif
3022 
3023 //---------------------------------------------------------------------------
3024  }//end routine
3025 //////////////////////////////////////////////////////////////////////////////
3026 
3027 
3029 {
3030 //////////////////////////////////////////////////////////////////////////////
3031 /// Set user trace events for projections optimizations
3032 
3033 
3034  traceRegisterUserEvent("doRealFwFFT", doRealFwFFT_);
3035 
3036  traceRegisterUserEvent("GspaceFwFFT", GspaceFwFFT_);
3037 
3038  traceRegisterUserEvent("fwFFTGtoR0", fwFFTGtoR0_);
3039  traceRegisterUserEvent("fwFFTGtoRnot0", fwFFTGtoRnot0_);
3040  traceRegisterUserEvent("OrthoDGEMM1", OrthoDGEMM1_);
3041  traceRegisterUserEvent("GradCorrGGA", GradCorrGGA_);
3042  traceRegisterUserEvent("WhiteByrdFFTX", WhiteByrdFFTX_);
3043  traceRegisterUserEvent("doRealBwFFT", doRealBwFFT_);
3044  traceRegisterUserEvent("WhiteByrdFFTY", WhiteByrdFFTY_);
3045  traceRegisterUserEvent("WhiteByrdFFTZ", WhiteByrdFFTZ_);
3046  traceRegisterUserEvent("PostByrdfwFFTGtoR", PostByrdfwFFTGtoR_);
3047  traceRegisterUserEvent("RhoRtoGFFT", RhoRtoGFFT_);
3048  traceRegisterUserEvent("BwFFTRtoG", BwFFTRtoG_);
3049  traceRegisterUserEvent("OrthoDGEMM2", OrthoDGEMM2_);
3050  traceRegisterUserEvent("ByrdanddoFwFFTGtoR",ByrdanddoFwFFTGtoR_);
3051  traceRegisterUserEvent("eesHartExcG",eesHartExcG_);
3052  traceRegisterUserEvent("eesEwaldG",eesEwaldG_);
3053  traceRegisterUserEvent("eesAtmForcR",eesAtmForcR_);
3054  traceRegisterUserEvent("eesAtmBspline",eesAtmBspline_);
3055  traceRegisterUserEvent("eesZmatR",eesZmatR_);
3056  traceRegisterUserEvent("eesEnergyAtmForcR",eesEnergyAtmForcR_);
3057  traceRegisterUserEvent("eesProjG",eesProjG_);
3058  traceRegisterUserEvent("doNlFFTGtoR",doNlFFTGtoR_);
3059  traceRegisterUserEvent("doNlFFTRtoG",doNlFFTRtoG_);
3060  traceRegisterUserEvent("eesPsiForcGspace",eesPsiForcGspace_);
3061  traceRegisterUserEvent("enlMatrixCalc",enlMatrixCalc_);
3062  traceRegisterUserEvent("enlAtmForcCalc",enlAtmForcCalc_);
3063  traceRegisterUserEvent("enlForcCalc",enlForcCalc_);
3064  traceRegisterUserEvent("doEextFFTRtoG",doEextFFTRtoG_);
3065  traceRegisterUserEvent("doEextFFTGtoR",doEextFFTGtoR_);
3066 
3067  traceRegisterUserEvent("HartExcVksG",HartExcVksG_);
3068  traceRegisterUserEvent("divRhoVksGspace",divRhoVksGspace_);
3069  traceRegisterUserEvent("GspaceBwFFT", GspaceBwFFT_);
3070  traceRegisterUserEvent("DoFFTContribute", DoFFTContribute_);
3071  traceRegisterUserEvent("IntegrateModForces", IntegrateModForces_);
3072  traceRegisterUserEvent("Scalcmap", Scalcmap_);
3073  traceRegisterUserEvent("AcceptStructFact", AcceptStructFact_);
3074  traceRegisterUserEvent("doEextFFTGxtoRx", doEextFFTGxtoRx_);
3075  traceRegisterUserEvent("doEextFFTRytoGy", doEextFFTRytoGy_);
3076  traceRegisterUserEvent("doRhoFFTRytoGy", doRhoFFTRytoGy_);
3077  traceRegisterUserEvent("doRhoFFTGxtoRx", doRhoFFTGxtoRx_);
3078 
3079  traceRegisterUserEvent("GSProcnum", 10000);
3080  traceRegisterUserEvent("RSProcnum", 20000);
3081  traceRegisterUserEvent("SCProcnum", 30000);
3082  traceRegisterUserEvent("GHartAtmForcCopy",GHartAtmForcCopy_);
3083  traceRegisterUserEvent("GHartAtmForcSend",GHartAtmForcSend_);
3084  Ortho_UE_step2 = traceRegisterUserEvent("Ortho step 2");
3085  Ortho_UE_step3 = traceRegisterUserEvent("Ortho step 3");
3086  Ortho_UE_error = traceRegisterUserEvent("Ortho error");
3087 
3088 }
3089 
3090 void computeMapOffsets()
3091 {
3092  int x, y, z, x1, y1, z1;
3093  // correction to accomodate multiple instances
3094  for(int thisInst=1; thisInst<config.numInstances; thisInst++)
3095  {
3096  if(config.torusMap == 1) {
3097  int dimNX, dimNY, dimNZ;
3098  int longDim = -1, maxD;
3099 
3100  x1 = dimNX = topoMgr->getDimNX();
3101  y1 = dimNY = topoMgr->getDimNY();
3102  z1 = dimNZ = topoMgr->getDimNZ();
3103 
3104  maxD = dimNX;
3105  longDim = 1;
3106 
3107  if (dimNY > maxD) { maxD = dimNY; longDim = 2; }
3108  if (dimNZ > maxD) { maxD = dimNZ; longDim = 3; }
3109 
3110  switch(longDim) {
3111  case 1:
3112  x = thisInst*(maxD/config.numInstances); y = 0; z = 0;
3113  x1 = dimNX / config.numInstances;
3114  break;
3115  case 2:
3116  x = 0; y = thisInst*(maxD/config.numInstances); z = 0;
3117  y1 = dimNY / config.numInstances;
3118  break;
3119  case 3:
3120  x = 0; y = 0; z = thisInst*(maxD/config.numInstances);
3121  z1 = dimNZ / config.numInstances;
3122  break;
3123  }
3124  }
3125  else
3126  {
3127  // just split by numInstances
3128  x=thisInst*config.numPesPerInstance;
3129  y=0;
3130  z=0;
3131  }
3132  mapOffsets[thisInst]=inttriple(x,y,z);
3133  }
3134 }
3135 
3136 
3137 void paircalcstartup(pc::pcConfig *cfgSymmPC, pc::pcConfig *cfgAsymmPC, CPcharmParaInfo *sim, int doublePack)
3138 {
3139 
3140 
3141 
3142  // Stuff it with the actual configurations
3143  cfgSymmPC->isDynamics = (sim->cp_min_opt==1)? false: true;
3144  cfgSymmPC->useComplexMath = false;
3145 
3146  cfgSymmPC->numPlanes = config.nchareG;
3147  cfgSymmPC->numStates = nstates;
3148  cfgSymmPC->grainSize = config.sGrainSize;
3149  cfgSymmPC->orthoGrainSize = config.orthoGrainSize;
3150 
3151  cfgSymmPC->conserveMemory = config.conserveMemory;
3152  cfgSymmPC->isLBon = config.lbpaircalc;
3153 
3154  cfgSymmPC->areBWTilesCollected= config.PCCollectTiles;
3155  cfgSymmPC->isBWstreaming = config.PCstreamBWout;
3156  cfgSymmPC->isBWbarriered = config.useBWBarrier;
3157  cfgSymmPC->shouldDelayBWsend = config.PCdelayBWSend;
3158  cfgSymmPC->isInputMulticast = !config.usePairDirectSend;
3159  cfgSymmPC->isOutputReduced = !config.gSpaceSum;
3160  cfgSymmPC->inputSpanningTreeFactor = config.PCSpanFactor;
3161 
3162  cfgSymmPC->gemmSplitFWk = config.gemmSplitFWk;
3163  cfgSymmPC->gemmSplitFWm = config.gemmSplitFWm;
3164  cfgSymmPC->gemmSplitBW = config.gemmSplitBW;
3165 
3166 
3167 
3168  // Configurations specific to the symmetric PC instance
3169  cfgSymmPC->isSymmetric = true;
3170  cfgSymmPC->arePhantomsOn = config.phantomSym;
3171  cfgSymmPC->numChunks = config.numChunksSym;
3172  cfgSymmPC->isDoublePackOn = doublePack;
3173  cfgSymmPC->inputMsgPriority = config.psipriority;
3174  cfgSymmPC->resultMsgPriority = config.gsfftpriority;
3175 
3176  // Copy baseline parameters from symm, then override for asymm case
3177  *cfgAsymmPC=*cfgSymmPC;
3178 
3179  // Configurations specific to the asymmetric PC instance
3180  cfgAsymmPC->isSymmetric = false;
3181  cfgAsymmPC->arePhantomsOn = false;
3182  cfgAsymmPC->numChunks = config.numChunksAsym;
3183  cfgAsymmPC->isDoublePackOn = 0;
3184  cfgAsymmPC->inputMsgPriority = config.lambdapriority;
3185  cfgAsymmPC->resultMsgPriority = config.lambdapriority+2;
3186 
3187  // Configure the GSpace entry methods that the PCs will callback
3188  if(cfgSymmPC->isOutputReduced)
3189  {
3190  cfgSymmPC->gSpaceEP = CkIndex_CP_State_GSpacePlane::acceptNewPsi ((CkReductionMsg*)NULL);
3191  cfgSymmPC->PsiVEP = CkIndex_CP_State_GSpacePlane::acceptNewPsiV((CkReductionMsg*)NULL);
3192  }
3193  else
3194  {
3195  cfgSymmPC->gSpaceEP = CkIndex_CP_State_GSpacePlane::acceptNewPsi ((partialResultMsg*)NULL);
3196  cfgSymmPC->PsiVEP = CkIndex_CP_State_GSpacePlane::acceptNewPsiV((partialResultMsg*)NULL);
3197  }
3198 
3199  if(cfgAsymmPC->isOutputReduced)
3200  {
3201  cfgAsymmPC->gSpaceEP = CkIndex_CP_State_GSpacePlane::acceptLambda ((CkReductionMsg*)NULL);
3202  cfgAsymmPC->PsiVEP = 0;
3203  }
3204  else
3205  {
3206  cfgAsymmPC->gSpaceEP = CkIndex_CP_State_GSpacePlane::acceptLambda ((partialResultMsg*)NULL);
3207  cfgAsymmPC->PsiVEP = 0;
3208  }
3209 
3210 #ifdef _CP_SUBSTEP_TIMING_
3211  //symmetric AKA Psi
3212  cfgSymmPC->forwardTimerID = keeperRegister("Sym Forward");
3213  cfgSymmPC->backwardTimerID = keeperRegister("Sym Backward");
3214  cfgSymmPC->beginTimerCB = CkCallback(CkIndex_TimeKeeper::collectStart(NULL),0,TimeKeeperProxy);
3215  cfgSymmPC->endTimerCB = CkCallback(CkIndex_TimeKeeper::collectEnd(NULL),0,TimeKeeperProxy);
3216  //asymmetric AKA Lambda AKA Gamma
3217  cfgAsymmPC->forwardTimerID = keeperRegister("Asym Forward");
3218  cfgAsymmPC->backwardTimerID = keeperRegister("Asym Backward");
3219  cfgAsymmPC->beginTimerCB = CkCallback(CkIndex_TimeKeeper::collectStart(NULL),0,TimeKeeperProxy);
3220  cfgAsymmPC->endTimerCB = CkCallback(CkIndex_TimeKeeper::collectEnd(NULL),0,TimeKeeperProxy);
3221 #endif
3222 }
3223 
3224 void orthostartup( cp::ortho::orthoConfig *orthoCfg, pc::pcConfig *cfgSymmPC, pc::pcConfig *cfgAsymmPC, CPcharmParaInfo *sim, PeListFactory *peList4PCmapping)
3225 {
3226  // Blame Ram for ugly crime against readability. More
3227  // redundant config objects and builders doesn't help
3228  // global clarity at all.
3229  // EJB: Sequestered in this function call to sweep it under the rug
3230 
3231  orthoCfg->isDynamics = (sim->cp_min_opt==1)? false: true;
3232  orthoCfg->isGenWave = (sim->gen_wave==1)? true: false;
3233  orthoCfg->numStates = config.nstates;
3234  orthoCfg->grainSize = config.orthoGrainSize;
3235  orthoCfg->instanceIndex = thisInstance.getPO();
3236  orthoCfg->maxTolerance = sim->tol_norb;
3237  orthoCfg->uponToleranceFailure = CkCallback(CkIndex_GSpaceDriver::needUpdatedPsiV(), UgSpaceDriverProxy[thisInstance.getPO()]);
3238 
3239  // Fill in the paircalc configs that are instance dependent
3240  cfgSymmPC->gSpaceAID = UgSpacePlaneProxy[thisInstance.getPO()].ckGetArrayID();
3241  cfgAsymmPC->gSpaceAID = UgSpacePlaneProxy[thisInstance.getPO()].ckGetArrayID();
3242  cfgSymmPC->instanceIndex = thisInstance.getPO();
3243  cfgAsymmPC->instanceIndex = thisInstance.getPO();
3244  // Init the post-init callbacks that the paircalcs will trigger (after ortho<-->PC comm setup)
3245  cfgSymmPC->uponSetupCompletion = CkCallback(CkIndex_InstanceController::doneInit(NULL),CkArrayIndex1D(thisInstance.getPO()),instControllerProxy.ckGetArrayID());
3246  cfgAsymmPC->uponSetupCompletion = CkCallback(CkIndex_InstanceController::doneInit(NULL),CkArrayIndex1D(thisInstance.getPO()),instControllerProxy.ckGetArrayID());
3247 
3248  // Identify who is the owner for this bubble
3249  CkCallback pcHandleCB(CkIndex_CP_State_GSpacePlane::acceptPairCalcAIDs(0), UgSpacePlaneProxy[thisInstance.getPO()]);
3250 
3251  // Fill out a structure with all configs needed for PC mapping
3252  cp::startup::PCMapConfig pcMapCfg(boxSize,
3253  *peList4PCmapping,
3254  &GSImaptable[thisInstance.getPO()],
3255  (config.torusMap == 1),
3256  (config.fakeTorus == 1),
3257  mapOffsets[numInst]);
3258 
3259  // Delegate the actual construction/initialization to a creation manager
3260  cp::startup::PCCreationManager pcCreator(*cfgSymmPC, *cfgAsymmPC, *orthoCfg);
3261  pcCreator.build(pcHandleCB, pcMapCfg);
3262 }
3263 
3264 //////////////////////////////////////////////////////////////////////////////
3265 #include "CPcharmParaInfo.def.h"
3266 #include "timeKeeper.def.h"
3267 #include "startupMessages.def.h"
3268 #include "cp_state_ctrl/CP_State_GSpacePlane.h"
3269 #include "cpaimd.def.h"
3270 
3271 //////////////////////////////////////////////////////////////////////////////
bool isSymmetric
Is this a symmetric or asymmetric paircalc instance.
Definition: pcConfig.h:30
void make_rho_runs(CPcharmParaInfo *sim)
== rho space run descriptors for spherical cutoff fft support
Definition: util.C:34
int grainSize
The block size for parallelization.
Definition: orthoConfig.h:31
~main()
Cleanup stuff in the hopes of getting clean valgrind.
Definition: cpaimd.C:746
holds the UberIndex and the offset for proxies
Definition: Uber.h:68
const char OpenAtomRevision[]
A global constant for use in the code.
Definition: cpaimd.C:228
int gSpaceEP
The entry point to which this instance should send results to.
Definition: pcConfig.h:63
int inputMsgPriority
The priority (set by GSpace) of the input messages.
Definition: pcConfig.h:87
int numPlanes
The total number of planes in the system.
Definition: pcConfig.h:39
Accepts reduction of forces from AtomsCache, integrates forces to produce new coordinates, distributes new coordinates to AtomsCache.
void paircalcstartup(pc::pcConfig *cfgSymmPC, pc::pcConfig *cfgAsymmPC, CPcharmParaInfo *sim, int doublePack)
stuff to be include before the decl or else
Definition: cpaimd.C:3137
Definition: PeList.h:33
bool isDoublePackOn
Is double-packing on?
Definition: pcConfig.h:79
void setTraceUserEvents()
Definition: cpaimd.C:3028
void init_commlib_strategies(int numRhoG, int numReal, int numRhoRhart, UberCollection thisInstance)
Initialize Commlib communication strategies.
Definition: cpaimd.C:762
int numStates
The number of states in the simulation (the dimension of the input square matrix) ...
Definition: orthoConfig.h:29
bool isBWbarriered
Should we impose a hard barrier in the BW path to sync all PC chares?
Definition: pcConfig.h:101
CkArrayID gSpaceAID
The array ID of the GSpace chare array this instance talks to.
Definition: pcConfig.h:61
collection point for global scope map, proxy, and Uber read-only variables
double maxTolerance
The tolerance threshold for the S->T iterations in Ortho at which to trigger a PsiV update...
Definition: orthoConfig.h:36
void init_eesNL_chares(int natm_nl, int natm_nl_grp_max, int doublePack, PeList *exclusion, CPcharmParaInfo *sim, UberCollection thisInstance)
/addtogroup Particle
Definition: cpaimd.C:1753
bool isInputMulticast
Will the input data be multicast to PC sections or sent directly (p2p)
Definition: pcConfig.h:83
Author: Abhinav S Bhatele Date Created: December 28th, 2006.
int inputSpanningTreeFactor
The branching factor of the spanning trees that carry the input msgs.
Definition: pcConfig.h:85
A container for assorted mapping inputs to pass around easily.
Definition: pcMapConfig.h:15
Configuration settings for the ortho world.
Definition: orthoConfig.h:17
CkCallback uponSetupCompletion
Callback to trigger at the end of a paircalc array's init.
Definition: pcConfig.h:59
void init_PIBeads(CPcharmParaInfo *sim, UberCollection thisInstance)
Create the PIBeadAtoms array.
Definition: cpaimd.C:1203
CkGroupID mCastGrpId
Multicast manager group that handles many mcast/redns in the code. Grep for info. ...
Definition: cpaimd.C:216
int gemmSplitFWk
{ BGL's painful NIC forces us to split long computations.
Definition: pcConfig.h:121
A place to collect substep times.
int numStates
The total number of states in the system.
Definition: pcConfig.h:41
Add type declarations for simulationConstants class (readonly vars) and once class for each type of o...
void init_state_chares(int natm_nl, int natm_nl_grp_max, int numSfGrps, int doublePack, CPcharmParaInfo *sim, UberCollection thisInstance)
Create the array elements for the GSpace, Particle and Real Space planes ////////////////////////////...
Definition: cpaimd.C:1235
bool isDynamics
Is this a minimization or dynamics run.
Definition: orthoConfig.h:21
int conserveMemory
The mem footprint vs performance setting for the paircalcs (tribool)
Definition: pcConfig.h:75
Hacky solution to passing a PeList to GSpace(0,0) for use in paircalc mapping without actually having...
Definition: PeList.h:423
bool fakeTorus
readonly defined in cpaimd.C
Definition: cpaimd.C:172
#define IntegrateModForces_
200-300 reserved for paircalculator
Definition: cpaimd.h:676
int grainSize
The grain size along the states dimensions (plural) (number of states per PC chare) ...
Definition: pcConfig.h:45
bool isDynamics
Is this a minimization or dynamics run.
Definition: pcConfig.h:28
int PsiVEP
The entry point to which this instance should send PsiV tolerance update results to.
Definition: pcConfig.h:65
int instanceIndex
The proxyOffset value of thisInstance of OpenAtom computations.
Definition: pcConfig.h:57
bool isOutputReduced
Should the results from each PC chare be reduced or delivered individually to GSpace?
Definition: pcConfig.h:91
int numChunks
The number of chunks (4th dimension of decomposition)
Definition: pcConfig.h:43
bool useComplexMath
Should the actual matrix multiplies be handled as real math or complex math.
Definition: pcConfig.h:34
Definition: Atoms.h:129
Some basic data structures and the array map classes are defined here.
Definition: Atoms.h:20
Definition: Uber.h:36
void build(CkCallback cb, const PCMapConfig &mapCfg)
CkCallback uponToleranceFailure
Callback to notify bubble owner (GSpace) that a tolerance update is needed.
Definition: orthoConfig.h:38
bool isLBon
Should the paircalcs worry about load balancing.
Definition: pcConfig.h:77
bool shouldDelayBWsend
Should we tweak msg priority to delay msgs to GSpace carrying the results?
Definition: pcConfig.h:103
Dumb structure that holds all the configuration inputs required for paircalc instantiation, functioning and interaction.
Definition: pcConfig.h:23
bool isBWstreaming
Should this instance stream the result fragments in the BW path as they become ready?
Definition: pcConfig.h:99
int orthoGrainSize
The grain size along the states dimensions for Ortho chares.
Definition: pcConfig.h:52
bool findCuboid(int &x, int &y, int &z, int &order, int maxX, int maxY, int maxZ, int maxT, int volume, int vn)
return the cuboid x,y,z of a subpartition exactly matching that volume
Definition: cpaimd.C:2583
bool arePhantomsOn
If this is a symmetric instance, should it use phantom chares to balance the BW path.
Definition: pcConfig.h:32
double Timer
readonly globals
Definition: cpaimd.C:200
void create_line_decomp_descriptor(CPcharmParaInfo *sim)
/////////////////////////////////////////////////////////////////////////cc
Definition: util.C:1651
#define OPENATOM_REVISION
The build system should define this macro to be the commit identifier.
Definition: cpaimd.C:222
CkReductionMsg * sumFastDouble(int nMsg, CkReductionMsg **msgs)
sum together matrices of doubles possibly faster than sum_double due to minimizing copies and using s...
Definition: cpaimd.C:2924
Manages the creation of a complete paircalc bubble that includes two paircalc instances (symmetric an...
void control_physics_to_driver(UberCollection thisInstance)
Get the atoms and the parainfo //////////////////////////////////////////////////////////////////////...
Definition: cpaimd.C:2373
bool isGenWave
If, this is a minimization run, is it for generating the system wave functions?
Definition: orthoConfig.h:23
int resultMsgPriority
If shouldDelayBWsend, what priority should this instance use for the result msgs. ...
Definition: pcConfig.h:93
int init_rho_chares(CPcharmParaInfo *sim, UberCollection thisInstance)
Creating arrays CP_Rho_GSpacePlane, CP_Rho_GSpacePlaneHelper and CP_Rho_RealSpacePlane.
Definition: cpaimd.C:1902
Author: Eric J Bohm Date Created: May 31, 2006.
bool areBWTilesCollected
Should this instance collect result fragments and send them out together in the BW path...
Definition: pcConfig.h:97
void create_Rho_fft_numbers(int nchareR, int nchareRHart, int rhoRsubplanes, int nplane, int sizeY, int ngridbHart, int *numRXRho, int *numRYRho, int *numRXEext, int *numRYEext, int *numSubGx)
Definition: cpaimd.C:2948