r2 - 25 Jun 2008 - 20:49:28 - EricBohmYou are here: CharmWiki >  Main Web > CenterForPetascaleComputing > WorkShopJun24Notes
 

-- EricBohm - 24 Jun 2008

Thomas Dunning:

  • Center is critical to Illinois strategy to become the academic center for extreme scaling computing.
  • Allocations on Abe will be available
    • 50% of Abe is for IACAT, apply for time through Kale and D Johnson

General:

David Padua: Transforming Programs Automatically

  • Compilers
    • Vikram A will be contributing a multicore oriented compiler.
    • ETA ~ 1 year
  • Generators
    • schemes for specifying algorithms and then having the code expressed for target machines or tasks
  • Libraries
    • Short term work can begin on hand/self tuned libraries oriented towards IACAT-CPC needs.
    • Auto tuning BLAS routines. (A Duchateau)
      • strategy searches unrolling space and loop parameters around innermost loop.
      • works by annotation, good for use on critical path computation

Ralph Johnson: Refactoring

  • Original design no longer sufficient for current needs.
  • Flossing vs Root Canal. Incremental refactoring vs major refactoring
    • Have automated test suite
    • make small change (15 minutes to 1-2 hours)
    • test change to verify
    • break large changes into smaller changes
      • when adding a feature, do the refactoring first, won't need new tests.
    • Optimizations
      • qualitatively different in that they add no feature and usually make the code harder to understand
      • these manual changes are actually a kind of program transformation
      • a refactoring aware version control system can propagate changes forward down branches
  • Ask Ralph to help with refactoring issues, improving documentation, and adapting to new technologies
  • Automated/interactive refactoring tools (Photran) not yet for general consumption, ~ 1 year

Sanjay Kale: Charm++ and its use in scalable CSE applications

  • overdecomposition
    • express as much parallelism as there is available, more than the number of cores
  • adaptive runtime system
  • dynamic load balancing
  • performance tools, scalable with automated data mining to highlight performance bottlenecks
  • new "languages"
  • early performance development via BigSim
  • PMPI layer available for using projections tool with MPI codes (without AMPI) idooley@uiuc.edu

David Ceperly: QMCPack

  • QMC stochastic sampling to solve many body Schrodinger
  • Monte Carlo methods to use random walk approaches
  • Branches occur which split walkers which creates some imbalance
  • Walkers are weakly coupled, not communication bound
  • Computing from the determinant e.g. 100x100 done ~ 10^12 times
    • Lapack sherman morrison single column update for matrix inverse to get the determinant
    • can be done using 3D spline interpolation table requiring a few hundred operations
    • has memory constraint due to table size
  • computational bias is 1/number of walkers.
    • you need enough to approach normal noise
    • too many consumes more resource with little benefit
  • will need another level of decomposition, parallelization within walkers
  • cpu time per walker per DMC strp grows N^3
  • will need fault tolerance
  • need SMP awareness
    • memory issues with size of one-body orbitals
    • finer level parallelism, perhaps 1 walker per node
  • Padua suggests a code walkthrough
    • D Johnson recommends profiling first to identify code segments for a walkthrough

Duane Johnson: KKR (hybrid real and k-space) order-N DFT code

  • overall characteristics
    • green's function, but without eigenvalue spectra etc
    • screened structure constants (sparsity)
    • sparse matrix technique ( memory and size)
    • Iterative inversion technique (size)
    • hybrid k-space/r-space (accuracy)
    • r-space only (speed)
    • Coherent potential approx (disorder)
  • complex non-hermitian matrices
  • G= G_0 (1- t G_0)^-1 (repeated until convergence) is the core of the computational time
    • evaluated at each energy
    • G_0 structure
    • t is
  • create reference state system using square well
    • solve for true system relative to reference system G=G_r[I - (t -T_r)G_r]-1
    • G_r is sparse <= 1% nonzero for N=2000. this is the whole point of doing it this way.
    • sadly, off diagonals same magnitude as on diagonal, which makes banded approaches unsuitable
  • parallelize by atom vectors, or groups of atom vectors
  • pre-conditioners have so far not yielded good results
  • need iterative schemes for non-hermitian case with good scaling properties

Paul Ricker: FLASH

  • AMR
  • Type 1a supernovae, novae, x-ray bursts
  • grid determined by power spectrum initial fluctuation
  • physics being modeled
    • hydrodynamics, euler equations, magnetohydrodynamics, multifuild advection
    • Reactions: coupled ODE
    • Gravity: Poisson equation (heavy computational component)
    • Particles: Vlasov-Boltzmann equation
  • Modular framework, managed by python script to mix and match for specified target
  • multigrid poisson solving has complex storage issues with ghost/guard boundary exchange
  • use space filling curve for load balancing in Paramesh
    • worth discussing a non-SFC solution (oct-tree?)
  • multigrid poisson solver has discontinuity at fine grid boundaries
    • oct-tree intpolate on all boundary points, residual on composite sources single layer potential
  • weak scaling to 128k BG/L, flat for Eulerian, less optimal for lagrangian
  • good strong scaling on itanium,
  • strong scaling on Jaguar good to about 8x minimum number to fit
  • Load balancing
    • multigrid poisson solver
    • movement of particles among processors
    • addition of time refinement
    • coping with hybrid architectures
  • results out
    • efficient I/O for raw checkpoint data (one run can produce terabytes of data)
    • fault tolerance (error recovery)
    • on-the fly analysis to reduce I/O requirements
  • migrate to AMPI
  • Tune multigrid
    • multiblock FFT solver
  • hybrid parallelization for multi-core
    • re-vectorization (sort of like it was way back in the T1 days)

Jim Phillips: Towards Petascale Biomolecular Simulations with NAMD

  • All atom MD long time scales (10 ns minumum, more is better), usually biological systems in liquid
  • 20,000 users
  • 10-20 MB data for 100,000 atoms
  • Full electrostatics, multiple timestepping for long range electrostatics
  • Hybrid spatial decomposition, compute objects for pairwise patch force computation
    • over decomposed
  • Uses dynamic load balancing based on instrumentation of computation time per object (Charm++)
  • Jim notes that automatic refactoring/transformation tools could be useful for MPI codes
    • e.g. a get based implementation could be swapped for a put based implementation if the underlying fabric has faster puts than gets
  • Challenges derive from determining the origin of performance problems and locating the appropriate workaround
    • this should be done automatically for the end user

Tools for OpenMP? optimization should be studied. (Acumem, plus some direct help from Intel are worth pursuing).

Commonalites:

  • What differential is being solved?
  • What kernel contains the computation?
  • Which of these characterizes performance?
    • Flow Equation
    • Poisson Equation,
    • FFTW
    • Linear Algebra routines
  • How do we identify problems in the code
  • how do we identify the real cause of the problem?
  • A tool to figure out which cores to avoid on a target machine with daemon interference would be helpful

Key needs:

  • D Johnson will consult with the application teams to identify their specific key needs wrt libraries, compilers, etc
  • Kale will consult with the CS teams to identify specific techniques for IACAT-CPC use
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions

tip TWiki Tip of the Day
InterwikiPlugin links
The InterwikiPlugin automatically links ExternalSite:Page text to external sites based on aliases defined ... Read on Read more

 
Powered by CharmWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CharmWiki? Send feedback