----------------------- StarSlice perf. tests ------------------- old and new meshes are named ".noboite" (tetmesh format) in ~olawlor/csar/meshing/starslice/transfer/ so you've got to do ln -s ~olawlor/csar/meshing/starslice/transfer/*.noboite . Transfer from old to new, with fabricated solution data Read input file old.noboite: 18232 nodes, 86906 tets Read input file new.noboite: 23274 nodes, 117787 tets -- Serial tests on thrift4 (AthlonXP2000, 1GB ram) -- (-memory os is *much* better than our builtin allocator-- the builtin version doesn't seem to release metis' huge buffers when they get freed (sbrk issue?)) Everything is built with -O -g +p1 +vp1 Finding bounding boxes took 0.154904 s (+0.000193 s imbalance) Colliding bounding boxes took 13.764658 s (+0.000318 s imbalance) Extracting collision list took 0.434714 s (+0.000228 s imbalance) Rank 0: 5597657 collisions, Finding communication size took 0.120343 s (+0.000155 s imbalance) Creating outgoing messages took 0.116760 s (+0.000185 s imbalance) Isend took 0.000211 s (+0.000060 s imbalance) Recv took 0.000079 s (+0.000053 s imbalance) Wait took 0.000077 s (+0.000051 s imbalance) Transferring solution took 48.980991 s (+0.000217 s imbalance) (lots of warnings about little boundary elements that are getting clipped off) Normalizing transfer took 0.057111 s (+0.000184 s imbalance) [64.517 s] Transferred. +p1 +vp2 (peak memory is close to 500MB during collide (!)) Similar overall, collide is slightly slower (16s now). Send/recv lists are pretty dang big-- 5.5m collisions for only 100K elements.... [78.622 s] Transferred, counting 11s in metis. Varying numbers of processes on single-PE thrift4: Input/partition: log_thrift4_1:[0.826 s] Input files read log_thrift4_2:[11.004 s] Input files read log_thrift4_4:[15.652 s] Input files read log_thrift4_8:[11.917 s] Input files read log_thrift4_16:[12.692 s] Input files read -> This is surpisingly slow. Metis isn't such a problem, but node adjacency spits out too many links. Changing the adjacency to face-based both speeds up the solution and cuts the memory dramatically (to like 3s!). Collide: log_thrift4_1:Colliding bounding boxes took 13.621231 s (+0.000344 s imbalance) log_thrift4_2:Colliding bounding boxes took 21.385291 s (+0.064712 s imbalance) log_thrift4_4:Colliding bounding boxes took 22.907772 s (+1.533167 s imbalance) log_thrift4_8:Colliding bounding boxes took 20.893643 s (+3.414591 s imba log_thrift4_16:Colliding bounding boxes took 17.842731 s (+12.608363 s imbalance) Total time: log_thrift4_1:[63.437 s] Transferred. log_thrift4_2:[83.291 s] Transferred. log_thrift4_4:[91.333 s] Transferred. log_thrift4_8:[87.812 s] Transferred. log_thrift4_16:[96.161 s] Transferred. -> This is excellent overall scaling. Be interesting to see how the real parallel performance is. ------------------ Magic Intersection ------------------- To avoid numerical difficulties in my intersection routines, switching to Magic Software's intersection routines, which are volume-based and hence less likely to get confused by roundoff. They're initially almost exactly the same speed under -O: Transferring solution took 47.655317 s (+0.000120 s imbalance) A -pg run shows (with a *4x* slowdown from -pg...): % cumulative self self total time seconds seconds calls ms/call ms/call name 15.91 14.54 14.54 42604238 0.00 0.00 SplitAndDecompose(Mgc::Tetrahedron, Mgc::Plane const &, vector > &) 11.62 25.16 10.62 793922392 0.00 0.00 Mgc::Vector3::Vector3(Mgc::Vector3 const &) 6.65 31.24 6.08 40391633 0.00 0.00 vector >::_M_insert_a ux(Mgc::Tetrahedron *, Mgc::Tetrahedron const &) 6.22 36.93 5.69 80783266 0.00 0.00 Mgc::Tetrahedron * __uninitialized_copy_aux(Mgc::Tetrahedron *, Mgc::Tetrahedron *, Mgc::Tetrahedron *, __false_type) 6.15 42.55 5.62 5597657 0.00 0.01 Mgc::FindIntersection(Mgc::Tetrahedron const &, Mgc::Tetrahedron co nst &, vector > &) 5.67 47.73 5.18 376371632 0.00 0.00 Mgc::Vector3::operator=(Mgc::Vector3 const &) 5.02 52.32 4.59 5597657 0.00 0.01 getSharedVolumeMgc(TetMesh const &, TetMesh const &, int const *, i nt const *) 4.57 56.50 4.18 198405237 0.00 0.00 Mgc::Vector3::Dot(Mgc::Vector3 const &) const 4.47 60.59 4.09 258061106 0.00 0.00 Mgc::Vector3::Vector3(double, double, double) 3.85 64.11 3.52 CollideOctant::divide(int) 3.22 67.05 2.94 simpleFindCollisions(CollideOctant const &, CollisionList &) 3.11 69.89 2.84 108600622 0.00 0.00 Mgc::operator*(double, Mgc::Vector3 const &) 2.78 72.43 2.54 5833145 0.00 0.00 Mgc::Tetrahedron * __uninitialized_copy_aux(Mgc::Tetrahedron const *, Mgc::Tetrahedron const *, Mgc::Tetrahedron *, __false_type) 2.63 74.83 2.40 22390628 0.00 0.00 vector >::operator=(v Original: Transferring solution took 47.655317 s (+0.000120 s imbalance) inline Vector3 constructor: Transferring solution took 35.354978 s (+0.000114 s imbalance) early-exit main intersection loop: Transferring solution took 34.663323 s (+0.000121 s imbalance) parameter-passing early-exit optimization on tet parameter: Transferring solution took 34.083170 s (+0.000121 s imbalance) vector-copy optimization: Transferring solution took 27.907812 s (+0.000120 s imbalance) fewer copies in caller code: Transferring solution took 26.393417 s (+0.000120 s imbalance) Vector3::Dot inlining: (I'm sure glad CkVector is almost all inline!) Transferring solution took 22.342098 s (+0.000116 s imbalance) Early-exit test for nonintersecting tets (*halves* the SplitAndDecompose calls!) Transferring solution took 19.053882 s (+0.000121 s imbalance) Final -pg run: % cumulative self self total time seconds seconds calls ms/call ms/call name 22.82 7.60 7.60 25265028 0.00 0.00 SplitAndDecompose(Mgc::Tetrahedron const &, Mgc::Plane const &, vec tor > &) 11.08 11.29 3.69 CollideOctant::divide(int) 8.80 14.22 2.93 5597657 0.00 0.00 getSharedVolumeMgc(TetMesh const &, TetMesh const &, int const *, i nt const *) 7.90 16.85 2.63 simpleFindCollisions(CollideOctant const &, CollisionList &) 5.68 18.74 1.89 11195314 0.00 0.00 Mgc::Tetrahedron::GetPlanes(Mgc::Plane *) const 4.83 20.35 1.61 14480365 0.00 0.00 vector >::_M_insert_a ux(Mgc::Tetrahedron *, Mgc::Tetrahedron const &) 4.77 21.94 1.59 28960730 0.00 0.00 Mgc::Tetrahedron * __uninitialized_copy_aux(Mgc::Tetrahedron *, Mgc::Tetrahedron *, Mgc::Tetrahedron *, __false_type) 4.41 23.41 1.47 5597657 0.00 0.00 Mgc::FindIntersection(Mgc::Tetrahedron const &, Mgc::Tetrahedron co nst &, vector > &) 3.81 24.68 1.27 8419254 0.00 0.00 allOutside(Mgc::Plane const *, int, Mgc::Tetrahedron const &)