Topic: OpenMP parallelization
We should refactor some assembly methods at engineering model level. Consider, for example the implementation of EngngModel :: assembleVectorFromBC (simplified):
for ( int i = 1; i <= nbc; ++i ) {
if ( ( bc = dynamic_cast< ActiveBoundaryCondition * >(bc) ) ) {
va.assembleFromActiveBC(answer, *abc, tStep, mode, s, eNorms);
} else if ( ( bodyLoad = dynamic_cast< BodyLoad * >(lbc) ) ) { // Body load:
// compute contrib
answer.assemble();
} else if ( ( sLoad = dynamic_cast< SurfaceLoad * >(load) ) ) { // Surface load:
// compute
answer.assemble()
}
}
The problem with this code is following: if we mark the loop over bcs as parallel (using pragma omp parallel), then we have the branching inside with independent sections, where the actual assembly into the destination is performed (answer.assemble calls) Even if we mark these sections as critical, we can get wrong result. This is due to the fact, that individual threads processing different BC can at the same enter different (and independent) critical sections with vector assembly.
I see two different solutions:
1) make floatArray assemble method thread safe (my preffered solution)
2) split the loop into independent loops over particular BC types (surface, edge, etc) and parallelize these loops
Actually we can apply the same principle to sparse matrices. They can implement much more smarter locking policy than marking the whole assembly as critical section.
Any comment or suggestions welcome.