parallelization-howto
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
parallelization-howto [2010/03/20 14:49] – Added links to further reading bp | parallelization-howto [2012/12/17 19:39] (current) – [Example] mikael.ohman | ||
---|---|---|---|
Line 32: | Line 32: | ||
To facilitate the mutual data exchange between sub-domains, | To facilitate the mutual data exchange between sub-domains, | ||
+ | |||
+ | ====== Parallelizing serial problem (static load balancing) ====== | ||
+ | Efficient parallel algorithm requires that all steps in solution procedure are parallelized. In a typical problem this involves assembly of characteristic problem, its solution, and postprocessing. These steps are facilitated by the fact, that on each partition, OOFEM assigns unique local code numbers to the local nodes as well as to the shared nodes. | ||
+ | |||
+ | ===== Parallel assembly ===== | ||
+ | |||
+ | Provided that the partitioning of the problem is given, each processor can assemble its local contribution to the global characteristic equation. | ||
+ | This part is pretty identical to serial assembly process and therefore allows to share this part between serial and parallel part of the algorithm. After finishing assembly of local contributions, | ||
+ | contributions to a particular remote partition, is required, while the receive map contains the shared node numbers, for which the exchange, in terms of receiving the contributions from a particular | ||
+ | remote partition, is required. The exchange process then consists in packing the local contributions | ||
+ | according to the send map and sending them to the corresponding partitions, followed by | ||
+ | receiving remote partitions contributions and unpacking their | ||
+ | contributions according to the receive maps. | ||
+ | |||
+ | The user is required to provide following methods: | ||
+ | - A **packing method** for wrapping the local contributions (for example load vector contributions) according to the send map into a communication buffer. Then these packed contributions are sent to the corresponding (remote) partitions, where they are received and unwrapped using **unpack method** according to the receive maps. | ||
+ | - As packing/ | ||
+ | |||
+ | When assembling sparse matrices in parallel, no specific action is needed, provided that the suitable sparse matrix representation is used (SMT_PetscMtrx, | ||
+ | |||
+ | ===== Parallel solution and postprocessing ===== | ||
+ | |||
+ | ===== Example ===== | ||
+ | Fist, we have to create an instance of // | ||
+ | <code cpp> | ||
+ | IRResultType | ||
+ | MyEngngModel :: initializeFrom(InputRecord *ir) | ||
+ | { | ||
+ | // problem specific part here | ||
+ | |||
+ | // initialize communicator if in parallel mode | ||
+ | #ifdef __PARALLEL_MODE | ||
+ | // first create send and receive communication buffers, here we use fix-length | ||
+ | // static buffers (CBT_static) (size needs to be determined) | ||
+ | commBuff = new CommunicatorBuff(this-> | ||
+ | // and create communicator, | ||
+ | communicator = new ProblemCommunicator(this, | ||
+ | | ||
+ | | ||
+ | |||
+ | #endif | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Following method illustrates how the assembly method for vector (load vector) is changed to support parallel assembly. It uses communicator, | ||
+ | |||
+ | <code cpp> | ||
+ | void | ||
+ | MyEngngModel :: assembleLoadVector (FloatArray & | ||
+ | Domain *sourceDomain, | ||
+ | { | ||
+ | EModelDefaultEquationNumbering en; | ||
+ | | ||
+ | // resize load vector first to accomodate all local entries | ||
+ | loadVector.resize( sourceDomain-> | ||
+ | // prepare for assembly | ||
+ | loadVector.zero(); | ||
+ | |||
+ | // assemble element part of load vector | ||
+ | this-> | ||
+ | | ||
+ | // assemble nodal part | ||
+ | this-> | ||
+ | VM_Total, en, sourceDomain); | ||
+ | |||
+ | #ifdef __PARALLEL_MODE | ||
+ | // parallel section (compiles only when parallel support is configured) | ||
+ | // pack all data, need to pass pointer to engngModel, local vector, and packing method | ||
+ | // this will call pack method for each remote partition | ||
+ | communicator-> | ||
+ | & loadVector, | ||
+ | & MyEngngModel :: packLoad ); | ||
+ | |||
+ | // send the packed data to remote partitions and receive their data | ||
+ | communicator-> | ||
+ | |||
+ | // unpack all received data, parameters: pointer to engngModel, local vector, and unpacking method | ||
+ | communicator-> | ||
+ | & loadVector, | ||
+ | & StructuralEngngModel :: unpackLoad ); | ||
+ | // finish exchange | ||
+ | communicator-> | ||
+ | #endif | ||
+ | |||
+ | } | ||
+ | |||
+ | </ | ||
+ | |||
+ | The following code illustrates the implementation of pack and unpack methods. The methods should have two parameters: a vector containing source/ | ||
+ | |||
+ | <code cpp> | ||
+ | int | ||
+ | MyEngngModel :: packLoad(FloatArray *src, ProcessCommunicator & | ||
+ | { | ||
+ | /** This method wraps the shared nodes entries of given vector into | ||
+ | communication buffer. The buffer is attribute of given ProcessCommunicator, | ||
+ | uniquely determines the remote partition | ||
+ | | ||
+ | int result = 1; | ||
+ | int i, size; | ||
+ | int j, ndofs, eqNum; | ||
+ | Domain *domain = this-> | ||
+ | IntArray const *toSendMap = processComm.giveToSendMap(); | ||
+ | ProcessCommunicatorBuff *pcbuff = processComm.giveProcessCommunicatorBuff(); | ||
+ | DofManager *dman; | ||
+ | Dof *jdof; | ||
+ | |||
+ | size = toSendMap-> | ||
+ | for ( i = 1; i <= size; i++ ) { // loop over send map entries (shared nodes) | ||
+ | dman = domain-> | ||
+ | ndofs = dman-> | ||
+ | for ( j = 1; j <= ndofs; j++ ) { // loop over shared node DOF's | ||
+ | jdof = dman-> | ||
+ | if ( jdof-> | ||
+ | // if DOF is primary (not linked) and no BC applied (equation number assigned) | ||
+ | // then pack its corresponding value in given vector to buffer | ||
+ | result &= pcbuff-> | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | |||
+ | return result; | ||
+ | } | ||
+ | |||
+ | |||
+ | int | ||
+ | MyEngngModel :: unpackLoad(FloatArray *dest, ProcessCommunicator & | ||
+ | { | ||
+ | /** This method unpacks the received shared nodes entries into given vector. | ||
+ | The receive buffer is attribute of given ProcessCommunicator, | ||
+ | uniquely determines the remote partition sending the data. | ||
+ | | ||
+ | |||
+ | int result = 1; | ||
+ | int i, size; | ||
+ | int j, ndofs, eqNum; | ||
+ | Domain *domain = this-> | ||
+ | dofManagerParallelMode dofmanmode; | ||
+ | IntArray const *toRecvMap = processComm.giveToRecvMap(); | ||
+ | ProcessCommunicatorBuff *pcbuff = processComm.giveProcessCommunicatorBuff(); | ||
+ | DofManager *dman; | ||
+ | Dof *jdof; | ||
+ | double value; | ||
+ | |||
+ | |||
+ | size = toRecvMap-> | ||
+ | for ( i = 1; i <= size; i++ ) { // loop over receive map | ||
+ | dman = domain-> | ||
+ | ndofs = dman-> | ||
+ | dofmanmode = dman-> | ||
+ | for ( j = 1; j <= ndofs; j++ ) { // loop over shared node DOFs | ||
+ | jdof = dman-> | ||
+ | if ( jdof-> | ||
+ | // if DOF is primary (not linked) and no BC applied (equation number assigned) | ||
+ | // then unpdate buffer value into corresponding vector value | ||
+ | result &= pcbuff-> | ||
+ | if ( dofmanmode == DofManager_shared ) { | ||
+ | // remote contribution is added to local one | ||
+ | dest-> | ||
+ | } else { | ||
+ | _error3( " | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | |||
+ | return result; | ||
+ | } | ||
+ | |||
+ | </ | ||
+ | Provided that the suitable sparse matrix representation is used (SMT_PetscMtrx, | ||
+ | <code cpp> | ||
+ | void | ||
+ | MyEngngModel :: solveYourselfAt(TimeStep *tStep) { | ||
+ | |||
+ | // create components of characteristic equation | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | // initialize profile of stiffness matrix | ||
+ | | ||
+ | |||
+ | // assemble stiffness and load vector | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | // get numerical method to solve the problem | ||
+ | | ||
+ | // solve the problem (yes, this solves linear system in parallel!) | ||
+ | | ||
+ | // postprocess results update nodes, elements, compute strains, stresses, etc | ||
+ | | ||
+ | // and we are done! | ||
+ | } | ||
+ | </ | ||
====== Further reading ====== | ====== Further reading ====== |
parallelization-howto.1269092961.txt.gz · Last modified: 2010/03/20 14:49 by bp