Automatic block threading
DIY can automatically execute multiple blocks resident in memory concurrently if allowed to have access to more than one thread. Similar to automatically switching from in-core to out-of-core, DIY can thread blocks automatically with no change to the program logic, and without recompiling if the number of threads is provided as a command-line argument.
DIY's threading is fairly coarse-grained: one thread executes an entire callback function provided to
diy::Master::foreach()
. Because each block is a separate object, and all state is maintained in the block, DIY's
automatic threading is inherently thread-safe. Within a block callback function, the user is also free to write
finer-grain custom threaded code, assuming that adequate harware resources exist for DIY's threads and the user's, and
that there are no conflicts between different thread libraries (DIY uses pthreads).
DIY's threading feature can also be combined with out-of-core execution.
Block
No changes are required to the block structure described in the Initialization module .
struct Block
{
static void* create() { return new Block; }
static void destroy(void* b) { delete static_cast<Block*>(b); }
// your user-defined member functions
...
// your user-defined data members
...
}
Master
Recall from Initialization that diy::Master owns and manages the blocks that are assigned to the
current MPI process. It also executes callback functions on blocks, which is where computations on blocks occur.
To execute callback functions on multiple blocks resident in memory concurrently, simply change the num_threads
argument to Master
to a value greater than 1. For example, assume we have a program with 32 total blocks, run on 8 MPI
processes. DIY's Assigner
will assign 32 / 8 = 4 blocks to each MPI process. Assume all 4 blocks fit in memory. Assume
we can allow DIY 2 threads for executing callback functions. In this case, rather than DIY stepping serially through
the 4 blocks in a process each time a callback function is called, DIY will only make two iterations through the local
blocks; in each iteration two blocks will execute their callbacks concurrently.
If Block
is defined as above, the first part of the code looks like this.
#include <diy/assigner.hpp>
#include <diy/master.hpp>
int main(int argc, char* argv[])
{
diy::mpi::environment env(argc, argv); // diy's version of MPI_Init
diy::mpi::communicator world; // diy's version of MPI communicator
int nprocs = 8; // total number of MPI ranks
int nblocks = 32; // total number of blocks in global domain
int num_threads = 2; // number of threads DIY is allowed to use
diy::ContiguousAssigner assigner(nprocs, nblocks); // assign blocks to MPI ranks
diy::Master master(world, // communicator
num_threads, // number of threads DIY is allowed to use
-1, // all blocks remain in memory in this example
&Block::create, // block create function
&Block::destroy); // block destroy function
...