Writing out-of-core algorithms
Typically, out-of-core algorithms are distinct from their in-core counterparts, and much research has been conducted on out-of-core algorithms for specific classes of problems. In DIY, switching from in-core to out-of-core is as simple as changing the number of blocks allowed to reside in memory. If this is a command-line argument, no recompilation is needed. This is another advantage of the block parallel programming model. When a program is written in terms of logical blocks, the DIY runtime is free to migrate blocks and their associated message queues in and out-of-core, with no change to the program logic.
Block
Recall from the Initialization module that if blocks are intended to be moved in and out-of-core,
then the block must define save and load functions in addition to create and destroy (create, destroy,
save, and load could also be defined globally outside of the block, if you wish).
struct Block
{
static void* create() { return new Block; }
static void destroy(void* b) { delete static_cast<Block*>(b); }
static void save(const void* b, diy::BinaryBuffer& bb) { diy::save(bb, *static_cast<const Block*>(b)); }
static void load(void* b, diy::BinaryBuffer& bb) { diy::load(bb, *static_cast<Block*>(b)); }
// your user-defined member functions
...
// your user-defined data members
...
}
Master
Recall from Initialization that diy::Master owns and manages the blocks that are assigned to the
current MPI process. For out-of-core operation, the storage object and the load and save objects are mandatory.
Master manages loading/saving blocks, executing their callback functions, and exchanging data between them including
when blocks are out-of-core.
To initiate out-of-core operation, simply change the mem_blocks argument to Master from -1 (meaning all blocks in
core) to a value greater than or equal to 1. For example, assume we have a program with 32 total blocks, run on 8 MPI
processes. DIY's Assigner will assign 32 / 8 = 4 blocks to each MPI process. If we only have sufficient memory to hold
2 blocks at a time in memory, setting memblocks = 2 is all that is needed; DIY does the rest.
If Block is defined as above, the first part of the code looks like this.
#include <diy/assigner.hpp>
#include <diy/master.hpp>
int main(int argc, char* argv[])
{
diy::mpi::environment env(argc, argv); // diy's version of MPI_Init
diy::mpi::communicator world; // diy's version of MPI communicator
diy::FileStorage storage("./DIY.XXXXXX"); // storage location for out-of-core blocks
int nprocs = 8; // total number of MPI ranks
int nblocks = 32; // total number of blocks in global domain
int mem_blocks = 2; // number of blocks that will fit in memory
diy::ContiguousAssigner assigner(nprocs, nblocks); // assign blocks to MPI ranks
diy::Master master(world, // communicator
1, // use 1 thread to execute blocks
mem_blocks, // # blocks in memory
&Block::create, // block create function
&Block::destroy, // block destroy function
&storage, // storage location for out-of-core blocks
&Block::save, // block save function for out-of-core blocks
&Block::load); // block load function for out-of-core blocks
...