Writing out-of-core algorithms

Typically, out-of-core algorithms are distinct from their in-core counterparts, and much research has been conducted on out-of-core algorithms for specific classes of problems. In DIY, switching from in-core to out-of-core is as simple as changing the number of blocks allowed to reside in memory. If this is a command-line argument, no recompilation is needed. This is another advantage of the block parallel programming model. When a program is written in terms of logical blocks, the DIY runtime is free to migrate blocks and their associated message queues in and out-of-core, with no change to the program logic.

Block

Recall from the Initialization module that if blocks are intended to be moved in and out-of-core, then the block must define save and load functions in addition to create and destroy (create, destroy, save, and load could also be defined globally outside of the block, if you wish).

struct Block
{
  static void*    create()                                    { return new Block; }
  static void     destroy(void* b)                            { delete static_cast<Block*>(b); }
  static void     save(const void* b, diy::BinaryBuffer& bb)  { diy::save(bb, *static_cast<const Block*>(b)); }
  static void     load(void* b, diy::BinaryBuffer& bb)        { diy::load(bb, *static_cast<Block*>(b)); }

  // your user-defined member functions
  ...

  // your user-defined data members
  ...
}

Master

Recall from Initialization that diy::Master owns and manages the blocks that are assigned to the current MPI process. For out-of-core operation, the storage object and the load and save objects are mandatory. Master manages loading/saving blocks, executing their callback functions, and exchanging data between them including when blocks are out-of-core.

To initiate out-of-core operation, simply change the mem_blocks argument to Master from -1 (meaning all blocks in core) to a value greater than or equal to 1. For example, assume we have a program with 32 total blocks, run on 8 MPI processes. DIY's Assigner will assign 32 / 8 = 4 blocks to each MPI process. If we only have sufficient memory to hold 2 blocks at a time in memory, setting memblocks = 2 is all that is needed; DIY does the rest.

If Block is defined as above, the first part of the code looks like this.

#include <diy/assigner.hpp>
#include <diy/master.hpp>

int main(int argc, char* argv[])
{
  diy::mpi::environment     env(argc, argv);             // diy's version of MPI_Init
  diy::mpi::communicator    world;                       // diy's version of MPI communicator
  diy::FileStorage          storage("./DIY.XXXXXX");     // storage location for out-of-core blocks

  int                       nprocs      = 8;             // total number of MPI ranks
  int                       nblocks     = 32;            // total number of blocks in global domain
  int                       mem_blocks  = 2;             // number of blocks that will fit in memory

  diy::ContiguousAssigner   assigner(nprocs, nblocks);   // assign blocks to MPI ranks

  diy::Master               master(world,                // communicator
                                   1,                    // use 1 thread to execute blocks
                                   mem_blocks,           // # blocks in memory
                                   &Block::create,       // block create function
                                   &Block::destroy,      // block destroy function
                                   &storage,             // storage location for out-of-core blocks
                                   &Block::save,         // block save function for out-of-core blocks
                                   &Block::load);        // block load function for out-of-core blocks

...