Dynamic Load Balancing
Dynamic load balancing automatically redistributes blocks across processes to balance computational workload. This is useful when different blocks require different amounts of computation time, leading to some processes finishing much earlier than others.
In many parallel applications, the computational cost of processing different blocks varies significantly. Without load balancing, some processes may become idle while others are still processing their expensive blocks. DIY's dynamic load balancing solves this by:
- Monitoring workload across processes
- Identifying overloaded processes (those with blocks requiring high computation)
- Moving blocks from overloaded to underloaded processes
- Continuing computation with improved load balance
Dynamic load balancing is most beneficial in applications where:
- Different blocks have varying computational complexity
- The computational cost cannot be easily predicted beforehand
- The workload may change during execution
Dynamic load balancing replaces the foreach operation with dynamic_foreach that automatically combines dynamic load balancing with block computation.
The dynamic_foreach() method executes blocks while monitoring workload and moving blocks between processes as needed.
void dynamic_foreach(const F& compute_function,
const G& work_function,
DynamicAssigner& dynamic_assigner,
float sample_frac,
float quantile)
Parameters:
compute_function: Callback function that performs computation on each blockwork_function: Callback function that returns estimated work for each block, in whatever units the user choosesdynamic_assigner: DynamicAssigner object that tracks block locationssample_frac: Fraction of processes to sample for load information (0.0-1.0)quantile: Cutoff threshold above which blocks are considered for moving (0.0-1.0)
Parameter Recommendations
sample_frac: Controls the fraction of processes sampled for load information
0.5f(default): Sample half the processes - good balance between accuracy and communication overhead- Lower values (e.g.,
0.3f): Less communication overhead but less accurate load information - Higher values (e.g.,
0.8f): More accurate load information but higher communication overhead
quantile: Controls the threshold for identifying overloaded blocks
0.8f(default): Move blocks from top 20% most loaded processes - moderate load balancing- Lower values (e.g.,
0.6f): More aggressive load balancing, moves more blocks - Higher values (e.g.,
0.9f): Less aggressive load balancing, moves fewer blocks
Work Estimation Function
A callback function that returns the estimated computational work for each block. The work can be any user-defined measure (e.g., number of operations, estimated time, complexity units).
Compute Callback Function
A user-defined function that performs the actual computation on each block. This is the same type of function used with regular foreach().
void compute(Block* b, const diy::Master::ProxyWithLink& cp, int max_time)
{
// Perform the actual computation for this block
// Use b->data to access block data
// Use cp for communication if needed
}
DynamicAssigner
The diy::DynamicAssigner is a special assigner that tracks the current location of blocks as they move between processes during load balancing.
The user creates the dynamic assigner and initializes it with the local block id's before starting the computation.
Parameters:
world: MPI communicatorworld.size(): Total number of MPI processesnblocks: Total number of blocks in the global domain
Complete Example
Below are the key snippets from examples/load_balancing/dynamic.cpp showing the essential load balancing components.
Block Structure with Work Information
struct Block
{
static void* create() { return new Block; }
static void destroy(void* b) { delete static_cast<Block*>(b); }
// Block data
int gid;
diy::Work estimated_work; // Estimated work for this block
// ... other user data
};
Work Assignment
// Assign work to blocks (e.g., based on data size, complexity, etc.)
master.foreach([&](Block* b, const diy::Master::ProxyWithLink& cp)
{ b->assign_work(cp, 0, noise_factor, distribution, generator); });
Work Estimation Function
Compute Function
void Block::compute(const diy::Master::ProxyWithLink&, int max_time, int)
{
// Simulate computation proportional to work amount
unsigned int usec = max_time * act_work * 10000L;
std::this_thread::sleep_for(std::chrono::microseconds(usec));
}
Dynamic Load Balancing Execution
// Create and initialize dynamic assigner
diy::DynamicAssigner dynamic_assigner(world, world.size(), nblocks);
diy::record_local_gids(master, dynamic_assigner);
world.barrier();
// Execute with dynamic load balancing
master.dynamic_foreach(
[&](Block* b, const diy::Master::ProxyWithLink& cp)
{ b->compute(cp, max_time, n); }, // compute function
&get_block_work, // work estimation function
dynamic_assigner, // dynamic assigner
sample_frac, // sample fraction (e.g., 0.5f)
quantile); // quantile (e.g., 0.8f)
The dynamic load balancing will automatically move blocks from overloaded processes to underloaded ones during execution, improving overall performance and utilization.