Overview
In MapReduce pattern the mapping is a process of splitting the initial task into sub-tasks and assigning them to the grid nodes. Mapping generally involves the splitting logic itself, mapping sub-tasks to the nodes including load balancing, and potential failover and collision resolution. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The later has fundamental advantage that was largely missing in grid computing frameworks before GridGain:
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
Early And Late Load Balancing
The sequence of steps described below shows when Early and Late load balancing policies come into play:
- Someone calls Grid.execute(..) passing grid task and its argument to initiate grid task execution in the system.
- Method GridTask.map(..) will be called on the task to perform the initial mapping. This method is responsible for taking a task, splitting it into number of sub-tasks and mapping every sub-task with one or more grid nodes. This method returns set of 'sub-task:node' pairs. This is what we call Early Load Balancing as it is done right during initial mapping operation and with only information available at the execution initiation time (see Load Balancing SPI documentation).
- Once mapping is done the sub-tasks will travel to respective remote nodes for execution.
- When sub-task arrives to the destination grid node it will be subject for collision (scheduling) resolution via Collision SPI. This SPI is called every time when new sub-task arrived, existing sub-task finished its execution or a metrics update is received (with every heartbeat). Collision SPI looks into the queue of its sub-tasks (including a newly received one, if any) and can either cancel sub-task, leave it waiting in the queue, transfer it to another node for execution, or start its execution locally. This is what we call Late Load Balancing. This load balancing happens later in the process of execution and it happens on destination node right where sub-task is about to get executed.
The important characteristic of the late load balancing is that there can be a significant time difference between mapping (early load balancing) and actual time when execution of the sub-task commences on the remote node - and late load balancing allows to account for this non-deterministic aspect of grid execution and potentially re-balance the sub-task on the grid.
For example, our Job Stealing Collision SPI does exactly that. It monitors number of queued sub-tasks on each node and preemptively moves waiting sub-tasks from "busy" node to the "idle" node for execution.
Load balancing capabilities in GridGain are more of the advanced features and not everyone would need them. For example, in homogeneous grid with homogeneous tasks load balancing achieved naturally. However, in many other cases when conditions are more real-life - sophisticated load balancing capabilities are about the only way to get the most out of your grid.
For more information on MapReduce refer to Map/Reduce: Simplified Data Processing on Large Clusters
article from Google.
