Overview
MapReduce is a distributed computing paradigm which allows to map your task into smaller jobs based on some key, execute these jobs on Grid nodes, and reduce multiple job results into one task result.
Here is a diagram that explains how MapReduce works based on Shape Counter example. Given a collection of Shapes we split this collection into 2 parts and send every part to a grid node. Each node will count number of Shapes provided and will return it back to caller. The caller then will add results received from remote nodes and provide the reduced result back to the user (the counts are displayed next to every shape).

In GridGain, MapReduce paradigm is implemented via GridTask ![]()
Map Operation
The GridTask.map(..) method splits a task into multiple instances of GridJob ![]()
Result Operation
Upon completion of any job, GridTask.result(..) method is invoked which is responsible to tell GridGain whether to Wait for more job results, Reduce now, or Failover this job to another node.
Reduce Operation
This operation is responsible for taking multiple results from remote jobs and reducing them into one aggregate result. This aggregated result will be returned to the user.
Pull vs. Push MapReduce
One of the fundamental differences between GridGain's implementation of MapReduce and the ones in the existing or legacy systems like Sun GridEngine, GigaSpaces, Hadoop and Globus is the cardinality or the type of the mapping operation. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The latter has fundamental advantage that was largely missing in grid computing frameworks before GridGain:
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
This unique property of GridGain's MapReduce implementation has profound effect on ability to develop grid applications with the advanced load balancing, failover and collision resolution logic.
See Early And Late Load Balancing for more information.
