Overview
When working with huge amounts of data it is often desirable to split this data across multiple nodes for processing. Basically, every node becomes responsible for caching a portion of the data. This approach essentially allows to load majority (if not all) of database data into cache and then collocate your computation logic with the data it needs to process. Why? To avoid redundant moving of data between caching nodes which often can kill performance and bring a server to its knees.
In GridGain, such design is achieved by using Affinity Load Balancing and integrating with distributed caches and data grids.
Affinity Load Balancing
Affinity load balancing in GridGain is provided via GridAffinityLoadBalancingSpi.
The diagram below illustrates the difference between using data grids without and with GridGain. The left side shows execution flow without GridGain, in which a remote data server is queried for data, the data is then delivered to caller (master) node, which is faster than DB access, but results into unnecessary network traffic.
On the right hand side, you can see the value that GridGain brings to the picture. The whole computation logic together with data access logic is brought to data server for local execution. Assuming that serialization of computation logic is much lighter than serializing data, the network traffic in this case is minimal. Also, your computation may access data from both, Node 2 and Node 3. In this case, GridGain will split your computation into logical jobs and route appropriate logical jobs to the corresponding data servers to ensure that all computations still remain local. Now, if one of the data server nodes crashes, your jobs will be automatically failed-over to other nodes, which allows you to fail-over logic together with data (not just data fail-over provided by data grids or distributed caches).

Data Grid Integration
GridGain does not come with data cache implementation and integrates with existing data cache or data grid solutions. This gives user a freedom of utilizing pretty much any distributed cache implementation he or she likes.
For example, GridGain comes with JBoss Cache Data Partitioning Example which demonstrates how to use JBoss Cache with Affinity Load Balancing. In fact, JBoss Cache does not ship with data partitioning features out of the box. Data partitioning of JBoss Cache becomes possible only due to GridGain Affinity Load Balancing provided by GridAffinityLoadBalancingSpi.
