Dashboard > GridGain User Guide > Table Of Contents > Developers Guide > Load Balancing SPI > GridAffinityLoadBalancingSpi
GridAffinityLoadBalancingSpi
Added by morpheus, last edited by morpheus on May 02, 2008  (view change)
Labels: 
(None)


Click on Javadoc link to open Javadoc documentation.

Package

org.gridgain.grid.spi.loadbalancing.affinity Javadoc

Available starting with GridGain

Description

GridAffinityLoadBalancingSpi Javadoc uses data affinity for routing jobs to remote nodes. It provides ability to collocate computations with data. This SPI is best used with distributed caches for which it is really important that computation is routed exactly to the node on which data is cached. Many data cache schemes can take advantage of this SPI, distributed, or invalidation based. The real value is that you now can partition your database between data servers and hence load the whole database into memory for faster access.

Architecture and Deployment

The diagram below illustrates the difference between using data grids without and with GridGain. The left side shows execution flow without GridGain, in which a remote data server is queried for data, the data is then delivered to caller (master) node, which is faster than DB access, but results into unnecessary network traffic.

On the right hand side, you can see the value that GridGain brings to the picture. The whole computation logic together with data access logic is brought to data server for local execution. Assuming that serialization of computation logic is much lighter than serializing data, the network traffic in this case is minimal. Also, your computation may access data from both, Node 2 and Node 3. In this case, GridGain will split your computation into logical jobs and route appropriate logical jobs to the corresponding data servers to ensure that all computations still remain local. Now, if one of the data server nodes crashes, your jobs will be automatically failed-over to other nodes, which allows you to fail-over logic together with data (not just data fail-over provided by data grids or distributed caches).

Coding Example

To use load balancers for your job routing, in your GridTask.map(List, Ojbect) Javadoc implementation use load balancer to find out the node this job should be routed to (see GridLoadBalancerResource Javadoc documentation for information on how a load balancer can be injected into your task. However, the preferred way here is to use GridTaskSplitAdapter Javadoc , as it will handle affinity assignment of jobs to nodes automatically. Node that when working with affinity load balancing, your task's map(..) or split(..) methods should return GridAffinityJob Javadoc instances instead of GridJob Javadoc ones. GridAffinityJob Javadoc adds one additional method to grid job: GridAffinityJob.getAffinityKey() Javadoc which will allow GridGain to properly route the job with the same key to the same grid node every time. In case if regular GridJob is returned, not the GridAffinityJob, it will be routed to a randomly picked node.

Here is an example of a grid task that uses affinity load balancing. Note how load balancing jobs is absolutely transparent to the user and is simply a matter of proper grid configuration.

public class MyFooBarAffinityTask extends GridTaskSplitAdapter<List<Integer>,Object> {
    // For this example we receive a list of cache keys and for every key
    // create a job that accesses it.
    @Override
    protected Collection<? extends GridJob> split(int gridSize, List<Integer> cacheKeys) throws GridException {
        List<MyGridAffinityJob> jobs = new ArrayList<MyGridAffinityJob>(gridSize);
 
        for (Inteter cacheKey : cacheKeys) {
            jobs.add(new MyGridAffinityJob(cacheKey));
        }

        // Node assignment via load balancer 
        // happens automatically.
        return jobs;
    }
    ...
}

Here is the example of grid jobs created by the task above:

public class MyGridAffinityJob extends GridAffinityJobAdapter<Integer, Serializable> {
    public MyGridAffinityJob(Integer cacheKey) {
        // Pass cache key as a job argument.
        super(cacheKey);
    }

    public Serializable execute() throws GridException {
        ...
        // Access data by the same key returned
        // in 'getAffinityKey()' method.
        mycache.get(getAffinityKey());
        ...
    }
}

Also note that there may be cases where your underlying cache product supports multiple caches and you need to cache data with identical keys in those caches. Although, it still may be OK to return the same key from GridAffinityJob.getAffinityKey() Javadoc method for either cache, you may wish to change your affinity key method as follows to make sure that affinity load balancing for one cache is independent from another:

public class MyFooBarAffintyJob extends GridAffinityJobAdapter<String, Integer> {
    public MyFooBarAffinity(String cacheName, Integer cacheKey) {
        // Construct affinity key by concatenating cache name
        // and affinity key. Note that we also pass cacheKey as
        // argument to access from execute method.
        super(cacheName + '.' + cacheKey, cacheKey);
    }

    @Override
    pubic Serializable execute() {
       ...
       // Access data from your cache by the cache key.
       // The main point to note here is that the same 
       // affinity key always corresponds to the same 
       // cache key.
       Integer cacheKey = getArgument();

       Object data = someCache.get(cacheKey);
       ...
       // Do computations.
    }
}

Configuration

The following configuration parameters can be used to configure GridAffinityLoadBalancingSpi Javadoc

Setter Method Description Optional Default
setVirtualNodeCount(int) Javadoc Configuration parameter for the number of virtual nodes for Consistent Hashing algorithm. The larger the virtual node count, the more even the data affinity distribution is across nodes. The value should usually be larger than 500. The default value of 1000 is good enough for most grid deployments. Consistent Hashing generally yields between 2% and 4% standard deviation for equal data affinity distribution. If you set the value too large (larger than several thousands), it may cause performance degradation Yes Default value is 1000 which is good enough for most grid deployments.
setAffinityNodeAttributes(Map<String, ? extends Serializable>) Javadoc All nodes that want to participate in affinity load balancing should have these attributes. This is useful when grid is segmented, for example, into clients and servers, and client nodes will never cache any data. Yes Default value is null which means that all nodes will be included.

Please pay specific attention to the number of virtual nodes for Consistent Hashing algorithm (see setVirtualNodeCount(int) Javadoc ). Generally, the larger the virtual node count, the more even the data distribution is across nodes. For best affinity distribution the value should usually be larger than 500. The default value of 1000 is good enough for most grid deployments. If you set the value too large (larger than several thousands), it may cause performance degradation. Consistent Hashing algorithm generally yields between 2% and 4% standard deviation for equal data affinity distribution.

You can use virtual node count to distribute load in uneven grid. Since the larger the virtual node count is, the more data will be stored on that node (which leads to more jobs sent to that node), nodes that have higher Memory or CPU capacity should have larger virtual node count value.

When configuring virtual node count, it is common to assign a certain number of virtual nodes to a single unit of capacity characteristic. For example, if you have 3 nodes in the grid: N1, N2, and N3, and nodes N1 and N2 have 2GB of memory and node N3 has 3GB of memory, then, to ease up calculations, for every 1GB of memory on a node you could assign 500 virtual nodes. As a result, nodes N1 and N2 should be assigned 1000 virtual nodes and node N3 should be assigned 1500 virtual nodes.

Examples

For an example of data affinity with JBoss Cache refer to Affinity MapReduce with JBoss Cache documentation on Wiki or under your GridGain installation.

As any GridGain SPI, GridAffinityLoadBalancingSpi Javadoc SPI can be configured either directly from code or from Spring configuration file. Here is an example of GridAffinityLoadBalancingSpi configuration from code:

GridAffinityLoadBalancingSpi spi = new GridAffnityLoadBalancingSpi();

// Change number of virtual nodes.
spi.setVirtualNodeCount(1500);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default load balancing SPI.
cfg.setLoadBalancingSpi(spi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file

<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="loadBalancingSpi">
        <bean class="org.gridgain.grid.spi.loadbalancing.affinity.GridAffinityLoadBalancingSpi">
            <property name="virtualNodeCount" value="1500"/>

            <!--
                If your grid is segmented via node attributes,
                then provide all attributes a node should have
                in order to be considered by affinity load balancer.
            -->
            <property name="affinityNodeAttributes">
                <map>
                    <entry key="node.segment" value="foobar"/>
                </map>
            </property>
        </bean>
    </property>
    ...
</bean>


For more information about using Spring framework for configuration click here.

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006) - Bug/feature request - Contact Administrators