Dashboard > GridGain User Guide > Table Of Contents > Developers Guide > Collision SPI > GridJobStealingCollisionSpi
GridJobStealingCollisionSpi
Added by morpheus, last edited by morpheus on Dec 23, 2008  (view change)
Labels: 
(None)


Click on Javadoc link to open Javadoc documentation.

Package

org.gridgain.grid.spi.collision.fifoqueue Javadoc

Available starting with GridGain

Description

GridJobStealingCollisionSpi Javadoc supports job stealing from over-utilized nodes to under-utilized nodes. This SPI is especially useful if you have some jobs within task complete fast, and others sitting in the waiting queue on slower nodes. In such case, the waiting jobs will be stolen from slower node and moved to the fast under-utilized node.

The design and ideas for this SPI are significantly influenced by Java Fork/Join Framework authored by Doug Lea and planned for Java 7. GridJobStealingCollisionSpi took similar concepts and applied them to the grid (as opposed to within VM support planned in Java 7).

Quite often grids are deployed across many computers some of which will always be more powerful than others. This SPI helps you avoid jobs being stuck at a slower node, as they will be stolen by a faster node. In the following picture when Node3 becomes free, it steals Job13 and Job23 from Node1 and Node2 respectively.

Usage

Note that this SPI must always be used in conjunction with GridJobStealingFailoverSpi Javadoc . The responsibility of Job Stealing Failover SPI is to properly route stolen jobs to the nodes that initially requested (stole) these jobs. The GridJobStealingCollisionSpi maintains a counter of how many times a jobs was stolen and hence traveled to another node, and it will not allow a job to be stolen if this counter exceeds a certain threshold. The threshold value is configured in GridJobStealingCollisionSpi.

Keep in mind that collision resolution happens on job executing nodes (workers), and failover happens on task-initiating node (master). So, if you have a case where a group of nodes is responsible only for sending tasks (masters) and another group is responsible for executing jobs (workers), it should be sufficient to configure GridJobStealingFailoverSpi on master nodes only and GridJobStealingCollisionSpi on worker nodes only. You should also take a look at setStealingEnabled(boolean) and setStealingAttributes(Map) configuration properties as they also allow you to control which nodes participate in job stealing.

Disable Job Stealing

Use GridJobStealingDisabled Javadoc annotation to disable job stealing and make sure that the jobs get executed exactly on the node they were mapped to. If job fails on the selected node it will be failed over as usual according to the configured failover policy in Failover SPI.

Configuration

The following configuration parameters can be used to configure GridJobStealingCollisionSpi Javadoc

Setter Method Description Optional Default
setActiveJobsThreshold(int) Javadoc Sets number of jobs that are allowed to be executed in parallel on this node. Node that this attribute may be different for different grid nodes as stronger nodes may be able to execute more jobs in parallel. Yes 95, specified in GridJobStealingCollisionSpi.DFLT_ACTIVE_JOBS_NUM Javadoc .
setWaitJobsThreshold(int) Javadoc Sets wait jobs threshold. If number of jobs in the waiting queue goes below this threshold, then implementation will attempt to steal jobs from other, more over-loaded nodes. Note this value may be different (but does not have to be) for different nodes in the grid. You may wish to give stronger nodes a smaller waiting threshold so they can start stealing jobs from other nodes sooner. Yes 0, specified by GridJobStealingCollisionSpi.DFLT_WAIT_JOBS_NUM Javadoc
setMessageExpireTime(long) Javadoc Message expire time configuration parameter. If no response is received from a busy node to a job stealing request, then implementation will assume that message never got there, or that remote node does not have this node included into topology of any of the jobs it has. In any case, job steal request will be resent (potentially to another node). Yes 1,000 milliseconds (or 1 second), specified by GridJobStealingCollisionSpi.DFLT_MSG_EXPIRE_TIME Javadoc
setMaximumStealingAttempts(int) Javadoc Sets maximum number of attempts for a single job to be stolen. Once a job reaches this threshold, not more attempts will be made by other nodes to steal it. Note that this attribute should be the same on all nodes. Yes 5, specified in GridJobStealingCollisionSpi.DFLT_MAX_STEALING_ATTEMPTS Javadoc
setStealingEnabled(boolean) Javadoc Enables/disables job stealing on this node. Yes true
setStealingAttributes(Map) Javadoc Enables stealing to/from only nodes that have given attributes set. Yes empty map

Examples

As any GridGain SPI, GridJobStealingCollisionSpi Javadoc SPI can be configured either directly from code or from Spring configuration file. Here is an example of GridJobStealingCollisionSpi SPI configuration from code:

GridJobStealingCollisionSpi spi = new GridJobStealingCollisionSpi();
 
// Configure number of waiting jobs
// in the queue for job stealing.
spi.setWaitJobsThreshold(10);

// Configure message expire time (in milliseconds).
spi.setMessageExpireTime(500);

// Configure number of active jobs that are allowed to execute 
// in parallel. This number should usually be equal to the number 
// of threads in the pool (default is 100).
spi.setActiveJobsThreshold(50);

// Configure maximum stealing attempts.
spi.setMaximumStealingAttempts(10);

// Enable stealing.
spi.setStealingEnabled(true);

// Set stealing attribute to steal from/to nodes that have it.
spi.setStealingAttributes(Collections.singletonMap("node.segment", "foobar"));

GridConfigurationAdapter cfg = new GridConfigurationAdapter();
 
// Override default Collision SPI.
cfg.setCollisionSpi(spi);

or from Spring configuration file

<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
            <property name="activeJobsThreshold" value="100"/>
            <property name="waitJobsThreshold" value="0"/>
            <property name="messageExpireTime" value="1000"/>
            <property name="maximumStealingAttempts" value="10"/>
            <property name="stealingEnabled" value="true"/>
            <property name="stealingAttributes">
                <map>
                    <entry key="node.segment" value="foobar"/>
                </map>
            </property>
        </bean>
    </property>
    ...
</bean>


For more information about using Spring framework for configuration click here.

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006) - Bug/feature request - Contact Administrators