Description
Grid jobs are said to be in collision when a job arrives onto node that already has one or more jobs either waiting or executing on it. Job collision resolution provides means to resolve this collision by basically allowing to:
- put newly arrived job into the waiting queue
- schedule it for immediate execution
- cancel it (and preempt it by failing it over to another node)
- wake up already waiting job from the queue and schedule it for immediate execution
As almost any kernel level functionality in GridGain collision is designed as SPI (Service Provider Interface). It consists of the public API and several implementations. As always, several pre-built implementations are shipped with GridGain and available for the developer - and custom ones can be easily built.
Collision is generally referred as late load balancing as it happens late in the execution process when job has already arrived onto destination node. In fact, it allows to load balance jobs in the context of the given node. Note that early load balancing handled by load balancing SPI and occurs during initial mapping phase of MapReduce process.
Collision SPI doesn't expose any public APIs and works implicitly behind the scenes. As with any SPI, developer can provide its own implementation and plug it into GridGain.
Collision SPI
Collision SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation: all jobs can proceed in parallel, or jobs can be serialized i.e., only one job can execute in any given point of time, or only certain number or types of grid jobs can proceed in parallel, etc:
