GridGain
GridGain is a grid computing product that can described by three key characteristics: open source, java and computational grids.
| GridGain Characteristics |
Open Source
We have decided on open source license from the get go. Our product is built on other open source software and we also believe that open source model is ideal for distributed middleware technologies like grid computing whereas per-CPU licensing would penalize the growth of grid - significantly reducing advantage of grid computing as highly distributed computing model. Read more here.
Java 5
Not only we built our product in Java 5 but we also made it to be a natural extension of the current Java development methodology: we integrate Spring and its dependency injection mechanism, we use similar to JEE annotation-based resource injection, deployment model and POJO ("Plain Old Java Object") programming model, we use AOP-based grid-enabling and provide JMX-based management. Note that GridGain is fuly compatible with recently released Java 6 as well.
Computational Grids
Most grid computing products fall into two categories: data grid and computational grids and usually are very different in how they are designed. Data grids essentially solve distributed data caching problem (storing large sets of data in distributed manner for high availability), while computational grids solve distributed computation problem (executing computation in parallel fashion).
GridGain is a state of the art computational grid product.
What GridGain Can Do?
GridGain is a computational grid product. In a nutshell, it allows you to parallelize the execution of the piece of code onto a set of computing resources. Computing resource can be a laptop, desktop computer, workstation, rack-server, mainframe or any other computing device with Java 5 or higher compatible JVM available. Set of computing resources can be homogenous (same kind) or heterogeneous (different kinds), can be generally located locally, within enterprise, globally or any combination of the above.
Note about utility or on-demand computing:
Often utility or on-demand computing can be viewed as a special case of such parallelization. In many cases a utility computing scenario simply assumes that a whole task is moved to a remote grid for execution. Remote grids in such cases often provide fine-grained SLAs (Service Level Agreement) including various services to make computing and other resources available on demand, providing seamless fault-tolerance for hardware failures and disaster recovery, pay-per-use renting, etc.
An important point despite of anything you may have heard so far is that GridGain, and computational grid in general, can be used for a wide variety of tasks applicable to the entire range of businesses from a small startup to a large corporation.
| Grid Computing for Small and Medium Businesses GridGain can be used for a wide variety of tasks applicable to the entire range of businesses from a small startup to a large corporation. |
Business Use Case Example
Almost everybody can imagine the use of grid computing in complex scenarios like oil and gas exploration, financial risks computations, weather forecasting, etc. However, it is just as easy to see how grid computing technology can be used in much simpler scenarios.
Let's take a look at the problem that clearly demonstrates such example:
Say your business has a well designed and implemented computerized process that takes ~30 seconds to complete. I'm sure that almost anyone can find such problem in their organization: logs analyses, archive search, incoming transaction file processing, compressing/decompressing, etc. Now, what if you need to web-enable this process to add a new line of service? To make it suitable for the web you need to shrink the timing to roughly 5 seconds (that's the usual "attention span" of the web users).
This simple business requirement presents an unexpectedly hard problem: how do you make something that was already well designed to run 6 times faster?
You can try using the obvious and replace all your hardware with dual or quad CPUs. But in most cases this simply impossible for variety of reasons. And even if it is - the performance gain is far from linier. And even if it was linier - even quad CPU can only give 4 times performance jump and we need 6.
GridGain (and computational grids in general) presents about the only solution available to solve this problem short of "rip & replace" (which in of its own may or may not solve the performance problem).
| GridGain Presents the Solution GridGain will allow you to split this problem into number of sub-tasks, execute each task in parallel, aggregate results from these sub-tasks to get the final result in a fraction of time. |
Now, several facts are important to note:
- You can use exactly the same type of hardware you are running your current task on (or even slower).
- Depending on the task this solution often scales linearly so that in most cases you would need just 6 processing nodes to get 6 times performance increase.
- In many cases necessary computing resources are already available as they can be "drawn" from existing desktop computers, servers or mainframes.
- The overall cost of this solution could be only the cost of grid-enabling this task since the GridGain is free open-source product and additional computing resources can be easily drawn from already existing pool.
- Since simplicity of grid-enabling is a key characteristic of GridGain the total actual cost can be in many cases just several days of work for one engineer.
Real-Life Examples
Here are several real-life examples of how GridGain can be used for everyday tasks:
| Task | Grid Enabling |
|---|---|
| Web-Site Log Analyses | These tasks can sometime take hours depending on how much log is produced, yet usually systems can be configured to produce separate log files based on size or time making the process of grid-enabling of this task rather trivial in GridGain. |
| "Deep" Searches | These searches can search past history, backups, archives or time-sensitive data. "Deep" searches are usually search multiple "locations". In most cases each location can be searched separately thus making grid-enabling very easy. |
| Large-File Processing | Processing of large file (hundreds of thousands of individual records) is something that many companies are faced doing. Despite the fact that such process is usually batch-oriented the operational concerns force many companies to speed these processes as much as possible. Processing a file with a larger amount of record usually can be grid-enabled very easily either by splitting file into multiple sub-files or processing subgroups of records in the same file in parallel fashion. |
| Complex Build Process | Most of us are familiar with Ant-based build process that takes minutes upon minutes to complete. Yes there are many tasks in Ant build file that could be executed in parallel - only if Ant would allow it. Fortunately, using GridGain you can easily grid-enable Ant and perform your build much faster. |
What GridGain Cannot Do?
There are number of general misconceptions around grid computing products and they apply to GridGain as well.
GridGain Is Not:
|
GridGian Is Not a Hardware Virtualization Solution
Although GridGain can provide a unified programming and computing model over very different set of computing resources it is not a hardware virtualization solution in its purest sense.
When people say hardware virtualization they usually mean something like VMWare
and Xen
, software that can run multiple OS-es at the same time and virtualize this way the underlying hardware from the user's point of view.
GridGain provides virtualization on a different level. It virtualizes a set of potentially very different computing resources by providing consistent and uniform Java programming model for writing a software whether it is going to be running on one computer or on the hundreds of them.
GridGain Is Not an ESB Solution
ESBs (Enterprise Service Bus
) usually connect multiple computers providing unified messaging and service-invocation platform and thus often confused with grid computing.
Although with some imagination one can be morphed into another, in their purest forms these two categories of middleware solutions solve different problems.
Often, if not in most cases, you will see ESB (or similar, more specific messaging technology like JMS or TIBCO) is used along side with grid computing with ESB acting as a medium (or a bus) by which grid nodes exchange the data.
GridGain Is Not a Data Grid Solution
GridGain is a computational grid product. It allows for parallelization of the resource intensive tasks. Data grids do the same with data - they parallelize (replicate) the data over the grid providing for high availability (HA) of the data replicating (or caching) data across a set of data storage resources.
As you can see the problems two categories of products are solving are quite different.
Despite their differences, both data and computational grids almost always are used together. Indeed, to execute almost any task you will need some data for it - and data grids provide the efficient way to get this data in a distributed environment.
