Dashboard > GridGain User Guide > Table Of Contents > GridGain In A Glance
GridGain In A Glance
Added by architect, last edited by morpheus on Apr 11, 2008  (view change)
Labels: 
(None)


Overview

GridGain is a grid computing product that can be described by three key characteristics: open source, java and computational grids.

GridGain Characteristics

Open Source

We have decided on open source license from the get go. Our product is built on other open source software and we also believe that open source model is ideal for distributed middleware technologies like grid computing whereas per-CPU licensing would penalize the growth of grid - significantly reducing advantage of grid computing as highly distributed computing model. Read more here.

Java 5

Not only we built our product in Java 5 but we also made it to be a natural extension of the current Java development methodology: we integrate Spring and its dependency injection mechanism, we use similar to JEE annotation-based resource injection, deployment model and POJO ("Plain Old Java Object") programming model, we use AOP-based grid-enabling and provide JMX-based management. Note that GridGain is fully compatible with recently released Java 6 as well.

Computational Grids

Most grid computing products fall into two categories: data grid and computational grids and usually are very different in how they are designed. Data grids essentially solve distributed data caching problem (storing large sets of data in distributed manner for high availability), while computational grids solve distributed computation problem (executing computation in parallel fashion).

GridGain is a state of the art computational grid product.

What GridGain Can Do?

GridGain is a computational grid product. In a nutshell, it allows you to parallelize the execution of the piece of code onto a set of computing resources. Computing resource can be a laptop, desktop computer, workstation, rack-server, mainframe or any other computing device with Java 5 or higher compatible JVM available. Set of computing resources can be homogeneous (same kind) or heterogeneous (different kinds), can be generally located locally, within enterprise, globally or any combination of the above.

Note about utility or on-demand computing:

Often utility or on-demand computing can be viewed as a special case of such parallelization. In many cases a utility computing scenario simply assumes that a whole task is moved to a remote grid for execution. Remote grids in such cases often provide fine-grained SLAs (Service Level Agreement) including various services to make computing and other resources available on demand, providing seamless fault-tolerance for hardware failures and disaster recovery, pay-per-use renting, etc.

An important point is that GridGain, and computational grid in general, can be used for a wide variety of tasks applicable to the entire range of businesses from a small startup to a large corporation.

Grid Computing for Small and Medium Businesses

GridGain can be used for a wide variety of tasks applicable to the entire range of businesses from a small startup to a large corporation.

Business Use Case Example

Almost everybody can imagine the use of grid computing in complex scenarios like oil and gas exploration, financial risks computations, weather forecasting, etc. However, it is just as easy to see how grid computing technology can be used in much simpler scenarios.

Let's take a look at the example that clearly demonstrates this situation:

Say your business has a well designed and implemented computerized process that takes ~30 seconds to complete. I'm sure that almost anyone can find such problem in their organization: logs analyzes, archive search, incoming transaction file processing, compressing/decompressing, etc. Now, what if you need to web-enable this process to add a new line of service? To make it suitable for the web you need to shrink the timing to roughly 5 seconds (that's the usual "attention span" of the web users).

This simple business requirement presents an unexpectedly hard problem: how do you make something that was already well designed to run 6 times faster?

You can try using the obvious and replace all your hardware with, for example, dual or quad CPUs. But in most cases this simply impossible for variety of reasons. And even if it is - the performance gain is far from linear. And even if it was linear - even quad CPU can only give 4 times performance jump and we need 6.

Computational grids present about the only solution available to solve this problem short of "rip & replace" (which in of its own may or may not solve the performance problem).

GridGain Presents the Solution

GridGain will allow you to split this problem into number of sub-tasks, execute each task in parallel, aggregate results from these sub-tasks to get the final result in a fraction of time.

Now, several facts are important to note:

  • You can use exactly the same type of hardware you are running your current task on (or even slower).
  • Depending on the task this solution often scales linearly so that in most cases you would need just 6 processing nodes to get 6 times performance increase.
  • In many cases necessary computing resources are already available as they can be "drawn" from existing desktop computers, servers or mainframes.
  • The overall cost of this solution could be only the cost of grid-enabling this task since the GridGain is free open-source product and additional computing resources can be easily "drawn" from already existing pool.

Real-Life Examples

Here are several real-life examples of how GridGain can be used for everyday tasks:

Task Grid Enabling
Web-Site Log Analyzes These tasks can sometime take hours depending on how much log is produced, yet usually systems can be configured to produce separate log files based on size or time making the process of grid-enabling of this task rather trivial in GridGain.
"Deep" Searches These searches can search past history, backups, archives or time-sensitive data. "Deep" searches are usually search multiple "locations". In most cases each location can be searched separately thus making grid-enabling very easy.
Large-File Processing Processing of large file (hundreds of thousands of individual records) is something that many companies are faced doing. Despite the fact that such process is usually batch-oriented the operational concerns force many companies to speed these processes as much as possible.
Processing a file with a larger amount of record usually can be grid-enabled very easily either by splitting file into multiple sub-files or processing subgroups of records in the same file in parallel fashion.
Complex Build Process Most of us are familiar with Ant-based build process that takes minutes upon minutes to complete. Yes there are many tasks in Ant build file that could be executed in parallel - only if Ant would allow it. Fortunately, using GridGain you can easily grid-enable Ant and perform your build much faster.

What GridGain Cannot Do?

There are number of general misconceptions around grid computing products and they apply to GridGain as well.

GridGain Is Not:
  • Hardware Virtualization Solution
  • ESB Solution

GridGain Is Not a Hardware Virtualization Solution

Although GridGain can provide a unified programming and computing model over very different set of computing resources it is not a hardware virtualization solution in its purest sense.

When people say hardware virtualization they usually mean something like VMWare and Xen, software that can run multiple OS-es at the same time and virtualize this way the underlying hardware from the user's point of view.

GridGain provides virtualization on a different level. It virtualizes a set of potentially very different computing resources by providing consistent and uniform Java programming model for writing a software whether it is going to be running on one computer or on the hundreds of them.

GridGain Is Not an ESB Solution

ESBs (Enterprise Service Bus) usually connect multiple computers providing unified messaging and service-invocation platform and thus often confused with grid computing.

Although with some imagination one can be morphed into another, in their purest forms these two categories of middleware solutions solve different problems.

Often, if not in most cases, you will see ESB (or similar, more specific messaging technology like JMS or TIBCO) is used along side with grid computing with ESB acting as a medium (or a bus) by which grid nodes exchange the data.

Note that GridGain provides native integration with Mule ESB - leading open source ESB solution.

GridGain And C, C++ or .NET

There is a huge amount of software written in C/C++. NET and Fortran. Interoperability between any grid computing platform and existing libraries in C/C++ and Fortran, Win32, .NET languages is essential.

GridGain relies on standard Java Native Interface (JNI) for communication with C/C++ or any other languages that have C-based compatible binding like Fortran, Win32 and .NET-based languages.

In this scenario GridGain plays the role of distribution middleware that's responsible for all aspects of running a task on the grid except for the actual grid job body. Java code has two options in running non-Java grid job code body:

  • Execute non-Java code in-process by calling a native method (assuming non-Java code has C-based compatible binding). Upside of this approach is that arguments can be passed in usual Java fashion via method parameters. The significant drawback of this approach is that failure of native method usually means halt of JVM.
  • Execute non-Java code in external process. This option is usually preferable as it shields JVM process and GridGain node code from failures of non-Java code. Note that in case of in-process execution failure of non-Java code will halt (crash) JVM. Yet this method requires non-standard parameters passing (like serializing them into command line arguments).

In many real-life situations the second approach would often work better. Many C/C++ and specifically Fortran libraries are designed (or at least can be run) as individual program (or scripts) that naturally take parameters from either file or command line. These programs can easily be called from GridGain jobs and external process invocation will shield GridGain from library failures.

GridGain And Multi-Core CPUs

Multi-core CPUs present one of the most important innovation for grid computing lately (second, probably, to only high-performance interconnect solutions like Infiniband, etc.)

The raw performance increase per grid node is only part of the advantage multi-core CPUs bring. Most important is the fact that a single processing resource can now support much large amount of processes and threads per process accordingly without usual performance degradation due to excessive thread context switching. For example, a quad-CPU server can simultaneously run 4 grid nodes (assuming it has 4 network ports) sharing the memory, disks and other hardware component with virtually no loss in performance comparing to 4 individual server of similar configuration. This alone drastically reduces the cost of the 4-node configuration comparing to 4 individual servers taking into account hardware and administration costs, energy consumption, etc.

Also important to note that GridGain provides a consistent and uniform programming model whether you run on single, dual or quad-CPUs servers making migration from one type to another 100% transparent.

In a nutshell, multi-core CPUs make grid computing even more available for small to mid-sized businesses and now having 64-nodes grid will cost you less than $20K total for hardware and software - an unheard of proposition just short 5 years ago.

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006) - Bug/feature request - Contact Administrators