Dashboard > GridGain User Guide > Browse Space > News Items for
News Items for March, 2008
March 2008  
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

Mar 02, 2008
Mar 02, 2008

  2008/03/02

One of the fundamental differences between GridGain's implementation of MapReduce and the ones in the existing or legacy systems like Sun GridEngine, GigaSpaces, Hadoop and Globus is the cardinality or the type of the mapping operation.

In MapReduce pattern the mapping is a process of splitting the initial task into sub-tasks and assigning them to the grid nodes. Mapping generally involves the splitting logic itself, mapping sub-tasks to the nodes including load balancing, and potential failover and collision resolution. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The later has fundamental advantage that was largely missing in grid computing frameworks before GridGain:

GridGain approach of giving task the control of sub-task distribution enables early and late load balancingalgorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved.

This unique property of GridGain's MapReduce implementation has profound effect on ability to develop grid applications with the advanced load balancing, failover and collision resolution logic. Let me describe early and late balancing in details by simply walking through the grid task execution sequence in GridGain where it will become apparent:

  1. Someone calls Grid.execute(...) passing grid task and its argument to initiate grid task execution in the system.
  2. Method map(...) will be called on the task to perform the initial mapping. This method is responsible for taking a task, splitting it into number of sub-tasks and mapping every sub-task with one or more grid nodes. This method returns set of {sub-task, node} pairs. This is what we call an early load balancing as it is done right during initial mapping operation and with only information available at the execution initiation time.
  3. Once mapping is done the sub-tasks will travel to respective remote nodes for execution.
  4. When sub-task arrives to the destination grid node it will be subject for collision (scheduling) resolution via collision SPI. This SPI is called every time when new sub-task arrived, existing sub-task finished its execution or metrics update received (with every heartbeat). Collision SPI looks into the queue of its sub-tasks (including a newly received one, if any) and can either cancel sub-task, leave it waiting in the queue, transfer it to another node for execution, or start its execution locally. This is what we call late load balancing. This load balancing happens later in the process of execution and it happens on destination node right where sub-task is about to get executed. The important characteristic of the late load balancing is that there can be a significant time difference between mapping (early load balancing) and actual time when execution of the sub-task commences on the remote node – and late load balancing allows to account for this non-deterministic aspect of grid execution and potentially re-balance the sub-task on the grid.

For example, our job stealing collision SPI does exactly that. It monitors number of queued sub-tasks on each node and preemptively moves waiting sub-tasks from "busy" node to the "idle" node for execution.

Load balancing capabilities in GridGain are more of the advanced features and not everyone would need them. For example, in homogeneous grid with homogeneous tasks load balancing achieved naturally. However, in many other cases when conditions are more real-life – sophisticated load balancing capabilities are about the only way to get the most out of your grid.

Enjoy grid computing!

Posted at 02 Mar @ 11:41 PM by Nikita Ivanov | 0 comments

We've been receiving request for public SVN access for quite some time. Recently we've redesigned how all our online resources are available on the website and put them all in one page. This is the list of all online resources for GridGain that are available to everyone:

  • Wiki Documentation
    That's where you can find all developer's documentation for GridGain: development guide, getting started, installation instructions, etc.
  • Javadoc
    Standard Javadoc documentation for the latest release. We take a great care and pride in producing our Javadoc as most of the developers use Javadoc as their main source of documentation
  • JIRA
    Issue and bug tracking system that is open to public and is used internally. Great tool to see what bugs and issues we have, what has been closed and what is being worked on right now.
  • SVN Access
    Standard access to our SVN repository in read-only mode. You can see all latest changes.
  • WebSVN browser
    Default web-viewer to our SVN repository. You don't have to checkout the entire project to see a certain file, history of changes or commit log.
  • Online Forums
    Our online Forum is the best place where you can usually find answers on all your GridGain questions. If not – just post there and we or other members will respond promptly.

Enjoy grid computing with GridGain!

Posted at 02 Mar @ 11:43 PM by Nikita Ivanov | 0 comments

Quite often I am being asked about what is the most important feature or the feature that I like the most in GridGain. My answer usually comes around one feature or characteristic of GridGain that is unique and often overlooked in grid computing - developer's productivity.

The whole idea behind the GridGain came about from a frustration of working in Java with something like Globus or Sun Grid Engine. These things were so out of touch with modern Java development that using them seemed almost contrived. All the things we've come to expect and appreciate like lightweight containers, IoC, AOP, conventions over configurations, light deployment process, meta-programming with annotations, simple and powerful APIs - all of them were "missing in action".

Lack of focus on developer's productivity is what largely created this painful stigma that grid computing is a complex and very expensive proposition. In the minds of many mid-level managers grid computing is still firmly associated with hiring IBM Global Services to help with this "monumental" undertaking - installing and configuring Globus grid infrastructure...

So, when we started GridGain our goal #1 was to develop grid computing infrastructure that will be as productive to use as Spring (or Seam or Grails if you fast forward to our times). Here's what GridGain 2.0 provides when it comes to developer's productivity - your productivity as software engineer:

  • Zero deployment with p2p class loadingThis is the key feature that boosts developer's productivity more than anything else. In a nutshell, when you are writing grid application with GridGain you don't copy files, you don't run special Ant scripts, you don't FTP anything, you don't start or re-start or re-provision anything, you don't go to a GUI console to do something - you just click "Run" and the code with your latest changes just works across the grid.
  • Many grid nodes per computer or even per JVMLet me ask you a question: what other product allows you run multiple independent grid nodes in parallel on the same computer? How about in the same VM?? Can you run 10 nodes in the same VM and debug your complex distributed algorithms without leaving your favorite Java IDE? Yes - with GridGain you can.
  • Conventions over configurationGridGain works out-of-the-box without a single line of configuration that needs to be changed. And this is not a special toy setup - this is the setup that would work for many production environments and for the most of development setups too. Moreover, we took a great care of thinking what default configuration means for each and every of our SPI - ensuring that you, the developer, spending time coding and not fiddling with configuration unless it is absolutely necessary.
  • XML and non-XML configurationWith GridGain we use Spring XML beans as configuration medium. If you know Spring XML beans - you already know how to configure GridGain. In the same time - you don't have to know Spring at all. GridGain can be equally configured via Java code as our entire configuration is concentrated in a single interface:
    • You can inject it via Spring which is default IoC framework
    • You can inject it via any other IoC-container/framework
    • You can configure it in Java without any XML or IoC
  • Strongly typed Java 5 interfacesWe take strong typing seriously and GridGain is using parameterized types in many places. That has great benefit for the developers as you don't need to guess and rely on runtime or junits to catch something that any IDE will highlight as you are typing...
  • Best Javadoc you have even seenWe take pride in our Javadoc. We know most of the developers that are using GridGain looking at it every time they need some answers on API and we wanted to make our Javadoc simple and effective API documentation. It's well organized, has useful UML diagrams for all classes, cross-linked with our wiki, and has generous coding and configuration examples.
Posted at 02 Mar @ 11:44 PM by Nikita Ivanov | 0 comments

Cloud computing terminology has been rising in popularity lately, and rising rather fast:

Here are some of my thoughts:

  1. I think Amazon first used the word "cloud" in their Elastic Cloud (EC) offering (I may be wrong here about first). When I first heard this it sounded silly but catchy, like an mturk they have. Technically, EC2 is a Xen infrastructure for rent with on-demand provisioning capabilities. Nothing more, nothing less.
  2. I have no idea what cloud computing actually is. I can guess – but I don't care. I still prefer good old Grid Computing as an industry accepted terminology.
  3. Grid computing arena seems to be going through a superficial name change every year or two: HPC->Grid Computing->Utility or On-Demand Computing-> Virtualization->Cloud Computing. Not only these changes sprung in blogosphere – but sometimes companies do an overnight change in their entire messaging. Case in point is DataSynapse that has changed its message from grid computing to virtualization literally overnight (entire website, PRs, white-papers, etc.) about 2 years ago and now seems to be going through the same change again. Comical...
  4. As expected, certain companies claim they invented cloud computing way before we knew about cloud computing. In separate news Al Gore claims he invented Internet before... it was invented.
  5. The motives are always clear for companies that are trying to chase the latest FUD in naming: they are attempting to differentiate on naming and positioning while struggling to make a difference in technology or features of the product.

I think the "Cloud" FUD will subside and something else will grab the attention. We already had a "swarm computing" so may it will be a "mist computing" or a "puff computing" or even a "haze infrastructure"

Posted at 02 Mar @ 11:50 PM by Nikita Ivanov | 0 comments
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006) - Bug/feature request - Contact Administrators