| December 2007 |
|
|||||
|---|---|---|---|---|---|---|
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | 31 | |||||
I was at the QCon conference about a month ago and two of our friends had booths there: Terracotta and GigaSpaces. It occurred to me there that there is one significant philosophical difference between our products (that is between GridGain and Terracotta and GigaSpaces). Now, these products have plenty of differences on all levels (just consider their different technological backgrounds - byte-code augmentation and JavaSpaces).
But one difference is rather startling: we at the GridGain went through a pain to provide developers with very comprehensive information about grid topology (grid nodes, their logical and physical properties, APIs around it, etc.) while Terracotta and GigaSpaces went through a pain of hiding this information from the developers. Obviously, in both cases there was a significant rational behind it.
I'll leave to the respective vendors to describe their ideas but we at GridGain project believe that grid topology information, such as nodes' identification, their logical and physical properties, having grid nodes as first-class citizens in GridGain and have them participating in all major APIs - is critical for GridGain's design, deep features and its power. We simply think that hiding this information from user is abstracting too much. You effectively dumb down your product leaving it only for rather simple use cases, at least for computational grid aspects.
See what we do in terms of exposing grid topology at GridGain: http://www.gridgain.org/downloads.hml
I heard this idea probably 3-4 years ago when somebody mentioned to me that the biggest grid in existence is the grid comprised of all the cell phones out there. Think about it: every cell phone acts as a grid node, it has certain (albeit minimal) processing power and can easily communicate with any other cell phone in the world peer-to-peer (well, in practical sense, almost easy). And there is a special class of grid tasks that fits this type of grid perfectly: hyper-parallel grid tasks - the tasks that can easily split into thousands or even tens of thousand small micro-jobs where each such micro-job can take 2-3 seconds to get delivered onto processing device and spend another 5-10 seconds on it to get processed and send optional result back.
Now, back 3-4 years ago there were several obvious problems in this picture:
- Phones were hopelessly underpowered and worked performed by a grid of hundreds of cell phones could easily be done by a single workstation
- Not so today. Phones are quickly getting more and more powerful and there is a class of tasks that can fit naturally into this profile (like human image recognition, for example).
- Connectivity was so slow that it basically negated the whole parallelization effect as grid would have been spending all its time in communication
- Not so today. 3G phones are (finally) getting to US and within 2-3 year will dominate the phone market. With 3G connectivity - the problem is basically solved as you are getting your office type of connectivity on your cell phone.
- Phones had proprietary system APIs and development of the uniform grid middleware was very hard if not impossible all together.
- Not so today - and that's the key.
The last point is probably the crucial one. Hardware is inevitably going to catch up with demand and consumers are clearly demanding more processing power and faster communication. Unified APIs, however, were never a critical problem for vendors. It took Google (and its endless budget and market force) to create unified APIs for mobile devices in Java.
And I think it made it possible now to come back to the idea of ad-hoc swarm grid computing with mobile devices. We, at GridGain project, are exploring possibilities of such integration.
I have been a member of JSR-107 "JCache" for a number of years by now. It is one of the oldest JSRs and activity on it has been very uneven. At some time, especially with influx of new members, the activity level spikes up and then it quickly subsides. Still, JCache APIs are not ready.
There are several objective reasons for that:
- Caching technologies are very different. Just look at GigaSpaces, Terracotta or Coherence. It is hard to bridge them, even as an optional API that each vendor can implement additionally to a proprietary one (what to do with Terracotta that has no explicit API at all?)
- Oracle started this JSR - Tangosol's Cameron picked it up and drove it to its current near-completion state - then Oracle acquired Tangosol and I don't know what are the priorities now...
- A lot have had happened in data and computational grid field and things have just changed in the industry for the last 5-6 years
- JCP is very ineffective when it comes to running JSR. It is basically a full-time job when done right but somehow it is "billed" as some sort of a hobby
Regardless of these objective difficulties I would like propose 3 steps on how to invigorate JSR-107:
- Rename it to "Java Data Grid" JSR and re-concentrate on data-grid aspects of caching. Get GigaSpaces and Terracota more involved (additionally to existing members)
- Introduce sister JSR "Java Compute Grid" to augment data grid effort. Get Tangosol (Oracle), GridGain, GigaSpaces, and Terracotta on board as well as legacy vendors (if they wish to participate). Having a clear separation between data and compute grid will allow vendors more flexibility on where and how they plug their own implementations in.
- Provide tight integration between two as a key architectural point. This is a critical component since this is where things usually tend to get complicated - in integrating different products for data and computation grids
What do you think? Comments are welcomed!
I've had lately several interesting discussions on how to process massive amount of data on the grid (specifically, with GridGain). Imagine that you have say 100TB of data either in files (thousands of files on NAS) or in database (spread over dozens of instance and NAS). Let's say you are storing textual blogs and you need to calculate tag cloud (i.e. find 20 most frequent tags in those blogs). What's the best approach?
Now, the general approach is rather obvious: you split processing of the entire set into separate jobs, where each job finds 20 most frequent tags only in its subset of the data, then execute these jobs in parallel and aggregate results from all of these jobs finding 20 most frequent tags in the aggregated set - that's your cloud (note that statistically this algorithm is not correct - but for tag cloud calculation on a reasonably small grid it is more than enough).
The key to effective implementation is to avoid transferring massive amount of data, i.e. we need to move processing as close as possible to where data is stored - that is what we call an affinity split in GridGain.
If data is in files, for example, you can run multiple grid instances per core on each box in the grid since you'll be spending significant amount of time in network I/O accessing NAS. So, with 50 modern dual-core blades you can have a grid with 300 nodes (6 grid instances per blade) in it and if single computer takes 25 minutes to scan entire set and calculate the tag cloud - our grid should do it in about 5 seconds.
If your data is in database, things get trickier but still depending on specifics of how your data is stored you will likely get the linear performance increase. In some cases you can implement the scanning logic in database itself (like PL/SQL for Oracle) but in all cases your database design can help greatly in reducing amount of time needed to find most frequent words (keep updating frequency on insert, run regular batch scanning, etc.). Again, avoiding data transfers from one physical location to another is the key here. If you have 30 database instances, for example, you may have 10 grid nodes per blade hitting one database instance (if good portion of the work happens on database layer). All in all, you should be getting pretty much linear performance increase.
As you can see a lot depends on specifics of the task. GridGain provides a lot in terms of the basic middleware plumbing: failover (what if database connection got lost), collision and work balancing (what if one of the blades in your grid is busy with other work), dynamic grid topology awareness (what grid nodes are there and what are their characteristics) splitting and aggregating support, and basic data communication among many other things.
Notice that I didn't mention data grids which are usually the obvious choice for storing massive amount of data on the grid. This topic is much talked about and I really wanted to show a simpler example because fundamentally the idea of splitting the workload won't change but rather specifics of how affinity split would work in that case.
Every four minutes someone around the globe starts the GridGian. In of itself these numbers are not that impressive and we have plenty of statistics from usage to website hits - but this one remains my favorite. We have released the first major version of GridGain just four and a half month ago and today GridGain gets started every four minutes...
Being the young project and a very specialized software - this is certainly great progress in my view. See for yourself what makes GridGain one of the leading Java grid computing frameworks: http://www.gridgain.org
Let me ask you a question: how much time do you spend on Javadoc development in your project and how much you customize it? Nothing at all, very little... That's what usually be the answer. Standard Javadoc created by default is good enough for most of us - but Javadoc is where users of your project will be spending most of their time when searching for documentation for your software so it may be prudent to spend a bit more time on improving it.
We at GridGain spent collectively weeks if not months optimizing and improving our Javadoc - how it looks, how it works, what it shows and how it shows. I've collected several tips through trial and error as well as the feedback we are getting on our project. Here's my list of suggestions (NOT including how to write good Javadoc comments - that's a topic for a separate discussion all together):
- Fix nagging "constant scrolling bar" issue that is still present in JDK 5. I'm not sure about JDK 6 but I won't be surprised if it is still there. The easiest way to do it is to post-process all HTML files after Javadoc is created (look on Google for a specific HTML fix for it).
- Fix awful visual design of standard Javadoc. Thick lines, excessive underlining, horrible fonts and colors. Open your main website - does your Javadoc have the same color palette, fonts and design cues? It should. Even better: open any of your favorite Web 2.0 websites. Does your Javadoc look as polished and neat? It should. Note that some of these issues cannot be fixed with just fixing existing CSS classes provided by standard Javadoc. In some cases, you will have to embed new CSS classes or styles via post processing.
- Standard Javadoc is too textual. Try to use some delicate measure of icons and images to provide better visual cues. However, using graphical art requires taste and understanding and usually demands good grasp on web design basics (color matching, good DHTML skills, picking appropriate icons collection, etc.).
- Always indicate version of the product in the header and the footer of Javadoc. You want to make sure people know what version of your product this Javadoc correspond to.
- Always indicate name of the product and the company behind it in the footer (including copyright and optional license statement). Logo should also be present in the footer.
- Use yFiles (or similar) to automatically provide UML diagrams for your packages, classes and interfaces. This helps greatly in understanding the structure of the component you are trying to describe. Remember: picture is worth of thousand words sometimes.
- In every page include prominently cross-linkage between all of your online development resources such as issue tracking, online forums, wiki documentation, etc. It should be right in the face of anyone looking at Javadoc. Also, if possible, include search capabilities across all of these online resources - most people don't know the specific location of the information they are looking for and will use search most of the time.
- Create your Javadoc groups correctly so that they form logical table of contents in Javadoc. Note that they may or may not correspond to the actual packages. Make sure to always provide short and descriptive package.html. Remember that first sentence in package.html (before first dot) goes into package description.
- Always mark (with some style of graphical art) the starting point in Javadoc (usually a group) where user should start looking for documentation when seeing Javadoc first time. It is usually forgotten idea but very important one. How many times you have opened some Javadoc and face dozens upon dozens of packages and you have no idea where to start... Clearly marked starting point is a great "ice breaker" for Javadoc.
As an example check out GridGain Javadoc and see if those tips make sense.
With the next point release of GridGain we will provide much better support for work stealing resource utilization algorithms on the grid.
Work stealing is not a new concept and is actually proposed for Java7 inclusion http://gee.cs.oswego.edu/dl/papers/fj.pdf
(for local threads in VM only). In a distributed environment, however, this becomes a much more complicated issue.
Here's an example of how a work stealing algorithm would work. Imagine you have a grid with 3 nodes. Each node is configured to execute only one grid job at a time. Two nodes executing one job and have 5 jobs in the local queues awaiting their execution (total 6 jobs on each node). Now, the 3rd node has just finished executing its last job and has no jobs waiting - it's basically just got freed up. In this case a work stealing logic should detect this situation in real time, grab 2 waiting jobs from each of the two loaded nodes (to make every node contain 4 jobs total) and move them to freed up node making an even resource utilization on this grid - all pretty much transparently for the developer of the grid job.
Ok, we omit plenty of interesting details and made a lot of assumptions - but example should be self evident. Algorithms like that are very important and what makes them ever more complicated - is that they are rarely can be implemented in a generic fashion. We at GridGain spent weeks upon weeks designing an APIs that will combine substantial out-of-box value and functionality as well as extensibility and configurability so that complex resource management scenarios like this work stealing can be easily supported and easily customized.
As a side note, in the end, it fit perfectly into our SPI-driven design and once again proved the worth of the approach we have taken almost 2 years ago in the outset of the GridGain product development - we (and the developers) can change the behavior of our product to almost unrecognizable level, yet without any changes to the core or developer's APIs and by simply providing different implementations for existing SPIs. You've seen this level of customization in Java IDEs like Eclipse - now you have it in open source grid computing middleware.
There is a general wide spread confusion on how grid computing, and specifically split/aggregate technique, can help scalability and performance of enterprise applications. I mean that subconsciously we can all feel that using grid computing improves scalability and reduce latencies - but few of us can clearly articulate how...
Let's first define what scalability and performance generally mean. Scalability means ability to process more transactions without increasing overall system response time. Performance or transaction latency generally means how fast the system can process a single transaction under a certain load.
It's important to note that low latency doesn't mean better scalability and highly scalable systems can have horrible performance. Each particular system will exhibit certain pattern on how these two metrics correlate.
What is Split/Aggregate? As I mentioned here, there's a significant confusion over the difference between Master/Worker and Split/Aggregate. Although the difference may seem benign, I think it's important nonetheless. Split/Aggregate design pattern, unlike Master/Worker, concentrates on a notion of splitting data or computation into multiple sub-elements rather than on a notion of remote execution.
There are two major types of grid computing: data grids and computational grids. Data grids allow splitting (partitioning) data and computational grids split (parallelize) processing. When using data partitioning exclusively you usually can only achieve better scalability. Why? Because due to partitioning you can process more transactions in the same period of time since each grid node processes a subset of data in parallel. But if the processing logic remains unchanged you can't significantly improve the performance, i.e. how fast a single transaction gets processed.
That's where computational grids come to play. Computational grids provide the capability to split or parallelize the execution of a single transaction leading to dramatically improved performance and reduced latencies. What's even more is that computational grids usually lead to a better scalability since you can now fine tune the data partitioning and resources utilization.
Data and computational grids are very different in how they approach their tasks. Their designs usually vary dramatically. Yet - you will need both if you want to achieve the real grid effect for you application.
In GridGain we provide what we believe the best computational grid framework for Java. But we've chosen not to re-implement data grids but rather integrate with all the leading solutions - and we do with Coherence, GigaSpaces and JBoss Cache. Most of the time no integration is really required as data grids are substantially different in design and can be easily used just along side with computational grids.
I've stopped by on QCon last week to listen for scalability and performance panel. There were guys from GigaSpaces, Terracotta, 3Terra and Tangosol (Oracle) as well as eBay and Orbitz to mix up the vendor list with some real-life customers.
Well, the talk was predictably boring with every vendor (with some exception) pontificating out every thought they had lately about scalability, latency and performance routinely mixing it up with their products props - and messages were rather long, inconsistent and confusing to a certain degree (I'll give it GigaSpaces's Nati for the ability to lose me several times over the course of one of the longest answers). I'm pretty sure that people in the audience were no where clearer to answering this rather important question after this panel than they were before.
I was surprised that no one had a simple and concise message that is clearly articulated (and that's what we strive for in GridGain): if you need scalability and performance - split your workload and parallelize it. That's all!
The closest we've got to it is when a guy from eBay (notice, not a vendor!) said that they have a motto in company "if you can't split it - you can't scale it"...
Now, many will say that this oversimplification and there are many concerns that needed to be addressed. And I say - rubbish. There are always going to be many concerns to address no matter what you do in software. But fundamentally, splitting and parallelizing your workload is the enough of architectural decision to gain scalability and performance. And it is simple to understand and usually very simple to implement given the right tools.
Yes, you will usually need to co-locate the data and processing logic (affinity splitting, sticky connections, etc.). And yes, you will need to split processing logic as well (and not only the data) to reduce the latency - but all of this are tactical decisions that will be driven by your specific requirements.
My humble advise is to, frankly, stop listening to vendors (including myself) and start applying common sense to what you do in your software. If anything - listen and learn from real-life practitioners like eBay, Google and Amazon on topic of scalability and performance as they have more relevance to your work than most of the middleware vendors. And remember what Antoine de Saint-Exupery once said "A designer knows he has achieved perfection not where there's nothing left to add - but when there's nothing left to take away...".
We are working on providing integration between GridGain and Amazon EC2 (elastic cloud) computing infrastructure. Amazon EC2 enables an exiting mix of on-demand infrastructure and very cheap pricing point that makes small grid available even for individual developers with just few hundreds of dollars per month. We'll provide not just an image that you can upload to Amazon EC2 but very deep and native integration so that your Amazon EC2 images will become a natural part of your grid - yet you will maintain all on-demand flavor of Amazon EC2 infrastructure.
This work is still in design phase but few obvious extension points are topology and load balancing SPIs. Our SPI-based architecture is really helping our product more and more with every release. It provides this painless integration that we can keep on incrementally adding to the product with every release.
Stay tuned for more news!
Ok, what I want to do today is to give you a real-life chronicles of grid computing usage. But just like the demo during my presentations this is really a real-life example
In demos, I don't show code in PowerPoint and don't open Eclipse or IDEA full of pre-built source code - I open Eclipse naked and start by creating empty project, typing in code and showing how grid computing can really be simple - no cut outs or paste-ins, you always see all there is to it to build a simple grid application.
So, today I will chronicle the real life example of how you can apply grid computing in your life.
In recently released GridGain 1.6 we are shipping native integration with JUnit 3/4. It basically allows you to run JUnits on the grid with very minimal modifications to your code while preserving full semantic of local run - you just click Run button, see your green bar go and get all the log output in console window as if you were running all JUnits absolutely locally - they just run faster, much faster...
So, we were preparing this integration, writing and debugging code and one day everything basically worked. Dmitriy called me and said that he thought of an obvious and natural way to test it further and prove that it actually does what we claim it does and see how easy it is to migrate to the grid... let's migrate some of our main JUnits (several hundreds) to run on the grid. Not only for JUnits but it is great test to see how our message of simplicity and advanced features will stand...
It was about 8pm at evening.
The idea looked natural and fun and so we agreed in a second and decided to time it just for fun of it. So, at about 9:30pm that evening I logged in to our data center provider and spent 15 minutes ordering 2 more boxes for this test (Dual Core 2.13, 2GB RAM, Fedora Core 6). It cost us $220/month for two boxes plus $500 one-time setup fee. That's all hardware cost.
Next day around lunch time I got email from data center that boxes are ready, FTP and ssh accounts attached. Another 30 minutes was spent by our engineer to get latest JDK and install GridGain 1.6 on both machines.
So, at this point we spent collectively about an hour of time, ~$720 for the first month plus setup fees and we have 3 boxes for our JUnits (we had one previously on which we were running Bamboo for our builds).
Now, the fun began. We instructed one of our engineers who wasn't working on coding JUnit integration to basically take our test suites (we are using JUnit 3 and have several hundreds of tests) and migrate most of them (now, some of our tests can't be migrate since they are testing JUnit distribution functionality itself). He started in the morning around 10am Moscow time... Around 11:45am the same day I got message on IM "Nikita, basically works, all pass. I don't use Bamboo box - splitting on two new ones only. We got down to 25 mins from ~40". I also verified it by glancing at our Bamboo build graph and the last bar was roughly half the size of previous runs.
So, here we go. $720 and about 2 hours lighter we've got our tests running almost twice as fast. We cut our time in half for seeing integration issues, for fixing build problems and improved our visibility into projects state. What's even more remarkable is that we can spent another $220/month and get our tests to run close to 10 minutes making the whole project 4 times more agile than original. With such improvement to the project's dynamics we can go now and adjust our processes, for example, allowing last minute commits and fixes because we can retest the entire product from end to end in about 10 minutes flat. Or we can provide patches to our enterprise support clients in less than an hour including fixing, testing, building and sending.
Now, that's powerful.
I love the economics of this example. This is exactly where I see the value of new wave of grid computing products. Sure, you can use GridGain for all traditional heavy uses (and most people do) but it is the small things, like running JUnits that really underscore the GridGain's strength.
I almost wanted to put this blog under offbeat topic as I am going to put some of my personal thoughts into it about the seed funding, investment overall and our project growth.
We have had several serious conversations lately about seed or 1st round funding for GridGain project. GridGain project has been operationally cash positive from day one and we don't really need money to continue development at the current pace. However, the business side of the project has been growing much faster than we can catch up and that is where additional investment can be beneficial.
Seed or the first round is generally the most expensive in terms the valuations of the business. Valuation of any business at that early stage is really a guessing game. In the same time seed round is usually needed to complete some working prototype or proof of concept - but we have successfully launched product on the market with rapidly growing user base, and we have enough resources to do a decent marketing campaign to increase the awareness about out project.
So, in general I was always against any seed funding and GridGain project was bootstrapped in the best traditions of a shoe-string startups. However, as an open-source project we really depend on a large community of users, and in order to create and sustain this community we need more resources. Look at Mule and Spring - they never had a lack of market awareness but nonetheless took substantial investments lately to further their businesses. This can be a good indicator for us.
We are still debating whether or not to pursue the funding and at this stage it almost entirely depends on who we are going to be dealing with on the other side. It is my strong opinion that at the early stage of the business the non-monetary assets that investors bring to the board room are the most important ones. We'll see how it will go...
I came across an interesting research paper by Doug Lee that describes a proposed new feature in Java7 as part of JSR-166y (http://jcp.org/en/jsr/detail?id=166
): fork/join framework in java.util.concurrent package or how it is better known in grid world - Map/Reduce or Split/Aggregate.
Proposed feature only deals with execution on the local VM, basically providing a better support for multicore CPUs. It's obviously not a grid in a technical sense but it is interesting to see that grid computing ideas are getting into main stream Java tooling such as JDK. That will expose this idea to millions of developers and will certainly help to clear the fog of complexity of grid computing in general.
We at GridGain are really excited about this. First of all, it totally mimics our API that we have as far as splitting and aggregation and it is very trivial to implement proposed APIs using GridGain and gain traditional distributed grid computing capabilities (and we'll try to do it in some point release). Second of all, we were going to provide "local split" in the nearest future to provide for better utilization of modern multicore CPUs - and that's nice that Java7 is going the same direction. But we'll be able to provide it for Java5 and up.
Overall, I think there is growing realization that parallel processing and grid computing specifically will becoming a more and more important aspect of software engineering as we struggling to pair Moor's law and ever increasing demand for more processing.
This is rather a rare blog for me where I'm going to talk about success we have as a project.
GridGain had its major release in July 2007, less than 4 months ago. Since then we had 2 point releases with a lot of fixes, couple of new integrations and features, notable JUnit integration. Some may argue that JUnit integration is one of the coolest features we have in GridGain and I would not argue - nothing comes close to value/cost ratio this integration provides.
We've had over 4000 downloads and we've been actively used by more than 400 users and companies world-wide. Numbers are relatively small - but they are growing faster with each month and we are just roughly 4 months old project with almost zero budget on marketing or sales. We blog about our product, we do talks on conferences and nurture our first users on our support forums with great care. The way I see our support forums is that basically people are giving us ideas and testing for free by trying new and frankly unproven product. We are at GridGain are sincerely grateful to all our early adaptors. We are all engineers at heart regardless of our titles and nothing gives us more satisfaction than seeing someone using our product and liking it.
We are still a small project and a small company behind it. But we have big ideas and plans. We think the potential for grid computing is huge and our focus on small to mid-size businesses is proving to be fruitful. In the last 6 months we've seen people using our product in ways we could not imagine - clearly showing power of grid computing when it is cleaned from over-complication and over-generalization. When you distill idea to the very core, when you remove all the gung of legacy mistakes and approaches - you get clean and elegant software that is pleasure to use.
The biggest adrenaline rush we are getting as a team is when we receive a forum post or an email from someone who decided to drop us a line and just say how he or she is in love with our product... we are getting 2-3 emails like that per week now. I think each such email re-energizes the whole team for the next week - and working for 15 hours a day for months and months we certainly appreciate this energy boost
We just released a nice screencast (http://www.gridgain.com/screencasts.html
) for our JUnit integration with GridGain. In about 10 minutes you can see all it takes to write a simple test suite and run it on the grid with basically one simple annotation. Take a look (http://www.gridgain.com/screencasts.html
)!
In the last several weeks we presented about grid computing and GridGain at several conferences: Colorado Software Summit 2007 and Silicon Valley Code Camp 2007. These two conferences are very different but nonetheless we are getting the similar feedback from the audience - a very positive feedback. There is a genuine interest in grid computing and the product that can deliver scripts and unambiguous implementation of this concept.
The biggest kick though we are getting from our implementation of peer class loading (similar feature can be found in Jini/JavaSpaces but less powerful). It allows the developer to bypass any deployment steps when working with GridGain. Think about it: you change the code, you click "Run" button in any IDE and your just modified code works on the grid without a single Ant script, starting, stopping, copying or anything else - exactly as if you were developing non-distributed local application. What's even more, if you want to debug your grid application locally you can start as many grid nodes in the same VM and debug everything locally. Now, that's powerful.
We have been asked the question about LGPL vs. some other open-source license a few times before. Our usual answer was that we like LGPL and we are basically as free as JBoss (if you can use JBoss - you can use us). At GridGain we have chosen LGPL from the beginning 2 or 3 years ago and still believe it was the right choice. LGPL is community friendly, business friendly and gives us some minimal protection as well. Indeed, after you invested over the million dollars into product development and giving it for free to community you want to have at least minimally reasonable protection and LGPL provides it.
A week or so ago I saw a post from James Strachan on our forums. He basically liked our product and wanted to provide some integration between GridGain and Apache Camel/ActiveMQ. Since ASF expressly prohibits any LGPL software to be used in Apache projects - he could not do it.
We thought about Apache licensing before, mainly, of course, due to Spring being an Apache licensed. But we just could not come up with a reasonable schema. Well, thanks to James, we now have an elegant dual-licensing with Apache 2.0. What James suggested in his post is to extract just APIs (basically interfaces, enumerations, annotations and exceptions), put them into additional JAR and license this JAR under Apache 2.0 license. This way, James argued, Apache team can link against GridGain APIs binaries and build all sorts of SPI implementations while we can keep implementation details under LGPL - elegant and wonderfully simple idea. Thanks James!
We were in the midst of GridGain 1.6 release with all the new work we've done to provide JUnit grid integration and many other enhancements so I was naturally a bit worried to introduce this build change just days before we go GA. But I decided to bite the bullet. I added @Apache20LicenseCompatible annotation and marked every interface, enumeration, annotation and exception that we wanted to dual-licensing with this annotation. Then I changed the build process to create a separate JAR during the Ant build that will include only the elements with this annotation. All in all - 45 minutes work with all the testing. Done!
Apache 2.0 dual-licensing is now part of GridGain 1.6. Go ahead, download it and enjoy grid computing with GridGain (now with Apache 2.0 license too
There was a lot of chatter lately regarding Master/Worker pattern on blogs. It was primarily coming from various grid vendors (now I'm adding to the overall noise with my blog here).
Coming from grid vendors it is surely a sign of grasping for air or basic misunderstanding - and I don't get it. Many people are simply confusing, in my opinion, Master/Worker (a.k.a Client/Server for those who is old enough to remember early nineties) with Map/Reduce or Split/Aggregate patterns.
So, what is Master/Worker or Client/Server? In its basic definition it is a design pattern when a master sends task to one or more workers, workers execute the task and send results back to master. They key here is that task is being executed by different entity, the worker, on the request from the master. Curiously, it looks a lot like an old trusty RPC - and that is exactly what most of the grid vendors are providing - an RPC (except for GridGain).
What I think Master/Worker aficionados are missing is the fact that the key to compute grids is ability to split the task into logical sub-tasks, and then execute them in parallel and aggregate the results back. The word split is the key here, not the fact that sub-tasks will somehow get onto remote nodes. That is why in grid world we have Map/Reduce or Split/Aggregate patterns that are making a clear point of splitting decision because it is the key to achieving performance increase on the grid.
Now, I've heard several times that you can "stretch" Master/Worker to include splitting as well. In some abstract out case I guess you can. But it is not as simple as it seems. There's a lot that goes into design to properly support split & aggregate logic. But even in that case why create all this confusion? Just because your product doesn't have natural support for splitting - is not a good reason...
Take a look at GridGain and enjoy elegant Map/Reduce implementation. Decide for yourself if proper Map/Reduce is close to Client/Server
I just could not resist against this topic
I've read Dan Cirulli's blog (from Digipede - commercial .NET grid computing framework, and very nice one) and this Windows-only product still... doesn't support MS Vista. That's just startling to me.
GridGain - a Java-centric grid computing framework - supported Windows Vista from the first release (and it was nothing special to support it - just tested and make sure everything worked as expected). Is that easier to build Vista supporting products in Java than in .NET? I don't know - we don't actively develop in .NET anymore.
In my presentations I usually tell this real-life example about power of Java cross-platform capabilities. Unlike GUIs, where Java is not 100% reliably cross-platform, server-side is pretty much solid and here's the story.
When we released GridGain 1.0 we didn't have a formal support for Mac OS X. Put it simply, nobody on the team had Mac laptop or iMac. We tested on Fedoras, Ubuntus, Solaris, Windows XP/Vista, HP-UX, etc. - but not Mac OS X. So, we released but I didn't feel comfortable of leaving many Mac OS X diehards out of using GridGain (even the very 1st version). And I knew that number of people using Mac OS X for Java development was growing fast and cannot be simply ignored anymore.
So, I went to eBay and bought decent iMac. It got in, I unpacked it, installed Java 5 and GridGain 1.0. Then I run all of our about 500 JUnits on it. Everything... worked... perfect... from the first time! I re-run again and again - something has just got to be broken... You can't just run a complex middleware system with all its functional and scalability tests on new OS without something to be broken?!?
Well, to make a long story short we never had any problems with Mac OS X.
It's has been some days off that I took from blogging. We had major release GridGain 1.6 and I had some travel to do. I've got plenty of interesting topics to discuss and number of exciting news to report... I'll be getting back in the groove gradually though...
I'm right now enjoying beautiful scenery of the Rockies and wonderful Colorado Software Summit 2007 conference where I'm presenting during this week. Conference is a top notch: from organization, speakers, venue, food - just about everything has a feel of professional touch. It also has a feel of a tight knit community which is always great. Kudos to organizers.
I'm getting a great reception on our demonstrations. "Java Grid Computing With AOP" is leading the two presentation I'm making in terms the feedback I am getting. Another presentation about comparative analyses of three open source grid computing projects is rather a heavy lifting as I need to keep folks entertaining with 90 minutes of pure speaking... I'll try to spice it up a little for the following days.
This Saturday I will be presenting at Silicon Valley Code Camp 2007 right at home at Silicon Valley. No more travel this time and I will be giving a talk on "Java Grid Computing with AOP" and I always looking forward to talk about our project. If you are around Bay Area this weekend stop by SVCC 2007 - list of session is really exciting and you are certainly getting a feel of being on the cutting edge there.
When will grid computing achieve its Tipping Point and become contagious? Or will it ever? We have seen slow start, unexpected pick and rapid growth with JEE frameworks (Spring), JEE application servers (JBoss), and ESB (Mule). It is fare to say that each of these examples spread exclusively by the word of mouth (i.e. blogging) epidemic initially. Each of these technologies went through a set of well known adoption stages: from innovators, to early adapters and finally to a broader mass audience and "plateau" adoption.
For grid computing it is an open question and is something that I certainly have strong bias to.
I do believe that grid computing today is still used largely by innovators. Innovators are the ones who are constantly trying new and love to experiment with bleeding edge technologies and are prepared for uncertain results and potential windfall. Products that existed up until year or two ago required a lot of technological faith, significant commitment and questionable returns (as far as commercial returns anyways) - and that's where innovators get their advantage and disadvantage. It is interesting to note that being an innovator does not mean to be big, established company or stealth mode startup - both qualify and pretty much everything in between because this stage is driven primarily by individuals - by people who understand technology best and like to take risks.
Talking about innovators, many consider academia as a default innovators stage for a technology. I don't believe so. In academic world many things are tried and it usually has nothing to do with taking risk, expecting uncertain results and hoping for a windfall. It has more to do with genuine broad research and interest in different things.
In the last couple of years grid computing technology started inching towards early adapters, a group of users that are much large than innovators and more conservative but still considered cutting edge by the overall majority. This dynamic can be directly attributed to a crop of new generation of grid computing products such as GridGain, GigaSpaces, Terracota, and Digipede. Early adapters have a critical advantage over the innovators - they have innovators' experience. In every industry and in every segment there are companies that consider IT technology as one of their key weapons against competitors. These companies can be big or small, young or old but you always will see the new stuff being tried and used. It is much more calculated risk than one taken by innovators - but still a risk.
It is during this stage - the early adapters stage - when the quality of the products, among many other factors, starts to matter. Innovators can bare pretty much anything as long as the base idea is sound and product minimally works. Innovators usually don't have time pressure or expectations about usability or stability of the product. Things change rather drastically for early adapters. I would go as far as saying that quality of the product is a key for successful transition to early adapters. Usability, features, stability, overall quality, documentation - all the things that define software product now matter and matter progressively more as more people start using product.
This is also the time in the product life-cycle where you cultivate strongest communicators, or mavens, for your product. It is simply impossible to induce the necessary passion and excitement about the product or technology without sense of overall quality and "craftsmanship". And without communicators, the small group of people who can push your product over the tipping point - it will never see rapid growth and adoption.
Not every technology or product can spread on the word of the mouth and see rapid, epidemic like growth. In fact, very few will ever do. In the next couple of years we'll see if grid computing as a technology has what it takes to become an epidemic in IT.
We have got an interesting question recently on our forum from one of our users. The question relates to a use case when task gets split into very large number of jobs (thousands) and each job returns significant payload. The question was about the fact that GridTask#result(...) callback method provides a set of all previously received job results and if number of these results is big and each result's payload has significant size one can easily get out of memory (even on a high end box).
Now, this is an interesting problem. In fact, GridGain internally maintains list of all received job results to simplify this work for the developer - he or she doesn't need to do it manually. The downside, of course, at least in the current version, is that developer doesn't have control over this process - system just silently accumulates those results and provides them to the developer with each callback.
One way to work in situation when total size of all results coming back from split is over the memory limit is to store the actual payload on external storage (for example, via checkpoints or directly in file system/RDBMS) and return only keys as the actual payload. This will drastically reduce the amount of data being passed over the network and stored in memory on receiving node. During aggregation developer will need to use those keys and retrieve the data for the aggregation.
In GridGain 1.6 we are introducing special annotation @GridTaskNoResultCache that you can attach to the task to disable the caching of received job results. This way you can further limit memory consumption in above use case. One usage pattern would be to store every received job result right away in some data grid (i.e., named distributed cache) or external storage, and during aggregation retrieve the actual job results and from them retrieve the actual payload. This is a two-step process but it keeps node's memory consumption at the minimum.
As a side note I would like to notice that one of the key rule in dealing with massive data payloads in grids is to... never send them over the network. Network, even local gigabits, are very inefficient in sending large payload from node to node comparing to using shared resources like shared file systems, distributed caches or even RDBMS - store the payload on a shared resource and send only the key.
Lately I'm involved with a very interesting project that employs SEDA - Staged Event-Driven Architecture. That gave me an opportunity to learn about SEDA that is becoming very popular in ETL world (Spring has ETL SEDA-based project in works, for example) and see its pros and cons in details. What is SEDA? SEDA is a product of very interesting PhD work by Matt Welsh. In a nutshell, with SEDA you divide or partition your work load into serious of stages connected by queues (a pipeline), where each stage performs certain processing on the data it receives from its upstream parent stages and then sends it further downstream to the next stage or stages.
One of the original problems that SEDA is helping to solve is the fact that in real life stages are never homogeneous in what they do, and if one stage process is slower than the others - it will naturally become a bottleneck starving all downstream stages and wasting their resources. Moreover, bottleneck also leads to excessive queuing and buffering on incoming messages.
They key ingredient that SEDA provides to solve this problem is ability to communicate feedback between stage and its upstream senders - so called "backpressure". With this communication link a stage can ask upstream stages to "slow down" creating a compound effect of effectively balancing the load on the entire pipeline.
Still, even with basic SEDA in place, the limiting factor of this architecture is that overall processing goes only as fast as the slowest stage. SEDA basically ensures that pipeline is loaded as heavily as acceptable by the slowest stage to process its workload without overloading. In many cases this is not enough.
In real-life situations SEDA mostly deals with balancing of memory consumption and DB activity. It doesn't really deal directly with CPU load or provide a way to improve the performance above the one that's achieved on an optimally loaded pipeline. That's why I think SEDA and traditional grid computing are very complementary technologies: while SEDA provides nice architectural basis for ETL processes, grid computing provides a necessary brute force for those stages that require performance increase achieved through parallelization.
Now, with grid computing in place and SEDA you can create really scalable architecture. Not only you can balance passive or dependent resources like memory and DB or I/O activity but you can also balance out CPU load as well. For example, GridGain provide highly efficient and elegant API for splitting the task into sub-tasks for parallel processing that can easily react on "backpressure" communication from SEDA and, for example, increase or decrease number of splits to "speed up" or "slow down" the parallelized processing.
Needless to say that with this approach you are also removing the main limitation of SEDA approach (fast as the slowest stage) since you can increase the performance of the stage through parallelization.
Luis Cheskin was born in Ukraine and immigrated to America in 1920s. For anyone who studied marketing name Cheskin is immediately associated with... packaging. Cheskin was the first marketing practitioner who fully understood value of packaging and its perception and defined "sensual transitivity" or in lame terms "we really do care about the packaging of the product". The company Cheskin founded some 60 years ago (outside of the San Francisco in Redwood Shores still carrying his name) is the household name in branding and packaging marketing to this date.
Cheskin put his name firmly on the map by solving "margarine problem". In late 1930s margarine wasn't selling well. Various reports and studies indicated that this product has no way to go but down. Cheskin was hired to understand the main reasons margarine was not selling while, for example, butter was selling particularly well.
Through number of focus group studies (which in of its own was a novel idea) and research into the matter he made the following recommendations:
- make margarine yellow so it would look like real butter
- package it into fancy foil (foil was a sign of a quality back then)
- name it "Imperial Margarine" to give it a more upscale sense
It is startling to notice that there is absolutely nothing about improving quality, taste or anything specific to the product (except for the color). Well, to make a long story short, sales of margarine rebounded soundly and Cheskin never looked back.
Fast forward to our time.
I like to call the similar concept a simpler name - perception, because it adds time component to the instant sensual transitivity - perception is something that is built over time in many cases. I often say "perception is everything". Let me give you an example. Let's say you have a software product in a competitive field and this product has comprehensive UI as its significant component (i.e. Java application server with Web-based administration console). I would claim that if your UI looks particular unappealing from visual prospective (style, graphics art, colors, overall visual design - notice, nothing about its functionality) this perception will undoubtedly spill over to entire product even if the product has nothing to do with visuals. That's what I call a perception. And in a highly competitive market this can be enough to tank the sales.
This "spill" or sensual transitivity happens on subconscious level and often cannot be explained logically or through the reasoning - yet these snap, rapid cognitions are extremely hard to overcome and thus are very important. Honda Civic is not cheapest, or prettiest, or most reliable, or most stylish, or most powerful, or most feature-full car - but if you look at Honda sales numbers you will clearly understand the power of perception that this car has built over the years.
How often do you stumbled upon a website, documentation, Javadoc, screenshot, presentation, etc. that just doesn't look professional enough and you have this immediate gut reaction that stuff "behind" is probably shady too? It happens to me a lot. I observed this reaction several times for the open source projects on sourceforge.net. They all feel the same through a cookie-cutter approach looking rather cheap and unappealing with a bunch of flashing ads on every page. It creates strong perception that project on sourceforge.net cannot be innovative, unique, or anyhow distinguished - very damming characteristics for the software. However, by just simply creating a decent 5-10 pages website for your project with next to nothing investment you can avoid most of this perception problem and concentrate on the project itself.
It all sounds simple and rather obvious. And it is for many who understands and cares about perception. If you are an engineer, open source project founder or aspiring entrepreneur you just cannot afford to ignore perception and you need to start building up a positive perception to augment your overall product.
See www.gridgain.org for an attempt to build a positive perception about one grid computing product in particular. At GridGain we pay attention to all details starting with website, pictures, and ending with code structure, comments' grammar and punctuation (we run grammar checker for all comments). Just look at our Javadoc: we've worked days and nights to make it easier to read, easier to navigate between Javadoc, wiki, JIRA and online forums; we included UML diagram for each class and interface to ease the understanding of the static composition. We synchronized colors for our website, wiki, JIRA and forums and customized their UI appearance. Little things but they give overall continuation to our various web assets. All in all these small things help us create a positive perception of the product.
I have known that things weren't going well at all at ActiveGrid for quite some time. Product was stagnating, lack of any news, etc. Now, top management quietly got replaced (gone are Peter Yared and Jeff Veis), new product direction (bought Ajax company for Dojo tool) - near death VC-driven maneuvering...
Back in 2004 they were flying really high. They've got $3M VC funding just when VC market started to wake up, the idea was fresh - combining grid computing with emerging web 2.0 infrastructure/applications. They also pitched themselves as a commercial open-source company (it worked back then very well, I guess).
From the get go the product seems rather strange: grid server (some kind of transactional middleware which is already an odd combination of grid and transactions) and application GUI-based builder. Now, when I see infrastructure company building GUI development tools I really cringe: haven't we seen BEA Workshop disaster? I mean why do these people think that Java developers will just abandon Eclipse or IDEA for Java development? Anyways...
In 2005-2006 ActiveGrid did a turn around for Java and switched to LAMP which was, I guess, a cool thing at the time and they have become "Enterprise LAMP" getting lightweight from being Java (which probably meant heavy). Good idea, in general, but horrible business move in my estimation - which proved to be true.
Now, September 2007, ActiveGrid announced the acquisition of TurboAjax (pretty much one-man operation). TurboAjax is a nice Ajax GUI builder for Dojo framework (the same people contribute to both projects). Where does it lead ActiveGrid? I don't know. One thing I know for sure - away from grid computing...
Oddly enough, Peter Yared, now former CEO of ActiveGrid, is working on his next web 2.0 mashups, etc. venture called "wdgtbldr.com" (I had to copy and paste it to make sure I got it right). Well, good luck.
Recently, companies like DataSynapse, United Devices (now with Univa) and some other moved from being grid computing companies to... application virtualization companies (in case of DataSynapse it happened literally overnight - website updated, promos updated, PR issued, paid articles published).
The reason for this move is classical: these businesses gave in to younger companies developing in grid computing area with products that are simpler and cheaper. Among these companies are GridGain, Terracotta, GigaSpaces, Tangosol and number of open source projects.
Why did it happen? The answer is that with cheaper and simpler products hitting the market it became increasingly hard to grow business and maintain margins for companies like DataSynapse that have significant run-rate, obligations and expectations and so they had to move on to the "next level" and distance themselves from upstart companies to retain growth rate. It is classical Innovator's Dilemma that can be found almost in any business.
DataSynapse, United Devices or Platform Computing simply had no choice if they wanted to survive. It's not that their products were particularly bad or overly expensive - they just got hit by the natural evolution of the marketplace - Disruptive Technology. Grid computing is getting commoditized and you have to have a completely different business model to operate in commoditized market - and it is almost impossible to slim down business to the level necessary to compete with upcoming younger companies developing a disruptive technology.
Look at what happened to BEA System, for example, the company that I really admired during late nineties. They were making billions (literally) selling J2EE application server and were considered example of business and technological excellence but as JBoss project and business matured and gained popularity they had to move up to "liquid computing". In that transition they almost folded, lost practically all reputation in developer's community, lost almost all bright people and I have no idea how they are making their ends meet today. Yet JBoss was just a tiny speck, practically a dozen man operation, comparing to billion dollar BEA Systems. That's Disruptive Technology in full force.
Market forces are never instantaneous; they have a lot of inertia built it. They are like tsunami that takes hours to reach the shore and it is almost undetectable to the naked eye in the sea - but when it hits the shore the damage is usually catastrophic.
I'm not going to discuss merits of Test-Driven Development (TDD), whether or not you really write tests first and how often. I think we can all agree that unit testing (as part of the overall, much larger testing strategy) is a key component of any serious software project and probably the most important aspect of any Agile project where frequent commits, build-per-commit and instant feedback from the build are the key ingredients.
Lately, I'm seeing (figuratively speaking) more and more projects that have hundreds upon hundreds of unit tests. Usual environment for these tests involves developers running some subset of these unit tests as they develop locally on their workstations and build happens on some juiced-up server where all JUnits are run every time build is made.
What I still see is that people run builds only at night or just several times per day. The reason? "The freaking thing takes X hours to complete(!!)" - is usual response. Teams that have only nightly or otherwise infrequent builds are highly inefficient:
- It is taking a whole day of commits before they can even spot an integration problem or problem in some subsystem that nobody cared to run test for
- As a consequence fixing the problem takes by definition more than one calendar day and often the actual place (and troublesome code revision) is hidden by consequent commits
- Fixing broken build becomes a torture as you need to wait for the same X hours to confirm whether or not you fixed the build
- Visibility into the project is delayed by at least a day
- Employing off-shore teams gets ugly as they often given broken build or producing broken build depending from which side you are looking at it
But what can you do? "It takes Ant about 10 minutes to build the branch and then junits run for 5 hours straight. Thank God if nothing is broken otherwise we have to wait another 5 hours for re-run..." - is the usual sentiment. Think about: how often have you heard this explanation...
One thing you can do rather quickly (meaning in the matter of hours in most cases) is to grid enable your JUnits and run those unit tests that can be run concurrently - in parallel on the grid. I would go on the record saying that over 90% of your JUnits would usually qualify to run concurrently. And that means that you have got an embarrassingly parallel problem on your hands and products like GridGain can easily solve it for you.
With upcoming release of GridGain 1.6 you can grid enable your JUnit literally in minutes (see one of my blogs posted on Monday). Install GridGain on as many nodes as you like (simple unzipping takes less than a minute per node) and then change the test suite class for ours. In most cases that's it there is to it. Note that Junit3 and JUnit 4 will be supported out of the box.
After GridGain 1.6 is out I will come with a screencast that will show all these simple steps. Meanwhile, you can download GridGain 1.5 and enjoy grid computing in its original, intended form
I have recently read the book called "Blink". This is a fascinating book about power of rapid snap decisions or what's called "rapid cognition". We usually discard these often seemingly intuitive, fleeting and subconscious decisions because we don't believe they can be correct - we didn't spend enough time to analyze the problem, we didn't look deep enough, we didn't do any of the things we usually associate with well thought out and soundly made decision. Yet - as Blink's author shows - snap, unconscious decision, or thin slicing, can often be the best decision to make - quick and correct.
As I was reading this book it occurred to me that I often do thin slicing when I'm looking at someone's code. I do it a lot (be it a code at work, in our project, in open source projects, examples, blogs, etc.) and I usually make up my mind about the author of that particular code in may be first 20-30 seconds of looking at the code.
This has all the familiar characteristics of rapid cognition:
- It's quick
- Most of the time I strongly convinced about my decision
- I cannot easily (or cannot at all) explain why I came to this conclusion yet somehow I strongly believe it's right
Third point is the most important one in my opinion. Subconscious decisions are hard explain; they happen behind locked door of our conscious, yet they are not random or spontaneous. Our brain works extremely fast using things like previous associations and experiences, and often subconsciously we make a decision long before we can properly articulate it in a traditional way of logic and reason. Again, in many situations we reach the same conclusion just much later...
After reading the book I tried to understand how I reach such a strong conclusion about the author's professional level by just spending less than one minute of looking at his or her code.
When I look at the software source code everything in the code carries intricate small pieces of information to me: naming and naming convention, overall consistency, punctuation and grammar in comments, indentation, code structure and data structure, etc. All of them are not equal in their relative weight and I usually omit actual correctness of the code - but all of them create tapestry like picture of the code.
Each of these characteristics can be carefully analyzed independently and over significant period of time one can come up with definitive mathematical score - but somehow our brain is capable of reaching the same conclusion in just a few short glances...
I picked up the idea of comparing human performance to CPU performance in another blog. I think this is a great and fresh way to look at developer's productivity in general: what kind of tradeoffs we can make between two? Any at all? Should human performance always trump CPU performance? Or otherwise?
Common sense and past experiences will probably tell us that human performance should almost always be a deciding factor. Indeed, we moved from assembly language to C and C++, and then to Java all the way loosing on raw CPU performance ability but gaining immensely on human performance while building more and more complex software systems. The fact that we can in fact build these complex systems proves that tradeoff was right.
Another obvious observation is that Moore's law helps a lot at compensating whatever we have lost by using more high level abstractions or more productive development tools vs. extracting every last bit of hardware juice.
Market trends tell us similar story. Look at Spring and how singular vision of simplified Enterprise Java development (comparing primarily to clunky EJB2 model) led this product to its current popularity. Look at EJB3 vs. EJB2 for the similar story.
Our own project is based around the same idea. Grid computing as no other software technology almost prides itself on its current complexity and chunkiness. Although ideas behind it are rather simple and old it historically has been marred by over generalized implementations and academia sting.
A new generation of grid computing products is emerging like GridGain and Terracotta that put simple and productive usability - the human performance - on top of any other concerns. Judging by our past experiences it should be a good tradeoff...
"Java Grid Computing with AOP" vendor presentation will be shown this Friday at 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) (http://www.grid2007.org/
). We are really excited about this conference as this is focused exclusive on grid computing. There will be plenty of fine folks from academia and business and talks are looking very interesting. This year this conference is hosted in Austin, Texas which should be fun place to visit.
This blog's idea is inspired by a funny year-old montage of Steve Balmer's intriguing performance at Microsoft developer's conference:
As funny as it is (and Balmer's dancing is truly startling) montage actually hits an important point: it could really be the most important words for the Microsoft or for many other software technology companies.
I was once at the presentation for startup marketing and the guy made very simple but powerful observation - when you are building your software always visualize the customer you are building it for. It may sound like a cliché but it really does help to put a real face on the mythical customer...
At GridGain our customer is a developer. If you develop in Java and you care about performance of your application - you are our potential customer (we don't sell software - it's open source and free but nonetheless). Just like RoR committers develop for themselves - we develop GridGain for us, for developers. We are practically devoid of any marketing push for this feature or that feature. We try to see how we would use GridGain and try to make it fit in a best way.
Of course, we are not alone in this way. Good example is JIRA/WIKI/Bamboo from Atlassian. Why does it fit so well and developers love it? Because it's developed by the same developers who are using. It was not for marketing types telling what features have to be in. Yet another example is distributed JUnit. Our tests for GridGain project are taking upwards to an hour every time Bamboo kicks in (and it kicks in for each commit). There are several frameworks for JUnit distribution but none are seriously acceptable. We need it - and we are building it right now. It's already working in the lab and wait to see how simple and transparent it will be. It will truly revolutionize the way we run tests forever...
Not all software obviously can be developed this way. But when it can - it's the best way.
Posted at 09:57AM Sep 15, 2007 by Nikita Ivanov in Technology | Edit
You don't realize but you already have it on your desktop... Rapid Application Development (RAD) tools are essential for quick prototyping, piloting and further development of software systems. That's why things like Flex, Silverlight, and .NET are so productive.
With GridGain you don't even have to learn a new tool - your Eclipse, IDEA or NetBean is a RAD tool for grid computing. Most of the people who are evaluating or using GridGain don't pay much attention at all to the fact that GridGain allows you to run multiple grid nodes on the same computer or even on the same VM.
Running multiple grid nodes on the same VM is enormous productivity boost. You can easily test multi-node configurations in isolation (like, for example, session-based algorithms where jobs exchanging session attributes or synchronize). You can conveniently debug your multi-node setup all within the comfortable confines of your favorite Java IDE - no more awkward remote debugging or scanning endless log files.
Other things that people take for granted is the simple deployment model and peer class loading. Unlike practically any other grid computing solutions out there in GridGain you can deploy the grid task with a one line of code (let's say in your test) and once this task travels to other nodes for execution (local or remote) they will peer class load this task from the node it was deployed on - leaving you, the developer, to simply work as if everything you do is local.
Rapid application development for grid is a reality. With GridGain you can quickly prototype, develop and test application on the gr
That's what WTF record says on Technorati for "GridGain". I think it describes it some what accurately, although similarities are rather superficial. GridGain is all about doing to grid computing what Spring and Jboss accomplished for J2EE just few short years ago. Hm... there goes another "is like" comparison.
While we are at it let me rephrase it yet another way - "With GridGain we've taken Saint Exupery's principle quite literally".
Have fun and enjoy grid computing!
I frequently hear the comment that our users can't achieve the ideal split: basically, split the task onto homogeneous grid and have every grid node complete its sub-task in the same amount of time (+/- small discretion).
It may look rather obvious for homogeneous grid and embarrassingly simple parallel problems - but this simplicity can be deceiving. Here's a good story.
I once wrote a very simple grid example that was searching for number of prime numbers in a given range. I used optimized non-trivial algorithm but yet managed to put it into ~50 lines of code - indeed a very simple example. I had two identical PCs and I was up and running in about an hour (including all the debugging, etc.).
For small ranges everything worked fine; I was getting 2x time performance. However, the bigger the range I pick the bigger the discrepancy I was getting. I checked and re-checked the code and everything seemed fine. But I was getting sometimes ~15-20% difference in timing between two nodes for large ranges!.
I spent next day searching for a bug in our kernel. I traced everything and, although GridGain guts are complex, for this trivial example I could not find anything wrong.
To make a long story short the answer was very revealing. My grid task was taking a range, divided it into number of equal sub-ranges (number of sub-ranges equals to number of grid node available) and each grid node was searching for primes in its sub-range. What I missed is a simple fact that distribution of prime numbers is not constant and some sub-ranges would have significantly more or less primes than others. And performance of my optimized prime searching algorithm was heavily dependent on how many primes there were in the range.
That thought me a good lesson that in a distributed parallel processing environment like grid computing even simple tasks can pose a challenge.
I have been thinking lately on how to improve our message. How to make it crisper. How to make it to describe our product in best way and in a short amount of words.
That exercise left me with the following observations:
- we are confused with Data Grids
- we are confused with virtualization
- we are confused with VM clusterization (Azul or Terracota)
- we are confused with JGroups and other reliable messaging
- and many don't get what the hell we are doing at all...
For about 6 months since the release of our software our positioning was based on three simple facts:
- Focused on computational grids - and being the best at it
- Made in and for Java
- Open source and professionally supported
However, that left one particular characteristic of our product that almost anyone who's using it points out and the one that we worked really hard to achieve: simplicity of use. It may sound cliché and overused but nonetheless it is hugely important. If there is one plague that cripple most of today's software it is over complexity and bloat.
I am an avid collector of old 8-bit and 16-bit home computers. That's how I got into software engineering after all. I have mint C64/128/Amigas and range of Atari XE128/ST/STe with lots of software. I do C64 assembly for fun. And I still amazed by how much you can achieve with so little... I think proportionally we have gotten a lot worse today.
With GridGain we try to bring a "small-town" approach to fairly complex problem of grid computing - the type of distributed computing. Look at the source code of GridGain. It's pretty compact for what is does. You really just need to know one interface to use GridGain. I could easily port it to Amiga platform if I had just a bit more time ![]()
I have sort of a dream that people will look at grid computing in much the same way as we look today at JSP or JMS - technologies that are everyday tools for all enterprise Java developers but yet were considered advanced and complex just short 7-8 years ago...
In the end I liked this tag line: "GridGain - Grid Computing Made Simple".
I saw a reply on TSS from Ari (of Terracota) and it just occurred to me that there is some confusion in understanding of what Terracota is and how it is related to product/technologies like GridGain, GigaSpaces or Coherence.
I can blame Terracota folks themselves because their product offering is rather confusing and you constantly have to read between the lines. I challenge you to go to their website and within 1 minute tell me what their products do...
Based on Ari's reply I kind of agree with him that it's just wrong to compare Terracota with GridGain (or GigaSpaces or Coherence for that matter). Terracota is not a grid computing technology and it's pretty far away from it. Terracota is a VM clusterization technology, basically, providing virtual heap and distributed thread synchronization done via byte-code augmentation. This is a far cry from anything close to grid computing products and it is rather a technology that can be potentially USED TO DEVELOP products like GridGain - not a functional replacement for it.
This is an important distinction because marketing messages and positioning can be sometime confusing and blur the lines, often on purpose. This is not in any means to say that Terracota is particularly bad product - but it is a low level middleware while GridGain is a higher level grid computing product.
I've said almost two years ago that Terracota seems like Azul's technology implemented on software level. I still don't get the value proposition of such trickery with Java semantics and whole-sales distribution of data... 1st rule of distributed programming is "Don't Distribute".
Terracota guys are rigorously working through $13.5M they have gotten year or so ago and they are making every bit of effort to fix the perception about their technology which is in my opinion a "Look, Ma, no hands!" kind of product.
One of the most frequent questions we are all getting here in GridGain project is how can I use grid computing... If I'm not a big Wall Street firm or searching for oil or building spacecrafts - what can it do for me?
The answer can usually be very long. I prefer, when possible, to give a short one and just tell about some specific and simple real-life use case. Here's one...
There's a startup that's trying to build a social community around open source projects and people that participate in them. One of the main functions of this website (its backend systems) is to constantly scan SVN/CVS repositories and build its own data warehouse off this harvesting process. Scanning SVN/CVS repositories (and building necessary data from it) is an embarrassingly parallel problem and it's ideally solved by computational grid computing framework like GridGain.
GridGain provides the following:
- Uniform programming model whether you run on the laptop in a Eclipse or on 100 of nodes in production environment
- Unparallel developer's productivity - trademark of GridGain technology
- Seamless scalability: buy a server, install Java and GridGain, start the node - you are done.
- Rich grid computing features such as failover, collision resolution, connected tasks, scheduling, etc.
We've been asked several times already why we don't ship fully compilable distribution of GridGain. We do ship entire source code as required by LGPL - but we don't ship all 3rd party libraries and build script required to build entire project.
One of the obvious reasons is that we integrate with WebLogic, JBoss, Websphere, Coherence, Mule, etc. and shipping is all these products is impossible (and in many cases license-wise impossible).
But my view on that is that why do you need it? What was the last time you rebuilt, say, Mule or JBoss? From my point of view availability of the source code is a great documentation and debugging tool. I can hardly imagine anyone seriously modifying GridGain source base - and, of course, if you happen to have a real need to do it you can easily do it in less than 20 mins (you just need to download whatever dependencies you require). And if you are realy itching to develop grid computing middleware - join our project!
We take a great care of the source code that we ship. It's well documented with Javadoc, it's well structured and rigorously code-reviewed, and we inject file names and line numbers into all our exceptions so that if you get a stack trace you can easily glance into the source code to see what is happening.
I just realized that if you use full Globus stack or Sun Grid Engine (SGE) you cannot have Windows nodes in the grid as both technologies have some Unix-only components. I understand the origins of it, I understand that you can limit certain things to get by but it is still rather startling.
This highlights power of Java WORA concepts. No matter how much bashing it took over the years it does work and it works almost flawlessly in middleware world.
We have fairly complex product with GridGain and I frankly don't remember when was the last time we had OS-related issue in our tests. Developers on our team work in mixed environments of Linux (Ubuntu, Fedora, Suse) and Windows (XP/Vista). About the only place where we deal with OS specific is in our installer - and that's about it.
Guess what - we run perfectly on Mac OS X too - first time we tried it all tests worked flawlessly. Speaks volumes about Java on the server side.
With GridGain 1.5 we integrate natively with Coherence (funny that we've started this work when it was Tangosol's product and finished when it became Oracle's new addition to their Fusion middleware stack).
Coherence is a unique product in its category. It is really one of the few Java middleware products left that consistently beat all open source alternatives and people are happily paying per-CPU license money for it (which has gotten rather very expensive). You can categorize Coherence as great design, excellent performance, brilliant spokes-man and... horrendous coding style (did you see those brackets...)
And, if your name is Bill Burke, you have got yourself a life-time partner in online sparring.
Regardless of anything, Coherence is almost a standard de-facto in data grids and it was a key for us to have a native integration with it. Coherence has some rudimentary features to perform basic computational grid tasks but they are rather afterthought - Coherence has always been and will always be a great distributed caching technology.
GridGain 1.5 provides Coherence-based SPI implementations for communication and discovery. What it means for you as a developer is that when you use these SPIs all grid communication will go through Coherence networking protocol and GridGain will use Coherence cluster capabilities for its grid node discovery. What it means for you as network administrator is that if you already have Coherence deployed - you need to do exactly nothing to your network to add computation grid capabilities with GridGain as existing network protocol, settings and configurations will be reused.
On the surface, GridGain almost becomes an extension of Coherence functionality. This duo is very powerful: most of the time you need data to perform computational tasks and caching becomes essential to maintain the performance advantage you got by splitting and parallelizing your work load.
We are working on loader for Coherence to further simplify the startup of GridGain within Coherence. Stay tuned for that in GridGain 1.5 point releases.
Download GridGain and Coherence (eval) and play with this powerful duet. And as always - enjoy grid computing!
I blog-ed some time ago about GridGain being developers friendly and how we augment exceptions and assertion during the build time. Here is couple of more features that fall into the same category.
Grid Nodes
In GridGain, you can start multiple grid nodes (in default configuration) on multiple computers (naturally), on the same computer (more interesting) and on the same VM (that's pretty cool).
If you are using default configuration (basically, IP-multicast discovery and TCP/IP communication) you don't even have to change any configuration to start many nodes on the same computer. The only addition to start multiple nodes in the same VM is to supply a unique name for each grid instance. In most case you just keep double-clicking on gridgain.{sh|bat} script and will keep starting grid ndoes.
Now, why is that important? Anybody who's done its own share of debugging code problems in a distributed environment will immediately appreciate the ability to run the same scenario locally, especially in the same VM. Debugging and testing productivity goes like 10-fold in this case.
Peer Class (Re)Deployment
This is the feature that I talk often about. This is probably the biggest boost to developer's productivity among all other features provided by GridGain. It basically allows you to work as if in local environment yet seamlessly run the application on the grid: change, recompile, run and your latest changes are running on the grid. No special deployment, Ant scripts, copying, restarting or anything else. Works independently from any IDE so you can be an Eclipse, NetBean or IDEA fan or emacs or vi stronghold.
You can watch our screencast to see peer class loading in action.
As always - enjoy grid computing with GridGain
GridGain has fairly advance and flexible support for preemption or preemptive scheduling. Preemption is basically when grid job decides or gets instructed that it needs to stop its execution on current node, save its state, migrate to another node and continue its execution on a new node from saved state (distributed continuation). For example, collision SPI can decide that a certain job needs to stop and move to another node. Another example is when job monitors its execution performance and determines that current pace is not enough and decided to preemptively move to another, presumably more performing, node.
Preemption is a complex multi-step process and GridGain supports it with several components working in ensemble:
- Collision SPI provides a way for the system to have a centralized place where each job goes through for resource contention and can be affected (buffered, executed or cancelled)
- Failover SPI provides means of re-mapping given job to a new node
- Checkpoint SPI provides means for storing and retrieving intermediate state
- GridJob cancellation and session attributes provide all necessary API mechanics of stopping job and passing any necessary user information between SPIs, job and job's siblings executing on other nodes, if required.:
Preemption is not trivial mechanism but it can be the only solution in systems with complex resource management requirements.
One real life example where preemption support is required is for priority-based near real-time (nRT) grid tasks. In nRT cases certain task must be given all local resources to be completed within given QoS specification. The only way short of outright canceling all currently running jobs executing on this node is to preempt them onto other node(s) without loosing all the work that has been performed to this moment.
I've got some questions on my previous blog on the same topic asking basically for more specific explanation why it is critical to expose the grid topology. I guess when you work in this field for a long time you are getting blind to certain things - hence I will try to provide more detailed explanations below...
Example #1: Your grid is heterogeneous, i.e. nodes are different in terms of CPU, memory, network bandwidth, IO, etc. In this situation split should always be "weighted" by node's characteristics meaning that original task cannot be split into equal jobs but rather each job should be of appropriate "size". Failure to do so will result in unbalanced grid and drastically reduced performance.
Example #2: Your task distribution is heterogeneous, i.e. not all grid nodes can or should execute the jobs from this task. In other words, when you split you pick only a specific (often dynamically changing) sub-set of grid node for your task distribution. This scenario happens practically in any situation: you have existing silos of blades that handle specific type of tasks, you have sub-grid dedicated to a new application, specific segment of your network is dedicated to on-demand computing only, etc. Failure to provide this topology awareness really makes these cases hard or impossible to implement.
Now, examples I provided are really kind of basic examples and each particular real-life application(s) would likely have its quirks and requirements. Here's one real-life example from GridGain usage: our user wanted to do a dynamic CPU scavenging but only for desktops in a specific subnet; all other desktops in the office should be automatically made available to the grid from 9pm to 6am (basically, overnight) and not available during the rest of the day. With GridGain this is the matter of 5-10 lines of code - and it's beautifully elegant.
