Welcome, Guest
Guest Settings
Help

Home » GridGain Forums » Installation and Configuration

Thread: Peer class loading timeouts on EC2 GG 2.1.0


This question is not answered. Helpful answers available: 2. Correct answers available: 1.

Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 3 - Pages: 1 - Last Post: Feb 1, 2010 6:28 AM Last Post By: dozer Threads: [ Previous | Next ]
sharakan

Posts: 3
Registered: 12/29/09
Peer class loading timeouts on EC2 GG 2.1.0
Posted: Dec 29, 2009 9:41 AM
 
  Click to reply to this thread Reply
We're working on ramping up to run some 40000 tasks on an EC2 cloud, using using the GG 2.1.0 ami to run 19 instances, along with an OpenMQ instance to take all the tasks.

This works fine as long as I'm only trying to run 10 or so tasks, but as soon as I ramp up to hundreds, I start getting lots of peer classloading errors, like the one attached at the bottom here. After some time, it does start running my tasks, but the ones that failed due to class loading stay failed (my task is a subclass of GridTaskSplitAdapter, and does not extend result()) stay failed. For example, I ran 280 tasks, and only 29 successfully executed.

For the moment, I've overridden the GridTaskSplitAdapter.result() method so that for ANY GridException, we indicate that the task should be FailedOver. This doesn't seem like a long term solution though, so I'm hoping someone can give me an idea on what my options are for getting around this. I'd like to stick with peer classloading, as the code we're running in this case will be changing so I'd rather not have to create a new image, deploy our code to it, etc. I imagine one option is to make our own GG image that has a much higher peer class loading timeout, is that the best solution?

Exception:
----------
<div class="jive-quote"><div class="jive-quote"><div class="jive-quote">Type: org.gridgain.grid.GridException
Message: Remote job threw user exception (override or implement GridTask.result(..) method if you would like to have automatic failover for this exception).
Documentation: http://wiki.gridgain.org
Stack trace: 
    at org.gridgain.grid.GridTaskAdapter.result(GridTaskAdapter.java:109)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.result(GridTaskWorker.java:617)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:546)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.processJobExecuteResponse(GridTaskProcessor.java:886)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:851)
    at org.gridgain.grid.kernal.managers.communication.GridCommunicationManager.unwindMessageSet(GridCommunicationManager.java:767)
    at org.gridgain.grid.kernal.managers.communication.GridCommunicationManager.access$3300(GridCommunicationManager.java:45)
    at org.gridgain.grid.kernal.managers.communication.GridCommunicationManager$6.body(GridCommunicationManager.java:706)
    at org.gridgain.grid.util.runnable.GridRunnable$1.run(GridRunnable.java:142)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
    at org.gridgain.grid.util.runnable.GridRunnablePool$1.run(GridRunnablePool.java:80)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)</div></div></div>
 
Caused By:
----------
<div class="jive-quote"><div class="jive-quote"><div class="jive-quote">Type: org.gridgain.grid.GridException
Message: Task was not deployed or was redeployed since task execution (either received a stale message in which case you should increase GridConfiguration.getPeerClassLoadingTimeout() configuration parameter, or encountered some invalid condition, like internal or user code version mismatch) [taskName=com.rtrms.instrument.model.convergence.RunConvergenceTask, taskClsName=com.rtrms.instrument.model.convergence.RunConvergenceTask, codeVer=0, clsLdrId=db63985c-5b30-4e7c-a805-73d2741431ff, seqNum=1]
Documentation: http://wiki.gridgain.org
Stack trace: 
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1063)
    at org.gridgain.grid.kernal.managers.communication.GridCommunicationManager$4.body(GridCommunicationManager.java:589)
    at org.gridgain.grid.util.runnable.GridRunnable$1.run(GridRunnable.java:142)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
    at org.gridgain.grid.util.runnable.GridRunnablePool$1.run(GridRunnablePool.java:80)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)</div></div></div>
morpheus

Posts: 1,842
Registered: 8/14/07
Re: Peer class loading timeouts on EC2 GG 2.1.0
Posted: Dec 30, 2009 12:50 PM   in response to: sharakan in response to: sharakan
 
  Click to reply to this thread Reply
Looks like peer class loading requests are timing out. Try increasing GridConfiguration.getNetworkTimeout() configuration property. Also we strongly recommend using 64-bit High IO instances from Amazon.

--Best
enewett

Posts: 1
Registered: 1/26/10
Re: Peer class loading timeouts on EC2 GG 2.1.0
Posted: Jan 26, 2010 10:16 AM   in response to: sharakan in response to: sharakan
 
  Click to reply to this thread Reply
Amazon EC2 will not allow us to launch the 64-bit High IO instances against the gridgain community ami (ami-00678569) because the ami is 32-bit. Is there a 64-bit version of this ami available?
dozer

Posts: 23
Registered: 8/23/07
Re: Peer class loading timeouts on EC2 GG 2.1.0
Posted: Feb 1, 2010 6:28 AM   in response to: enewett in response to: enewett
 
  Click to reply to this thread Reply
I think that you can get GridGain 2.1.1 EC2 image, start it, change configuration (increase p2p timeout) and create your own EC2 image. After you can use your image.
Let me know if you have some questions about this.
Legend
Gold: 21 - 10000 pts
Silver: 6 - 20 pts
Bronze: 1 - 5 pts
Helpful Answer
Correct Answer

Point your RSS reader here for a feed of the latest messages in all forums