04 November 2006

Java Retry Framework

One developer on my team this week needed to implement retry logic on a section of code so that it would retry after a deadlock. We spent some time discussing the options:
  • Add a "retry count" to the object (which implements Runnable). Increment the counter on failure and if the counter is less than N, then place the Runnable back in the queue. After N retries, log a failure and give up.
  • Set up a separate thread, and upon failure, queue of the object to run again after a period of time. Continue to retry every M seconds up to N times, then give up.
  • Similar to above; initially sleep for a long period of time (an hour), then try N times over M minutes before giving up.
  • Increase the wait interval, M, betwen retries; keep trying until the interval reaches a defined maximum.
  • ..etc..
Since the first option was the easiest and effectively accomplished what we wanted, it was selected. A discussion on how we have retry logic throughout our system led to a hunt to see if someone had written a Retry Framework that we could leverage.

As far as I could determine, there isn't one. Hasn't everyone at one point had to write some sort of retry logic? Why isn't there a framework of some sort for that?

4 comments:

straun said...

The problem with a retry framework is that there is no real de facto technique for 'doing something later' in Java yet.

If you had such a technique, then writing a retry framework would be a sensible step.

Candidates for the technique to do something later are:

Quartz Jobs
Weblogic 'send later' JMS messages
...and other persistent events

straun said...

... as an afterthought

JMX Timers are J2SE based, and if you could find a sensible way to persist them such that they survived a JVM restart then a retry framework is only a day's coding away.

idcmp said...

There isn't a defacto way of doing something later - you're right - but there's a few common sense ways, like Runnables, and a stab at a retry framework could promote a common way of writing logic that a developer wants to keep attempting until success...

Christopher said...

I'm currently working on a project with a few jboss clusters and tangosol caches on the server side, and a swing client for the users.

To handle connection issues and other remote explosions I decided to place all tasks that interact with a remote resource within a Callable object, and pass this object to a task service.

The components:

Manager
* facade to control access to remote resources

ManagerTaskService
* uses backing ExecutorService for async tasks

TaskFailedHandler
* controls reconnect/retry logic

ManagerConnection
* Maintains connection state safely in concurrent environments. Uses notify/wait for responsiveness, and immutable state objects.

Synchronous tasks are executed using ManagerTaskService.invokeAndWait, which handles waiting for the manager to be ready, deciding whether a failed task should be retried, and whether the manager needs to reconnect.

Asynchronous tasks are wrapped by a Callable that runs invokeAndWait, and then added to the ExecutorService.

Here's the code I used for invokeAndWait:


public <CVal> CVal invokeAndWait(final String name, final Callable<CVal> task) {
TaskFailedHandler failureHandler = failureHandlerFactory.newInstance();

for (;;) {
try {
manager.waitFor();
return task.call();
}
catch (Exception ex) {
failureHandler.exceptionThrown(ex);
if (failureHandler.shouldReconnect(ex)) {
manager.requestReconnect();
}
if (false == failureHandler.shouldRetry(ex)) {
log.error("invokeAndWait - not retrying due to TaskFailureHandler policy - " + name);
return null;
}
}
}
}