Friday, 7 March 2008

Parallel Processing in Java EE 5

There are a number of options. It depends on your situation as to which is the right choice:

- Use JMS and message driven pojo's/beans

This is a strategy that is often used. You have a choice over whether you want to make the JMS persistent or not. If your JMS queue is not persistent and the server fails, you can lose messages and therefore some of your processing is not guaranteed to complete. So if you want guarantees that all the processing will be performed then you should make your queue persistent.

- Java 5 Concurrency API

The concurrency API is an addition in Java 5. It will allow you to execute work in parallel safely within a J2EE container, however, for typical Master/Worker pattern processing the API does expose you to more complexity than you need.

- Work Manager API - JSR 237

Weblogic has a work manager API (available in Weblogic 9). I like the API as it is clean and simple to use. If you are a J2EE purist however, and you want your J2EE to remain portable, then this is not for you.

- Java EE 6

Java EE 6 will include an evolved version of JSR 237. Doug Lea is working on this:

- Use Terracotta (or Gigaspaces) and distribute your units of work across JVM's. Grid products like these two seem to be gaining rapidly in popularity. The technology is being used in Investment banks where systems are processing high volumes of transactions per second and the technology seems to back the trend of scaling by having more processors doing the work (dual core, quad core etc.). Terracotta works by using bytecode instrumentation (basically taking your java class and inserting some extra logic to allow synchronization etc. across multiple JVM's). Terracotta supports the Master/Worker pattern to allow you to distribute your units of work across multiple JVM's.

- Use business process management software (JBPM / WS-BPEL) which provide parallelism. Webloigc Integration for instance allows you to define a 'flow' which is some processing that can be performed in parallel.

- Parallel processing frameworks like JPPF, Gridgain, Hadoop.

- Spring batch


Alex Miller said...

You could just not use JavaEE and use Terracotta instead... :)

pveentjer said...

A lot depends on the situation. There are a lot of parallel processing frameworks out there like JPPF, Gigaspaces, Gridgain, Hadoop etc. If you want to process batches and are using Spring, you could have a look at Spring Batch.

If you want to have more control you can design something yourself and have it clustered with a transparent distribution technology like Terracotta.

I have used java.util.concurrent in combination with a (not very frequently accessed) database in quite a few enteprisy application and it certainly can do the job.

So it depends.

Chris said...

For Message processing (JMS) type scenarios, have you come across Spring's DefaultMessageListenerContainer? This builds on util.concurrent, but also can make use of the WorkManager API where available so that your threads can be executed as managed threads on recent WebLogic and WebSphere appservers.