On Camel and OSS: October 2010

From recent posts on the camel users@ mailing list there seems to be some confusion related to the use of getIn() and getOut() on a Camel Exchange. The question is relevant to developers who want to implement a processor and need to understand what should happen with the result of their processing. There is a wiki page that attempts to explain the usage of the two methods, but sadly that mostly fuels the confusion than clarifies anything. The page will be corrected soon, you may see a newer, corrected version, but at the time of my writing current is version 4 (which was recently improved and a bit more accurate than version 2).

So the idea (deeply rooted in some older discussion about what the Exchange api should be) is that you should use getIn(), because, uhm, "it's often easier just to adjust the IN message directly" and that has something to do with MEPs and a poor relative of camel called JBI. Before we get into more details, the reality is much simpler, using getIn() or getOut() has not much to do with MEPs and you should do what's right, which is process the in message (most likely), generate an out message (if needed) or update the in (sometimes).

Camel is one of the integration space best frameworks (I'd think the best, but then you'll say I'm biased) and it is true that it was influenced by JBI. Some of its roots are in Apache Servicemix and three Camel PMC members have been at some point involved in drafting the jbi specs: James with jbi 1.0 and Guillaume and Bruce with jbi 2.0. One of the main goals of Camel is to make integration simpler and more intuitive than jbi (and complement jbi in some cases).

Now, just because the jbi 2.0 was inactive for a good while doesn't mean that the Camel Exchange API is broken as some may say. In particular, the jbi 1.0 spec says that "The message exchange model is based on the web services description language 2.0 [WSDL 2.0] or 1.1 [WSDL 1.1]" (jbi-1.0-fr-spec, pg 5) and WSDL is pretty much alive and kicking. Then there is the discussion about the MEPs, which is 100% correct and 100% irrelevant. The reason is that the MEPs refers to the route not what happens within a particular Processor. You could use the exact same Processor on routes that are in-only or in-out.

The concept though is much older than WSDL also. When implementing a function one would pass arguments and return a value. Think of a process() function that gets a Message as an argument and returns another Message.

Message process(Message in) throws Exception;

So the question then becomes should I return the same in Message or another one? Well, that's entirely up to you and what the process() function is supposed to do. The only thing that happens in Camel is that we wrap the two messages in an Exchange (plus the exception and some other metadata, because we needed an execution context) and that's what we pass to process(), so the function signature becomes:

 void process(Exchange exchange);

... which is pretty much the definition of the Processor interface. Or you can think of the exchange as being the stack frame prepared for the function execution in the first case (not much of a stretch).

One important thing to note, that may clarify a bit the confusion is that getOut() will not really return the out message, but rather will lazily create an empty DefaultMessage for you (on the first call, and the existing out message on subsequent calls). Therefore there is no out message until you create it by calling getOut(), the assumption being that you called getOut() with the intention of returning the new distinct Message resulted from your processing. As a consequence the assertion getOut() ==null will always fail, because the getOut() call will create a message. To address that there is a hasOut() method in Exchange you could use to tell you if you created the out message (in case you don't know that already).

As you probably know, what camel does is to pass an Exchange along the processor chain in a route and pass the out message of the previous processor as an in message for the next. If the previous processor did not produce an out (there are such processors, a logger for example, logs the in, but does not produce an out), then pass the previous in as an in message for the next processor. During its lifetime, an Exchange starts with an in, goes down the processor chain, outs are created and they replace the ins and the old ins are lost in history. And here's where the mep comes into the picture. If the route is in-out, at the end of processing chain, the last message in the exchange is passed back by the consumer to the caller, otherwise the last message is simply ignored (with an in-only mep, the caller is not waiting for an answer).

A problem faced by users is that if they call getOut() and produce a new message, the headers from the in message get lost (well, the whole message gets lost). One thing to know is that if one needs to access some value at different points during processing, the Exchange has a property map. Objects in that map stay with the Exchange, unlike headers that are tied to messages (and probably have no semantics outside the message). As we saw, messages come and go during processing. Exchange properties don't. Now if you want headers to be preserved, you probably have a very specific processor, that can only exist in particular places in a route and make heavy assumptions about the type of in message they receive. If that's the case, yes, you probably only need to slightly modify the in message and since you did not create an out the modified in will be passed further down the route. If you need to update a header you set on a previous processor (such as a timestamp) but don't really rely on the in message per se, you might consider using exchange properties instead of headers. Consider also that even if you modify the in message to preserve the headers, other processors down the route may produce an out (and hence replace the current in) anyway so headers could still get lost down the route.

That's pretty much it. Let's take a look at a few examples. A throttler processor will most likely not care about the in nor generate an out, it will only introduce a delay in processing (which may be adaptive and may depend on a header or property). A profiler processor will probably update timestamps in exchange properties, again ignoring the in, not producing an out and couldn't care less what the exchange pattern of the exchange is. A logger processor will log the in, won't produce an out, mep is again irrelevant. A crypto processor (partial message protection) will use credentials from the exchange properties to alter parts of the in message, sign/verify/encrypt/decrypt (may be either headers or parts of the body), will likely not produce an out. A proxy processor that allows you to integrate some legacy technology into a camel route, will use the input, format it according to the needs of the technology, use the proprietary api, get the result and put it as a new out on the exchange, possibly along with specific headers. A processor that fetches records from a database will most likely put the result set into a new out message as well. So again, should you use getIn() or getOut()? Things depend much on what your processor needs to do.

Don't do what's easy, do what's right!

On Camel and OSS

Sunday, October 31, 2010

Apache Camel 2.5.0 Released

Wednesday, October 6, 2010

Should you getIn or getOut?

Tuesday, October 5, 2010

Take the Apache Camel Survey!