Background
When I started my career in the early-2000’s, the emerging hotness was enterprise Java. The mantra of the day was extreme loose-coupling at every level. To avoid “vendor lock-in”. To help support the fantasy that you might one day swap out your database or messaging middleware, without significant re-writes in your application code. I seldom if ever saw that actually happen, but supporting the idea lead to a notorious level of complexity and abstraction.
Over the past 5-10 years, I’ve witnessed the pendulum swinging hard in the opposite direction. Even the most conservative of companies have moved more and more things to a public cloud, and often embraced managed services over installing and maintaining those components themselves. Given the fierce price war between AWS, Azure, and GCP, vendor lock-in became less scary. In this era, we saw a lot more shops embrace tighter coupling between applications and their dependencies.
I’m starting to wonder though, whether this trend may have reached its peak. As the largest and most conservative companies finally enter these waters, you see more and more marketing around “hybrid cloud” services, blending public cloud offerings with data or systems that must remain on-prem. After working for a decade at pure-SaaS companies, my last shop was a mixed model. Hosting its own SaaS product, but also offering custom on-prem installations for key clients (relying on different infrastructure components).
Amazon competitors grow by the day as Amazon encroaches into more industries, and those competitors often insist on SaaS vendors hosting a version of their product on a non-Amazon cloud. With the passage of GDPR in Europe, customers have increasing demands for software to be hosted on different clouds in different jurisdictions. So on and so forth… it feels like “cloud agnostic” is shifting from an afterthought, to a legitimate architectural requirement for more shops.
Introducing Apache Camel
However, I don’t see us going back to OSGi components or AbstractFactoryFactory
classes, in order meet the challenges of plugging the same codebase into multiple different environments. The extreme loose coupling of classic enterprise architecture, and even the term “enterprise” itself, has just lost too much mindshare among developers under 40 (and even many of us on the wrong side of that line!).
One tool that I have seen help, and that I don’t think gets anywhere near the level of attention it deserves, is Apache Camel. In my mind, the problem is one of marketing. A newcomer could easily read the Camel website for a half-hour, and not come away with any basic understanding of what Camel even is. It’s described as an “open source integration framework”, and “an implementation of the Enterprise Integration Patterns from Gregor Hohpe and Bobby Woolf’s book“. But once again, how many developers under the age of 40 have even heard of the terms “EIP” or “EAI” today?
Let me take a very informal swag at this. Apache Camel is basically:
- a Java library,
- that allows you to shuttle data from a source to a destination,
- perhaps doing transformation or processing with it along the way,
- where the source and destination are defined by URI strings.
Here’s a simple example of a Camel “route”, a mini data pipeline that defines the flow from a source to a destination:
CamelContext context = new DefaultCamelContext();
RouteBuilder route = new RouteBuilder() {
@Override
void configure() throws Exception {
from("aws-sqs://my-queue?region=us-east-1&accessKey=abc&secretKey=xyz&concurrentConsumers=20&attributeNames=All&messageAttributeNames=All")
.to("kafka://canopy-dev-eventpipeline?brokers=mybroker.com:9092&securityProtocol=SASL_SSL&saslMechanism=PLAIN&saslJaasConfig=org.apache.kafka.common.security.plain.PlainLoginModule required username=\"myuser\" password=\"mypassword\";&sslEndpointAlgorithm=https");
}
}
camelContext.addRoutes(route);
context.start();
This code creates a message consumer for an Amazon SQS queue, and passes each message received onward to an Apache Kafka topic. All of the necessary information about this queue and topic is captured in Camel URI strings, which are passed to the from(...)
and to(...)
builder methods.
Lets look a slightly more advanced example:
CamelContext context = new DefaultCamelContext();
RouteBuilder route = new RouteBuilder() {
@Override
void configure() throws Exception {
from("aws-sqs://my-queue?region=us-east-1&accessKey=abc&secretKey=xyz&concurrentConsumers=20&attributeNames=All&messageAttributeNames=All")
.process(new Processor() {
@Override
void process(Exchange exchange) throws Exception {
Message msg = exchange.getIn();
// ENRICHMENT, TRANSFORMATION, ETC...
}
})
.choice()
.when(new Predicate() {
@Override
boolean matches(Exchange exchange) {
Message msg = exchange.getIn();
// EXAMINE THE MESSAGE, AND RETURN true OR false...
}
})
.log("Sending this message to the HTTP web hook")
.setHeader(Exchange.HTTP_METHOD, constant("POST"))
.to("http://myserver.com/mywebhook")
.otherwise()
.log("Sending this message to the Kafka topic")
.to("kafka://canopy-dev-eventpipeline?brokers=mybroker.com:9092&securityProtocol=SASL_SSL&saslMechanism=PLAIN&saslJaasConfig=org.apache.kafka.common.security.plain.PlainLoginModule required username=\"myuser\" password=\"mypassword\";&sslEndpointAlgorithm=https");
}
}
camelContext.addRoutes(route);
context.start();
Here we see some additional functionality that a Camel route can provide:
- Adding a
Processor
implementation at any step(s) along the way. Either to enrich or transform the message, or to perform some other side effect. - A fork in the pipeline. By using the
choice()
construct, you can direct a particular message to a different destination, based on the result of aPredicate
examining the message and returningtrue
orfalse
. - Performing simple transformations directly with the route builder DSL. Here, a message for which the
Predicate
resolvestrue
will be delivered to an HTTP endpoint rather than the Kafka topic. We use thesetHeader(...)
builder method, to tell Camel’s HTTP component to use a “POST” for this request. - Coding log output directly in the route builder DSL, with the
log(...)
method.
A Camel route doesn’t HAVE to be an end-to-end data pipeline. You can define routes to be triggered on-demand, by messages passed from your application code. Here’s a route that the application can invoke, using a Camel ProducerTemplate, to send an inventory record to an IBM AS/400 midrange server.
CamelContext context = new DefaultCamelContext();
RouteBuilder synchronousRoute = new RouteBuilder() {
@Override
void configure() throws Exception {
from("direct:inventory-blocking")
.to("jt400://MYUSER:MYPASSWORD@INVENTORYSYSTEM/QSYS.LIB/INVENTORY.LIB/INCOMING.DTAQ")
}
}
RouteBuilder asyncRoute = new RouteBuilder() {
@Override
void configure() throws Exception {
from("seda:inventory-async")
.to("jt400://MYUSER:MYPASSWORD@INVENTORYSYSTEM/QSYS.LIB/INVENTORY.LIB/INCOMING.DTAQ")
}
}
camelContext.addRoutes(synchronousRoute);
camelContext.addRoutes(asyncRoute);
context.start();
DefaultProducerTemplate producerTemplate = new DefaultProducerTemplate(camelContext);
producerTemplate.start();
...
String inventoryRecordCsv = ...
Map<String, Object> headers = new HashMap<>();
// This line will block, until the message is delivered.
producerTemplate.sendBodyAndHeader("direct:inventory-blocking", inventoryRecordCsv, headers);
// This line will return right away, with message delivery occurring in a background thread.
producerTemplate.sendBodyAndHeader("seda:inventory-async", inventoryRecordCsv, headers);
The direct
and seda
URI’s are not continuous listeners, like a message queue consumer or HTTP endpoint. Rather, they are event-based Camel components that are meant to be invoked via a ProducerTemplate
object. The direct
component is for traditional, blocking I/O, where the send operation does not return until the message is sent. The seda
component delegates the delivery to a background thread, returning control back to the calling thread immediately.
How This Helps
So how can this help you design and write more cloud agnostic applications? The answer is two-fold:
- It provides a single, thin abstraction over the actions of retrieving data from, or pushing data to, just about any external source. This is not JMS… providing an abstraction over a few message queue systems, but leaving you in a lurch if you need to migrate to an AMQP broker, Apache Kafka or Pulsar, etc. This is a consistent API for integrating nearly anything. Switch from publishing Kafka messages, to writing records to an S3-based data lake, simply by swapping out the URI string.
- It allows you to externalize the integration source and destination. In practice, you probably wouldn’t hardcode these Camel URI strings directly in your application source. Rather, you would be loading them from a config properties file. Doing so, you can potentially deploy the same code in different environments, using different external components, purely configuration-driven.
Just as a side benefit, Camel makes local development and testing easier as well. There are components for consuming from plain text files, writing output to the application log, as well as just straight-up mocks. So for example, a developer can do initial testing of a Kafka-based message pipeline without even having Kafka present.
SEDA routes also give you an easy way to manage concurrency, without having to manually setup ExecutorService
instances or thread pooling.
Nothing is perfect, though. There ARE some caveats and considerations you should keep in mind as you explore this, though:
- Although Camel is a very thin abstraction, it IS an abstraction. It’s a new layer that developer will have to know and think about. So if you introduce it, then be ready to spend the next 12 months getting blamed for every stacktrace that has the word “camel” anywhere in it!
- Since Camel IS such a thin abstraction, it can deceive you into feeling like you don’t still need to understand the components underneath. Some examples that I have encountered:
- The Amazon client for SQS, by default, does not consume message headers unless you change some settings to make it so. The Camel component for SQS does not try to alter this default behavior. So even though the Camel API can trick developers into assuming that they’ll always receive message headers, those actually won’t be populated unless you add
&attributeNames=All&messageAttributeNames=All
to your endpoint URI string. - When you use Apache Kafka, you need to set a “key” on each message, so that Kafka can deterministically direct it toward the right topic partition. Camel doesn’t try to figure this out and set it for you (it would probably be a bad thing if it tried!). So your code needs to set a key header, with the name
"kafka.KEY"
(or just use theKafkaConstants.KEY
constant). Even if you might be swapping out Kafka with something else in your configuration, your code may always need to set this header just in case Kafka is being used.
- The Amazon client for SQS, by default, does not consume message headers unless you change some settings to make it so. The Camel component for SQS does not try to alter this default behavior. So even though the Camel API can trick developers into assuming that they’ll always receive message headers, those actually won’t be populated unless you add
- The Camel documentation has come a long way in recent years, but occasionally I still wish it had more detail for a given component parameter or something. Also, a lot of the community are die-hard greybeards, who still answer StackOverflow questions with chunks of old-school Spring XML config. So you might have to be willing to dig a little, and work to translate information into something more applicable to your application’s framework.
- That being said, Camel offers a highly-opinionated Spring Boot starter, if you are working with that framework and want much of the boilerplate to handled magically.
- Even if you would like to avoid too much “magic”, the “camel-core” dependency alone provides all the tools you need to play nice with a modern application. You can bind Spring beans or any plain Java object to a route, to declare which method gets invoked when a message is consumed. On that method’s signature, you can use
@Headers
and@Body
annotations to denote which parameter should be set with the headers and body for each message, etc.
Overall, Apache Camel is a powerful tool for integrating with a world of different sources and targets, under a configurated-driven single consistent API. It doesn’t receive enough attention in the cloud-native world compared to the enterprise space… and frankly, people are missing out. If you find your application services integrating with an increasing number of external components, then it’s definitely worth your time to check out this library and get to know it better.