Batch processing in EDA (Event Driven Architectures)

Batch processing in event-driven
architectures
asantuy@gmail.com

• To stablish the role of batch process in service-oriented
architectures is a challenging problem.
• Classical SOAs fit better with online processing .
• Batch is often used to save some technical restriction
that is unrelated with bussines logic, and sometimes
forces to duplicate the implementation.

Batch as integration mechanism

Batch is often used to syncronize the state of two
systems, in order to get data consistency,
• Using interchange files (text, XML, binary, …)
• By direct updates.

Batch processes were
ubiquitously used in old
legacy systems as
integration mechanism, and
usually prevent them from
evolution.

Typical service contracts are not well suited to define
massive operations.
The REST style (resource-based) is inappropriate to deal
with bulk operations.
• How to output great amounts of data?
• How to handle list parameters?

As a consequence of shared-nothing principle, datastores
cannot be directly accessed by other services to execute
complex queries.

By similar reasons, joining data directly from different
datastores is not possible.

Iterative access to services may be slow because of
roundtrips.
Because of this, retrieving big amounts of data from services
is frequently infeasible.
Retrieve parent entities
Retrieve child entities(for each parent)
Retrieve some description
(for each child)
Check some condition
(for each child)

To overcome this difficulties, real-time event processing is
preferred to batch integration in event-driven architectures.
Events producer
Events consumer
Queue
Events consumer
Events consumer

The idea is to replace one big scheduled process by
lots of small incremental updates happening
online.

This approach offers a number of advantages:
• Events are objects closer to bussines semantics that
records in a file.
• The same event can be used by multiple consumers.
• New consumers can be added transparently.
• Event consumption can be delayed, acting then as
bulk updates.

• As OLTP services should be mainly autonomous, all the
logic to process an event should be contained in the service
itself (Dumb pipes smarts endpoints principle).
• In this kind of systems, events tend to be sparse in time.
• Integration between OLTP systems can be accomplished
by using AMQP-style middlewares.

When the data flow from an OLTP system to some
datawarehouse or reporting repository, the number of
events may be so large, that some kind of event-stream
platform may be needed.

In this case, secondary streams may be needed in order
to enrich or transform the primary data (as in an ETL
pipeline).
Events producers
EnricherPrimary
Queue
Secondary
Queue
Additional
information

Classical batch processing may still be necessary to transfer information to
external systems not connected by the event processing infrastructure.
Transf
er
Chann
el
Data extraction
Platform
boundary

In these cases, data lakes can be used to obtain the needed
records. This pattern can be used as a method to release the
transactional services from the burden of batch works.
Event stream

A number of mechanism can be used to avoid the
consumer to be flooded by events, when event
number exceeds the processing capacity.

• Big Data solutions like Kafka allow to store large number of
events until the consumer is ready to process them.
• Each consumer accesses the queue independently from
the last received event.
• Consumer can get offline and reconnect later from the
same point of the queue.

Back pressure mechanisms can avoid consumer flooding when
events are pushed to the consumer.
Queue
request(n)

Some systems may need to perform bulk operations over
its own data in order to accomplish bussines
requierements. These operations are usually executed on a
periodic basis.
• Analytical process (data mining, machine learning,…).
• Bulk updates.

Events can be used to seggregate scheduling from the service
itself; several services can be coordinated by this mechanism.
Scheduler.
Events consumer
Queue
Events consumer
Events consumer

http://www.confluent.io/blog/stream-data-platform-1/

Batch processing in EDA (Event Driven Architectures)

More Related Content

What's hot

Viewers also liked

Similar to Batch processing in EDA (Event Driven Architectures)

Recently uploaded

Batch processing in EDA (Event Driven Architectures)