KEMBAR78
Batch processing in EDA (Event Driven Architectures) | ODP
Batch processing in event-driven
architectures
asantuy@gmail.com
• To stablish the role of batch process in service-oriented
architectures is a challenging problem.
• Classical SOAs fit better with online processing .
• Batch is often used to save some technical restriction
that is unrelated with bussines logic, and sometimes
forces to duplicate the implementation.
Batch as integration mechanism
Batch is often used to syncronize the state of two
systems, in order to get data consistency,
• Using interchange files (text, XML, binary, …)
• By direct updates.
Batch processes were
ubiquitously used in old
legacy systems as
integration mechanism, and
usually prevent them from
evolution.
Typical service contracts are not well suited to define
massive operations.
The REST style (resource-based) is inappropriate to deal
with bulk operations.
• How to output great amounts of data?
• How to handle list parameters?
As a consequence of shared-nothing principle, datastores
cannot be directly accessed by other services to execute
complex queries.
By similar reasons, joining data directly from different
datastores is not possible.
Iterative access to services may be slow because of
roundtrips.
Because of this, retrieving big amounts of data from services
is frequently infeasible.
Retrieve parent entities
Retrieve child entities(for each parent)
Retrieve some description
(for each child)
Check some condition
(for each child)
To overcome this difficulties, real-time event processing is
preferred to batch integration in event-driven architectures.
Events producer
Events consumer
Queue
Events consumer
Events consumer
The idea is to replace one big scheduled process by
lots of small incremental updates happening
online.
This approach offers a number of advantages:
• Events are objects closer to bussines semantics that
records in a file.
• The same event can be used by multiple consumers.
• New consumers can be added transparently.
• Event consumption can be delayed, acting then as
bulk updates.
• As OLTP services should be mainly autonomous, all the
logic to process an event should be contained in the service
itself (Dumb pipes smarts endpoints principle).
• In this kind of systems, events tend to be sparse in time.
• Integration between OLTP systems can be accomplished
by using AMQP-style middlewares.
When the data flow from an OLTP system to some
datawarehouse or reporting repository, the number of
events may be so large, that some kind of event-stream
platform may be needed.
In this case, secondary streams may be needed in order
to enrich or transform the primary data (as in an ETL
pipeline).
Events producers
EnricherPrimary
Queue
Secondary
Queue
Additional
information
Classical batch processing may still be necessary to transfer information to
external systems not connected by the event processing infrastructure.
Transf
er
Chann
el
Data extraction
Platform
boundary
In these cases, data lakes can be used to obtain the needed
records. This pattern can be used as a method to release the
transactional services from the burden of batch works.
Event stream
Safety mesures
A number of mechanism can be used to avoid the
consumer to be flooded by events, when event
number exceeds the processing capacity.
• Big Data solutions like Kafka allow to store large number of
events until the consumer is ready to process them.
• Each consumer accesses the queue independently from
the last received event.
• Consumer can get offline and reconnect later from the
same point of the queue.
Back pressure mechanisms can avoid consumer flooding when
events are pushed to the consumer.
Queue
request(n)
Periodic batch processes
Some systems may need to perform bulk operations over
its own data in order to accomplish bussines
requierements. These operations are usually executed on a
periodic basis.
• Analytical process (data mining, machine learning,…).
• Bulk updates.
Events can be used to seggregate scheduling from the service
itself; several services can be coordinated by this mechanism.
Scheduler.
Events consumer
Queue
Events consumer
Events consumer
An example DEPLOYMENT
http://www.confluent.io/blog/stream-data-platform-1/
Thanks to all

Batch processing in EDA (Event Driven Architectures)

  • 1.
    Batch processing inevent-driven architectures asantuy@gmail.com
  • 2.
    • To stablishthe role of batch process in service-oriented architectures is a challenging problem. • Classical SOAs fit better with online processing . • Batch is often used to save some technical restriction that is unrelated with bussines logic, and sometimes forces to duplicate the implementation.
  • 3.
  • 4.
    Batch is oftenused to syncronize the state of two systems, in order to get data consistency, • Using interchange files (text, XML, binary, …) • By direct updates.
  • 5.
    Batch processes were ubiquitouslyused in old legacy systems as integration mechanism, and usually prevent them from evolution.
  • 6.
    Typical service contractsare not well suited to define massive operations. The REST style (resource-based) is inappropriate to deal with bulk operations. • How to output great amounts of data? • How to handle list parameters?
  • 7.
    As a consequenceof shared-nothing principle, datastores cannot be directly accessed by other services to execute complex queries.
  • 8.
    By similar reasons,joining data directly from different datastores is not possible.
  • 9.
    Iterative access toservices may be slow because of roundtrips. Because of this, retrieving big amounts of data from services is frequently infeasible. Retrieve parent entities Retrieve child entities(for each parent) Retrieve some description (for each child) Check some condition (for each child)
  • 10.
    To overcome thisdifficulties, real-time event processing is preferred to batch integration in event-driven architectures. Events producer Events consumer Queue Events consumer Events consumer
  • 11.
    The idea isto replace one big scheduled process by lots of small incremental updates happening online.
  • 12.
    This approach offersa number of advantages: • Events are objects closer to bussines semantics that records in a file. • The same event can be used by multiple consumers. • New consumers can be added transparently. • Event consumption can be delayed, acting then as bulk updates.
  • 13.
    • As OLTPservices should be mainly autonomous, all the logic to process an event should be contained in the service itself (Dumb pipes smarts endpoints principle). • In this kind of systems, events tend to be sparse in time. • Integration between OLTP systems can be accomplished by using AMQP-style middlewares.
  • 14.
    When the dataflow from an OLTP system to some datawarehouse or reporting repository, the number of events may be so large, that some kind of event-stream platform may be needed.
  • 15.
    In this case,secondary streams may be needed in order to enrich or transform the primary data (as in an ETL pipeline). Events producers EnricherPrimary Queue Secondary Queue Additional information
  • 16.
    Classical batch processingmay still be necessary to transfer information to external systems not connected by the event processing infrastructure. Transf er Chann el Data extraction Platform boundary
  • 17.
    In these cases,data lakes can be used to obtain the needed records. This pattern can be used as a method to release the transactional services from the burden of batch works. Event stream
  • 18.
  • 19.
    A number ofmechanism can be used to avoid the consumer to be flooded by events, when event number exceeds the processing capacity.
  • 20.
    • Big Datasolutions like Kafka allow to store large number of events until the consumer is ready to process them. • Each consumer accesses the queue independently from the last received event. • Consumer can get offline and reconnect later from the same point of the queue.
  • 21.
    Back pressure mechanismscan avoid consumer flooding when events are pushed to the consumer. Queue request(n)
  • 22.
  • 23.
    Some systems mayneed to perform bulk operations over its own data in order to accomplish bussines requierements. These operations are usually executed on a periodic basis. • Analytical process (data mining, machine learning,…). • Bulk updates.
  • 24.
    Events can beused to seggregate scheduling from the service itself; several services can be coordinated by this mechanism. Scheduler. Events consumer Queue Events consumer Events consumer
  • 25.
  • 26.
  • 27.