KEMBAR78
Parallel batch processing with spring batch slideshare | PPTX
Parallel processing with
      Spring Batch
     Lessons learned
Morten Andersen-Gott
•   Manager at Accenture Norway
•   30 years old
•   Been using Spring Batch since 1.0M2 (2007)
•   Member of JavaZone program committee
    – http://tinyurl.com/javaever
    – http://tinyurl.com/thestreaming
    – http://tinyurl.com/ladyjava




     mortenag
     www.github.com/magott
     www.andersen-gott.com
A BoF focusing on the problems

•   Functional background for the batch
•   Short introduction to Spring Batch
•   Even shorter on Hibernate
•   The problems
•   The problems
•   The problems
Norwegian Public Service
                            Pension Fund (SPK)
• Norway’s main provider of public occupational
  pensions
• Also provide housing loans and insurance schemes
• Membership of the Norwegian Public Service Pension
  Fund is obligatory for government employees
• Stats
   –   950,000 members across 1600 organisations
   –   Approx 138,000 receive a retirement pension
   –   Approx 58,000 receive a disability pension
   –   950,000 members have total accrued pension entitlements
       in the Norwegian Public Service Pension Fund of 339
       billion kroner.
Background
• Every year the parliament sets the basic amount
  of the national insurance
• This amount is a constant used in calculation of all
  benefits
• When the basic amount is changed, all benefits
  must be recalculated
• It’s more complex than a constant in an algorithm
   – Rules are result of political games the last 50 years
   – Complex rules
   – Lots of exemptions
Execution time requirements

• SPK’s recalculation batch must run
  – After the basic amount is set
  – After the Labour and Welfare Administration
    has done it’s calculation
  – Before the pensions are due next month
• Window of 1 week
  – Will ideally only run during weekends
  – Can not run while case workers are doing
    their job
Spring Batch

• Framework for developing batch
  applications
• Implements batch semantics
  – Steps, Chunks, Stop, Restart, Skips, Retries
  – Partitioning, Multithreading, Parallel steps
A step…
The components
• ItemReader
  – read() returns one row at the time
  – Step is completed once read()returns null
• ItemProcessor
  – process(item) item is return value from read()
  – Business logic goes here
  – Items can be filtered out by returning null
• ItemWriter
  – Write(list) list of items returned from
    process()
The code
<job id=”foo”>
   <step id="fileImport">
     <tasklet>
          <chunk commit-interval="10"
            reader=“reader"
            processor=“processor"
            writer=“writer“/>
     </tasklet>
   </step>
</job>
<bean id=“reader" class="...">
<bean id=“processor" class="...">
<bean id=“writer" class="...">
Chunk
• A chunk is a unit of work
  – Executes within a transaction
  – Size is defined by number of items read
• A step is divided into chunks by the
  framework
• When x is the chunk size
  – read() is called x times
  – process() is called x times
  – write is called once with a list where list.size()==x
    (minus filtered items)
Scaling
Multi-threaded step
Id   Name

1    Paul

2    John

3    Lisa




                                     Execution
               Reader




                                       Step
4    Simon
              Processor      T1..*
5    Rick       Writer

6    Julia

7    Olivia

8    Scott

9    Martin
Partitioned Step
Id   Name




                                       Execution
               Reader




                                         Step
1    Paul                       T1
              Processor
2    John       Writer

3    Lisa




                                       Execution
               Reader




                                         Step
4    Simon
              Processor         T2
5    Rick       Writer

6    Julia




                                       Execution
7    Olivia    Reader




                                         Step
              Processor         T3
8    Scott      Writer
9    Martin
Hibernate 101

• Proxy
• Session cache
• Flushing
  – Queries
  – Commit
The pension recalculation batch
- Stored procedure
Step 1
                  Populate staging table


                  - Read from staging
                  - Fetch additional data
Step 2   Rules    - Call out to rules engine
         engine   - Store result as pending
                  pensions



                  - Loop through pending
Step 3
                  pensions and activate


                  - Create manual tasks
Step 4
                  for failed items
Staging table

• A batch processing pattern
• A simple table with a identity column,
  functional key and processing status
• Restart is easy
  – Select soc_sec from staging where
    processed=false
• An easy way to get parallel processing
  capabilities
Our staging table(s)
Reader

           Updates
         family status


         Processor
Step 2                   ILog

                                Sybase


           Writer
Hibernate in the reader

• Using a separate statefull session in
  reader
  – Not bound to active transaction
  – Is never flushed
  – clear() is called before commit
• Entities are written to database using a
  (different) transaction bound session
  – Used by processor and writer
Problems?
Problems?
What happened?

org.hibernate.HibernateException: Illegal
  attempt to associate a collection with two
  open sessions
Why

• Family.errors and Family.persons are
  attached to reader’s session
• Attempting to attach them to transaction
  bound session
• Hibernate will have non of that!
Problems?
The solution?
Stateless sessions
Hibernate in the reader
                                (Second attempt)


• Let’s try stateless session
• Default behavior in Spring Batch’s
  Hibernate ItemReaders
  – useSateless=”true”
• LEFT JOIN FETCH for eager loading
  collections
  – Avoiding LazyLoadingExceptions
Hibernate in the reader
            (Second attempt)
Problems?
Problems?
What happened?

• org.hibernate.HibernateException: cannot
  simultaneously fetch multiple bags
Why

• Hibernate is unable to resolve the
  Cartesian product
  – Throws exception to avoid duplicates
You are using it wrong!
Curing the
                    Problems?
you are using it wrong
syndrome
Hibernate in the reader
                               (The third attempt)

• Examine the object graph
• Replace List with Set
  – Only one eagerly loaded collection may be of
    type list List
• This works…
  – ..for a while
  – We’ll revisit Hibernate later…
Exception resilience

• New demands sneak in
  – The batch should not abort
     • Not under any circumstance
• The batch should deal with
  – Missing or functionally corrupt data
  – Programming errors
  – ...
Pseudo code
try{
  //do data and business operations
}catch(Exception e){
   //Add error to staging & continue
   family.addError(createError(e));
}
Problems?
Problems?
Overzealous exception handling

an assertion failure occurred (this may
  indicate a bug in Hibernate, but is more
  likely due to unsafe use of the session)
Overzealous exception handeling
                               The solution

• Some exception MUST result in a rollback
  – Ex. StaleStateException
• Configure the framework to do retry/skip
  for these
• Only catch exceptions you know you can
  handle in a meaningful way
  – Nothing new here
  – Do not succumb to crazy requirements
Time to turn on parallelization

• We chose partitioned over mutli-threaded
  step
  – No need for a thread safe reader
    • Step scope
    • Each partition get a new instance of reader
  – Page lock contentions are less likely
    • Row 1 in partition 1 not adjacent to row 1 in
      partition 2
Deadlocks

• Legacy database using page locking
• Normalized database
• Relevant data for one person is spread
  across a number of tables
• Different threads will access same data
  pages
• Deadlocks will occur
Page locking
ID   NAME
1    Paul       T1
2    John       T2 waiting
3    Simon
4    Scott     T1 waiting

5    Lisa      T2
6    Jack
7    Nina
8    Linda      T3
DeadlockLoserDataAccessException
Retry to the rescue
<step id=”step2">
 <tasklet>
   <chunk reader="reader" processor="processor" writer="writer"
      commit-interval="10" retry-limit="10">
         <retryable-exception-classes>
            <include class=”…DeadlockLoserDataAccessException"/>
         </retryable-exception-classes>
   </chunk>
  </tasklet>
</step>
The mystery exception
Problems?
Two weeks later..
#42
Problems?
Retry
Id=42




StaleState
Exception
What do we do?
What we should have done weeks ago
We ditch Hibernate
…well, almost anyway
Removing Hibernate from
                                   reader

• ItemReader is re-configured to use JDBC
• Fetches primary key from family staging
  table
• ItemProcesser fetches staging object
  graph
  – Uses primary key to fetch graph with
    hibernate
• Primary keys are immutable and stateless
End result

•   Performance requirement was 48 hrs
•   Completed in 16 hrs
•   Used 12 threads
•   C-version used 1 thread and ran for 1 week
    – Stopped each morning, started each evening
• A batch that scales with the infrastructure
    – Number of threads is configurable in .properties
What we would have done
                                differently

• Switched from partioned to multi-threaded
  step
  – All work is shared among threads
  – All threads will run until batch completes
  – Avoid idle threads towards the end
  – With partitioning some partitions finished well
    before others
Recommendations
• Do not use Hibernate in the ItemReader
• Test parallelization early
• Monitor your SQLs
  – Frequently called
  – Long running
• Become friends with your DBA
• There is no reason to let Java be the bottle
  neck
  – Increase thread count until DB becomes the
    bottle neck
Thanks!



@mortenag
www.github.com/magott
www.andersen-gott.com
Pro tip: @BatchSize
• If one lazily loaded entity is fetched, they
  are all fetched – in one query

Parallel batch processing with spring batch slideshare

  • 1.
    Parallel processing with Spring Batch Lessons learned
  • 2.
    Morten Andersen-Gott • Manager at Accenture Norway • 30 years old • Been using Spring Batch since 1.0M2 (2007) • Member of JavaZone program committee – http://tinyurl.com/javaever – http://tinyurl.com/thestreaming – http://tinyurl.com/ladyjava mortenag www.github.com/magott www.andersen-gott.com
  • 3.
    A BoF focusingon the problems • Functional background for the batch • Short introduction to Spring Batch • Even shorter on Hibernate • The problems • The problems • The problems
  • 4.
    Norwegian Public Service Pension Fund (SPK) • Norway’s main provider of public occupational pensions • Also provide housing loans and insurance schemes • Membership of the Norwegian Public Service Pension Fund is obligatory for government employees • Stats – 950,000 members across 1600 organisations – Approx 138,000 receive a retirement pension – Approx 58,000 receive a disability pension – 950,000 members have total accrued pension entitlements in the Norwegian Public Service Pension Fund of 339 billion kroner.
  • 5.
    Background • Every yearthe parliament sets the basic amount of the national insurance • This amount is a constant used in calculation of all benefits • When the basic amount is changed, all benefits must be recalculated • It’s more complex than a constant in an algorithm – Rules are result of political games the last 50 years – Complex rules – Lots of exemptions
  • 6.
    Execution time requirements •SPK’s recalculation batch must run – After the basic amount is set – After the Labour and Welfare Administration has done it’s calculation – Before the pensions are due next month • Window of 1 week – Will ideally only run during weekends – Can not run while case workers are doing their job
  • 7.
    Spring Batch • Frameworkfor developing batch applications • Implements batch semantics – Steps, Chunks, Stop, Restart, Skips, Retries – Partitioning, Multithreading, Parallel steps
  • 8.
  • 9.
    The components • ItemReader – read() returns one row at the time – Step is completed once read()returns null • ItemProcessor – process(item) item is return value from read() – Business logic goes here – Items can be filtered out by returning null • ItemWriter – Write(list) list of items returned from process()
  • 10.
    The code <job id=”foo”> <step id="fileImport"> <tasklet> <chunk commit-interval="10" reader=“reader" processor=“processor" writer=“writer“/> </tasklet> </step> </job> <bean id=“reader" class="..."> <bean id=“processor" class="..."> <bean id=“writer" class="...">
  • 11.
    Chunk • A chunkis a unit of work – Executes within a transaction – Size is defined by number of items read • A step is divided into chunks by the framework • When x is the chunk size – read() is called x times – process() is called x times – write is called once with a list where list.size()==x (minus filtered items)
  • 12.
  • 13.
    Multi-threaded step Id Name 1 Paul 2 John 3 Lisa Execution Reader Step 4 Simon Processor T1..* 5 Rick Writer 6 Julia 7 Olivia 8 Scott 9 Martin
  • 14.
    Partitioned Step Id Name Execution Reader Step 1 Paul T1 Processor 2 John Writer 3 Lisa Execution Reader Step 4 Simon Processor T2 5 Rick Writer 6 Julia Execution 7 Olivia Reader Step Processor T3 8 Scott Writer 9 Martin
  • 15.
    Hibernate 101 • Proxy •Session cache • Flushing – Queries – Commit
  • 16.
  • 17.
    - Stored procedure Step1 Populate staging table - Read from staging - Fetch additional data Step 2 Rules - Call out to rules engine engine - Store result as pending pensions - Loop through pending Step 3 pensions and activate - Create manual tasks Step 4 for failed items
  • 18.
    Staging table • Abatch processing pattern • A simple table with a identity column, functional key and processing status • Restart is easy – Select soc_sec from staging where processed=false • An easy way to get parallel processing capabilities
  • 19.
  • 20.
    Reader Updates family status Processor Step 2 ILog Sybase Writer
  • 21.
    Hibernate in thereader • Using a separate statefull session in reader – Not bound to active transaction – Is never flushed – clear() is called before commit • Entities are written to database using a (different) transaction bound session – Used by processor and writer
  • 23.
  • 24.
    What happened? org.hibernate.HibernateException: Illegal attempt to associate a collection with two open sessions
  • 25.
    Why • Family.errors andFamily.persons are attached to reader’s session • Attempting to attach them to transaction bound session • Hibernate will have non of that!
  • 26.
  • 27.
  • 28.
    Hibernate in thereader (Second attempt) • Let’s try stateless session • Default behavior in Spring Batch’s Hibernate ItemReaders – useSateless=”true” • LEFT JOIN FETCH for eager loading collections – Avoiding LazyLoadingExceptions
  • 29.
    Hibernate in thereader (Second attempt)
  • 30.
  • 31.
    What happened? • org.hibernate.HibernateException:cannot simultaneously fetch multiple bags
  • 32.
    Why • Hibernate isunable to resolve the Cartesian product – Throws exception to avoid duplicates
  • 33.
    You are usingit wrong!
  • 34.
    Curing the Problems? you are using it wrong syndrome
  • 35.
    Hibernate in thereader (The third attempt) • Examine the object graph • Replace List with Set – Only one eagerly loaded collection may be of type list List • This works… – ..for a while – We’ll revisit Hibernate later…
  • 36.
    Exception resilience • Newdemands sneak in – The batch should not abort • Not under any circumstance • The batch should deal with – Missing or functionally corrupt data – Programming errors – ...
  • 37.
    Pseudo code try{ //do data and business operations }catch(Exception e){ //Add error to staging & continue family.addError(createError(e)); }
  • 38.
  • 39.
    Overzealous exception handling anassertion failure occurred (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session)
  • 40.
    Overzealous exception handeling The solution • Some exception MUST result in a rollback – Ex. StaleStateException • Configure the framework to do retry/skip for these • Only catch exceptions you know you can handle in a meaningful way – Nothing new here – Do not succumb to crazy requirements
  • 41.
    Time to turnon parallelization • We chose partitioned over mutli-threaded step – No need for a thread safe reader • Step scope • Each partition get a new instance of reader – Page lock contentions are less likely • Row 1 in partition 1 not adjacent to row 1 in partition 2
  • 42.
    Deadlocks • Legacy databaseusing page locking • Normalized database • Relevant data for one person is spread across a number of tables • Different threads will access same data pages • Deadlocks will occur
  • 43.
    Page locking ID NAME 1 Paul T1 2 John T2 waiting 3 Simon 4 Scott T1 waiting 5 Lisa T2 6 Jack 7 Nina 8 Linda T3
  • 44.
  • 45.
    Retry to therescue <step id=”step2"> <tasklet> <chunk reader="reader" processor="processor" writer="writer" commit-interval="10" retry-limit="10"> <retryable-exception-classes> <include class=”…DeadlockLoserDataAccessException"/> </retryable-exception-classes> </chunk> </tasklet> </step>
  • 46.
  • 47.
  • 57.
  • 58.
  • 66.
  • 67.
  • 68.
    What we shouldhave done weeks ago
  • 69.
  • 70.
    Removing Hibernate from reader • ItemReader is re-configured to use JDBC • Fetches primary key from family staging table • ItemProcesser fetches staging object graph – Uses primary key to fetch graph with hibernate • Primary keys are immutable and stateless
  • 73.
    End result • Performance requirement was 48 hrs • Completed in 16 hrs • Used 12 threads • C-version used 1 thread and ran for 1 week – Stopped each morning, started each evening • A batch that scales with the infrastructure – Number of threads is configurable in .properties
  • 74.
    What we wouldhave done differently • Switched from partioned to multi-threaded step – All work is shared among threads – All threads will run until batch completes – Avoid idle threads towards the end – With partitioning some partitions finished well before others
  • 75.
    Recommendations • Do notuse Hibernate in the ItemReader • Test parallelization early • Monitor your SQLs – Frequently called – Long running • Become friends with your DBA • There is no reason to let Java be the bottle neck – Increase thread count until DB becomes the bottle neck
  • 76.
  • 77.
    Pro tip: @BatchSize •If one lazily loaded entity is fetched, they are all fetched – in one query

Editor's Notes

  • #4 trials and tribulationsfairy tale
  • #5 Offers a lot of different benefits for employees or former employees of the public sector in NorwayPlays a major part in the personal economy for a lot of Norwegian citizens
  • #6 To reflect changes in consumer price indexBenefits
  • #7 Labour and Wellfare Administration provides the minimum pensions for all Norwegian citizensWe rely on the calculations from the Wellfare Administration in our batch
  • #9 2 loopsOuter: Loops while read returns dataInner: Loops over read+process according to commit-interval
  • #15 Data is partitioned before step executesThreads are processing isolated data setsExamples of partitioning strategiesOne file per partitionPartitioning a tablePartition 1: Row 1 – Row 1000Partition 2: Row 1001 – Row 2000Partition 3: Row 2001 – Row 3000
  • #48 Indicates that session contained stale dataThrown when a version number or timestamp check failsAlso occurs if we try delete or update a row that does not existStarted doing the numbers, how much can we do within 48 hours with a single thread?Weknewthattwothreadswould never accessthe same row. Same page, sure, but never same row.
  • #62 The ItemReader is forward-onlyWe can’t roll back a cursor next()Spring batch stores all read items in a retry cache until transaction is committedUses the cache for retries
  • #68 Hibernatethrows an exceptionwhenrealitydoes not match itsexpectations