KEMBAR78
Java one 2015 [con3339] | PDF
Real-World Batch Processing
with Java EE [CON3339]
Arshal Ameen (@AforArsh)
Hirofumi Iwasaki (@HirofumiIwasaki)
Financial Services Department,
Rakuten, Inc.
2
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
3
“Batch”
Batch processing is the execution of a series of
programs ("jobs") on a computer without manual
intervention.
Jobs are set up so they can be run to completion
without human interaction. All input parameters are
predefined through scripts, command-line arguments,
control files, or job control language. This is in contrast
to "online" or interactive programs which prompt the
user for such input. A program takes a set of data
files as input, processes the data, and produces a
set of output data files.
- From Wikipedia
4
Batch vs Real-time
Batch
Real-time
Short Running
(nanosecond
- second)
Long Running
(minutes
- hours)
JSF
EJB
etc.
JBatch (JSR 352)
EJB
POJO
etc.
Sometimes
“job net” or
“job stream”
reconfiguration
required
Fixed at
deploy
Immediately
Per sec,
minutes,
hours, days,
weeks,
months, etc.
5
Batch vs Real-time Details
Trigger UI support Availability Input data Transaction
time
Transaction
cycle
Batch Scheduler Optional Normal Small -
Large
Minutes,
hours,
days,
weeks…
Bulk
(chunk)
operation
Real-time On
demand
Sometimes
UI needed
High Small ns, ms, s Per item
6
Batch app categories
• Records or
values are
retrieved from
files
File
driven
• Rows or
values are
retrieved from
file
Database
driven
• Messages are
retrieved from
a message
queue
Message
driven
Combination
7
Batch procedure
Stream
Job A
Input A
Process A
Output A
Job B
Input B
Process B
Output B
Job C
Input C
Process C
Output C …
“Job Net” or “Job Stream”,
comes from JCL era. (JCL itself doesn’t provide it)
Card
/Step
8
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
9
Simple History of Batch Processing in Enterprise
1950 1960 1970 1980 1990 2000 2010
JCL
J2EE
MS-DOS
Bat
UNIX
Sh
Mainframe
COBOL Java
JSR 352
Java EE
Win NT
Bat
Bash
C
CP/M
Sub Power
Shell
FORTLAN
BASIC
VB C#
PL/I
Hadoop
10
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
11
Super Legacy Batch Script (1960’s – 1990’s)
JCL
//ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1,
// CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1)
//********************************************************
//* Unloading data procedure
//********************************************************
//UNLDP EXEC PGM=UNLDP,TIME=20
//STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR
// DD DSN=ZB.PPDBL.LOAD,DISP=SHR
// DD DSN=ZA.COBMT.LOAD,DISP=SHR
//CPT871I1 DD DSN=P201.IN1,DISP=SHR
//CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE),
// SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA,
// DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600)
//SYSOUT DD SYSOUT=*
JES
COBOL
Call
Input
Output
Proc
12
Legacy Batch Script (1980’s – 2000’s)
Windows Task Scheduler
command.com Bat FileBash Shell Script
Linux Cron
Call Call
13
Modern Batch Implementation
or
.NET Framework
(ignore now)
14
Java Batch Design patterns
1. POJO
2. Custom Framework
3. EJB / CDI
4. EJB with embedded container
5. JSR-352
15
1. POJO Batch with PreparedStatement object
✦ Create connection and SQL statements with placeholders.
✦ Set auto-commit to false using setAutoCommit().
✦ Create PrepareStatement object using either prepareStatement() methods.
✦ Add as many as SQL statements you like into batch using addBatch() method
on created statement object.
✦ Execute SQL statements using executeBatch() method on created statement
object with commit() in every chunk times for changes.
16
1. Batch with PreparedStatement object
Connection conn = DriverManager.getConnection(“jdbc:~~~~~~~”);
conn.setAutoCommit(false);
String query = "INSERT INTO User(id, first, last, age) "
+ "VALUES(?, ?, ?, ?)";
PreparedStatemen pstmt = conn.prepareStatement(query);
for(int i = 0; i < userList.size(); i++) {
User usr = userList.get(i);
pstmt.setInt(1, usr.getId());
pstmt.setString(2, usr.getFirst());
pstmt.setString(3, usr.getLast());
pstmt.setInt(4, usr.getAge());
pstmt.addBatch();
if(i % 20 == 0) {
stmt.executeBatch();
conn.commit();
}
}
conn.commit(); ....
ü Most effecient for
batch SQL statements.
ü All manual operations.
17
1. Benefits of Prepared Statements
Execution
Planning & Optimization of
data retrieval path
Compilation of SQL query
Parsing of SQL query
Execution
Create
PreparedStatement
ü Prevents SQL
Injection
ü Dynamic
queries
ü Faster
ü Object oriented
x FORWARD_O
NLY result set
x IN clause
limitation
18
2. Custom framework via servlets
Customizability, full-controlPros
Tied to container or framework
Sometimes poor transaction management
Poor job control and monitoring
No standard
Cons
19
3. Batch using EJB or CDI
Java EE App Server
@Stateless
/ @Dependent
EJB / CDI BatchEJB
@Remote
or REST
client
Remote
Call
Database
Input
Output
Job
Scheduler
Remote
trigger
Other
System
Process
MQ
@Stateless
/ @Dependent
EJB / CDI
Use EJB Timer
@Schedule to
auto-trigger
20
3. Why EJB / CDI?
EJB
/CDI
Client
1. Remote Invocation
EJB
/CDI
2. Automatic Transaction Management
Database
(BEGIN)
(COMMIT)
EJB
only
EJB EJB
EJBInstance
Pool
Activate
3. Instance Pooling for Faster Operation
RMI-IIOP (EJB only)
SOAP
REST
Web Socket
EJB
only
Client
4. Security Management
21
3. EJB / CDI Pros
ª Easiest to implement
ª Batch with PreparedStatement in EJB works well in JEE6 for database
batch operations
ª Container managed transaction (CMT) or @Transactional on CDI:
automatic transaction system.
ª EJB has integrated security management
ª EJB has instance pooling: faster business logic execution
22
3. EJB / CDI cons
ª EJB pools are not sized correctly for batch by default
ª Set hard limits for number of batches running at a time
ª CMT / CDI @Transactional is sometimes not efficient for bulk operations;
need to combine custom scoping with “REUIRES_NEW” in transaction type.
ª EJB passivation; they go passive at wrong intervals (on stateful session
bean)
ª JPA Entity Manager and Entities are not efficient for batch operation
ª Memory constraints on session beans: need to be tweaked for larger jobs
ª Abnormal end of batch might shutdown JVM
ª When terminated immediately, app server also gets killed.
23
4. Batch using EJB / CDI on Embedded container
Embedded EJB
Container
@Stateless / @Dependent
EJB / CDI Batch
Database
Input
Output
Job
Scheduler
Remote
trigger
Other
System
Process
MQ
Self
boot
24
4. How ?
pom.xml (case of GlassFish)
<dependency>
<groupId>org.glassfish.main.extras</groupId>
<artifactId>glassfish-embedded-all</artifactId>
<version>4.1</version>
<scope>test</scope>
</dependency>
EJB / CDI
@Stateless / @Dependent @Transactional
public class SampleClass {
public String hello(String message) {
return "Hello " + message;
}
}
25
4. How (Part 2)
JUnit Test Case
public class SampleClassTest {
private static EJBContainer ejbContainer;
private static Context ctx;
@BeforeClass
public static void setUpClass() throws Exception {
ejbContainer = EJBContainer.createEJBContainer();
ctx = ejbContainer.getContext();
}
@AfterClass
public static void tearDownClass() throws Exception {
ejbContainer.close();
}
@Test
public void hello() throws NamingException {
SampleClass sample = (SampleClass)
ctx.lookup("java:global/classes/SampleClass");
assertNotNull(sample); assertNotNull(sample.hello("World”););
assertTrue(hello.endsWith(expected));
}
}
26
4. Should I use embedded container ?
✦ Quick to start (~10s)
✦ Efficient for batch implementations
✦ Embedded container uses lesser disk space and main memory
✦ Allows maximum reusability of enterprise components
✘ Inbound RMI-IIOP calls are not supported (on EJB)
✘ Message-Driven Bean (MDB) are not supported.
✘ Cannot be clustered for high availability
Pros
Cons
27
5. JSR-352
Implement
artifacts
Orchestrate
execution
Execute
28
5. Programming model
ª Chunk and Batchlet models
ª Chunk: Reader Processor writer
ª Batchlets: DYOT step, Invoke and return code upon completion, stoppable
ª Contexts: For runtime info and interim data persistence
ª Callback hooks (listeners) for lifecycle events
ª Parallel processing on jobs and steps
ª Flow: one or more steps executed sequentially
ª Split: Collection of concurrently executed flows
ª Partitioning – each step runs on multiple instances with unique properties
29
5. Batch Chunks
30
5. Programming model
ª Job operator: job management
ª Job repository
ª JobInstance - basically run()
ª JobExecution - attempt to run()
ª StepExecution - attempt to run() a step in a job
JobOperator jo = BatchRuntime.getJobOperator();
long jobId = jo.start(”sample”,new Properties());
31
5. JSR-352
Chunk
32
5. Programming model
ª JSL: XML based batch job
33
5. JCL & JSL
JCL JSR 352 “JSL”
//ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1,
// CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1)
//********************************************************
//* Unloading data procedure
//********************************************************
//UNLDP EXEC PGM=UNLDP,TIME=20
//STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR
// DD DSN=ZB.PPDBL.LOAD,DISP=SHR
// DD DSN=ZA.COBMT.LOAD,DISP=SHR
//CPT871I1 DD DSN=P201.IN1,DISP=SHR
//CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE),
// SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA,
// DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600)
//SYSOUT DD SYSOUT=*
JES Java EE App Server
1970’s 2010’s
<?xml version="1.0" encoding="UTF-8"?>
<job id="my-chunk" xmlns="http://xmlns.jcp.org/xml/ns/javaee"
version="1.0">
<properties>
<property name="inputFile" value="input.txt"/>
<property name="outputFile" value="output.txt"/>
</properties>
<step id="step1">
<chunk item-count="20">
<reader ref="myChunkReader"/>
<processor ref="myChunkProcessor"/>
<writer ref="myChunkWriter"/>
</chunk>
</step>
</job>
COBOL JSR 352 Chunk or Batchlet
Input
Output
Proc
Call Call
34
5. Spring 3.0 Batch (JSR-352)
35
5. Spring batch
ª API for building batch components integrated with Spring framework
ª Implementations for Readers and Writers
ª A SDL (JSL) for configuring batch components
ª Tasklets (Spring batchlet): collections of custom batch steps/tasks
ª Flexibility to define complex steps
ª Job repository implementation
ª Batch processes lifecycle management made a bit more easier
36
5. Main differences
Spring JSR-352
DI Bean definitions Job definiton(optional)
Properties Any type String only
37
Appendix: Apache Hadoop
Apache Hadoop is a scalable storage and batch data processing system.
ª Map Reduce programming model
ª Hassle free parallel job processing
ª Reliable: All blocks are replicated 3 times
ª Databases: built in tools to dump or extract data
ª Fault tolerance through software, self-healing and auto-retry
ª Best for unstructured data (log files, media, documents, graphs)
38
Appendix: Hadoop’s not for
ª Not for small or real-time data; >1TB is min.
ª Procedure oriented: writing code is painful and error prone. YAGNI
ª Potential stability and security issues
ª Joins of multiple datasets are tricky and slow
ª Cluster management is hard
ª Still single master which requires care and may limit scaling
ª Does not allow for stateful multiple-step processing of records
39
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
40
Key points to consider
ª Business logic
ª Transaction management
ª Exception handling
ª File processing
ª Job control/monitor (retry/restart policies)
ª Memory consumed by job
ª Number of processes
41
Best practices
ª Always poll in batches
ª Processor: thread-safe, stateless
ª Throttling policy when using queues
ª Storing results
ª in memory is risky
42
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
43
Agenda
What’s Batch ?
History of batch frameworks
Types of batch frameworks
Best practices
Demo
Conclusion
44
Conclusion: Script vs Java
Shell Script Based
(Bash, PowerShell, etc.)
Java Based
(Java EE, POJO, etc.)
Pros § Super quick to write one
§ Easy testing
§ Power of Java APIs or Java EE APIs
§ Platform independent
§ Accuracy of error handling
§ Container transaction management (Java EE)
§ Operational management(Java EE)
Cons § Lesser scope of implementation
§ No transaction management
§ Poor error handling
§ Poor operation management
§ Sometimes takes more time to make
§ Sometimes difficult to test
45
Conclusion
POJO Custom
Framework
EJB / CDI EJB / CDI +
Embedded
Container
JSR 352
Pros § Quick to write
§ Java
§ easy testing
§ Depends on
each product
§ Super power of
Java EE
§ Standardized
§ Super power of
Java EE
§ Standardized
§ Easy testing
§ Can stop
forcefully
§ Super power of
Java EE
§ Standardized
§ Easy testing
§ Auto chunk,
parallel
operations
Cons § No standard
§ no transaction
management
§ less operation
management
§ No standard
§ Depends on
each product
§ Difficultto test
§ Cannotstop
forcefully
§ No auto chunk
or parallel
operations
§ No auto chunk
or parallel
operations
§ New !
§ Cannotstop
immediately in
case of chunks
Java EE 7
Java EE 6
46
Contact
Arshal (@AforArsh)
Hirofumi Iwasaki (@HirofumiIwasaki)

Java one 2015 [con3339]

  • 1.
    Real-World Batch Processing withJava EE [CON3339] Arshal Ameen (@AforArsh) Hirofumi Iwasaki (@HirofumiIwasaki) Financial Services Department, Rakuten, Inc.
  • 2.
    2 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 3.
    3 “Batch” Batch processing isthe execution of a series of programs ("jobs") on a computer without manual intervention. Jobs are set up so they can be run to completion without human interaction. All input parameters are predefined through scripts, command-line arguments, control files, or job control language. This is in contrast to "online" or interactive programs which prompt the user for such input. A program takes a set of data files as input, processes the data, and produces a set of output data files. - From Wikipedia
  • 4.
    4 Batch vs Real-time Batch Real-time ShortRunning (nanosecond - second) Long Running (minutes - hours) JSF EJB etc. JBatch (JSR 352) EJB POJO etc. Sometimes “job net” or “job stream” reconfiguration required Fixed at deploy Immediately Per sec, minutes, hours, days, weeks, months, etc.
  • 5.
    5 Batch vs Real-timeDetails Trigger UI support Availability Input data Transaction time Transaction cycle Batch Scheduler Optional Normal Small - Large Minutes, hours, days, weeks… Bulk (chunk) operation Real-time On demand Sometimes UI needed High Small ns, ms, s Per item
  • 6.
    6 Batch app categories •Records or values are retrieved from files File driven • Rows or values are retrieved from file Database driven • Messages are retrieved from a message queue Message driven Combination
  • 7.
    7 Batch procedure Stream Job A InputA Process A Output A Job B Input B Process B Output B Job C Input C Process C Output C … “Job Net” or “Job Stream”, comes from JCL era. (JCL itself doesn’t provide it) Card /Step
  • 8.
    8 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 9.
    9 Simple History ofBatch Processing in Enterprise 1950 1960 1970 1980 1990 2000 2010 JCL J2EE MS-DOS Bat UNIX Sh Mainframe COBOL Java JSR 352 Java EE Win NT Bat Bash C CP/M Sub Power Shell FORTLAN BASIC VB C# PL/I Hadoop
  • 10.
    10 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 11.
    11 Super Legacy BatchScript (1960’s – 1990’s) JCL //ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1, // CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1) //******************************************************** //* Unloading data procedure //******************************************************** //UNLDP EXEC PGM=UNLDP,TIME=20 //STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR // DD DSN=ZB.PPDBL.LOAD,DISP=SHR // DD DSN=ZA.COBMT.LOAD,DISP=SHR //CPT871I1 DD DSN=P201.IN1,DISP=SHR //CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE), // SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA, // DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600) //SYSOUT DD SYSOUT=* JES COBOL Call Input Output Proc
  • 12.
    12 Legacy Batch Script(1980’s – 2000’s) Windows Task Scheduler command.com Bat FileBash Shell Script Linux Cron Call Call
  • 13.
  • 14.
    14 Java Batch Designpatterns 1. POJO 2. Custom Framework 3. EJB / CDI 4. EJB with embedded container 5. JSR-352
  • 15.
    15 1. POJO Batchwith PreparedStatement object ✦ Create connection and SQL statements with placeholders. ✦ Set auto-commit to false using setAutoCommit(). ✦ Create PrepareStatement object using either prepareStatement() methods. ✦ Add as many as SQL statements you like into batch using addBatch() method on created statement object. ✦ Execute SQL statements using executeBatch() method on created statement object with commit() in every chunk times for changes.
  • 16.
    16 1. Batch withPreparedStatement object Connection conn = DriverManager.getConnection(“jdbc:~~~~~~~”); conn.setAutoCommit(false); String query = "INSERT INTO User(id, first, last, age) " + "VALUES(?, ?, ?, ?)"; PreparedStatemen pstmt = conn.prepareStatement(query); for(int i = 0; i < userList.size(); i++) { User usr = userList.get(i); pstmt.setInt(1, usr.getId()); pstmt.setString(2, usr.getFirst()); pstmt.setString(3, usr.getLast()); pstmt.setInt(4, usr.getAge()); pstmt.addBatch(); if(i % 20 == 0) { stmt.executeBatch(); conn.commit(); } } conn.commit(); .... ü Most effecient for batch SQL statements. ü All manual operations.
  • 17.
    17 1. Benefits ofPrepared Statements Execution Planning & Optimization of data retrieval path Compilation of SQL query Parsing of SQL query Execution Create PreparedStatement ü Prevents SQL Injection ü Dynamic queries ü Faster ü Object oriented x FORWARD_O NLY result set x IN clause limitation
  • 18.
    18 2. Custom frameworkvia servlets Customizability, full-controlPros Tied to container or framework Sometimes poor transaction management Poor job control and monitoring No standard Cons
  • 19.
    19 3. Batch usingEJB or CDI Java EE App Server @Stateless / @Dependent EJB / CDI BatchEJB @Remote or REST client Remote Call Database Input Output Job Scheduler Remote trigger Other System Process MQ @Stateless / @Dependent EJB / CDI Use EJB Timer @Schedule to auto-trigger
  • 20.
    20 3. Why EJB/ CDI? EJB /CDI Client 1. Remote Invocation EJB /CDI 2. Automatic Transaction Management Database (BEGIN) (COMMIT) EJB only EJB EJB EJBInstance Pool Activate 3. Instance Pooling for Faster Operation RMI-IIOP (EJB only) SOAP REST Web Socket EJB only Client 4. Security Management
  • 21.
    21 3. EJB /CDI Pros ª Easiest to implement ª Batch with PreparedStatement in EJB works well in JEE6 for database batch operations ª Container managed transaction (CMT) or @Transactional on CDI: automatic transaction system. ª EJB has integrated security management ª EJB has instance pooling: faster business logic execution
  • 22.
    22 3. EJB /CDI cons ª EJB pools are not sized correctly for batch by default ª Set hard limits for number of batches running at a time ª CMT / CDI @Transactional is sometimes not efficient for bulk operations; need to combine custom scoping with “REUIRES_NEW” in transaction type. ª EJB passivation; they go passive at wrong intervals (on stateful session bean) ª JPA Entity Manager and Entities are not efficient for batch operation ª Memory constraints on session beans: need to be tweaked for larger jobs ª Abnormal end of batch might shutdown JVM ª When terminated immediately, app server also gets killed.
  • 23.
    23 4. Batch usingEJB / CDI on Embedded container Embedded EJB Container @Stateless / @Dependent EJB / CDI Batch Database Input Output Job Scheduler Remote trigger Other System Process MQ Self boot
  • 24.
    24 4. How ? pom.xml(case of GlassFish) <dependency> <groupId>org.glassfish.main.extras</groupId> <artifactId>glassfish-embedded-all</artifactId> <version>4.1</version> <scope>test</scope> </dependency> EJB / CDI @Stateless / @Dependent @Transactional public class SampleClass { public String hello(String message) { return "Hello " + message; } }
  • 25.
    25 4. How (Part2) JUnit Test Case public class SampleClassTest { private static EJBContainer ejbContainer; private static Context ctx; @BeforeClass public static void setUpClass() throws Exception { ejbContainer = EJBContainer.createEJBContainer(); ctx = ejbContainer.getContext(); } @AfterClass public static void tearDownClass() throws Exception { ejbContainer.close(); } @Test public void hello() throws NamingException { SampleClass sample = (SampleClass) ctx.lookup("java:global/classes/SampleClass"); assertNotNull(sample); assertNotNull(sample.hello("World”);); assertTrue(hello.endsWith(expected)); } }
  • 26.
    26 4. Should Iuse embedded container ? ✦ Quick to start (~10s) ✦ Efficient for batch implementations ✦ Embedded container uses lesser disk space and main memory ✦ Allows maximum reusability of enterprise components ✘ Inbound RMI-IIOP calls are not supported (on EJB) ✘ Message-Driven Bean (MDB) are not supported. ✘ Cannot be clustered for high availability Pros Cons
  • 27.
  • 28.
    28 5. Programming model ªChunk and Batchlet models ª Chunk: Reader Processor writer ª Batchlets: DYOT step, Invoke and return code upon completion, stoppable ª Contexts: For runtime info and interim data persistence ª Callback hooks (listeners) for lifecycle events ª Parallel processing on jobs and steps ª Flow: one or more steps executed sequentially ª Split: Collection of concurrently executed flows ª Partitioning – each step runs on multiple instances with unique properties
  • 29.
  • 30.
    30 5. Programming model ªJob operator: job management ª Job repository ª JobInstance - basically run() ª JobExecution - attempt to run() ª StepExecution - attempt to run() a step in a job JobOperator jo = BatchRuntime.getJobOperator(); long jobId = jo.start(”sample”,new Properties());
  • 31.
  • 32.
    32 5. Programming model ªJSL: XML based batch job
  • 33.
    33 5. JCL &JSL JCL JSR 352 “JSL” //ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1, // CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1) //******************************************************** //* Unloading data procedure //******************************************************** //UNLDP EXEC PGM=UNLDP,TIME=20 //STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR // DD DSN=ZB.PPDBL.LOAD,DISP=SHR // DD DSN=ZA.COBMT.LOAD,DISP=SHR //CPT871I1 DD DSN=P201.IN1,DISP=SHR //CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE), // SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA, // DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600) //SYSOUT DD SYSOUT=* JES Java EE App Server 1970’s 2010’s <?xml version="1.0" encoding="UTF-8"?> <job id="my-chunk" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0"> <properties> <property name="inputFile" value="input.txt"/> <property name="outputFile" value="output.txt"/> </properties> <step id="step1"> <chunk item-count="20"> <reader ref="myChunkReader"/> <processor ref="myChunkProcessor"/> <writer ref="myChunkWriter"/> </chunk> </step> </job> COBOL JSR 352 Chunk or Batchlet Input Output Proc Call Call
  • 34.
    34 5. Spring 3.0Batch (JSR-352)
  • 35.
    35 5. Spring batch ªAPI for building batch components integrated with Spring framework ª Implementations for Readers and Writers ª A SDL (JSL) for configuring batch components ª Tasklets (Spring batchlet): collections of custom batch steps/tasks ª Flexibility to define complex steps ª Job repository implementation ª Batch processes lifecycle management made a bit more easier
  • 36.
    36 5. Main differences SpringJSR-352 DI Bean definitions Job definiton(optional) Properties Any type String only
  • 37.
    37 Appendix: Apache Hadoop ApacheHadoop is a scalable storage and batch data processing system. ª Map Reduce programming model ª Hassle free parallel job processing ª Reliable: All blocks are replicated 3 times ª Databases: built in tools to dump or extract data ª Fault tolerance through software, self-healing and auto-retry ª Best for unstructured data (log files, media, documents, graphs)
  • 38.
    38 Appendix: Hadoop’s notfor ª Not for small or real-time data; >1TB is min. ª Procedure oriented: writing code is painful and error prone. YAGNI ª Potential stability and security issues ª Joins of multiple datasets are tricky and slow ª Cluster management is hard ª Still single master which requires care and may limit scaling ª Does not allow for stateful multiple-step processing of records
  • 39.
    39 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 40.
    40 Key points toconsider ª Business logic ª Transaction management ª Exception handling ª File processing ª Job control/monitor (retry/restart policies) ª Memory consumed by job ª Number of processes
  • 41.
    41 Best practices ª Alwayspoll in batches ª Processor: thread-safe, stateless ª Throttling policy when using queues ª Storing results ª in memory is risky
  • 42.
    42 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 43.
    43 Agenda What’s Batch ? Historyof batch frameworks Types of batch frameworks Best practices Demo Conclusion
  • 44.
    44 Conclusion: Script vsJava Shell Script Based (Bash, PowerShell, etc.) Java Based (Java EE, POJO, etc.) Pros § Super quick to write one § Easy testing § Power of Java APIs or Java EE APIs § Platform independent § Accuracy of error handling § Container transaction management (Java EE) § Operational management(Java EE) Cons § Lesser scope of implementation § No transaction management § Poor error handling § Poor operation management § Sometimes takes more time to make § Sometimes difficult to test
  • 45.
    45 Conclusion POJO Custom Framework EJB /CDI EJB / CDI + Embedded Container JSR 352 Pros § Quick to write § Java § easy testing § Depends on each product § Super power of Java EE § Standardized § Super power of Java EE § Standardized § Easy testing § Can stop forcefully § Super power of Java EE § Standardized § Easy testing § Auto chunk, parallel operations Cons § No standard § no transaction management § less operation management § No standard § Depends on each product § Difficultto test § Cannotstop forcefully § No auto chunk or parallel operations § No auto chunk or parallel operations § New ! § Cannotstop immediately in case of chunks Java EE 7 Java EE 6
  • 46.