Opening Slide could say the title about the project in large Font rather then
name.
something like: Project presentation - ETL and validations
MANWANT SINGH BALA
Project Presentation
Name could be here
Title could be updated to tell the summary of the slide or
key takeaway in single sentence –
Datastage inbound framework to reduce effort, ensure
validations and manage failures.
DATASTAGE INBOUND FRAMEWORK
• Common Framework to load multiformat files to Vertica tables
• Fully parameterized, needs minimal input like schema file for file layout and
table name
• Perform data validations on the incoming file (value add) to maintain data
quality
• Send out notifications on failures with exact step of the failure
• Minimize coding effort
Can add a slide to explain business problem and
solution. Helpful to mention then domain and the system
(CRM, account management, financial reporting etc…..) if
not client name.
BUSINESS PROBLEM
We can explain why the project was needed by the client, why they need to do
this –
It could be in 2/3 sub headers –
Problem: Legacy DBs leading to erroneous data with high lead times impacting
customer service and turn around time
Solution: DB modernization, setting ETL flows and low code system
Title could be updated to tell the summary of the slide or
key takeaway in single sentence –
High impact, client facing lead role to drive the project
WHY THIS PROJECT?
• Involved working on a fairly new ETL tool in the application, which I had
minimal experience with, posed unique challenges
• First project here in the US, where I was interacting directly with client and
leading onshore and offshore team
• Involved not only coding but coming up with a design for a comprehensive
reusable “framework” which not only loads data to the table but also performs
data validations on the file
• Had huge scope and direct impact to project’s success
Title could be updated to tell the summary of the slide or
key takeaway in single sentence –
Heterogeneous data, new tool, existing processes and DB
CHALLENGES FACED
• Lack of in-depth knowledge on the tool
• Multiple format of files needed to be processed using same framework
• Different file schemas need to pass through the ETL framework, needed to find
a way to make columns pass through various ETL stages and land finally in the
target table
• Data validations needed to be done on these files from within the framework
• Improving performance of the existing processes while keeping in mind the
limitations of the infrastructure
• Managing impact on existing Vertica DB and processes
• Leading a less experienced Team
This slide could be coupled with the slide#2 on
framework. If the two explain the same thing.
THE FRAMEWORK
File
File Type:
File Read Dataset
Delimited
File Split Truncate?
File Type:
Fixed
File
Validation
File Type: Vertica
Zipped Table
File Landing File Flow File Load
Title could be updated to tell the summary of the slide or
key takeaway in single sentence –
Framework modernization, low code, accurate
validations, reduced lead times
IMPACT
• Framework was built successfully and was used to convert 600+ files from
legacy framework to this framework
• It is still the default framework used to validate and load any new file in
current application with minimal coding
Besides technical successes could also help to mention
• File validation module has captured and identified
how itvarious
impacted types of what
business, issues
didin thegain –
client
Dollar value impact, Accuracy, Improved Customer
input files potentially avoiding data corruption.Service
It was… repurposed
etc to perform
file validations for outgoing files as well.
• Reduction in time to load for high volume files