Data engineering
and big data
D ATA E N G I N E E R I N G F O R E V E R YO N E
Hadrien Lacroix
Content Developer at DataCamp
About the course
Conceptual course
No coding involved
Objectives
Being able to exchange with data engineers
Provide a solid foundation to learn more
DATA ENGINEERING FOR EVERYONE
Chapter 1
What is data engineering?
1. Data engineering and big data
2. Data engineers vs. data scientists
3. Data pipelines
DATA ENGINEERING FOR EVERYONE
Chapter 2
How data storage works
1. Structured vs unstructured data
2. SQL
3. Data warehouse and data lakes
DATA ENGINEERING FOR EVERYONE
Chapter 3
How to move and process data
1. Processing data
2. Scheduling data
3. Parallel computing
4. Cloud computing
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
Data workflow
DATA ENGINEERING FOR EVERYONE
Data workflow
DATA ENGINEERING FOR EVERYONE
Data workflow
DATA ENGINEERING FOR EVERYONE
Data workflow
DATA ENGINEERING FOR EVERYONE
Data engineers
DATA ENGINEERING FOR EVERYONE
Data engineers
Data engineers deliver:
the correct data
in the right form
to the right people
as e ciently as possible
DATA ENGINEERING FOR EVERYONE
A data engineer's responsibilities
Ingest data from di erent sources
Optimize databases for analysis
Remove corrupted data
Develop, construct, test and maintain data architectures
DATA ENGINEERING FOR EVERYONE
Data engineers and big data
Big data becomes the norm =>
DATA ENGINEERING FOR EVERYONE
Data engineers and big data
Big data becomes the norm => data engineers are more and more needed
Big data:
Have to think about how to deal with its size
So large traditional methods don't work anymore
DATA ENGINEERING FOR EVERYONE
Big data growth
Sensors and devices
Social media
Enterprise data
VoIP (voice communication, multimedia sessions)
1 Data Age 2025, Seagate, November 2018
DATA ENGINEERING FOR EVERYONE
The five Vs
Volume (how much?)
Variety (what kind?)
Velocity (how frequent?)
Veracity (how accurate?)
Value (how useful?)
DATA ENGINEERING FOR EVERYONE
Summary
What's waiting for you
How data ows through an organization
When a data engineer intervenes
What their responsibilities are
How data engineering relates to big data
DATA ENGINEERING FOR EVERYONE
Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E
Data engineers vs.
data scientists
D ATA E N G I N E E R I N G F O R E V E R YO N E
Hadrien Lacroix
Content Developer at DataCamp
Data workflow
DATA ENGINEERING FOR EVERYONE
Data engineers
DATA ENGINEERING FOR EVERYONE
Data scientists
DATA ENGINEERING FOR EVERYONE
Data engineers enable data scientists
Data engineer Data scientist
Ingest and store data Exploit data
Set up databases Access databases
Build data pipelines Use pipeline outputs
Strong so ware skills Strong analytical skills
DATA ENGINEERING FOR EVERYONE
Summary
At which stages data engineers and data scientists intervene
How data engineers enable data scientists
DATA ENGINEERING FOR EVERYONE
Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E
The data pipeline
D ATA E N G I N E E R I N G F O R E V E R YO N E
Hadrien Lacroix
Content Developer at DataCamp
If data is the new oil...
1 The Economist, 2017-05-06, by David Parkins
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
Back to data engineering
Ingest
Process
Store
Need pipelines
Automate ow from one station to the next
Provide up-to-date, accurate, relevant data
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
Data pipelines ensure an efficient flow of the data
Automate Reduce
Extracting Human intervention
Transforming Errors
Combining Time it takes data to ow
Validating
Loading
DATA ENGINEERING FOR EVERYONE
ETL and data pipelines
ETL Data pipelines
Popular framework for designing data Move data from one system to another
pipelines
May follow ETL
1) Extract data
Data may not be transformed
2) Transform extracted data
Data may be directly loaded in
3) Load transformed data to another applications
database
DATA ENGINEERING FOR EVERYONE
Summary
What a data pipeline is
What it does
Why it's important
How data pipelines are implemented at Spot ix
What ETL is and its nuances
DATA ENGINEERING FOR EVERYONE
Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E