KEMBAR78
Chapter 1 What Is Data Engineering PDF | PDF | Big Data | Business
0% found this document useful (0 votes)
543 views79 pages

Chapter 1 What Is Data Engineering PDF

This document provides an overview of key concepts in data engineering, including: - Data pipelines which automate the flow of data from ingestion to storage and ensure efficient movement of up-to-date data. ETL is a common framework for designing data pipelines. - The roles of data engineers in ingesting, optimizing, and maintaining data to deliver the right data to analysts efficiently. They work with large, complex "big data" sources. - How data engineers enable the work of data scientists by setting up databases and data pipelines to provide clean, structured data for analysis.

Uploaded by

Chandra Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
543 views79 pages

Chapter 1 What Is Data Engineering PDF

This document provides an overview of key concepts in data engineering, including: - Data pipelines which automate the flow of data from ingestion to storage and ensure efficient movement of up-to-date data. ETL is a common framework for designing data pipelines. - The roles of data engineers in ingesting, optimizing, and maintaining data to deliver the right data to analysts efficiently. They work with large, complex "big data" sources. - How data engineers enable the work of data scientists by setting up databases and data pipelines to provide clean, structured data for analysis.

Uploaded by

Chandra Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Data engineering

and big data


D ATA E N G I N E E R I N G F O R E V E R YO N E

Hadrien Lacroix
Content Developer at DataCamp
About the course
Conceptual course

No coding involved

Objectives
Being able to exchange with data engineers

Provide a solid foundation to learn more

DATA ENGINEERING FOR EVERYONE


Chapter 1
What is data engineering?

1. Data engineering and big data

2. Data engineers vs. data scientists

3. Data pipelines

DATA ENGINEERING FOR EVERYONE


Chapter 2
How data storage works

1. Structured vs unstructured data

2. SQL

3. Data warehouse and data lakes

DATA ENGINEERING FOR EVERYONE


Chapter 3
How to move and process data

1. Processing data

2. Scheduling data

3. Parallel computing

4. Cloud computing

DATA ENGINEERING FOR EVERYONE


DATA ENGINEERING FOR EVERYONE
Data workflow

DATA ENGINEERING FOR EVERYONE


Data workflow

DATA ENGINEERING FOR EVERYONE


Data workflow

DATA ENGINEERING FOR EVERYONE


Data workflow

DATA ENGINEERING FOR EVERYONE


Data engineers

DATA ENGINEERING FOR EVERYONE


Data engineers
Data engineers deliver:

the correct data

in the right form

to the right people

as e ciently as possible

DATA ENGINEERING FOR EVERYONE


A data engineer's responsibilities
Ingest data from di erent sources

Optimize databases for analysis

Remove corrupted data

Develop, construct, test and maintain data architectures

DATA ENGINEERING FOR EVERYONE


Data engineers and big data
Big data becomes the norm =>

DATA ENGINEERING FOR EVERYONE


Data engineers and big data
Big data becomes the norm => data engineers are more and more needed

Big data:
Have to think about how to deal with its size

So large traditional methods don't work anymore

DATA ENGINEERING FOR EVERYONE


Big data growth
Sensors and devices

Social media

Enterprise data

VoIP (voice communication, multimedia sessions)

1 Data Age 2025, Seagate, November 2018

DATA ENGINEERING FOR EVERYONE


The five Vs
Volume (how much?)

Variety (what kind?)

Velocity (how frequent?)

Veracity (how accurate?)

Value (how useful?)

DATA ENGINEERING FOR EVERYONE


Summary
What's waiting for you

How data ows through an organization

When a data engineer intervenes

What their responsibilities are

How data engineering relates to big data

DATA ENGINEERING FOR EVERYONE


Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E
Data engineers vs.
data scientists
D ATA E N G I N E E R I N G F O R E V E R YO N E

Hadrien Lacroix
Content Developer at DataCamp
Data workflow

DATA ENGINEERING FOR EVERYONE


Data engineers

DATA ENGINEERING FOR EVERYONE


Data scientists

DATA ENGINEERING FOR EVERYONE


Data engineers enable data scientists
Data engineer Data scientist
Ingest and store data Exploit data

Set up databases Access databases

Build data pipelines Use pipeline outputs

Strong so ware skills Strong analytical skills

DATA ENGINEERING FOR EVERYONE


Summary
At which stages data engineers and data scientists intervene

How data engineers enable data scientists

DATA ENGINEERING FOR EVERYONE


Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E
The data pipeline
D ATA E N G I N E E R I N G F O R E V E R YO N E

Hadrien Lacroix
Content Developer at DataCamp
If data is the new oil...

1 The Economist, 2017-05-06, by David Parkins

DATA ENGINEERING FOR EVERYONE


DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
Back to data engineering
Ingest

Process

Store

Need pipelines

Automate ow from one station to the next

Provide up-to-date, accurate, relevant data

DATA ENGINEERING FOR EVERYONE


DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
Data pipelines ensure an efficient flow of the data
Automate Reduce

Extracting Human intervention

Transforming Errors

Combining Time it takes data to ow

Validating

Loading

DATA ENGINEERING FOR EVERYONE


ETL and data pipelines
ETL Data pipelines
Popular framework for designing data Move data from one system to another
pipelines
May follow ETL
1) Extract data
Data may not be transformed
2) Transform extracted data
Data may be directly loaded in
3) Load transformed data to another applications
database

DATA ENGINEERING FOR EVERYONE


Summary
What a data pipeline is

What it does

Why it's important

How data pipelines are implemented at Spot ix

What ETL is and its nuances

DATA ENGINEERING FOR EVERYONE


Let's practice!
D ATA E N G I N E E R I N G F O R E V E R YO N E

You might also like