0% found this document useful (0 votes)

282 views5 pages

Data Engineering Nanodegree Program Syllabus PDF

The document outlines a Data Engineering Nanodegree program that teaches students to build production-ready data warehouses at scale. The 110-hour program is self-paced over 5 months and consists of 4 courses that teach data modeling, cloud data warehouses, data lakes with Spark, and automating data pipelines with Airflow. It culminates in a capstone project for students to combine the skills learned in the program.

Uploaded by

Ovidiu Eremia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

282 views5 pages

Data Engineering Nanodegree Program Syllabus PDF

Uploaded by

Ovidiu Eremia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Engineering Nanodegree Program

Syllabus
Build Production Ready Data Warehouses at Scale

Before You Start

Prerequisites: Students should have intermediate SQL and Python programming skills.

Educational Objectives: Students will learn to
● Create user-friendly relational and NoSQL data models
● Create scalable and efficient data warehouses
● Identify the appropriate use cases for different big data technologies
● Work efficiently with massive datasets
● Build and interact with a cloud-based data lake
● Automate and monitor data pipelines
● Develop proficiency in Spark, Airflow, and AWS tools

Length of Program: 110 Hours*
Program Structure: Self-paced, 5 months of access
Textbooks Required: None
Student Services:
● Content: This includes video lectures, interactive quizzes, and other learning
material.
● Services Personalized project feedback, technical forums, mentor-led online
community.

*The length is an estimation of total hours the average student may take to complete all
required coursework, including lesson and project time. Actual hours may vary.

Course 1: Data Modeling

In this course, you’ll learn to create relational and NoSQL data models to fit the diverse
needs of data consumers. You’ll understand the differences between different data
models, and how to choose the appropriate data model for a given situation. You’ll also
build fluency in PostgreSQL and Apache Cassandra.

Lesson Title Learning Outcomes

Introduction to Data ➔ Understand the purpose of data modeling

Modeling ➔ Identify the strengths and weaknesses of different types
of databases and data storage techniques
➔ Create a table in Postgres and Apache Cassandra

Relational Data Models ➔ Understand when to use a relational database

➔ Understand the difference between OLAP and OLTP
databases
➔ Create normalized data tables
➔ Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models ➔ Understand when to use NoSQL databases and how
they differ from relational databases
➔ Select the appropriate primary key and clustering
columns for a given use case
➔ Create a NoSQL database in Apache Cassandra

Project: Data Modeling with Postgres and Apache Cassandra

In this project, you’ll model user activity data for a music streaming app called Sparkify.
You’ll create a database and ETL pipeline, in both Postgres and Apache Cassandra,
designed to optimize queries for understanding what songs users are listening to. For
PostgreSQL you will also define Fact and Dimension tables and insert data into your new
tables. For Apache Cassandra, you will model your data so you can run specific queries
provided by the analytics team at Sparkify.

Course 2: Cloud Data Warehouses

In this course, you’ll learn to create cloud-based data warehouses. You’ll sharpen your data
warehousing skills, deepen your understanding of data infrastructure, and be introduced
to data engineering on the cloud using Amazon Web Services (AWS).

Lesson Title Learning Outcomes

Introduction to the ➔ Understand Data Warehousing architecture

Data Warehouses ➔ Run an ETL process to denormalize a database (3NF to Star)
➔ Create an OLAP cube from facts and dimensions
➔ Compare columnar vs. row oriented approaches

Introduction to the ➔ Understand cloud computing

Cloud with AWS ➔ Create an AWS account and understand their services
➔ Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data ➔ Identify components of the Redshift architecture

Warehouses on AWS ➔ Run ETL process to extract data from S3 into Redshift
➔ Set up AWS infrastructure using Infrastructure as Code (IaC)
➔ Design an optimized table by selecting the appropriate
distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

In this project, you are tasked with building an ELT pipeline that extracts their data from S3,
stages them in Redshift, and transforms data into a set of dimensional tables for their
analytics team to continue finding insights in what songs their users are listening to.

Course 3: Data Lakes with Spark

In this course, you will learn more about the big data ecosystem and how to use Spark to
work with massive datasets. You’ll also learn about how to store big data in a data lake and
query it with Spark.
Lesson Title Learning Outcomes

The Power of Spark ➔ Understand the big data ecosystem

➔ Understand when to use Spark and when not to use it

Data Wrangling with ➔ Manipulate data with SparkSQL and Spark Dataframes
Spark ➔ Use Spark for ETL purposes

Debugging and ➔ Troubleshoot common errors and optimize their code using
Optimization the Spark WebUI

Introduction to Data ➔ Understand the purpose and evolution of data lakes

Lakes ➔ Implement data lakes on Amazon S3, EMR, Athena, and
Amazon Glue
➔ Use Spark to run ELT processes and analytics on data of
diverse sources, structures, and vintages
➔ Understand the components and issues of data lakes

Project 3: Big Data with Spark

In this project, you'll build an ETL pipeline for a data lake. The data resides in S3, in a
directory of JSON logs on user activity on the app, as well as a directory with JSON metadata
on the songs in the app. You will load data from S3, process the data into analytics tables
using Spark, and load them back into S3. You'll deploy this Spark process on a cluster using
AWS.

Project 4: Automate Data Pipelines

In this course, you’ll learn to schedule, automate, and monitor data pipelines using Apache
Airflow. You’ll learn to run data quality checks, track data lineage, and work with data
pipelines in production.
Lesson Title Learning Outcomes

Data Pipelines ➔ Create data pipelines with Apache Airflow

➔ Set up task dependencies
➔ Create data connections using hooks

Data Quality ➔ Track data lineage

➔ Set up data pipeline schedules
➔ Partition data to optimize pipelines
➔ Write tests to ensure data quality
➔ Backfill data

Production Data ➔ Build reusable and maintainable pipelines

Pipelines ➔ Build your own Apache Airflow plugins
➔ Implement subDAGs
➔ Set up task boundaries
➔ Monitor data pipelines

Project: Data Pipelines with Airflow

In this project, you’ll continue your work on the music streaming company’s data
infrastructure by creating and automating a set of data pipelines. You’ll configure and
schedule data pipelines with Airflow and monitor and debug production pipelines.

Data Engineering Nanodegree Capstone Project

The purpose of the data engineering capstone project is to give you a chance to combine
what you've learned throughout the program. This project will be an important part of your
portfolio that will help you achieve your data engineering-related career goals.

In this project, you'll define the scope of the project and the data you'll be working with.
We'll provide guidelines, suggestions, tips, and resources to help you be successful, but
your project will be unique to you. You'll gather data from several different data sources;
transform, combine, and summarize it; and create a clean database for others to analyze.

Data Engineering Nanodegree Program Syllabus
33% (3)
Data Engineering Nanodegree Program Syllabus
15 pages
Large-Scale Apache Flink Insights
No ratings yet
Large-Scale Apache Flink Insights
76 pages
Data Engineering Basics Guide
100% (1)
Data Engineering Basics Guide
81 pages
Golang Unit and Integration Testing Guide
No ratings yet
Golang Unit and Integration Testing Guide
59 pages
MongoDB for Developers & DBAs
No ratings yet
MongoDB for Developers & DBAs
7 pages
SQL Server Modernization Guide
No ratings yet
SQL Server Modernization Guide
74 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
Apache Spark & Scala Course Content
No ratings yet
Apache Spark & Scala Course Content
5 pages
AWS FargateECS Masterclass Course
No ratings yet
AWS FargateECS Masterclass Course
74 pages
Nprobe User'S Guide: Open Source Software and Hardware Netflow V5/V9 Probe
No ratings yet
Nprobe User'S Guide: Open Source Software and Hardware Netflow V5/V9 Probe
47 pages
Distributed Computing With Python - Sample Chapter
No ratings yet
Distributed Computing With Python - Sample Chapter
18 pages
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
No ratings yet
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
11 pages
J2EE Architecture
No ratings yet
J2EE Architecture
49 pages
AWS IaaS Guide for Developers
No ratings yet
AWS IaaS Guide for Developers
20 pages
CS101 Java Notes
No ratings yet
CS101 Java Notes
12 pages
MongoDB - Course Curriculum
No ratings yet
MongoDB - Course Curriculum
5 pages
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
Mastering Advanced Scala Sample
No ratings yet
Mastering Advanced Scala Sample
21 pages
Case Study Based On: Cloud Deployment and Service Delivery Models
No ratings yet
Case Study Based On: Cloud Deployment and Service Delivery Models
10 pages
Greedy Algorithms Explained
No ratings yet
Greedy Algorithms Explained
34 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
AWS Training & Certification Overview - Flyer PDF
No ratings yet
AWS Training & Certification Overview - Flyer PDF
2 pages
Lambda Expressions With Collections Udemy
No ratings yet
Lambda Expressions With Collections Udemy
9 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
Master Airflow With This Amazing Document!
No ratings yet
Master Airflow With This Amazing Document!
63 pages
JVM (Java Virtual Machine)
No ratings yet
JVM (Java Virtual Machine)
34 pages
AWS-Storage Services V2
No ratings yet
AWS-Storage Services V2
25 pages
THE Computer Science Ph.D. Program AT Carnegie Mellon
No ratings yet
THE Computer Science Ph.D. Program AT Carnegie Mellon
35 pages
Defining Data Science - The What, Where and How of Data Science - 365 Data Science PDF
No ratings yet
Defining Data Science - The What, Where and How of Data Science - 365 Data Science PDF
24 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Data Skills & Roles Cheat Sheet
No ratings yet
Data Skills & Roles Cheat Sheet
11 pages
AWS Course Content
No ratings yet
AWS Course Content
9 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
30 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Apache Spark: Dhineshkumar S K
No ratings yet
Apache Spark: Dhineshkumar S K
31 pages
Informatica Powermart / Powercenter 8.6
No ratings yet
Informatica Powermart / Powercenter 8.6
239 pages
Light Mode - 341 - AWS Certified Solutions Architect-PDF - 1574730476 PDF
No ratings yet
Light Mode - 341 - AWS Certified Solutions Architect-PDF - 1574730476 PDF
266 pages
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
AWS Developer Associate Course
No ratings yet
AWS Developer Associate Course
2 pages
Servers
No ratings yet
Servers
8 pages
Management and Governance Services Slides
No ratings yet
Management and Governance Services Slides
36 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
AWS White Paper
No ratings yet
AWS White Paper
88 pages
Flume Case Study
No ratings yet
Flume Case Study
2 pages
Rule Engine
No ratings yet
Rule Engine
2 pages
Amazon Aurora: Relational Database Reimagined For The Cloud
No ratings yet
Amazon Aurora: Relational Database Reimagined For The Cloud
31 pages
SQL Tuning Basic Part II
0% (1)
SQL Tuning Basic Part II
16 pages
Certification
No ratings yet
Certification
16 pages
Azure Az 104 Course Curriculum Nareshit
No ratings yet
Azure Az 104 Course Curriculum Nareshit
4 pages
SQL Code Smells
100% (1)
SQL Code Smells
71 pages
AWS Command Line Interface: User Guide
No ratings yet
AWS Command Line Interface: User Guide
110 pages
Data Engineering Nanodegree Program Syllabus
No ratings yet
Data Engineering Nanodegree Program Syllabus
16 pages
Data Engineering Roadmap Guide
No ratings yet
Data Engineering Roadmap Guide
3 pages
Data Architect Nanodegree Program Syllabus
No ratings yet
Data Architect Nanodegree Program Syllabus
15 pages
Data Engineering Course Outline
No ratings yet
Data Engineering Course Outline
3 pages
Data Engineering Roadmap
No ratings yet
Data Engineering Roadmap
2 pages
Data Architect Nanodegree Program Syllabus
No ratings yet
Data Architect Nanodegree Program Syllabus
12 pages
Syllabus - Fundamentals of Data Engineering
No ratings yet
Syllabus - Fundamentals of Data Engineering
4 pages
ChatLog AWS Certification Exam Readiness Workshop - AWS Certified Solutions Architect - Associate 2019-01-29 11 - 45
No ratings yet
ChatLog AWS Certification Exam Readiness Workshop - AWS Certified Solutions Architect - Associate 2019-01-29 11 - 45
1 page
Chat Log C:/Users/Ovidiu - Eremia/Documents/Chatlog STP - Foundations - Technical - September 11 - 2018 2018 - 09 - 12 01 - 51
No ratings yet
Chat Log C:/Users/Ovidiu - Eremia/Documents/Chatlog STP - Foundations - Technical - September 11 - 2018 2018 - 09 - 12 01 - 51
2 pages
ChatLog AWS Certification Exam Readiness Workshop - AWS Certified Solutions Architect - Professional 2018-12-06 16 - 24
No ratings yet
ChatLog AWS Certification Exam Readiness Workshop - AWS Certified Solutions Architect - Professional 2018-12-06 16 - 24
7 pages
Fun & Fearless Leadership - Andrei Postolache PDF
100% (1)
Fun & Fearless Leadership - Andrei Postolache PDF
94 pages
Senior Python Developer
No ratings yet
Senior Python Developer
1 page
Safe Travel Checklist - KB4 - v12
No ratings yet
Safe Travel Checklist - KB4 - v12
1 page
Exam Readiness - AWS Certified Solutions Architect - Professional
No ratings yet
Exam Readiness - AWS Certified Solutions Architect - Professional
1 page
Social Engineering Red Flags PDF
No ratings yet
Social Engineering Red Flags PDF
1 page
Firme Autorizate Montaj Aer Condiționat / Service - Furnizor Inventor®
No ratings yet
Firme Autorizate Montaj Aer Condiționat / Service - Furnizor Inventor®
1 page
STPTF Role Play Student Guide
No ratings yet
STPTF Role Play Student Guide
5 pages
Seminar Face Recognition Technology
No ratings yet
Seminar Face Recognition Technology
21 pages
Exercise Solutions For Simulation With Arena PDF
0% (1)
Exercise Solutions For Simulation With Arena PDF
2 pages
Ls Dyna Ls Prepost Tutorial
0% (1)
Ls Dyna Ls Prepost Tutorial
33 pages
AWW Dust Collector Article Jan 2006
No ratings yet
AWW Dust Collector Article Jan 2006
7 pages
Vitocal Heat Pumps Brochure
No ratings yet
Vitocal Heat Pumps Brochure
52 pages
General Engineering PDF
No ratings yet
General Engineering PDF
12 pages
The Effectiveness of Isometric Contractions Compared With Isotonic Contractions in Reducing Pain For In-Season Athletes With Patellar Tendinopathy
No ratings yet
The Effectiveness of Isometric Contractions Compared With Isotonic Contractions in Reducing Pain For In-Season Athletes With Patellar Tendinopathy
4 pages
Cost Out Engineer: VAVE Mechanical Role
No ratings yet
Cost Out Engineer: VAVE Mechanical Role
2 pages
DLP in Math Ttleg
No ratings yet
DLP in Math Ttleg
3 pages
Skid Steer Loader L225 Parts Catalog
83% (6)
Skid Steer Loader L225 Parts Catalog
853 pages
Alloy Steel Tubes Material Data
No ratings yet
Alloy Steel Tubes Material Data
3 pages
다음 글의 내용과 일치하지 않는 것은? (수능특강 Light 1강 4번) 다음 글의 내용과 일치하는 것은? (수특 라이트 1 강 gateway)
No ratings yet
다음 글의 내용과 일치하지 않는 것은? (수능특강 Light 1강 4번) 다음 글의 내용과 일치하는 것은? (수특 라이트 1 강 gateway)
36 pages
Starfinder Alien Archive 4 Pawn Collection 3 4
No ratings yet
Starfinder Alien Archive 4 Pawn Collection 3 4
2 pages
1375 2013
0% (1)
1375 2013
10 pages
TechCalc Thermal Software Guide
No ratings yet
TechCalc Thermal Software Guide
108 pages
Database Lab: EER Diagrams
No ratings yet
Database Lab: EER Diagrams
9 pages
Endemism: Definition, Types, and Examples
No ratings yet
Endemism: Definition, Types, and Examples
39 pages
Chromotherapy
100% (1)
Chromotherapy
10 pages
JAVA Chapter 4
No ratings yet
JAVA Chapter 4
1 page
Art & Design Student Assessment
No ratings yet
Art & Design Student Assessment
2 pages
Michael Faraday
No ratings yet
Michael Faraday
5 pages
LEGO Mindstorms Education Kit 9797 User Guide
100% (1)
LEGO Mindstorms Education Kit 9797 User Guide
66 pages
Surge Protection Standards Guide
No ratings yet
Surge Protection Standards Guide
1 page
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Blockchain Tech Seminar Report
No ratings yet
Blockchain Tech Seminar Report
27 pages
Business Model 2
No ratings yet
Business Model 2
13 pages
Presented By-Khyati, Chareeta, Hitesh
No ratings yet
Presented By-Khyati, Chareeta, Hitesh
6 pages
Class 12 Geography: Planning & Sustainable Development
No ratings yet
Class 12 Geography: Planning & Sustainable Development
40 pages
Guidelines For Life Safety Plan (LSP) Submissions
No ratings yet
Guidelines For Life Safety Plan (LSP) Submissions
6 pages
SHP 2 Grid
No ratings yet
SHP 2 Grid
7 pages

Data Engineering Nanodegree Program Syllabus PDF

Uploaded by

Data Engineering Nanodegree Program Syllabus PDF

Uploaded by

Data Engineering Nanodegree Program

Before You Start

Course 1: Data Modeling

Introduction to Data ➔ Understand the purpose of data modeling

Relational Data Models ➔ Understand when to use a relational database

Project: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the ➔ Understand Data Warehousing architecture

Introduction to the ➔ Understand cloud computing

Implementing Data ➔ Identify components of the Redshift architecture

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark ➔ Understand the big data ecosystem

Introduction to Data ➔ Understand the purpose and evolution of data lakes

Project 3: Big Data with Spark

Project 4: Automate Data Pipelines

Data Pipelines ➔ Create data pipelines with Apache Airflow

Data Quality ➔ Track data lineage

Production Data ➔ Build reusable and maintainable pipelines

Project: Data Pipelines with Airflow

Data Engineering Nanodegree Capstone Project

You might also like