Data Engineering Course Outline
Week 1-2: Introduction to Data Engineering
What is Data Engineering?
Overview of roles and responsibilities
Key components of data infrastructure
Data Engineer vs. Data Scientist
Understanding the collaboration between data engineers and data scientists
Data Pipelines
Overview of data pipeline design and automation
Key concepts: extraction, transformation, and loading (ETL)
Week 3-4: Prerequisites and Core Skills
Educational Background and Career Paths
Degrees in computer science, mathematics, or related fields
Non-traditional paths (self-taught, bootcamps, etc.)
Foundational Skills
Programming Basics (Python):
Syntax, control flow, data structures, and functions
Computer Science Fundamentals:
Algorithms, memory management, and data complexity
SQL for Data Engineering:
Database querying, filtering, and data manipulation using SQL
Week 5-6: Databases and Storage Solutions
Relational Databases (MySQL, PostgreSQL)
Schema design, normalization, and querying techniques
NoSQL Databases (MongoDB, Cassandra)
Types of NoSQL databases and their use cases
Data Warehousing
Introduction to cloud data warehousing solutions (e.g., Amazon Redshift, Google
BigQuery)
ETL processes and data modeling
Week 7-8: Data Processing Techniques
ETL Processes
Extraction, transformation, and loading methods
Tools for data integration
Batch and Streaming Processing
Differences between batch and real-time data processing
Use cases for each approach
Hands-on Projects
Building small-scale ETL pipelines using popular tools
Week 9-10: Cloud Computing for Data Engineers
Introduction to Cloud Platforms (AWS, GCP)
Understanding cloud services (compute, storage, databases)
Hands-on with cloud setup for data engineering tasks
Cloud-Based Data Storage Solutions
Comparison of storage options (S3, GCS, etc.)
Implementing cloud-based pipelines
Week 11-12: Big Data Technologies
Hadoop Ecosystem
Distributed data storage and processing (HDFS, MapReduce, YARN)
Apache Spark
Introduction to Spark for large-scale data processing
In-memory vs. disk-based processing
Hands-on Projects
Data processing using Hadoop and Spark
Week 13-16: Building Data Pipelines
Data Pipeline Design
Extracting data from various sources (APIs, web scraping, databases)
Transforming data into usable formats
Pipeline Orchestration Tools (Apache Airflow, Luigi, Prefect)
Building and automating workflows
Hands-on Practice
Developing, testing, and deploying data pipelines
Week 17-18: Advanced Data Engineering Skills
Machine Learning Integration
Building data pipelines for ML models (preprocessing, feature engineering)
Distributed Systems and DevOps
Fault tolerance, scalability, and CI/CD for data pipelines
Data Security and Governance
Access control, encryption, and compliance in data engineering
Week 19-20: Final Projects and Real-World Applications
Beginner Projects
Building a simple web scraper and basic data cleaning
Intermediate Projects
Cloud-based data warehouse setup or recommendation engine
Advanced Projects
Machine learning pipelines and real-time analytics dashboards