1-Week Data Engineering Learning Plan (AWS-Focused)
Day 1: Cloud & Data Engineering Basics + AWS Core Services
Goal: Understand cloud fundamentals, AWS core data services, and architecture.
Topics:
- What is Data Engineering (recap)
- AWS overview (IAM, EC2, S3, RDS, VPC)
- Data storage concepts: structured vs unstructured data
- S3 hands-on
Resources:
- AWS Cloud Practitioner Essentials: https://www.aws.training/Details/Curriculum?id=20685
- S3 Tutorial: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
Day 2: Data Ingestion & ETL Basics
Goal: Learn data ingestion and processing using AWS Glue and Lambda.
Topics:
- Batch vs Stream processing
- AWS Glue overview
- AWS Lambda basics
Resources:
- Glue Getting Started: https://docs.aws.amazon.com/glue/latest/dg/getting-started.html
- Lambda + S3 Trigger: https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
1-Week Data Engineering Learning Plan (AWS-Focused)
Day 3: Querying Data Using Athena and Redshift
Goal: Query structured data and explore Redshift data warehousing.
Topics:
- Athena usage
- Redshift vs Spectrum
- Partitioning and optimization
Resources:
- Athena Docs: https://docs.aws.amazon.com/athena/latest/ug/what-is.html
- Redshift Getting Started: https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
Day 4: Data Pipelines & Workflow Orchestration
Goal: Create and automate data workflows.
Topics:
- AWS Step Functions
- Intro to Apache Airflow
Resources:
- Step Functions Workshop:
https://catalog.us-east-1.prod.workshops.aws/workshops/5d1e0011-b3ae-4605-8160-2d35ce77d4c8/en-US
- Airflow Zoomcamp Module:
https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/05-terraform-aws-airflow
1-Week Data Engineering Learning Plan (AWS-Focused)
Day 5: Big Data Processing with EMR & PySpark
Goal: Learn big data processing using EMR and PySpark.
Topics:
- EMR basics
- PySpark (RDDs, DataFrames)
Resources:
- EMR Docs: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html
- PySpark Intro: https://spark.apache.org/docs/latest/api/python/getting_started/index.html
Day 6: Mini Project - Build a Complete Data Pipeline
Goal: Integrate services into a working pipeline.
Project Flow:
- Ingest data to S3
- Use Glue for ETL
- Store in Redshift
- Query via Athena
- Optional: QuickSight visualization
Resource:
- AWS Data Pipeline Concepts:
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html
1-Week Data Engineering Learning Plan (AWS-Focused)
Day 7: Review, Document, and Reflect
Goal: Solidify learning and plan next steps.
Activities:
- Document your pipeline
- Explore AWS Data Analytics Certification
- Take quizzes
Resource:
- AWS Analytics Lens: https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/welcome.html