KEMBAR78
Data Engineering | PDF | Data | Data Warehouse
0% found this document useful (0 votes)
8 views14 pages

Data Engineering

Uploaded by

mathewnzau254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Data Engineering

Uploaded by

mathewnzau254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Engineering

Brian Njuguna
Agenda

▪ What is Data Engineering


▪ Data Engineering vs Data Scientist
▪ Key Component of Data Engineering
▪ Key Skills For Data Engineering
▪ Case Study
▪ Bonus
▪ Learning Resources
Data Engineering

▪ Data Engineering focuses on designing, building, and managing data


pipelines.
▪ It's crucial for collecting, transforming, and storing large datasets
used for analysis.
▪ Key role in enabling data-driven decision-making and powering
technologies like AI and machine learning.
Data Engineering vs. Data Science

▪ Data Engineers: Build and maintain the infrastructure for data.


▪ Data Scientists: Analyze data to extract insights and build models.
▪ Collaboration between both roles is critical to delivering value from
data.
Data Engineering vs. Other Domains
Key Components of Data Engineering

▪ Data Ingestion (Pipelines)


▪ Data Processing (Transforming, cleaning, and aggregating)
▪ Data Modeling (Analysis, reporting, and decision-making)
▪ Data Storage(Database, Data Lake, Data Warehouse)
▪ Data Quality (Accurate, consistent, and complete)
▪ Data Catalog (Encyclopedia for your data platform)
▪ Access Management (Protect sensitive information)
▪ Data Observability and Orchestration (Detect and resolve)
Key Skills for Data Engineers

▪ Programming: Python, Java, Scala.


▪ Database Management: SQL, NoSQL (MongoDB, Cassandra).
▪ Big Data Tools: Hadoop, Spark.
▪ Cloud Platforms: AWS, Azure, GCP.
▪ Data Pipeline Tools: Apache Airflow, Kafka, DBT.
ETL Pipeline
ELT Pipeline
Data Streaming and Batch Process
Data Pipeline Architecture Best Practices

▪ Map and understand the dependencies.


▪ Design your data pipeline so it is modular and automated.
▪ Create data pipeline SLAs (service level agreements).
▪ Let the data drive the data pipeline architecture.
▪ Create data products.
▪ Continuously review and optimize costs.
▪ Make pipelines idempotent.
Story Telling With Data

▪ Understand the Context


▪ Choose an Appropriate Visual
▪ Eliminate Clutter
▪ Focus Attention
▪ Tell a Story
▪ Use Accessible and Intuitive Labels
▪ Iterate and Seek Feedback
▪ Balance Data and Design
Learning Resources

▪ “Data Modeling Made Simple” by Steve Hoberman


▪ “Designing Data-Intensive Applications” by Martin Kleppmann
▪ “The Data Warehouse Toolkit” by Ralph Kimball and Margy Ross
▪ “Clean Code” by Robert C. Martin
▪ “Principles of Distributed Database Systems” by M. Tamer Özsu
and Patrick Valduriez
▪ Storytelling with data - Nussbaumer Knaflic
Thank You

You might also like