KEMBAR78
Shaik DataEngineer Interview QA CheatSheet | PDF
0% found this document useful (0 votes)
33 views2 pages

Shaik DataEngineer Interview QA CheatSheet

The document is a cheat sheet for Data Engineer interview questions and answers, covering topics such as ETL processes, differences between ADF and SSIS, debugging in Azure Data Factory, and API creation using Flask. It also includes strategies for optimizing SQL queries, managing version control with GitHub, and handling bad data in ETL. Each question is paired with a concise answer that highlights key concepts and best practices.

Uploaded by

javedmiddeshaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views2 pages

Shaik DataEngineer Interview QA CheatSheet

The document is a cheat sheet for Data Engineer interview questions and answers, covering topics such as ETL processes, differences between ADF and SSIS, debugging in Azure Data Factory, and API creation using Flask. It also includes strategies for optimizing SQL queries, managing version control with GitHub, and handling bad data in ETL. Each question is paired with a concise answer that highlights key concepts and best practices.

Uploaded by

javedmiddeshaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Engineer Interview Q&A Cheat Sheet - Shaik

Q: What is ETL and how have you implemented it?

A: ETL stands for Extract, Transform, Load. I implemented ETL using SSIS to extract flat files and SQL data, applied

transformations (e.g., lookups, derived columns), and loaded data into SQL Server. For cloud ETL, I used Azure Data

Factory to load from Blob to Azure SQL DB.

Q: What is the difference between ADF and SSIS?

A: SSIS is an on-prem ETL tool while ADF is a cloud-native data integration tool. ADF is scalable and integrates with

various Azure/cloud services. SSIS works well within SQL Server environments.

Q: How do you debug failed pipelines in Azure Data Factory?

A: Check Monitor logs, review each activity output, inspect Linked Services and schema mismatches, and rerun in

debug mode to trace errors.

Q: How do you design a pipeline to load JSON from Blob to Azure SQL DB?

A: Use Copy Activity in ADF. Define source dataset as JSON (linked to Blob), and sink as Azure SQL DB. Map schema,

use parameterized file paths if needed.

Q: How do you create APIs using Flask?

A: Use Flask to define routes for CRUD (GET, POST, PUT, DELETE). Accept JSON payloads, use

SQLAlchemy/pyodbc to interact with DB, and test using Postman.

Q: How do you optimize SQL queries?

A: Avoid SELECT *, use WHERE clauses early, use proper indexes, prefer JOINs over subqueries, and write set-based

logic.

Q: How do you handle version control in a project?

A: Use GitHub for repository management. Create feature branches, commit with messages, open pull requests, and
Data Engineer Interview Q&A Cheat Sheet - Shaik

resolve merge conflicts collaboratively.

Q: How do you handle bad data in ETL?

A: Use validation checks, route invalid rows to error tables, and alert on data quality issues using conditional split and

logging mechanisms.

You might also like