KEMBAR78
Fabric Interview Guide | PDF | Apache Spark | Metadata
0% found this document useful (0 votes)
71 views7 pages

Fabric Interview Guide

Uploaded by

Nikunj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views7 pages

Fabric Interview Guide

Uploaded by

Nikunj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

📘 Microsoft Fabric Interview Guide

✅ 100 Interview Questions with Answers, Examples & Diagrams

1. Basics of Microsoft Fabric


Q1. What is Microsoft Fabric?\ A unified analytics platform combining data engineering, data science, real-
time analytics, and BI. It brings together Power BI, Synapse, and Data Factory into one SaaS experience.

Q2. What is OneLake?\ A single, logical, multi-cloud data lake for storing all organizational data. Acts as the
storage backbone for Fabric.

Q3. How is Fabric different from Synapse?\ Fabric is SaaS (integrated services in one UI). Synapse is PaaS
(separate services).

Q4. Difference between Lakehouse and Warehouse?

• Lakehouse = Open Delta storage (raw, semi-structured, ML-ready).


• Warehouse = Relational SQL tables (structured BI-ready).

Diagram: Fabric Ecosystem

┌───────────────────────────┐
│ Microsoft Fabric │
└───────────────┬───────────┘

┌──────┴──────┐
│ OneLake │
└──────┬──────┘
┌─────────────┼─────────────┐
│ │ │
Lakehouse Warehouse Pipelines
│ │ │
Raw + Curated BI Tables Orchestration
│ │ │
Semantic Models + Power BI (Reports)

Q5. What workloads exist in Fabric?

• Data Engineering (Notebooks, Lakehouse)


• Data Science (ML)
• Real-Time Analytics

1
• Data Factory (Pipelines, Dataflows)
• Warehouse (SQL)
• Power BI (Reports)

2. Lakehouse & Delta Tables


Q11. How does Lakehouse store data?\ As Delta tables (Parquet + transaction logs).

Q12. Difference between Parquet vs Delta?

• Parquet = columnar file format


• Delta = Parquet + transaction log + ACID

Q13. What is schema evolution?\ Ability to add/modify columns automatically when ingesting new data.

Q14. Create Delta Table Example (PySpark):

df.write.format("delta").mode("overwrite").save("Tables/myTable")

Q15. Query Delta Table (T-SQL):

SELECT * FROM delta."Tables/myTable";

Diagram: Medallion Architecture

┌────────────┐
│ Bronze │ → Raw ingestion (CSV, JSON, files)
└─────┬──────┘

┌─────▼─────┐
│ Silver │ → Cleaned & standardized data
└─────┬─────┘

┌─────▼─────┐
│ Gold │ → Aggregated, business-ready data
└───────────┘

Q19. How does Fabric ensure ACID?\ Delta transaction logs provide snapshot isolation + commit
consistency.

2
3. Warehouse
Q21. How is Fabric Warehouse different from Azure SQL DB?

• Fabric Warehouse is SaaS + fully managed, tightly integrated with OneLake.


• Azure SQL DB is standalone PaaS, requires provisioning.

Q23. How to implement star schema?\ Create dimension + fact tables in Warehouse.

Example:

CREATE TABLE DimCustomer (CustomerID INT PRIMARY KEY, Name VARCHAR(100));


CREATE TABLE FactSales (SaleID INT, CustomerID INT, Amount DECIMAL(10,2));

Diagram: Lakehouse vs Warehouse

Lakehouse → Stores raw + semi-structured + unstructured data


Warehouse → Stores structured data optimized for BI reporting

4. Pipelines & Data Ingestion


Q31. What is a Data Pipeline?\ An orchestration tool in Fabric for ingesting & transforming data.

Q33. Ingest on-prem SQL data?

• Use Data Gateway


• Configure Copy Activity → Lakehouse/Warehouse

Q37. How to capture pipeline run status?\ Store logs in a Metadata.IngestionBatch table.

Q39. Incremental ingestion?

• Use watermark column (e.g., ModifiedDate)


• Store last loaded value in metadata table

Q40. Copy vs Dataflow Gen2?

• Copy → Data movement only


• Dataflow Gen2 → Transformation + ingestion

Diagram: Metadata-driven Ingestion

3
Source System → Pipeline (Copy + Metadata Control) → Bronze (Lakehouse)

Logging Tables

5. Dataflow Gen2
Q41. What is Dataflow Gen2?\ Low-code ETL in Fabric using Power Query engine.

Q42. How to implement incremental load?\ Filter rows using last_modified column from metadata.

Q43. Split header & line items?

• Import PO data → Split into two outputs → PO_Header & PO_Lines tables.

Q47. Apply column mapping?\ Map source → target columns in Dataflow transformation UI.

Q49. Schema drift handling?\ Enable schema drift → allows dynamic schema adaptation.

Diagram: Dataflow Transformation

Raw Data → Dataflow (Transform, Clean, Split) → Lakehouse/Warehouse Tables

6. Notebooks & Spark


Q51. Languages supported?\ Python, PySpark, SQL, R, Scala.

Q53. Load CSV Example:

df = spark.read.format("csv").option("header","true").load("Files/data.csv")

Q55. Spark SQL vs T-SQL?

• Spark SQL = distributed, big data processing


• T-SQL = relational queries on structured tables

Q57. Merge Incremental Data Example:

spark.sql("""
MERGE INTO target t

4
USING staging s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.val = s.val
WHEN NOT MATCHED THEN INSERT *
""")

Q60. Orchestration?\ Notebooks can be called inside Pipelines.

Diagram: Notebook Usage in Fabric

Raw Data → Notebook (PySpark/SQL Transformations) → Delta Tables in Lakehouse

7. CI/CD & DevOps


Q61. How to version control Fabric artifacts?\ Use Git integration with Azure Repos.

Q63. YAML Deployment Example:

trigger:
- main

jobs:
- job: Deploy
steps:
- task: PowerShell@2
inputs:
targetType: 'inline'
script: |
Write-Output "Deploying Fabric artifacts"

Q66. Environment configs?\ Use deployment pipeline rules (parameterize connections).

Q69. Git vs Deployment Pipeline?

• Git → Source control + branching


• Deployment Pipeline → Promote to TEST/PROD

Diagram: CI/CD Flow

DEV → Commit to Git → Deployment Pipeline → TEST → PROD

5
8. Security & Governance
Q71. How to implement RLS?\ Define roles in Warehouse or Semantic Model.

Q74. Example:

CREATE SECURITY POLICY SalesFilter


ADD FILTER PREDICATE dbo.fnRLS(CustomerID) ON FactSales;

Q75. Purview integration?\ Fabric integrates with Purview for data lineage & catalog.

Q76. Data masking?\ Use dynamic data masking in Warehouse.

Q78. Default OneLake security?\ Access controlled via Fabric workspaces + Microsoft Entra ID.

Diagram: RLS Security Flow

User → Role (defined in Semantic Model/Warehouse) → Filtered Query Result

9. Advanced Scenarios
Q81. How to implement Medallion?

• Bronze: Copy raw files → Lakehouse


• Silver: Transform with Dataflows/Notebooks
• Gold: Load curated tables into Warehouse

Q83. Implement SCD Type 2?\ Use MERGE with valid_from, valid_to columns.

Q85. Real-time streaming?\ Use Eventstream in Fabric → Lakehouse → Power BI.

Q87. Detect duplicates Example (PySpark):

df.groupBy("id").count().filter("count > 1")

Q90. GDPR compliance?\ Implement data retention policies + masking + audit.

Diagram: Real-time Data Flow

6
Event Stream → Fabric Lakehouse (Delta Table) → Power BI Dashboard

10. Semantic Models & Power BI


Q91. What is a Semantic Model?\ A centralized data model (like Power BI dataset) in Fabric.

Q96. Import vs Direct Lake vs DirectQuery?

• Import → Cached data, fast


• Direct Lake → Query Lakehouse directly (best for Fabric)
• DirectQuery → Query external DB (slower)

Q94. DAX Example (YTD Sales):

Sales YTD = TOTALYTD(SUM(FactSales[Amount]), DimDate[Date])

Q98. RLS in Semantic Model?\ Define roles in model view → assign filters.

Q100. Direct Lake advantages?

• Near real-time
• High performance (no duplication)
• Lower cost

Diagram: Power BI Integration

Lakehouse/Warehouse → Semantic Model → Power BI Report

✅ Summary
• Lakehouse = Big data + ML + raw storage
• Warehouse = Structured BI reporting
• Pipelines & Dataflows = ETL orchestration
• Notebooks = Advanced transformations
• Deployment Pipelines + Git = CI/CD
• Semantic Models = Power BI integration

This guide gives you 100 Q&A with examples and diagrams across all Fabric components, structured for
10–12 page PDF revision before interviews.

You might also like