Databricks Certified Data Engineer Associate - Practice Questions
Apache Spark & Notebooks
Q: What is a common use of markdown cells in notebooks?
 A. C++
 B. Returns all elements of the DataFrame as a list
 C. Documentation
 D. To run another notebook
 Answer: C
Q: What is a benefit of using notebooks in Databricks?
 A. Returns all elements of the DataFrame as a list
 B. C++
 C. Supports interactive development
 D. Documentation
 Answer: C
Q: Which language is NOT supported in Databricks notebooks?
 A. To run another notebook
 B. Supports interactive development
 C. df.cache()
 D. C++
 Answer: D
Q: How do you cache a DataFrame in Spark?
 A. Documentation
 B. df.cache()
 C. DataFrame
 D. Supports interactive development
 Answer: B
Q: How is SparkSession accessed in Databricks?
 A. spark
 B. C++
 C. Documentation
 D. To run another notebook
 Answer: A
                 Databricks Certified Data Engineer Associate - Practice Questions
Q: How do you write comments in Python notebooks?
 A. To run another notebook
 B. # This is a comment
 C. spark
 D. C++
 Answer: B
Q: What does `display(df)` do?
 A. Supports interactive development
 B. # This is a comment
 C. Renders a DataFrame in a tabular format with visualization options
 D. spark
 Answer: C
Q: What is `%run` used for in notebooks?
 A. Supports interactive development
 B. To run another notebook
 C. spark
 D. DataFrame
 Answer: B
Q: What does the `.collect()` method do?
 A. Renders a DataFrame in a tabular format with visualization options
 B. DataFrame
 C. df.cache()
 D. Returns all elements of the DataFrame as a list
 Answer: D
Q: What does `spark.read.csv()` return?
 A. C++
 B. df.cache()
 C. Documentation
 D. DataFrame
 Answer: D
Data Governance & Security
                 Databricks Certified Data Engineer Associate - Practice Questions
Q: Which layer defines table-level access?
 A. Stores metadata about data assets
 B. Catalog permissions
 C. A shared environment for users
 D. Data permissions and lineage
 Answer: B
Q: Who defines data access policies in Unity Catalog?
 A. Data permissions and lineage
 B. Through access control lists (ACLs)
 C. Stores metadata about data assets
 D. Data stewards or admins
 Answer: D
Q: What does Unity Catalog manage?
 A. A shared environment for users
 B. Data permissions and lineage
 C. Through access control lists (ACLs)
 D. Role-Based Access Control
 Answer: B
Q: How are user permissions granted?
 A. Role-Based Access Control
 B. Stores metadata about data assets
 C. Through access control lists (ACLs)
 D. Assign roles to users
 Answer: C
Q: What is a workspace in Databricks?
 A. Tracks access logs and usage history
 B. Role-Based Access Control
 C. Assign roles to users
 D. A shared environment for users
 Answer: D
Q: What is one way to restrict data access?
                 Databricks Certified Data Engineer Associate - Practice Questions
 A. Data permissions and lineage
 B. Tracking data origin and transformations
 C. Catalog permissions
 D. Assign roles to users
 Answer: D
Q: What is data lineage?
 A. A shared environment for users
 B. Catalog permissions
 C. Data stewards or admins
 D. Tracking data origin and transformations
 Answer: D
Q: What is RBAC?
 A. Assign roles to users
 B. Tracks access logs and usage history
 C. Role-Based Access Control
 D. Data stewards or admins
 Answer: C
Q: What is the role of a metastore?
 A. Role-Based Access Control
 B. Stores metadata about data assets
 C. Tracks access logs and usage history
 D. Data stewards or admins
 Answer: B
Q: How does Unity Catalog improve auditing?
 A. Assign roles to users
 B. Tracks access logs and usage history
 C. Data stewards or admins
 D. Catalog permissions
 Answer: B
Data Ingestion & Transformation
Q: Which tool helps with transformation jobs?
                  Databricks Certified Data Engineer Associate - Practice Questions
 A. JSON
 B. XLS
 C. Databricks Workflows
 D. df.write.format('delta').save('path')
 Answer: C
Q: What is a common data ingestion format in Databricks?
 A. XLS
 B. Incrementally ingesting data from cloud storage
 C. df.write.format('delta').save('path')
 D. JSON
 Answer: D
Q: Which function applies transformation to each row?
 A. JSON
 B. Structured Streaming
 C. Databricks Workflows
 D. map
 Answer: D
Q: Which format is NOT typically used in Databricks ingestion?
 A. map
 B. XLS
 C. spark.read.csv('file.csv')
 D. dropna
 Answer: B
Q: How do you write a DataFrame as Delta?
 A. Structured Streaming
 B. Incrementally ingesting data from cloud storage
 C. JSON
 D. df.write.format('delta').save('path')
 Answer: D
Q: How to read CSV data into a DataFrame?
 A. JSON
                  Databricks Certified Data Engineer Associate - Practice Questions
 B. df.write.format('delta').save('path')
 C. Structured Streaming
 D. spark.read.csv('file.csv')
 Answer: D
Q: Which method is used for cleaning data?
 A. Structured Streaming
 B. JSON
 C. df.write.format('delta').save('path')
 D. dropna
 Answer: D
Q: Which method ingests streaming data?
 A. JSON
 B. Structured Streaming
 C. readStream
 D. dropna
 Answer: C
Q: What is 'autoloader' used for?
 A. JSON
 B. spark.read.csv('file.csv')
 C. df.write.format('delta').save('path')
 D. Incrementally ingesting data from cloud storage
 Answer: D
Q: Which API supports streaming in Spark?
 A. dropna
 B. JSON
 C. Structured Streaming
 D. map
 Answer: C
Databricks Lakehouse Platform
Q: Which storage format does Lakehouse architecture commonly use?
 A. Unified BI and ML analytics
                 Databricks Certified Data Engineer Associate - Practice Questions
 B. Lack of schema enforcement and consistency
 C. Open formats and APIs
 D. Delta Lake
 Answer: D
Q: How does Lakehouse support ML workloads?
 A. By enabling data scientists to access the same data used in analytics
 B. ACID transactions
 C. Unified BI and ML analytics
 D. Open formats and APIs
 Answer: A
Q: What is one way Lakehouse reduces data movement?
 A. It combines the benefits of data lakes and data warehouses
 B. Unified data platform
 C. Unified BI and ML analytics
 D. By enabling data scientists to access the same data used in analytics
 Answer: B
Q: Which layer of Lakehouse handles governance and security?
 A. Open formats and APIs
 B. Metadata layer
 C. By enabling data scientists to access the same data used in analytics
 D. ACID transactions
 Answer: B
Q: Which component enables data reliability in a Lakehouse?
 A. Unified data platform
 B. Lack of schema enforcement and consistency
 C. It combines the benefits of data lakes and data warehouses
 D. ACID transactions
 Answer: D
Q: What is a common use case of a Lakehouse?
 A. Unified BI and ML analytics
 B. ACID transactions
                  Databricks Certified Data Engineer Associate - Practice Questions
 C. Batch and streaming workloads
 D. Unified data platform
 Answer: A
Q: Why are traditional data lakes insufficient for BI workloads?
 A. Batch and streaming workloads
 B. Lack of schema enforcement and consistency
 C. Metadata layer
 D. Open formats and APIs
 Answer: B
Q: Which feature allows multiple tools to access the same data in Lakehouse?
 A. Open formats and APIs
 B. Delta Lake
 C. Metadata layer
 D. It combines the benefits of data lakes and data warehouses
 Answer: A
Q: What is the primary benefit of the Databricks Lakehouse Platform?
 A. Open formats and APIs
 B. By enabling data scientists to access the same data used in analytics
 C. Batch and streaming workloads
 D. It combines the benefits of data lakes and data warehouses
 Answer: D
Q: What type of data workloads can be handled by a Lakehouse?
 A. It combines the benefits of data lakes and data warehouses
 B. Open formats and APIs
 C. Delta Lake
 D. Batch and streaming workloads
 Answer: D
Delta Lake
Q: Which method updates a Delta table conditionally?
 A. Parquet
 B. MERGE INTO
                 Databricks Certified Data Engineer Associate - Practice Questions
 C. Data reliability with ACID transactions
 D. _delta_log
 Answer: B
Q: How can schema evolution be enabled in Delta?
 A. RESTORE
 B. A table stored in Delta format with transaction support
 C. Transaction log
 D. mergeSchema=True
 Answer: D
Q: What is a Delta table?
 A. Transaction log
 B. Parquet
 C. Data reliability with ACID transactions
 D. A table stored in Delta format with transaction support
 Answer: D
Q: How to enable change data feed in Delta Lake?
 A. VACUUM
 B. Transaction log
 C. Set 'delta.enableChangeDataFeed = true'
 D. RESTORE
 Answer: C
Q: Which command is used to remove old files in Delta tables?
 A. Parquet
 B. RESTORE
 C. A table stored in Delta format with transaction support
 D. VACUUM
 Answer: D
Q: What does Delta Lake use for ACID transactions?
 A. VACUUM
 B. Data reliability with ACID transactions
 C. _delta_log
                  Databricks Certified Data Engineer Associate - Practice Questions
 D. A table stored in Delta format with transaction support
 Answer: C
Q: What operation allows restoring a table to a previous state?
 A. Transaction log
 B. RESTORE
 C. mergeSchema=True
 D. Set 'delta.enableChangeDataFeed = true'
 Answer: B
Q: What is one benefit of Delta Lake?
 A. Set 'delta.enableChangeDataFeed = true'
 B. VACUUM
 C. A table stored in Delta format with transaction support
 D. Data reliability with ACID transactions
 Answer: D
Q: Which file format is used by Delta Lake?
 A. VACUUM
 B. Set 'delta.enableChangeDataFeed = true'
 C. Transaction log
 D. Parquet
 Answer: D
Q: What enables time travel in Delta Lake?
 A. A table stored in Delta format with transaction support
 B. Transaction log
 C. VACUUM
 D. RESTORE
 Answer: B
ETL Pipelines & Workflows
Q: What is a task in Databricks Jobs?
 A. Via Widgets or Job Parameters
 B. A unit of work like running a notebook or script
 C. Single Node
                  Databricks Certified Data Engineer Associate - Practice Questions
 D. Python task
 Answer: B
Q: How are job parameters passed?
 A. Governance on cluster configurations
 B. Jobs UI
 C. Via Widgets or Job Parameters
 D. max_retries
 Answer: C
Q: What is a multi-task job?
 A. Workflow with multiple dependent tasks
 B. Jobs UI
 C. Single Node
 D. Use the cron expression
 Answer: A
Q: What parameter controls retry attempts?
 A. max_retries
 B. Via Widgets or Job Parameters
 C. Use the cron expression
 D. Job run history page
 Answer: A
Q: How to schedule a job weekly?
 A. Workflow with multiple dependent tasks
 B. Via Widgets or Job Parameters
 C. Python task
 D. Use the cron expression
 Answer: D
Q: Which task type supports Python scripts?
 A. Python task
 B. A unit of work like running a notebook or script
 C. Governance on cluster configurations
 D. max_retries
                  Databricks Certified Data Engineer Associate - Practice Questions
 Answer: A
Q: What is the default cluster mode in a job?
 A. Single Node
 B. max_retries
 C. Jobs UI
 D. Job run history page
 Answer: A
Q: Where do you find job run logs?
 A. Jobs UI
 B. max_retries
 C. Governance on cluster configurations
 D. Job run history page
 Answer: D
Q: What is a cluster policy?
 A. Via Widgets or Job Parameters
 B. Governance on cluster configurations
 C. Use the cron expression
 D. Single Node
 Answer: B
Q: What UI is used to create workflows in Databricks?
 A. Via Widgets or Job Parameters
 B. Single Node
 C. Jobs UI
 D. Workflow with multiple dependent tasks
 Answer: C