Deploy
Workloads with
Databricks
Workflows
Module 05
©2023 Databricks Inc. — All rights reserved 1
Module Agenda
Deploy Workloads with Databricks Workflows
Introduction to Workflows
Building and Monitoring Workflow Jobs
DE 5.1 - Scheduling Tasks with the Jobs UI
DE 5.2L - Jobs Lab
©2023 Databricks Inc. — All rights reserved 2
Introduction to
Workflows
©2023 Databricks Inc. — All rights reserved 3
Course Objectives
1 Describe the main features and use cases of Databricks Workflows
2 Create a task orchestration workflow composed of various task types
3 Utilize monitoring and debugging features of Databricks Workflows
4 Describe workflow best practices
©2023 Databricks Inc. — All rights reserved
Databricks Workflows
Databricks Workflows
Workflows is a fully-managed cloud-based
general-purpose task orchestration
service for the entire Lakehouse.
Lakehouse Platform
Data Data Data Data Science
Warehousing Engineering Streaming and ML
Workflows is a service for data engineers,
data scientists and analysts to build reliable Unity Catalog
Fine-grained governance for data and AI
data, analytics and AI workflows on any
Delta Lake
cloud. Data reliability and performance
Cloud Data Lake
All structured and unstructured data
©2023 Databricks Inc. — All rights reserved 5
Databricks Workflows
Databricks has two main task orchestration
services:
• Workflow Jobs (Workflows)
• Workflows for every job
• Delta Live Tables (DLT)
• Automated data pipelines for Delta Lake
Note: DLT pipeline can be a task in a workflow
©2023 Databricks Inc. — All rights reserved 6
DLT versus Workflow Jobs
Considerations
Delta Live Tables Workflow Jobs
JARs, notebooks, DLT, application written in
Source Notebooks only
Scala, Java, Python
Dependencies Automatically determined Manually set
Cluster Self-provisioned Self-provisioned or existing
Timeouts and Retries Not supported Supported
Import Libraries Not supported Supported
©2023 Databricks Inc. — All rights reserved 7
DLT versus Jobs
Use Cases
Orchestration of Machine Learning Tasks Arbitrary Code, External Data Ingestion and
Dependent Jobs API Calls, Custom Tasks Transformation
Run MLflow notebook task
Jobs running on schedule, in a job Run tasks in a job which ETL jobs, Support for batch
containing dependent can contain Jar file, Spark and streaming, Built in data
tasks/steps Submit, Python Script, SQL quality constraints,
task, dbt monitoring & logging
Jobs Workflows Jobs Workflows Jobs Workflows Delta Live Tables
©2023 Databricks Inc. — All rights reserved 8
Workflows Features
Part 1 of 2
Orchestrate Anything Fully Managed Simple Workflow
Anywhere Authoring
Run diverse workloads for the full Remove operational overhead An easy point-and-click authoring
data and AI lifecycle, on any cloud. with a fully managed experience for all your data teams
Orchestrate; orchestration service enabling not just those with specialized
you to focus on your workflows skills
• Notebooks
not on managing your
• Delta Live Tables
infrastructure
• Jobs for SQL
• ML models, and more
©2023 Databricks Inc. — All rights reserved 9
Workflows Features
Part 2 of 2
Deep Platform Integration Proven Reliability
Designed and built into your Have full confidence in your
lakehouse platform giving you workflows leveraging our proven
deep monitoring capabilities and experience running tens of
centralized observability across millions of production workloads
all your workflows daily across AWS, Azure, and GCP
©2023 Databricks Inc. — All rights reserved 10
How to Leverage Workflows
• Allows you to build simple ETL/ML task orchestration
• Reduces infrastructure overhead
• Easily integrate with external tools
• Enables non-engineers to build their own workflows using simple UI
• Cloud-provider independent
• Enables re-using clusters to reduce cost and startup time
©2023 Databricks Inc. — All rights reserved
Common Workflow Patterns
Sequence Funnel Fan-out
Sequence Funnel
● Data transformation/ Fan-out, star pattern
● Multiple data sources
processing/cleaning ● Single data source
● Data collection
● Bronze/silver/gold tables ● Data ingestion and
distribution
©2023 Databricks Inc. — All rights reserved
Example Workflow
Data ingestion funnel
E.g. Auto Loader, DLT
Data filtering, quality assurance, transformation
E.g. DLT, SQL, Python
ML feature extraction
E.g. MLflow
Persisting features and training prediction model
©2023 Databricks Inc. — All rights reserved
Building and
Monitoring Workflow
Jobs
©2023 Databricks Inc. — All rights reserved 14
Workflows Job Components
TASKS SCHEDULE CLUSTER
What? When? How?
©2023 Databricks Inc. — All rights reserved 15
Creating a Workflow
Task Definition
While creating a task;
• Define the task type
• Choose the cluster type
• Job clusters and All-purpose clusters can
be used.
• A cluster can be used by multiple tasks.
This reduces cost and startup time.
• If you want to create a new cluster,
you must have required permissions.
• Define task dependency if task
depends on another task
©2023 Databricks Inc. — All rights reserved
Monitoring and Debugging
Scheduling and Alerts
You can run your jobs immediately or
periodically through an easy-to-use
scheduling system.
You can specific alerts to be notified
when runs of a job begin, complete or
fail. Notifications can be sent via email,
Slack or AWS SNS.
©2023 Databricks Inc. — All rights reserved
Monitoring and Debugging
Access Control
Workflows integrates with existing
resources access controls, enabling you
to easily manage access across different
teams.
©2023 Databricks Inc. — All rights reserved
Monitoring and Debugging
Job Run History Run duration
Workflows keeps track of job runs and
save information about the success or
failure of each task in the job run.
Navigate to the Runs tab to view
completed or active runs for a job.
Tasks
Job run
©2023 Databricks Inc. — All rights reserved 19
Monitoring and Debugging
Repair a Failed Job Run
Repair feature allows you to re-run only
the failed task and sub-tasks, which
reduces the time and resources required
to recover from unsuccessful job runs.
©2023 Databricks Inc. — All rights reserved
Navigating the Jobs UI
Use breadcrumbs to navigate back to your job from a specific run page
©2023 Databricks Inc. — All rights reserved 21
Navigating the Jobs UI
Runs vs Tasks tabs on the job page
Use Runs tab to view completed or Use Tasks tab to modify or add
active runs for the job tasks to the job
©2023 Databricks Inc. — All rights reserved 22
DE 5.1.1: Task
Orchestration
©2023 Databricks Inc. — All rights reserved 23
Demo: Task Orchestration
DE 5.1.1 - Task Orchestration
• Schedule a notebook task in a Databricks Workflow Job
• Describe job scheduling options and differences between cluster types
• Review Job Runs to track progress and see results
• Schedule a DLT pipeline task in a Databricks Workflow Job
• Configure dependency between tasks via Databricks Workflows UI
©2023 Databricks Inc. — All rights reserved 24
DE 5.2.1.L: Task
Orchestration Lab
©2023 Databricks Inc. — All rights reserved 25
Lab: Task Orchestration
DE 5.2.1.L - Task Orchestration
©2023 Databricks Inc. — All rights reserved 26
©2023 Databricks Inc. — All rights reserved 27