Data Analysis
with
Databricks SQL
Databricks Academy
2023
©2023 Databricks Inc. — All rights reserved 1
Meet your classmates
• Where is everyone joining us from today (city, country)?
©2023 Databricks Inc. — All rights reserved 2
Meet your classmates
• How long have you been working with Databricks SQL?
©2023 Databricks Inc. — All rights reserved 3
Meet your classmates
• What data analysis tools have you worked with in the past?
©2023 Databricks Inc. — All rights reserved 4
Meet your classmates
• What are you hoping to get out of this class?
©2023 Databricks Inc. — All rights reserved 5
Getting Started with
the Course
©2023 Databricks Inc. — All rights reserved 6
Course goals
1● Describe how Databricks SQL:
○ Works in the lakehouse architecture
○ Implements data security
2● Use Databricks SQL to:
○ Ingest data into the Platform
○ Use SQL commands specific to Databricks
○ Query data
○ Create visualizations and dashboards
○ Share queries and dashboards with others
○ Use automation and integration capabilities
©2023 Databricks Inc. — All rights reserved 7
Agenda
Module Name Duration
Databricks SQL Services and Capabilities 1 hour, 50 min
Data Management in Databricks SQL 1 hour, 40 min
Data Visualization and Dashboarding 3 hours, 30 min
● We will take 10 minute breaks about every hour
©2023 Databricks Inc. — All rights reserved 8
Activities used in this course
Lectures Demos Labs Knowledge checks
©2023 Databricks Inc. — All rights reserved 9
Technical Environment Overview
The Databricks SQL workspace
• Everyone:
• is in the same workspace
• has their own catalog
• is using the same SQL warehouse
• Only the instructor has administrator privileges in the workspace
• only a select few tasks in this course require admin privileges
• you will see these tasks in the slides in order to provide context
• the labs do not require admin privileges
©2023 Databricks Inc. — All rights reserved 10
Databricks SQL
Services and
Capabilities
Databricks Academy 2023
©2023 Databricks Inc. — All rights reserved 11
Databricks SQL Services and Capabilities
Lesson Name Duration
Lecture: Getting Started with Databricks SQL 15 min
Follow Along Demo: Setting Up a Catalog and Schema 20 mins
Follow Along Demo: Data Importing 20 mins
Follow Along Demo: A Simple (but Quick) Query, Visualization, and Dashboard 15 min
Lecture: Unity Catalog in Databricks SQL 20 min
Lecture: Lakehouse Architecture 20 min
Lecture: Integrations 10 min
©2023 Databricks Inc. — All rights reserved 12
Getting Started with
Databricks SQL
©2023 Databricks Inc. — All rights reserved 13
Learning Objectives
By the end of this lesson, you should be able to:
1 Describe what Databricks SQL is.
2 Describe the benefits of Databricks SQL.
©2023 Databricks Inc. — All rights reserved 14
Databricks SQL
Delivering analytics on the freshest
data with data warehouse Data Science
& AI
ETL &
Real-time Analytics
Orchestration Data
Warehousing
performance and data lake Databricks AI Delta Live Tables Workflows Databricks SQL
economics
Use generative AI to understand the semantics of your data
Data Intelligence Engine
Unity Catalog
Securely get insights in natural language
■ Better price / performance than other cloud
data warehouses Delta Lake
Data layout is automatically optimized based on usage patterns
■ Simplify discovery and sharing of new insights
Open Data Lake
■ Connect to familiar BI tools, like Tableau or All Raw Data
(Logs, Texts, Audio, Video, Images)
Power BI
■ Simplified administration and governance
©2023 Databricks Inc. — All rights reserved 15
Better price / performance
Run SQL queries on your
lakehouse and analyze your
freshest data with up to 6x
better price/performance than
traditional cloud data
warehouses.
Source: Performance Benchmark with Barcelona Supercomputing Center
©2023 Databricks Inc. — All rights reserved 16
Better together | Broad integration with BI tools
Connect your preferred BI tools
with optimized connectors that
provide fast performance, low
latency, and high user
conconcurrency to your data lake for
your existing BI tools.
©2023 Databricks Inc. — All rights reserved 17
A new home for data analysts
Enable data analysts to quickly
perform ad-hoc and
exploratory data analysis, with a
new SQL query editor,
visualizations and dashboards.
Automatic alerts can be triggered
for critical changes, allowing to
respond to business needs faster.
©2023 Databricks Inc. — All rights reserved 18
Simple administration and governance
Quickly setup SQL / BI
optimized compute with SQL
warehouses. Databricks
automatically determines
instance types and
configuration for the best
price/performance. Then, easily
manage usage, perform quick
auditing, and troubleshooting
with query history.
©2023 Databricks Inc. — All rights reserved 19
Use Cases
Connect existing BI Collaboratively Build data-
tools to one source explore the enhanced
of truth for all your latest and freshest applications
data data
©2023 Databricks Inc. — All rights reserved 20
Follow Along Demo:
Setting Up a Catalog and
Schema
©2023 Databricks Inc. — All rights reserved 21
Follow Along Demo
Setting Up a Catalog and Schema
• Get your username for the course
• Create a catalog
• Create a schema
©2023 Databricks Inc. — All rights reserved 22
Follow Along Demo:
Data Importing
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 23
Follow Along Demo
Data Importing
• Upload a .csv file
• Use the Catalog Explorer
• Create a table with data from object store
©2023 Databricks Inc. — All rights reserved 24
Follow Along Demo:
A Simple (but Quick)
Query, Visualization, and
Dashboard
©2023 Databricks Inc. — All rights reserved 25
Follow Along Demo
A Simple (but Quick) Query, Visualization, and
Dashboard
• Query data
• Create a visualization
• Create a dashboard
©2023 Databricks Inc. — All rights reserved 26
Knowledge Check
©2023 Databricks Inc. — All rights reserved 27
Knowledge check
Think about this question and volunteer an answer
Which of the following statements about
Databricks SQL is true? Select one response.
A. With Databricks SQL, queries deliver up to 2x
better price/performance than other cloud data
warehouses.
B. Delta Live Tables can be created in Databricks
SQL.
C. Databricks SQL automatically configures scaling
when creating SQL warehouses.
D. Databricks SQL clusters are powered by Photon.
©2023 Databricks Inc. — All rights reserved 28
Knowledge check
Think about this question and volunteer an answer
Which of the following features is used by
Databricks SQL to ensure your data is secure?
Select one response.
A. Built-in data governance
B. Delta sharing
C. Integration with 3rd party tools
D. Automatically scalable cloud infrastructure
©2023 Databricks Inc. — All rights reserved 29
Knowledge check
Think about this question and volunteer an answer
Which of the following features of Databricks is
used for running queries in Databricks SQL?
Select one response.
A. Dashboards
B. Job scheduler
C. SQL Editor
D. SQL warehouses
©2023 Databricks Inc. — All rights reserved 30
Knowledge check
Think about this question and volunteer an answer
What is the primary purpose of Databricks SQL?
A. To provide better price/performance and simplify
discovery for BI tools.
B. To manage administration and governance of data
warehouses.
C. To support a broad set of BI tools, including Tableau
and Power BI.
D. All of the above.
©2023 Databricks Inc. — All rights reserved 31
Unity Catalog in
Databricks SQL
DAWD 02-1
©2023 Databricks Inc. — All rights reserved 32
Learning Objectives
By the end of this lesson, you should be able to:
1 Describe the three-level namespacing system provided by Unity
Catalog.
Describe persistence and scope of catalogs, schemas (databases),
2
tables, and views on Databricks.
Compare and contrast the behavior of managed and unmanaged
3
tables.
©2023 Databricks Inc. — All rights reserved 33
Learning Objectives, cont.
By the end of this lesson, you should be able to:
4 Identify the legacy hive_metastore as appearing as the default
catalog to be compatible with Unity Catalog.
Describe how and where Unity Catalog stores the data behind its
5
catalogs, schema, and granular data objects.
Explain the impact of Unity Catalog on existing external storage
6
locations.
©2023 Databricks Inc. — All rights reserved 34
Unity Catalog
Architecture
Implements access control on data Unity Catalog
Access control is always enabled Identity
User/group
Provider Metastore
management
Works across multiple workspaces
Grants permissions to users at the
account level
Workspace Workspace
Compute Compute
resources resources
©2023 Databricks Inc. — All rights reserved 35
Unity Catalog
Comparison to workspace security model
Workspace security model Unity Catalog security model
Object/privilege/principal access control Object/privilege/principal access
model control model
Open by default Secure by default
Local to workspace Works across multiple workspaces
Grants privileges to workspace-level Grants
36 privileges to account-level
principals principals
©2023 Databricks Inc. — All rights reserved 36
Catalog Explorer UI
Single pane of glass for all of your data
UI driven access
control to simplify
secure data
permissioning
Browse and understand
data assets stored in
your Lakehouse
Data lineage
End-to-end table &
column lineage
©2023 Databricks Inc. — All rights reserved 37
Data Lineage
Mapping the flow of data in the lakehouse
Auto-capture runtime
data lineage across all
languages
Track lineage down to
the table and column
level
Leverage common
permission model from
Unity Catalog
©2023 Databricks Inc. — All rights reserved 38
Object Model
©2023 Databricks Inc. — All rights reserved 39
Metastore
• Stores data assets
• Permissions
• Created with default storage
location (external object store)
• Metastore Admin
©2023 Databricks Inc. — All rights reserved 40
Catalog
• First level of organization
• Users can see all catalogs where
USAGE is granted
©2023 Databricks Inc. — All rights reserved 41
Schema
• aka, Database
• Second level of organization
• Users can see all schemas where
USAGE is granted on both the
schema and the catalog
©2023 Databricks Inc. — All rights reserved 42
Managed Table
• Third level of organization
• Supported format: Delta
• Data is written to a new directory in
the metastore’s default location
• Created using CREATE TABLE
statement with no LOCATION
clause
• Example:
CREATE TABLE table1 …
©2023 Databricks Inc. — All rights reserved 43
External Table
• Third level of organization
• Data stored in a location outside
the managed storage location
• DROP TABLE does not delete data
• Can easily clone a table to a new
schema or table name without
moving data
• Supported formats:
• Delta, csv, json, avro, parquet, orc, text
©2023 Databricks Inc. — All rights reserved 44
Creating External Tables
• Two credential types:
• Storage Credential or External Location
• Use the LOCATION clause
• Example using External Location only
CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...
• Example using Storage Credential
CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...
WITH CREDENTIAL <credential-name>;
©2023 Databricks Inc. — All rights reserved 45
View
• Third level of organization
• Can be composed from tables and
views in multiple schemas or
catalogs
• Created using CREATE VIEW:
CREATE VIEW view1 AS
SELECT column1, column2
FROM table1 ...
©2023 Databricks Inc. — All rights reserved 46
Three-Level Namespace Notation
• Data objects must be specified with three elements, depending on
granularity required: Catalog, Schema, and Table
• Example:
CREATE TABLE main.default.department
(
deptcode INT,
deptname STRING,
location STRING
);
• Or, with a USE statement:
USE main.default;
SELECT * FROM department;
©2023 Databricks Inc. — All rights reserved 47
Lakehouse
Architecture
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 48
Learning Objectives
By the end of this lesson, you should be able to:
1 Describe the benefits of using Databricks SQL for in-platform data
processing.
Describe the medallion architecture as a sequential data
2
organization and pipeline system of progressively cleaner data.
Identify that bronze and silver layers data requires additional
3
processing and cleaning.
4 Describe the data in the gold layer of the medallion architecture.
©2023 Databricks Inc. — All rights reserved 49
Learning Objectives, cont.
By the end of this lesson, you should be able to:
Describe last-mile ETL workflows fully within the gold layer for
5
specific use cases.
Identify the gold layer as the most common layer for data analysts
6
using Databricks SQL.
7 Describe the benefits of working with streaming data.
©2023 Databricks Inc. — All rights reserved 50
The Lakehouse Architecture
● Full ACID Transaction
● Focus on your data
flow, instead of
worrying about failures. Data stream source
● Open Standards, Open
Source
Streaming
● Store petabytes of data Analytics
without worries of Batch source
lock-in. Growing
community including
Presto, Spark and more.
● Powered by Data Lake
AI & Reporting
Unifies Streaming /
CSV,
● JSON, TXT…
Batch. Convert existing
jobs with minimal
modifications.* Data stream source
©2023 Databricks Inc. — All rights reserved 51
*Streaming not available in SQL warehouse.
The Delta Lake Architecture
*Data Quality Levels *
Bronze Silver Gold
Streaming
Analytics
CSV,
JSON, TXT…
Raw Filtered, Cleaned Business-level
Ingestion Augmented Aggregates
Data Lake AI & Reporting
Quality
©2023 Databricks Inc. — All rights reserved 52
The Delta Lake Architecture
*Data Quality Levels *
Bronze Silver Gold
Streaming
Analytics
CSV,
JSON, TXT…
🔥
Raw Filtered, Cleaned Business-level
Ingestion Augmented Aggregates
Data Lake AI & Reporting
Quality
©2023 Databricks Inc. — All rights reserved 53
The Delta Lake Architecture
*Data Quality Levels *
Bronze Silver Gold
Streaming
Analytics
CSV,
JSON, TXT…
Raw Filtered, Cleaned Business-level
Ingestion Augmented Aggregates
Data Lake AI & Reporting
Quality
©2023 Databricks Inc. — All rights reserved 54
The Delta Lake Architecture
*Data Quality Levels *
OVERWRITE
INSERT
Bronze DELETE Silver MERGE Gold
Streaming
Analytics
CSV,
JSON, TXT…
Raw Filtered, Cleaned Business-level
Ingestion UPDATE Augmented Aggregates
Data Lake AI & Reporting
Quality
©2023 Databricks Inc. — All rights reserved 55
Knowledge Check
©2023 Databricks Inc. — All rights reserved 56
Knowledge check
Think about this question and volunteer an answer
Which of the following statements about the lakehouse
medallion architecture is true? Select one response.
A. The data in a single upstream table could be used to generate
multiple downstream tables.
B. The silver layer is for reporting and uses more de-normalized
and read-optimized data models with fewer joins.
C. The gold layer provides a broad view of all key business entities,
concepts and transactions.
D. Only minimal or "just-enough" transformations and data
cleansing rules are applied to each layer in the medallion
architecture.
©2023 Databricks Inc. — All rights reserved 57
Knowledge check
Think about this question and volunteer an answer
Which of the following describes the data quality of the gold
layer of data in the lakehouse medallion architecture? Select
one response.
A. The gold layer brings the data from different sources into an
Enterprise view.
B. The gold layer is comprised of clean aggregated data, ready to
use in production for a specific use case.
C. The table structures in the gold layer correspond to the source
system table structures "as-is”.
D. The focus of the gold layer is quick Change Data Capture and
the ability to provide a historical archive if needed without
rereading the data from the source system.
©2023 Databricks Inc. — All rights reserved 58
Knowledge check
Think about this question and volunteer an answer
What is the primary purpose of the bronze layer in the
"bronze-silver-gold medallion" paradigm in Delta Lake?
A. To store data in a format suitable for individual business
projects or reports.
B. To perform data cleansing, joining, and enrichment on raw data.
C. To provide a "single source of truth" for the enterprise across
various projects.
D. To ingest raw data quickly, keeping it in its original format for
both current and future projects.
©2023 Databricks Inc. — All rights reserved 59
Knowledge check
Think about this question and volunteer an answer
Which of the following statements describes the relationship
between the silver and gold layer of data? Select one response.
A. The gold layer has less clean data than the silver layer.
B. Project-specific business rules are applied from the silver to
gold layer.
C. Self-service analytics are enabled for the gold layer for ad-hoc
reporting in the silver layer.
D. The gold layer is where we land all the data from external
source systems, which are represented by the silver layer.
©2023 Databricks Inc. — All rights reserved 60
Integrations
DAWD 01-6
©2023 Databricks Inc. — All rights reserved 61
Learning Objectives
By the end of this lesson, you should be able to:
1 Identify Databricks SQL as a complementary tool for BI partner tool
workflows.
Identify Databricks SQL as a quick opportunity to create queries,
2
visualizations, and dashboards from within the Lakehouse
Identify Partner Connect as a tool for implementing simple
3
integrations with a number of other data products.
©2023 Databricks Inc. — All rights reserved 62
Databricks Partner Connect
Databricks Partner Connect is a dedicated ecosystem
of integrations that allows users to easily connect with
popular data ingestion, transformation and BI partner
products.
This helps data analysts get useful data into their
lakehouse faster without the need to manually configure
each product so they can get data-driven insights
©2023 Databricks Inc. — All rights reserved 63
Databricks Partner Connect
Helps Data Analysts who:
● Struggle to connect to their choice of BI tools
● Struggle to bring data from SaaS apps (Google Analytics,
Facebook, etc.) to run SQL queries
● Have to wait on eng/ops to generate data for analysis
©2023 Databricks Inc. — All rights reserved 64
Partner Connect Makes it Easy
How do I get the data What tools can I use I heard Fivetran is great!
from SFDC into Delta to ingest data into How do I connect it to
lake? Delta? Databricks?
cluster
■ Many partner integrations take as few as 6 clicks
■ No context or page switches required
DATABRICKS
■ Automatically launches a cluster, calls Partner API to pass on PAT token and the cluster
PARTNER configuration details
CONNECT
■ Sets up all the necessary configs for an optimized user experience
■ Creates trial account in the partner product if an account doesn’t exist
©2023 Databricks Inc. — All rights reserved 65
Databricks Partner Connect
Data BI and Data Prep and Machine
Ingestion Visualization Transformation Learning
©2023 Databricks Inc. — All rights reserved 66
Databricks Partner Connect
Data Data Reverse Semantic
Governance Quality ETL Layer
©2023 Databricks Inc. — All rights reserved 67
Built on an open foundation
Easily integrate with the entire data and AI ecosystem
©2023 Databricks Inc. — All rights reserved
Follow Along Demo:
Integrations
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 69
Follow Along Demo
Integrations
• Connecting to outside data
• Connecting to BI tools
• Partner Connect
©2023 Databricks Inc. — All rights reserved 70
Data
Management in
Databricks SQL
Databricks Academy 2023
©2023 Databricks Inc. — All rights reserved 71
Data Management in Databricks SQL
Lesson Name Duration
Lecture: Databricks SQL Warehouses 15 min
Follow Along Demo: Delta Lake in Databricks SQL 20 min
Lecture: Data Security 20 min
©2023 Databricks Inc. — All rights reserved 72
Databricks SQL
Warehouses
©2023 Databricks Inc. — All rights reserved 73
Learning Objectives
By the end of this lesson, you should be able to:
1 Describe the purpose of Databricks SQL warehouses.
Compare and contrast Classic, Pro, and Serverless Databricks SQL
2
warehouses.
Identify Serverless Databricks SQL warehouses as a quick-starting
3
option.
©2023 Databricks Inc. — All rights reserved 74
Learning Objectives, cont.
By the end of this lesson, you should be able to:
Describe basic Databricks SQL warehouse sizing and scaling
4 guidelines in response to slow-running single-user queries and
multi-user environments.
Describe the impact of Databricks SQL warehouse permissions on
5 query history availability.
©2023 Databricks Inc. — All rights reserved 75
Successful data management platforms rely
on efficient infrastructure
ML
engineers Data scientists ALL THE
USERS
Data Data
engineers analysts
INFRASTRUCTURE
ALL THE
DATA LAKE DATA WAREHOUSE DATA
©2023 Databricks Inc. — All rights reserved 76
Problems with Managing Infrastructure
Users Admins
Clusters
Lost Productivity Admin Effort
Waiting for results while Manually configure
clusters startup / scaleup. versions, cluster sizes,
instance types
Cost
Need to reduce costs
Finance
©2023 Databricks Inc. — All rights reserved 77
Databricks SQL Serverless
Managed Servers
Always-running server fleet
Patched and upgraded automatically
Optimized Capacity
Databricks SQL Idle clusters removed
Serverless
...
10 minutes after last
Compute query
(configurable)
Instant Compute Secure
< 10s second allocation to users 3-layer isolation with data
No waiting for clusters to startup or encryption
scale-up
©2023 Databricks Inc. — All rights reserved 78
Industry leading security architecture
supporting production workloads
1. Container Isolation 2. VM Isolation 3. Network Isolation
● Hardened container ● Workloads separated by ● All nodes egress is
images per industry best VM boundaries blocked except to nodes
practice in same cluster
● Blocked - reusing VMs
● Disable privilege access among customers ● Federated access
in the container through temporary
security tokens
● Ingress traffic from other
customers is blocked
©2023 Databricks Inc. — All rights reserved 79
Databricks SQL Serverless Benefits
Higher user Zero Management Lower Cost
productivity
● User queries start ● No configuration ● Pay what you consume;
instantly, no waiting for ● No performance tuning eliminate idle cluster time
cluster start-up ● No capacity ● No over-provisioning of
management resources
● Add more concurrent
users with instant cluster ● Automatic upgrades and ● Idle capacity removed 10
scaling patching minutes after last query
©2023 Databricks Inc. — All rights reserved 80
Serverless Compute Architecture
Databricks
Databricks Databricks control plane
Account
Account Benefits:
WORKSPACE WORKSPACE WORKSPACE
● Production ready
environment
● Robust security
VPC/VNET
foundation - data
Databricks Serverless compute
isolation and
encryption
Customers
Customer
Account
Account
Customer Storage
©2023 Databricks Inc. — All rights reserved 81
Warehouse Configuration
AWS Azure
©2023 Databricks Inc. — All rights reserved 82
Warehouse Configuration
GCP
©2023 Databricks Inc. — All rights reserved 83
Warehouse Configuration
In the course, SQL Warehouses have the following settings
• Cluster size – 2X-Small
• Scaling – Min: 1, Max 1
• Auto-stop – After ten minutes
©2023 Databricks Inc. — All rights reserved 84
Delta Lake
©2023 Databricks Inc. — All rights reserved 85
Follow Along Demo:
Delta Lake in
Databricks SQL
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 86
Follow Along Demo
Delta Lake in Databricks SQL
• Create a schema
• Create views that withhold data from
unauthorized groups
• Optimize delta tables
©2023 Databricks Inc. — All rights reserved 87
Knowledge Check
©2023 Databricks Inc. — All rights reserved 88
Knowledge check
Think about this question and volunteer an answer
Which of the following statements describes the purpose of
Databricks SQL warehouses? Select one response.
A. SQL warehouses enable data analysts to find and share
dashboards.
B. SQL warehouses are a declarative framework for building data
processing pipelines.
C. SQL warehouses provide data discovery capabilities across
Databricks workspaces.
D. SQL warehouses allow users to run SQL commands on data
objects within Databricks SQL.
©2023 Databricks Inc. — All rights reserved 89
Knowledge check
Think about this question and volunteer an answer
What are the benefits of Delta Lake within the
Lakehouse Architecture?
A. Real-time data processing with low latency
B. Exclusive support for batch processing
C. ACID transactions, metadata scalability, and
storage improvement
D. Data isolation for multiple software development
environments
©2023 Databricks Inc. — All rights reserved 90
Knowledge check
Think about this question and volunteer an answer
Which of the following statements about SQL warehouse
sizing and scaling is true? Select two responses.
A. Increasing maximum scaling allows for multiple users to use
the same warehouse at the same time.
B. Scaling is set to a minimum of 1 and a maximum of 1 by
default.
C. The higher the cluster size, the higher the latency in your
queries.
D. The auto-stop feature will restart the warehouse if it remains
idle during the auto-stop period.
©2023 Databricks Inc. — All rights reserved 91
Knowledge check
Think about this question and volunteer an answer
Which feature of the platform provides users with
the ability to quickly connect to third-party tools
with simple to implement integrations? Select one
response.
A. SQL Editor
B. Partner Connect
C. Workflows
D. Features
©2023 Databricks Inc. — All rights reserved 92
Lab:
Lakehouse SQL
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 93
Lab
Lakehouse SQL
• Time Travel
• COPY INTO
• MERGE INTO
©2023 Databricks Inc. — All rights reserved 94
Data Security
DAWD 03-1
©2023 Databricks Inc. — All rights reserved 95
Learning Objectives
By the end of this lesson, you should be able to:
1 Describe the different levels of data object access available with
Unity Catalog.
Identify that catalogs, schemas, and tables can all have unique
2
owners.
Describe how to organize owned data objects for the purposes of
3
security.
Identify that the creator of a data object becomes the owner of
4
that data object.
©2023 Databricks Inc. — All rights reserved 96
Learning Objectives, cont.
By the end of this lesson, you should be able to:
5 Identify the responsibilities of data object ownership.
Update data object permissions to address user access needs in a
6
variety of common scenarios.
Identify PII data objects as needing additional, organization-specific
7
considerations.
©2023 Databricks Inc. — All rights reserved 97
The Life of a Query (Without Unity Catalog)
Per Workspace
2 nts
ra
SQL ec kG Table ACL
Ch
SELECT 3
1 Lookup Location
*
FROM
Sales2020; Cluster or SQL 4 Hive
6 Warehouse Return path to table Metastore
Cluster filters s3://sales/sales2020
unauthorized data
5
Instance Profile /
Service Principal /
Service Account
©2023 Databricks Inc. — All rights reserved 98
Cloud Storage
Unity Catalog Overview SQL access Audit
controls log
Managed
Data Source
Unity Catalog
(cross-workspace)
Managed
Data Source
Cluster or SQL
Warehouse
User Identity
Passthrough
Defined
Credentials External
Tables
Other
Existing Data
Sources
©2023 Databricks Inc. — All rights reserved 99
Databricks Unity Catalog
Audit
Unity Log Data (files on S3/ADLS/GCS)
Users
Catalog
table1 /dataset/pages/part-001
/dataset/pages/part-002
table2
/dataset/users/uk/part-001
view1 /dataset/users/uk/part-002
view2 /dataset/users/us/part-001
models
SQL Databases
view3
Fine-grained permissions on tables, fields, views: not
files
Industry standard interface: ANSI SQL grants ML Models
Uniform permission model for all data assets
Centrally audited Delta Shares
©2023 Databricks Inc. — All rights reserved 100
Attribute-Based Access Control
True policy driven ABAC that flows from your metadata
Policy pii
Sales 2020 Finance Data Eng Analyst
Visible Redact Mask
profile sku cost pii
userID purchase purhcaseAMT paymentDetails
... ... ... ...
pii
A record of all sales transactions is ingested from several data
sources, and only the finance team should be able to see payment
information. The PII policy flows down from column being tagged as
PII Finance Data Eng Analyst
©2023 Databricks Inc. — All rights reserved 101
Attribute-Based Access Control
CREATE ATTRIBUTE pii
ALTER TABLE iot_events ADD ATTRIBUTE pii ON email
ALTER TABLE users ADD ATTRIBUTE pii ON phone
...
GRANT SELECT ON SCHEMA iot_data
Set permission on all
HAVING ATTRIBUTE NOT IN (pii)
TO product_managers columns tagged pii
together
©2023 Databricks Inc. — All rights reserved 102
Unity Catalog: External Table with Defined
Credentials
CREATE CREDENTIAL iot_role TYPE AWS_ROLE ...
CREATE TABLE iot_data LOCATION s3:/...
WITH CREDENTIAL iot_role
iot_key
Credentials
Unity Catalog External
Tables
©2023 Databricks Inc. — All rights reserved 103
Unity Catalog: External Files with Passthrough
SELECT * FROM csv.`adls:/.../myfolder`
if a direct file path is specified,
we perform passthrough with
the user’s cloud credentials
CREATE VIEW v AS SELECT * FROM csv.`adls:/.../myfolder`
User ADLS
Credentials
External
Tables
©2023 Databricks Inc. — All rights reserved 104
Data
Visualization and
Dashboarding
Databricks Academy 2023
©2023 Databricks Inc. — All rights reserved 105
Data Visualization and Dashboarding
Lesson Name Duration
Follow Along Demo: Data Visualizations 40 min
Lab: Data Visualizations 30 min
Follow Along Demo: Dashboarding Basics 40 min
Lab: Dashboards 30 min
Follow Along Demo: Databricks SQL in Production 40 min
Lab: Databricks SQL in Production 30 min
©2023 Databricks Inc. — All rights reserved 106
Follow Along Demo:
Data Visualizations
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 107
Follow Along Demo
Data Visualizations
• Create common visualizations
• Create map visualizations
©2023 Databricks Inc. — All rights reserved 108
Knowledge Check
©2023 Databricks Inc. — All rights reserved 109
Knowledge check
Think about this question and volunteer an answer
What types of customized visualizations can be
created using Databricks SQL?
A. Pie charts, bar graphs, and line charts
B. Counter and funnels
C. Bar charts, combo charts, and geographical maps
D. All of the above
©2023 Databricks Inc. — All rights reserved 110
Knowledge check
Think about this question and volunteer an answer
How can you enable aggregation in a Databricks
SQL visualization?
A. Modify the underlying SQL query to add an aggregation
column.
B. Select the aggregation type directly in the visualization editor.
C. Use the Aggregation drop-down menu in the Visualization Type
options.
D. Aggregation is not supported in Databricks SQL visualizations.
©2023 Databricks Inc. — All rights reserved 111
Knowledge check
Think about this question and volunteer an answer
How can you add a query visualization to a
Databricks dashboard?
A. Select a query and choose the visualization type.
B. Drag and drop a visualization from the sidebar.
C. Copy and paste the visualization code into the
dashboard.
D. Query visualizations cannot be added to a Databricks
dashboard.
©2023 Databricks Inc. — All rights reserved 112
Knowledge check
Think about this question and volunteer an answer
What is the benefit of setting a refresh schedule for
a Databricks dashboard?
A. To change the color palette of visualizations.
B. To organize and label workspace objects.
C. To keep the data underlying visualizations up-to-date.
D. To create query parameters for customization.
©2023 Databricks Inc. — All rights reserved 113
Lab:
Data Visualizations
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 114
Lab
Data Visualizations
• Create a boxplot
• Create a funnel
©2023 Databricks Inc. — All rights reserved 115
Follow Along Demo:
Dashboarding Basics
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 116
Follow Along Demo
Dashboarding Basics
• Create a dashboard
• Parameterize queries
• Organize Databricks SQL assets
©2023 Databricks Inc. — All rights reserved 117
Knowledge Check
©2023 Databricks Inc. — All rights reserved 118
Knowledge check
Think about this question and volunteer an answer
Which of the following can be added to a query
so that the code can be rerun with different
variable inputs? Select one response.
A. User-defined functions
B. Parameters
C. Vectors
D. SQL warehouses
©2023 Databricks Inc. — All rights reserved 119
Knowledge check
Think about this question and volunteer an answer
A data analyst needs to create a Which of the following
visualization out of the following query: visualization types is best
SELECT order_date suited to depict the results of
this query? Select one
FROM sales
response.
WHERE order_date >= to_date('2020-01-01')
A. Funnel
AND order_date <= to_date('2021-01-01'); B. Stacked bar chart
C. Bar chart
D. Boxplot
©2023 Databricks Inc. — All rights reserved 120
Knowledge check
Think about this question and volunteer an answer
A team of stakeholders needs to be notified of changes
in a dashboard’s statistics on a daily basis. Which of the
following actions can be taken to ensure they always
have the newest information? Select one response.
A. A refresh schedule can be configured and stakeholders
can be subscribed to the dashboard's output.
B. A trigger alert can be created for the dashboard and
stakeholders can be added to the alert notification list.
C. A webhook can be created and shared with
stakeholders.
D. None of the above
©2023 Databricks Inc. — All rights reserved 121
Knowledge check
Think about this question and volunteer an answer
Which of the following data visualizations
displays a single number by default? Select one
response.
A. Bar chart
B. Counter
C. Map - markers
D. Funnel
©2023 Databricks Inc. — All rights reserved 122
Lab:
Dashboards
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 123
Lab
Dashboards
• Create your own dashboard
©2023 Databricks Inc. — All rights reserved 124
Follow Along Demo:
Databricks SQL in
Production
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 125
Follow Along Demo
Databricks SQL in Production
• Automation in Databricks SQL
• Sharing Databricks SQL assets
• Creating gold-level tables
©2023 Databricks Inc. — All rights reserved 126
Knowledge Check
©2023 Databricks Inc. — All rights reserved 127
Knowledge check
Think about this question and volunteer an answer
Which of the following automations are
available in Databricks SQL? Select one
response.
A. Query refresh schedules
B. Dashboard refresh schedules
C. Alerts
D. All of the above
©2023 Databricks Inc. — All rights reserved 128
Knowledge check
Think about this question and volunteer an answer
What is the purpose of Alerts in Databricks
SQL?
A. To automatically execute SQL queries.
B. To organize queries within a folder structure.
C. To trigger notifications based on specific
conditions in scheduled queries.
D. To share dashboards with other team members.
©2023 Databricks Inc. — All rights reserved 129
Knowledge check
Think about this question and volunteer an answer
What is the purpose of configuring a refresh
schedule for a query in Databricks SQL?.
A. To automatically pull new data into a table.
B. To create a new table based on specified
criteria.
C. To manually execute queries on-demand.
D. To edit existing data in the database.
©2023 Databricks Inc. — All rights reserved 130
Knowledge check
Think about this question and volunteer an answer
What level of permissions is the owner of a
query granted on their query? Select one
response.
A. Can View
B. Can Run
C. Can Edit
D. Can Manage
©2023 Databricks Inc. — All rights reserved 131
Lab:
Databricks SQL in
Production
DAWD 01-5
©2023 Databricks Inc. — All rights reserved 132
Lab
Databricks SQL in Production
• Create automations
• Share a dashboard
©2023 Databricks Inc. — All rights reserved 133
Summary and Next
Steps
©2023 Databricks Inc. — All rights reserved 134
Earn a Databricks certification!
Certification helps you gain industry recognition, competitive
differentiation, greater productivity, and results.
• This course helps you prepare for the
Databricks Certified Data Analyst
Associate exam
• Recommended Self-Paced Courses
• Ingesting Data for Databricks SQL
• Integrating BI Tools with Databricks SQL
• Please see the Databricks Academy for
additional prep materials
For more information visit:
databricks.com/learn/certification
©2023 Databricks Inc. — All rights reserved 135
Thank you!
©2023 Databricks Inc. — All rights reserved 136
©2023 Databricks Inc. — All rights reserved 137