Transform your data estate
with Big Data, Advanced Analytics
anid@microsoft.com
The world is changing
Today, 80% of
AI investment
Data will grow to organizations
increased by
44 ZB in 2020 adopt cloud -first
300% in 2017
strategies
Today, 80% of
AI investment
Data will grow to organizations
D ATA
44 ZB in 2020 CLOUD
adopt cloud -first AI
increased by
300% in 2017
strategies
D ATA AI
Organizations that harness data,
cloud, and AI outperform
CLOUD
Companies surveyed include well-known
enterprises across key industries
Financial services Retail
Manufacturing Consumer goods
Source: Keystone Strategy interviews Oct 2015 - Mar 2016
Organizations that harness data,
cloud, and AI outperform
Nearly double $100M in additional
operating margin operating income
They rely on a modern data estate
Rising citizen demands are driving
transformation in government services
25% 15% 10%
of government digital
of government of government transactions
services will use bots to
organizations will (such as tax collection,
personalize workflows and/or
roll out cognitive welfare disbursement and
channels through which
immigration control) will have xx
solutions by 2020 citizens and businesses
embedded analytics by 2019
access services by 2018
Modern government challenges
Changes in services, Large, expanding
policy, and citizen volume of data
requests require more increasing complexity
responsive agencies and storage costs
Data security,
Unstructured data privacy and regulatory
limits ability to analyze requirements are
and take action of paramount
importance
Unique government opportunities
IT infrastructure Social network Logistics Web app
optimization Public Safety analysis optimization optimization
Health Fraud Analytics Data Weather Healthcare
analysis detection Consolidation forecasting outcomes
Scientific Equipment Smart Sensor
research monitoring monitoring
constantly expanding
THE MODERN
DATA E S TAT E
HYBRID
On-premises Cloud
Private cloud
Operational databases Operational databases
Data warehouses Data warehouses
Data lakes Data lakes
Reason over any data, anywhere Flexibility of choice Security and Performance
The Azure BIG Data Landscape
AZURE
AZURE AZURE IMPORT AZURE SQL DB AZURE COSMOS DB AZURE SQL DATA WAREHOUSE POWER BI
ANALYSIS SERVICES
DATA FACTORY EXPORT SERVICE
AZURE CLI AZURE SDK
AZURE DATA LAKE AZURE AZURE AZURE ML ML SERVER AZURE
AZURE STORAGE AZURE DATA LAKE ANALYTICS HDINSIGHT DATABRICKS
DATABRICKS
BLOBS STORE
AZURE IOT HUB AZURE EVENT HUBS
AZURE AZURE AZURE
AZURE SEARCH BOT SERVICE COGNITIVE SERVICES
AZURE STREAM ANALYTICS HDINSIGHT DATABRICKS
KAFKA ON DATA CATALOG
AZURE HDINSIGHT
AZURE EXPRESSROUTE AZURE AZURE NETWORK AZURE KEY OPERATIONS AZURE FUNCTIONS
VISUAL STUDIO
ACTIVE DIRECTORY SECURITY GROUPS MANAGEMENT SERVICE MANAGEMENT SUITE
SQL Server 2019
Azure Data Lake
Azure Data Bricks
Industry-leading performance and security, with intelligence over all your data
Intelligence over Choice of platform Industry-leading Most secure Insights in minutes
any data and language performance over the last 8 years5 and rich reports
200
180
160
Vulnerabilities (2010-2017)
140
120
100
80
60
#1 OLTP performance1 40
20
#1 DW performance on 0
AI and Machine Learning T-SQL PHP Python The best of Power BI and
1TB2, 10TB3, and 30TB4
over all data with the power Java Node.js Ruby SQL Server Reporting Services
of SQL and Apache Spark C/C++ C#/VB.NET Intelligent Query Processing with Power BI Report Server
In-memory across all workloads Most consistent data platform
Private cloud 1/10th the cost of Oracle Public cloud
All TPC Claims as of 1/19/2018.
1 http://www.tpc.org/4081; 2 http://www.tpc.org/3331; 3 http://www.tpc.org/3326; 4 http://www.tpc.org/3321; 5 National Institute of Standards and Technology Comprehensive Vulnerability Database
Data virtualization Managed SQL Server, Spark, Complete AI platform
and data lake
Admin portal and management services
Analytics Apps
T-SQL Integrated AD-based security
REST API containers
for models
SQL Server External Tables SQL
Server Spark
SQL Server Spark &
ML Services Spark ML
Compute pools and data pools
Scalable, shared storage (HDFS)
Open NoSQL Relational HDFS External HDFS
database databases data sources
connectivity
Combine data from many sources without Store high volume data in a data lake and access Easily feed integrated data from many sources to
moving or replicating it it easily using either SQL or Spark your model training
Scale out compute and caching to boost Management services, admin portal, and Ingest and prep data and then train, store, and
performance integrated security make it all easy to manage operationalize your models all in one system
Custom
apps BI Analytics
SQL Server
SQL
master instance
Compute pool Compute pool Compute pool Directly
read from
…
SQL Compute SQL Compute SQL Compute SQL Compute SQL Compute
Node Node Node Node Node HDFS
Data mart Storage pool
SQL Data SQL Data
Node Node Spark
SQL
Spark
SQL
… Spark
SQL
Server Server Server
HDFS Data Node HDFS Data Node HDFS Data Node
Storage Storage
Kubernetes pod
IoT data
Node Node Node Node Node Node Node
Persistent storage
Performance
Intelligent Query Processing
Accelerating I/O performance with Persistent Memory
Gain performance insights anytime and anywhere with Lightweight Query Profiling
Security
Always Encrypted with secure enclaves
Data Classification and auditing built-in
Manage certificates easier with SQL Configuration Manager
Availability
Always On availability group enhancements
Resumable online index creation
Online Clustered Columnstore index creation and rebuild
Availability groups on Kubernetes
SQL Server 2019
Azure Data Lake
Azure Data Bricks
VALUE
How can we
make it happen?
Prescriptive
What will Analytics
happen?
Theory
Predictive
Theory Analytics
Why did Hypothesis
Hypothesis it happen?
Diagnostic Pattern
Observation What
Analytics
happened?
Observation
Descriptive
Confirmation
Analytics
DIFFICULTY
Understand Gather Implement Data Warehouse
Corporate Requirements
Strategy Reporting & BI and analytic
Reporting &
Analytics Design Analytics
Business Development
Requirements
Data warehouse
Dimension Modelling Physical Design
ETL
ETL
ETL Design
Technical Development
Requirements
Data sources
Setup Infrastructure Install and Tune
Data Lake Uses A Bottom-Up Approach
Ingest all data Store all data Do analysis
regardless of requirements in native format without Using analytic engines
schema definition like Hadoop
Batch queries
DEVICES
Interactive queries
Real-time analytics
r
LOGS, FILES AND MEDIA Machine Learning
(UNSTRUCTURED)
Data warehouse
BUSINESS / CUSTOM
APPS
(STRUCTURED)
WASB WASB ADLS Azure Data Lake Storage Gen2
Blob Storage + Blob Storage + Azure Data Lake
Scalable, secure storage that
(WASB) (WASB) Store (ADLS) speeds time to insight
Scale and Scale and Speed to Scale and Speed to
Availability Availability Insight Availability Insight
Cost Cost Rich Cost Rich
Effectiveness Effectiveness Security Effectiveness Security
Azure Data Lake Storage Gen2: Single Data Lake Store that combines the performance and
innovation of ADLS with the scale and rich feature set of Blob Storage
SQL Server 2019
Azure Data Lake
Azure Data Bricks
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure ser vices (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Director y integration, compliance, enterprise -grade SL As)
Azure Databricks
Azure Databricks
Collaborative Workspace
IoT / streaming data Machine learning models
DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST
Deploy Production Jobs & Workflows
BI tools
Cloud storage
MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS
Data warehouses
Optimized Databricks Runtime Engine Data exports
Hadoop storage
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Data warehouses
Enhance Productivity Build on secure & trusted cloud Scale without limits
Collaborative Workspace
Azure Databricks
Collaborative Workspace
DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS
Optimized Databricks Runtime Engine
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Deploy Production Jobs & Workflows
Azure Databricks
Collaborative Workspace
DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS
Optimized Databricks Runtime Engine
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
Optimized Databricks Runtime Engine
Azure Databricks
Collaborative Workspace
DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS
Optimized Databricks Runtime Engine
DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs
D ATA M A N A G E M E N T S O L U T I O N
FOR INTELLIGENCE IN THE CLOUD
Ingest Store Prep & Train Model & Serve Intelligence
Business
apps Data Factory
(Data movement, pipelines & orchestration)
Cosmos DB
Predictive apps
Kafka Blobs HDInsight /
10
01
SQL
SQL Database
Custom Data Lake Spark
apps
SQL Data
Operational reports
Warehouse
Event Hubs Machine
IoT Hub Learning
Sensors Analysis Services
and devices
Analytical dashboards
DATA INTELLIGENCE ACTION
THE MODERN
DATA E S TAT E
HYBRID
On-premises Cloud
Private cloud
Operational databases Operational databases
Data warehouses Data warehouses
Data lakes Data lakes
Reason over any data, anywhere Flexibility of choice Security and Performance
Empower today’s innovators to unleash the power of data
and reimagine possibilities that will improve our world