0% found this document useful (0 votes)

119 views21 pages

White Paper Modern Data Stack

The document discusses the importance of semantic layers in the modern data stack. It defines what a semantic layer and modern data stack are. It explains how semantic layers were traditionally implemented and the limitations. It argues that a purpose-built universal semantic layer integrated with central data platforms is optimal.

Uploaded by

Sujit Sadagopan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views21 pages

White Paper Modern Data Stack

Uploaded by

Sujit Sadagopan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

W h i t e pa p e r

Why Semantics
Matter in the
Modern Data Stack

Dave is founder and Chief

Technology Officer of AtScale.
Dave is a Big Data visionary and
serial entrepreneur. Dave
founded AtScale after
experiencing the challenges of
scaling enterprise analytics and
AI/ML to cloud scale at Yahoo
and Klout.
Introduction
As we move into 2023, the majority of enterprise data teams have made the shift to cloud data
platforms like Snowflake, Databricks, and Google BigQuery. The shift in data gravity to
centralized cloud platforms with easy access to powerful AI/ML, data transformation, and
query services carries the potential to deliver massive value to modern data-driven
organizations. However, data teams are still figuring out how to efficiently stitch together their
ecosystem of services and utilities and make data assets usable by human decision-makers.

The term the “Modern Data Stack,” describes architectures that address the challenge of
connecting cloud-managed data to the business users who derive value from it. This article
explores the role of the semantic layer in the modern data stack.

When applied correctly, a semantic layer forms a new center of knowledge gravity that
maintains the business context and semantic meaning necessary for users to work with and
create value from enterprise data assets. Further, it becomes a hub for leveraging active and
passive metadata to optimize analytics experiences, improve productivity, and manage cloud
costs.

2
What is the Semantic Layer?
A semantic layer is “a business representation of data and offers a unified and consolidated
view of data across an organization.”

The term was originally coined in the age of on-premise data stores — a time when business
analytics infrastructure was costly and highly limited in functionality. While the semantic layer’s
origins lie in the days of OLAP, the concept is even more relevant today.

By formalizing a semantic layer within a modern, cloud-oriented data stack, organizations can
provide business users with more meaningful analytics experiences. This kicks off a virtuous
cycle of data democratization, domain-oriented analytics innovation, and data-driven value
creation.

What is the Modern Data

Stack?
Although a Google search for “Modern Data Stack” will return many results, there is no real
official definition of the modern data stack.

I like the discussion from the team at Hevo that tracks the evolution of the data stack from the
pre-cloud era dominated by on-premise infrastructure and OLAP-style data management; to
the age of proto-cloud architectures with the launch of Amazon Redshift; and finally into the
modern era dominated by cloud data platforms from the likes of Snowflake, Databricks, and
Google BigQuery.

3
Shift in Data Gravity

1990-2010 2010-2020 2020

Pre Cloud Proto Cloud Modern Cloud

On Premise infrastructur Erly hybrid cloud architecture Scale-out cloud platform
Relational data warehouse Experiments Hadoo Powerful AI/ML toolin
OLAP-style data management First Redshift, then early GBQ, The Modern Data Stack
for BI and analytics Snowflake & Databricks

In my opinion, Matt Bornstein, Jennifer Li, and Martin Casado from Andreessen Horowitz offer
the cleanest view of modern data stacks in “Emerging Architectures for Modern Data
Infrastructure''.This representation carries the bias of the A16Z investment thesis, but it is a
good model to work from. I will refer to this simplified diagram (with example companies
removed) based on their work below:

Ingestion and Analysis

Sources transport Storage QUery and processing Transformation
and output

OLTP Databases
Data Replication Data Warehouse Dashboards
via CDC

ERP Metrics Layer Embedded Analytics

Lakehouse

Operational Apps Data Lake Spark Platform Data Modeling Augmented Analytics

Workflow Manager Delta, Tabular/ Iceberg, Hudi

Event collectors SQL Query Engine Workflow Manager Data Workspace

Parquet, ORC, Avro

S3, GCS, ABS, HDFS

Logs DS/ML Platforms DS/ML Tooling

3rd Party APIs App Frameworks

Event Streaming

File and Object Custom Applications

Storage Real-time Analytics Database

Reverse ETL Stream Processing

Data Discovery Data Governance Data Observability Entitlements & Security

4
This representation tracks the flow of data from left to right. Raw data from various sources
move through ingestion and transport services into core data platforms that manage storage,
query and processing, and transformation prior to being consumed by users in a variety of
analysis and output modalities.

We see the differentiation between unstructured storage (data lakes), structured (data
warehouses), and real-time data stores. In addition to storage, the data platforms offer SQL
query engines and access to AI/ML utilities. A set of shared services cuts across the entire
data processing flow at the bottom of the diagram.

I’ll try to refrain from incorporating my own bias in analyzing this representation and will
instead focus on placing the semantic layer within this view of the modern data stack.

Where Does the Semantic

Layer Fit in the Modern Data
Stack?
Regardless of whether or not an organization has an intentional semantic layer strategy, one
(or multiple) will naturally form any time humans interact with data.

Historically, semantic layers were implemented within analysis tools (i.e. BI platforms) or within
the data warehouse. With the rise of the modern data stack and the importance of data
engineering, we are now seeing semantic layers form within ELT pipelines. All three of these
approaches have limitations that are exacerbated by modern cloud-scale data.

5
Where is the
Semantic Layer?
Data Pipelines
Hard coded into ELT transformations -
can be difficult to govern and keep
consistent across disparate use cases.

Data Warehouse Independent “Universal”

Tend to be rigidly defined and difficult for Semantic Layer
business users to interact with directly. A Purpose-built platform or tightly
Often results in business groups extracting integrated set of services that leverage
data subsets and setting up localized power of central data platform.
semantic layers.

Analytics Tools
Results in siloed semantic layers with
inconsistancies across different use
cases or work groups using different
analytics consumption tools.

Challenges with Analytics-Based Semantic Layers

As the interface between human brains and data, analytics tools (i.e. dashboards, reports, ad
hoc spreadsheets, embedded visualizations, etc.) necessitate a semantic layer. If the semantic
layer is created and maintained within the analytics tool, definitions and context become
isolated from other semantic models used in other tools.

This is a challenge as most modern organizations will use more than one analytics tool.
Dashboards are best delivered in a BI tool like Power BI or Tableau. Financial analysis is best
done in a spreadsheet like Excel. Business process support is best done with analytics
embedded in applications. Data science is best done from Jupyter notebooks.

Semantic sprawl happens when different teams manage specialized semantic layers in each
tool. Just like human language evolves differences when speakers are geographically isolated,
definitions and meanings of key business data concepts evolve differences when teams are
isolated.

6
Challenges with Data Warehouse-Based
Semantic Layers
Data warehouses are designed for architectural integrity with normalized tables that are
difficult for business users to analyze directly. Data needs to be made “business-ready” before
it can be directly analyzed by a business user. Data marts are an attempt to create centralized,
business-oriented views of data.

Centrally controlled definitions that are “hard coded” into table structures become static. It’s
difficult for centralized architecture teams to keep up with domain-specific needs of different
workgroups. Furthermore, user queries against massive cloud-scale tables become slow for
even the most powerful cloud query engines.

This almost always results in users extracting data into analytics platforms for easier
manipulation and faster query performance. That in turn leads to the semantic sprawl of
localized semantic layer formation, as described above.

Challenges with Data Pipeline-Based Semantic

Layers
While I will make the case for defining the semantic layer within the transformation services
described in the A16Z version of the modern data stack, I need to point out the challenges
that can arise. Data engineering has become a critical resource responsible for connecting
analytics use cases to data assets by designing and maintaining data pipelines.

These pipelines will include transforms that create “analytics-ready” forms of data. If there is
no formal semantic layer strategy or proper governance, data engineers will encode semantic
meaning within their pipelines in order to support their analytics customers. This can result in
semantic sprawl and create extreme inefficiency, as data engineers recreate common
business concepts (e.g. month to fiscal quarter mapping) every time they design a new
pipeline.

7
The Universal Semantic Layer
I use the term “universal semantic layer” to describe a thin, logical layer sitting between the
data platform and analysis and output services. It abstracts the complexity of raw data assets
so that users can work with business-oriented metrics and analysis frameworks within their
preferred analytics tools.

The challenge here is how to assemble the minimum viable set of capabilities that gives data
teams sufficient control and governance while delivering end-users more benefit than they
could get by extracting data into localized tools.

8
Implementing a Universal
Semantic Layer using
Transformation Services
The modern semantic layer needs to be implemented by leveraging the services positioned
within the Transformation category of the A16Z data stack — within the Metrics Layer, Data
Modeling, Workflow Management, and Entitlements & Security services. When implemented,
coordinated, and orchestrated properly, these services form a universal semantic layer that
delivers important benefits including

Creating a single source of truth for enterprise metrics and hierarchical dimensions,
accessible from any analytics tool
Providing the agility to easily update or define new metrics, design domain-specific views
of data, and incorporate new raw data assets
Optimize analytics performance while monitoring and optimizing cloud resource
consumption
Enforce governance policies related to access control, definitions, performance, and
resource consumption.

Metrics Layer
Analytics Consumers
Cloud Data Platforms

Data Modeling

Workflow Manager

Entitlements &
Security

Semantic
Layer

The key to success is providing all of these benefits so individual users and work groups can
be free to innovate and deliver value while using a centrally-governed semantic layer. If
attempts to centrally manage a semantic layer inhibit business users, semantic sprawl will
happen with absolute certainty.

9
Let’s step through each transformation service with an eye toward how they must interact to
form an effective semantic layer.

Data Modeling
Data modeling is the creation of business-oriented, logical data concepts that are directly
mapped to the physical data structures in the warehouse or lakehouse. Data modeling
services can be based on no or low-code visual frameworks oriented toward business users or
code-based markup languages oriented towards developers or analytics engineers.
Regardless of their preferred modeling paradigm, data modelers focus on three key activities:

1. Making Data Analytics-Ready: Preparing data for analytics use cases requires de-
normalizing and blending of raw data assets to create views of data appropriate for
analytics interaction. Whether the modeling will result in a physical materialization of a new
data view (i.e. through an ELT process), or a virtualized view (i.e. through an SQL-based
query) may influence the best modeling approach.

2. Definition of Conformed Analysis Dimensions: One of the most important elements to

ensure consistency across analysis use cases is to provide definitions of dimensions like
time, product, and geography. Dimensions are hierarchical: months roll up to quarters roll
up to years; city rolls up to state rolls up to country; product SKU rolls up to product line
rolls up to a business unit. Dimensional definition and consistency is key to semantic layer
efficiency.

3. Metrics Design: The most tangible output of data modeling is the set of metrics that are
published to metrics layers for discovery and use by data consumers. Metrics may be
simple quantitative measures like revenue or cost. They may be calculations like gross
margin (e.g. revenue - cost / revenue). They may be ordinal (e.g. lowest, highest, median).
They may be time relative calculations (period to period change).

Metrics design is typically the most frequently iterated activity within data modeling and
benefits from an agile, end-user oriented option for designing and maintaining metrics
definitions. That said, it is also critical to ensure governance and prevent “metrics sprawl,”
where multiple definitions of an important metric like revenue cause confusion and erode
trust in analytics. Metrics design may sometimes be thought of as a metrics layer activity
vs. a data modeling activity.

10
Metrics design and dimension definition is where business semantics are embedded into the
naming and descriptions of a data model. But beyond naming, data modeling services in the
modern data stack must actually implement the model elements for use within a metrics layer
— not just define and communicate relationships in an entity relationship diagram.   

Within this discussion of data modeling services, it is worth considering the advantages of
taking a composable analytics approach. Ideally, elements of data models can be created and
managed in a way that enables shareability and reuse. For instance, a single, curated product
dimension could be shared across different data models supporting different workgroups. This
approach simplifies new model creation and change management.

Incorporating composability into a semantic layer strategy can be an enabler for supporting
data mesh or hub and spoke analytics management. My AtScale colleague, Elif Tutuk, wrote an
excellent blog series on how a semantic layer can support data mesh. In these analytics
management paradigms, key elements of data models and definitions are centrally managed in
a way that allows decentralized creation of data products by domain-specific teams. This can
be a powerful approach for fostering data product innovation while ensuring consistency and
governance.

I refer to the output of semantic layer data modeling as a semantic model. In this context, a
semantic model is a logical representation of enterprise data with business context embedded
in data views exposed to data consumers. The term semantic model is also sometimes used to
describe knowledge graph representations of enterprise data that draw from research related
to the Semantic Web. While sometimes confusing, these two definitions of semantic model are
related, but distinct.

The Metrics Layer

The metrics layer within the modern data stack becomes the single source of truth for serving
enterprise metrics to any analytics tool. The primary function of this service is delivering a
metrics store that can be accessed from the full range of analytics output modalities (BI tools,
applications, reverse ETL, data science tools, etc.).

It could be argued that metrics design and change management are a metrics layer service
instead of (or in addition to) a data modeling service as I have treated above. But since I am
positioning all these services within a super set of semantic layer services, it doesn’t really
matter.

11
Metrics stores are essentially identical to the feature stores, like Feast, used by data science
teams. In simplistic implementations, they are pre-calculated, shared repositories for key
business metrics (e.g. revenue or ship quantity). In richer implementations, metrics are
dynamically calculated and served on demand.  

The term “Headless BI” is sometimes used to describe a metrics layer service that supports
user queries from a variety of BI tools. This is actually a fundamental capability. As mentioned
earlier, if users are unable to interact with a semantic layer using their analytics tools, they will
end up extracting data directly using SQL and recreating a localized semantic layer.

Effective metrics layers need to support four important services:

1. Curation: Metrics stewards will move between data modeling and the metrics layer to
curate the set of metrics provided to data product creators and business users.

2. Change Management: The metrics layer serves as an abstraction layer that shields
complexity of raw data from data consumers. As a metrics definition changes, existing
reports or dashboards are automatically updated. Metrics lineage may be directly managed
in the metrics layer or integrated with a data catalog service.

3. Discoverability: Data product creators need to easily find and implement the proper
metrics for their purpose. This becomes more important as the list of curated metrics
grows to include a broader set of calculated or time relative metrics.

Metrics layer stewards need to invest time in creating definitional metadata used to
support discoverability. An interesting area of research is in AI-assisted discoverability
using both passive metadata (e.g. descriptions of metrics) and active metadata (e.g. how
often is a given metric is used).

4. Serving: Metrics layers are queried directly from analytics and output tools. As end users
request a metrics cut from a dashboard, the metrics layer needs to serve the request fast
enough to support positive analytics user experience.

As described earlier, poor performance will result in data extracts and semantic sprawl.
Performance management strategies that support metrics serving are discussed in the
next section, but may directly relate to metrics layer implementation as well. Again, this is
an argument for semantic layer thinking rather than a set of independent transformation
services.

12
Workflow Management
To Materialize or Virtualize: That is the question. As noted in the data modeling discussion,
transformation of raw data into an analytics-ready state can be based on physically
materialized transforms, virtual views based on SQL, or some combination. Workflow
management is the orchestration and automation of the physical and virtual transforms that
support semantic layer function. Decisions on what to materialize and what to virtualize should
be based on a cost-performance optimization.

Performance: Analytics consumers have a very low tolerance for query latency. A universal
semantic layer cannot introduce a query performance penalty otherwise clever end users or
work groups will again go down the data-extract-semantic-sprawl route. The raw size of
modern cloud scale data means even the most powerful query engines are not able to
consistently deliver “speed of thought” analytics without some level of physical materialization
of aggregates.

Legacy OLAP approaches take this to the extreme by materializing specialized cube data
structures that deliver high performance but do not scale beyond a few terabytes of raw data.
Effective performance management workflows automate and orchestrate materialization as
well as decide what and when to materialize. This functionality needs to be dynamic and
adaptive based on user query behavior, query runtimes, and other active metadata.

Cost: While there are labor cost considerations related to analytics pipeline management to
take into account, the primary cost tradeoff for performance is related to cloud resource
consumption. Physical transformations executed in the data platform (i.e. ELT transforms)
consume compute cycles and cost money. Query volume (i.e. queried terabytes per month)
consumes compute cycles and costs money. The emerging enterprise discipline of FinOps is
focused on managing cloud costs. Implementing a FinOps program for data and analytics
requires data that is best collected from the semantic layer.

Workflow management within the semantic layer supports FinOps goals while ensuring proper
performance to support user experience. This becomes an interesting optimization problem
that needs to be managed for each data product and use case. The emerging discussion of
data contracts focuses on scaling the automated interaction between data products and
infrastructure services that automate the definition of this optimization problem.

13
Entitlements and Security
Entitlements and security relate to the active application of data governance policies to
analytics. Beyond cataloging data governance policies, the modern data stack must enforce
policies at query time as metrics are accessed by different users and as models are created
and modified.

Management and enforcement need to be tightly integrated within the semantic layer service,
so that policies can be actively applied at query time as unique users make requests from
different data products. Many different types of entitlements may be managed and enforced
alongside (or embedded in) a semantic layer.

Access Control: Proper access control services ensure all users can get access to all of the
data they are entitled to see. Lack of an effective integrated service will result in access
loopholes or overly conservative governance policies (e.g. no updates to certain reports
during revenue reporting periods).

Model and Metrics Consistency: Maintaining semantic layer integrity requires some level of
centralized governance of how metrics are defined, shared, and used. Conflicts arise when
users or work groups share different definitions of a key metric (e.g. revenue) without proper
documentation. Inconsistent results rapidly erode trust in centralized analytics resources
which leads to semantic layer breakdown.

Performance and Resource Consumption: As discussed above, there are constant tradeoffs
being made on performance and resource consumption. User entitlements and use case
priority may also factor into the optimization that needs to happen when allocating resources
to a given user query. For instance, a real-time interactive dashboard supporting executives
may be entitled to more resources than a monthly batch report update.

While all entitlements and security services and their integration with enterprise access
control utilities reach beyond the scope of an analytics semantic layer, real-time enforcement
of governance policies is critical for maintaining semantic layer integrity.

14
Integrating the Semantic Layer
within the Modern Data Stack
Layers in the modern data stack must seamlessly integrate with other surrounding layers. The
semantic layer requires deep integration with its data fabric neighbors, including the data
platform, analysis and output, and the metadata and services layers.

Data Platform Integration

Implementation of a universal semantic layer on a cloud data platform will ideally not persist
data outside of the central data warehouse or lakehouse. Hybrid or multi-cloud environments
that require cross-platform query federations likely benefit from a data virtualization solution
(i.e. an intermediate layer) on which a semantic layer could be established (this is not really
contemplated in the A16Z diagram).  

A functioning semantic layer that orchestrates data platforms relies on workflow management
services tightly integrated with cloud data platforms to function. It’s important to consider how
a general service can support the wide variety of data platform architectures, including data
warehouses (cloud and on-premise), data lakehouse platforms, and data virtualization
platforms. A coordinated set of semantic layer services needs to integrate with the data
platform in a few important ways:

Query Engine Orchestration: The semantic layer dynamically translates incoming queries
from query consumers (which refer to metrics layer logical constructs) to platform-specific
SQL (auto-generated to reflect the logical to physical mapping defined in the semantic model).
The query needs to be optimized for each data platform’s idiosyncrasies, including
understanding nested data structures and partitioning schemes.

Transformation Orchestration: As discussed previously, managing performance and cost

requires the capability to materialize certain views into physical tables. This means the
semantic layer must be able to orchestrate these transformations in the data platform.   

Write-back Orchestration: There may be use cases where user or AI/ML interaction with the
semantic layer may create new data (or metadata) in the form of features or predicted metrics
that is best managed within the data platform. This requires the capability for semantic layers
to orchestrate data write-backs to data platforms.

15
User Defined Functions (UDF): Modern cloud data platforms offer libraries of functions that
can be leveraged by analysis and output utilities. The semantic layer may also leverage these
functions.

Data Platform Abstraction: A universal semantic layer needs to be a direct abstraction of

physical data — knowledge gravity should be managed independently of data gravity. In fact,
a semantic layer could facilitate the complete migration of underlying data platforms while
maintaining analytics user experience and protecting the investment in existing data products.

Analysis and Output

In our discussion of the metrics layer, we used the term “headless BI” to describe the
capability to support a broad set of analysis and output tools.

A semantic layer’s primary role is to provide a business-friendly interface to data consumers

by exposing a consistent set of metrics that model the organizations’ business processes. The
analysis and output layer is the eastbound integration point for the semantic layer, where
business value is created.

Therefore, a semantic layer must be capable of the following query consumer integrations:

Inbound Query Protocol Support: A semantic layer must support multiple inbound query
protocols, including (but not exclusively) SQL, MDX, DAX, Python, and RESTful interfaces using
standard protocols such as ODBC, JDBC, HTTP(s), and XMLA.

Live Query Connection: A semantic layer must support live connections to data and avoid
data extracts or external caching layers while providing “speed of thought” query
performance.

Persona Support: A semantic layer must address the needs of multiple end-user personas
(like business analysts, data scientists, or application developers). Failure to support the
needs of a persona risks creation of localized semantic layers. For instance, if it is difficult to
create time relative metrics in the metrics layer to support time series analysis, data scientists
must extract data and create a localized semantic layer.

16
Metadata and Support Services
While a semantic layer is indeed metadata rich, it’s not the exclusive repository of business
metadata. Many tools in a data fabric ecosystem consume and generate metadata, so it’s
critical that a semantic layer supports the following integrations with the metadata and
support services layer(s)

A semantic layer must be capable of sharing its metadata and lineage with enterprise data
cataloging tools to support the search and discovery of metrics and data models by data
catalog consumers

A semantic layer must be capable of importing metadata from other tools to automate the
creation of semantic data models while driving consistency and conformance with
enterprise standards

A semantic layer must expose monitoring endpoints to help manage users' access,
uptime, and system performance.

17
Beyond Descriptive Analytics
Leading data organizations emphasize the role of augmented analytics that go beyond classic
descriptive analytics to include diagnostic, predictive, and prescriptive analytics powered by
artificial intelligence.

A key driver for embracing the modern data stack is access to Data Science/Machine Learning
(DS/ML) platforms included in the Query and Processing service category. It is far more
efficient to leverage powerful AI algorithms if you do not need to move or re-model large
volumes of data.

There are a few important considerations for how a semantic layer strategy can support
accessibility and value creation from the full spectrum of augmented analytics

The Metrics Store is a Feature Store: This point was noted in our discussion of metrics
layer services. Data scientists should leverage the business vetted features defined and
actively curated in metrics layers.

Natural Language Query: “Alexa, what was our sales revenue in Massachusetts last
quarter?” will only return the right results if Alexa has a clear understanding of the data’s
semantic constructs, right revenue metric, the right geographical dimension, and the right
time dimension

Publishing Model-Generated Insights: Production AI/ML models generate new data points
(e.g. predictions, features) that need to be exposed to users in order to create value. A
semantic layer can leverage existing analytics and output infrastructure to more easily
disseminate augmented analytics

Explainable AI / Trusted AI: The semantic layer can be leveraged to organize and
disseminate information related to why an AI model is providing a particular answer. For
instance, business users can gain value from knowing not only the prediction for sales next
quarter, but also the key drivers for this prediction. Delivering better insight on the
reasoning behind AI/ML model suggestions directly supports explainability and enhances
the level of trust in model-generated insights.

18
The Semantic Layer as the
Center of Data Knowledge
Gravity
While the center of data gravity has clearly shifted to cloud data platforms, business
knowledge about the significance of data is still sprinkled across data fabrics in the form of
metadata. There is a vibrant market for metadata management and support services for
cataloging data assets, monitoring data quality, and tracking data usage.

The semantic layer has an advantageous position of seeing a large portion of active and
passive metadata created for analytics use cases. This creates an opportunity for forward-
thinking organizations to better manage knowledge gravity while using this rich set of
metadata to improve analytics experiences and drive incremental value.

We have already discussed a few examples of how the semantic layer becomes the center of
mass for knowledge gravity. Business context, definitions, and documentation for appropriate
use of metrics and analysis dimensions get encoded within the views of analytics exposed to
business users. FinOps efforts can use data on query patterns and efficiency to better
manage cloud resources and set policy.

One exciting area of research is in active metadata on query patterns, which can suggest the
types of questions the business asks and identify analytics best practices (e.g. what KPIs are
accessed most often by the most successful sales leaders).

19
Managing Business Augmenting Analytics
Context with Passive Experience

Metadata with Active Metadata

How is the data What data is being used and
structured & what are what are the most popular
the key data combinations
relationships How can we augment
Where does the data historical data with
come from predictive & prescriptive
What is the shape and data
size of the underlying Which queries have run, how
raw data? have they performed & how
can we make them faster

Is the Semantic Layer a Set of

Coordinated Services or an
Integrated Platform?
The A16Z framework implies that organizations could assemble a collection of home-grown or
single-purpose vendor offerings to simulate the integration of the semantic layer. While
certainly possible, success will be determined by how well integrated individual services are. If
a single service or integration does not deliver on user needs, localized semantic layers will
emerge.

A growing set of vendors (including AtScale) are offering semantic layer platforms that
package two or more of these services for supported integrations. Building your own services
or managing your own integrations carries labor and overhead costs. It’s also more prone to
catastrophic disconnects.

The big three cloud service providers (Google, AWS, and Azure) offer a huge array of services
but do not currently offer an integrated set of semantic layer services. While this will likely
change, cloud lock-in concerns will continue to argue for some level of vendor abstraction
from analytics and output tools.

20
It will be interesting to see how the modern data stack evolves. Will transformation services

(comprising the semantic layer) be provided by major cloud service providers, or can open

standards and design frameworks deliver a best-of-breed world?

Regardless of the answer, it’s clear that the semantic layer matters in the modern data stack —

now more than ever.

Dave is founder and Chief Technology Officer of AtScale. Dave is a Big Data

visionary and serial entrepreneur. Dave founded AtScale after experiencing

the challenges of scaling enterprise analytics and AI/ML to cloud scale at

Yahoo and Klout.

TDWI Checklist Report SLMDA AtScale Russom Web
No ratings yet
TDWI Checklist Report SLMDA AtScale Russom Web
10 pages
Ebook The Practical Guide To Using A Semantic Layer
No ratings yet
Ebook The Practical Guide To Using A Semantic Layer
30 pages
Modern Data Stack
No ratings yet
Modern Data Stack
23 pages
Demystifying Semantic Layers For Self-Service Analytics
No ratings yet
Demystifying Semantic Layers For Self-Service Analytics
33 pages
Architecting A Data Lake
100% (9)
Architecting A Data Lake
60 pages
Semantic Layer
No ratings yet
Semantic Layer
33 pages
Gartner - Capa Semantica AutoServicio BI
No ratings yet
Gartner - Capa Semantica AutoServicio BI
33 pages
Modern Data Architecture Guide
88% (8)
Modern Data Architecture Guide
23 pages
Data Science Market Growth 2024
No ratings yet
Data Science Market Growth 2024
27 pages
Creating A Modern Analytics Architecture
No ratings yet
Creating A Modern Analytics Architecture
18 pages
Big Data Architectures and The Data Lake: James Serra
No ratings yet
Big Data Architectures and The Data Lake: James Serra
53 pages
Demystifying Semantic Layers For Self-Service Analytics
100% (2)
Demystifying Semantic Layers For Self-Service Analytics
48 pages
TD GEStion Des Projets - PPPTX
No ratings yet
TD GEStion Des Projets - PPPTX
23 pages
124830-Rebrand Castor Book-Dark Cover - Superside
No ratings yet
124830-Rebrand Castor Book-Dark Cover - Superside
16 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
Modern Data Warehousing on AWS
No ratings yet
Modern Data Warehousing on AWS
26 pages
Azure Data Platform End2End - 1day
No ratings yet
Azure Data Platform End2End - 1day
90 pages
Ebook The Evolution of The Data Warehouse
No ratings yet
Ebook The Evolution of The Data Warehouse
40 pages
Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study
No ratings yet
Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study
6 pages
Fabric
100% (1)
Fabric
46 pages
Azure Data Platform End2End - 2day
100% (2)
Azure Data Platform End2End - 2day
108 pages
AWS - Data Warehouse Modernization Ebook
No ratings yet
AWS - Data Warehouse Modernization Ebook
16 pages
Data Warehousing
No ratings yet
Data Warehousing
6 pages
Modern Data Architecture Guide
No ratings yet
Modern Data Architecture Guide
18 pages
MIE1628 Big Data Analytics Lecture10
No ratings yet
MIE1628 Big Data Analytics Lecture10
41 pages
2wejuVA8RVy7LbK6nEbU The Starter Guide For The Modern Data Stack
No ratings yet
2wejuVA8RVy7LbK6nEbU The Starter Guide For The Modern Data Stack
6 pages
Designing A Modern Data Warehouse + Data Lake
100% (1)
Designing A Modern Data Warehouse + Data Lake
72 pages
Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani PDF Download
No ratings yet
Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani PDF Download
46 pages
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
No ratings yet
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
35 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Designing A Modern Data Warehouse + Data Lake
No ratings yet
Designing A Modern Data Warehouse + Data Lake
73 pages
Cloud Data Lakes For Dummies Snowflake Special Edition V1 2
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 2
10 pages
BlueGranite Data Lake Ebook
100% (1)
BlueGranite Data Lake Ebook
23 pages
SQL For Data Analysis - A Pro-Level Guide To SQL and Its - Louis Johanson - 2024 - Independently Published - Anna's Archive
No ratings yet
SQL For Data Analysis - A Pro-Level Guide To SQL and Its - Louis Johanson - 2024 - Independently Published - Anna's Archive
234 pages
Data Tools for Modern Businesses
No ratings yet
Data Tools for Modern Businesses
13 pages
ELT Vs ETL
No ratings yet
ELT Vs ETL
13 pages
How To Move Beyond A Monolithic Data Lake To A Distributed Data Mesh
0% (1)
How To Move Beyond A Monolithic Data Lake To A Distributed Data Mesh
29 pages
The Complete Buyers Guide To A Semantic Layer
No ratings yet
The Complete Buyers Guide To A Semantic Layer
17 pages
2018 05 24 Kathryn Varralls Modern Data Warehouse Presentation
No ratings yet
2018 05 24 Kathryn Varralls Modern Data Warehouse Presentation
29 pages
DA - Presentation - 20250421 - 182554 - 0000
No ratings yet
DA - Presentation - 20250421 - 182554 - 0000
19 pages
Data Stack Essentials for Analysts
No ratings yet
Data Stack Essentials for Analysts
2 pages
ELT Architecture in The Azure Cloud
No ratings yet
ELT Architecture in The Azure Cloud
8 pages
Modern Data Warehouse White Paper PDF
100% (1)
Modern Data Warehouse White Paper PDF
26 pages
9sight Consulting The Data Warehouse Lives On
No ratings yet
9sight Consulting The Data Warehouse Lives On
12 pages
Azure Synapse Analytics Overview
No ratings yet
Azure Synapse Analytics Overview
12 pages
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
10 pages
Analytics and Processing: Yuanyuan Zhu Email: Yyzhu@whu - Edu.cn
No ratings yet
Analytics and Processing: Yuanyuan Zhu Email: Yyzhu@whu - Edu.cn
47 pages
Whitepaper: Modern Integrated Data Environment - Qubole
No ratings yet
Whitepaper: Modern Integrated Data Environment - Qubole
11 pages
Data - Mining - Warehousing Unit 1
No ratings yet
Data - Mining - Warehousing Unit 1
35 pages
Hadoop Solutions for DBAs
No ratings yet
Hadoop Solutions for DBAs
44 pages
Fundamentals of Data Engineering by Joe Reis and Matt Housley 88
No ratings yet
Fundamentals of Data Engineering by Joe Reis and Matt Housley 88
6 pages
The Trill Incremental Analytics Engine (MSR-TR-2014-54)
No ratings yet
The Trill Incremental Analytics Engine (MSR-TR-2014-54)
14 pages
Ground: Open-Source Data Context Service
No ratings yet
Ground: Open-Source Data Context Service
12 pages
Rethinking The Future of Data Warehousing PDF
No ratings yet
Rethinking The Future of Data Warehousing PDF
7 pages
Data Dictionary
No ratings yet
Data Dictionary
24 pages
Indicative Program of Activities For IPA's Data Gathering
No ratings yet
Indicative Program of Activities For IPA's Data Gathering
3 pages
A Project Report On - Food Booking System
No ratings yet
A Project Report On - Food Booking System
20 pages
Window Functions and Syntax (Slides)
No ratings yet
Window Functions and Syntax (Slides)
14 pages
ABAP Chapter 3: Open SQL Internal Table
No ratings yet
ABAP Chapter 3: Open SQL Internal Table
101 pages
Task Flows
No ratings yet
Task Flows
5 pages
Powervault NX Series Network Attached Storage Systems: Windows Storage Server 2016 Administrator'S Guide
No ratings yet
Powervault NX Series Network Attached Storage Systems: Windows Storage Server 2016 Administrator'S Guide
39 pages
Naming Convention
No ratings yet
Naming Convention
17 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
BCT 4
No ratings yet
BCT 4
16 pages
Data Backup & Disaster Recovery Guide
No ratings yet
Data Backup & Disaster Recovery Guide
40 pages
Admin Guide: Oracle FDMEE Integration
No ratings yet
Admin Guide: Oracle FDMEE Integration
20 pages
Es 243 Biology For Engineers Assignment-2: Question-1
No ratings yet
Es 243 Biology For Engineers Assignment-2: Question-1
23 pages
Technical Overview 07 - Rating
No ratings yet
Technical Overview 07 - Rating
16 pages
Be Sharp With C# (Chapter 14, Database Access)
100% (2)
Be Sharp With C# (Chapter 14, Database Access)
35 pages
Course 5
No ratings yet
Course 5
30 pages
Data Distiller Guide - Saurabh Mahapatra
No ratings yet
Data Distiller Guide - Saurabh Mahapatra
672 pages
NetBackup & Media Manager Daemons Guide
No ratings yet
NetBackup & Media Manager Daemons Guide
2 pages
Nitish 2301037 Medhara Sahana
No ratings yet
Nitish 2301037 Medhara Sahana
2 pages
It Is A Cloud Based Analytical Reporting Solution From MSFT 2. Introduction About Business Intelligence
No ratings yet
It Is A Cloud Based Analytical Reporting Solution From MSFT 2. Introduction About Business Intelligence
192 pages
Java Obt FNL
No ratings yet
Java Obt FNL
20 pages
Deadlocks and Livelocks: Presented By: Guided by
No ratings yet
Deadlocks and Livelocks: Presented By: Guided by
9 pages
Practical Assignment - Xii CS 2023-24
No ratings yet
Practical Assignment - Xii CS 2023-24
4 pages
DOI Marketing Brochure
No ratings yet
DOI Marketing Brochure
16 pages
Welcome To The SQL Server Community
No ratings yet
Welcome To The SQL Server Community
6 pages
How To Use INDEX and MATCH - Exceljet PDF
No ratings yet
How To Use INDEX and MATCH - Exceljet PDF
20 pages
Database Exam Prep Questions
No ratings yet
Database Exam Prep Questions
2 pages
Replication of Views With SAP LT Replication Server: Projection View
No ratings yet
Replication of Views With SAP LT Replication Server: Projection View
10 pages
Storage in AWS
No ratings yet
Storage in AWS
55 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Service Methods For Business Objects in VBCS
No ratings yet
Service Methods For Business Objects in VBCS
5 pages