How to Build Your Data
Platform from Scratch
Quantify the impact of your financial investments
data and learn how leading data teams are driving
immediate value from their data platforms.
TABLE OF CONTENTS
3. Introduction: Meeting the demand for data
4. Defining the modern data platform
4. How to measure the ROI of your data
platform
5. The ROI worksheet
6. ROI in action at Drata
7. ROI in action at Prefect
8. ROI in action at Dr. Squatch
9. Conclusion: Scaling your data stack
INTRODUCTION
These days, every company wants to be data-driven.
Countless CEOs are demanding more — and more
reliable — data, dashboards, and reports.
Data leaders know meeting those demands requires
assembling a talented data team to pipe fresh analytics
to downstream stakeholders, field ad-hoc queries from
the executive team, and troubleshoot the data incident
du jour. And all that hard work doesn’t happen without a
modern data stack: the technologies powering decision-
making, digital services, and company growth.
Investing in these tools is a no-brainer — at least for
data leaders. But how do you justify the cost to your
CTO or CFO?
We’ll walk you through how to get the buy-in you need
to, well, buy what you need. In this short guide, you’ll
learn:
The must-have components of your data stack
How to calculate your data’s ROI
Proven strategies from real-life teams
Ready to measure the value of your data platform? Let’s
start at the beginning: how to build one.
DEFINING THE MODERN DATA PLATFORM
There’s no blueprint for the perfect data stack. The right
technology will look vastly different for a 5,000-person
healthcare enterprise than a 75-person SaaS startup.
But the modern data stack has a few inherent
characteristics:
Cloud-based
Modular and customizable
Best-of-breed first (choosing the best tool for a
specific job, versus an all-in-one solution)
Metadata-driven
Runs on SQL (for now)
And a few core layers make up the must-have
components that every company needs to ingest, store,
transform, and analyze data at scale.
Data Platform Value Matrix
LAYER PURPOSE VALUE
Store and compute your
data, either structured (in
Data warehouse / data
a warehouse),
lake Several days
unstructured ( in a lake),
or both (a lake house)
Bring data into your stack,
Orchestration (ingestion through either batch
+ transformation) processing or stream Hours
processing, and clean and
prepare it for analysis
Make your data actionable
Business intelligence
and available to data Several days
consumers and analysts
Monitors your end-to-end
20-minute deployment,
Data observability data quality and enables
value in hours, models
faster detection and
train in days
resolution of issues
Pro tip: Deciding whether to build or buy your modern data
stack will have a huge impact on its cost, and therefore, its ROI.
Due to the cost and scarcity of data engineering resources, it
usually makes more sense to buy, but there are exceptions.
Check out our Build vs. Buy Guide for more info.
HOW TO MEASURE THE ROI OF YOUR
DATA PLATFORM
Just like building it, measuring the ROI of a data stack isn’t a one-
size-fits-all operation. It all depends on how your company creates
value from data.
Start by identifying your company’s data use cases, then narrow in on
the right metrics to reflect how value is generated. Read on for the
high-level approach, then put your pen to paper with the worksheet on
the next page.
Step 1: Know your data use cases
Ask these critical questions about your data landscape:
Who are your stakeholders? Who accesses and uses data at your
company?
For example: executives, business unit leaders, customer
success managers, product managers, revenue ops, finance,
data analysts
How does your company use data?
Such as demand generation, product roadmapping, forecasting,
evaluating mergers and acquisitions, etc.
How does your company wish it could use data?
Think: powering consumer products, making purchase
recommendations, validating decision-making, automating ad
spend — the list goes on and on (and on and on)
What are your company-wide KPIs?
Likely candidates include recurring revenue, churn, retention,
customer acquisition cost, customer lifetime value, etc.
One way to measure data ROI is to track SLAs for
data uptime and availability.
Step 2: Identify metrics that reflect data usage
Look for improvements month-over-month, or quarter-over-quarter,
in these areas:
Reduced time to insight
How long does it take for stakeholders to get what they need
from the data team?
Increased stakeholder engagement
How many data consumers do you serve? How many queries do
data products receive per week? How many reports are pulled
per day?
Reduction in data downtime
How often is data is partial, erroneous, missing, or otherwise
inaccurate? (Calculate this by adding the time-to-detection and
time-to-resolution, then multiplying by the number of data
incidents over a period of time.)
Step 3: Communicate to executives & stakeholders
Tell the ROI story of the data your technology supports:
Hold quarterly business reviews (QBRs) to highlight wins and
learnings
Share growth, progress, and areas of improvement
Identify blockers to execution and rally support to address them
Data is more valuable the fresher it is.
DATA PLATFORM
ROI WORKSHEET
USE THIS WORKSHEET TO DEVELOP THE RIGHT METRICS TO REFLECT
YOUR DATA STACK ROI.
STACK OVERVIEW Highlight the key tools + cost, for reference.
Who are your stakeholders?
CRITICAL How does your company use data?
QUESTIONS How does your company *wish* it could use data?
What are your company-wide KPIs?
How long (in minutes) does it take for stakeholders to get what they need from the
data?
TIME TO
INSIGHT
How many dashboards, reports, and data projects do your stakeholders use on a
weekly or monthly basis?
# OF ACTIVE
DATA
PRODUCTS
Data downtime is measured by Time to Detection + Time to Resolution X Number of
Data Incidents over a period of time.
REDUCTION
IN DATA
DOWNTIME
# of requests or queries per week OR number of data consumers.
GROWTH
IN DATA
ADOPTION
DATA ROI IN ACTION AT DRATA
Data is kind of a big deal at Drata. The tech startup
provides automated security and compliance
capabilities for thousands of data-driven
companies. So when Lior Solomon joined as VP of
Data, he was ready to build a data organization with
a cutting-edge tech stack that enabled
unimpeachable user data privacy.
Lior needed scalable technologies that would be
easy to adopt and seamlessly integrate with Drata’s
existing systems. He built a powerful yet flexible
tech stack and a series of metrics that would
capture its ROI.
The Drata tech stack
Warehouse: Snowflake
Orchestration: Fivetran (ingestion), dbt
(transformation), Census (reverse ETL)
BI: Sigma Computing
Data Observability: Monte Carlo
Lior Solomon, VP of Engineering, Data
Drata's modern data stack.
How Drata measures data stack ROI
Data accuracy
Completeness, consistency, timeliness, level of data quality
issues, how quickly those get resolved
Data usage
Number of data requests, number of dashboards and reports
created, level of user engagement with those dashboards
and reports
Speed and efficiency
Data processing time, time to deliver data reports, overall
data analytics turnaround time
Team performance
Team morale, retention rates, ability to meet project
deadlines
“Overall, the data architecture we have put in place has been
designed to support the specific needs and goals of our fast-
growing startup. By choosing self-managed tools that don’t
require extensive infrastructure management, we have prioritized
speed and business enablement over hands-on control and
flexibility. This approach has allowed us to scale quickly and
efficiently, while also maintaining a high level of data security and
compliance.” - Lior Solomon, VP of Engineering, Data
DATA ROI IN ACTION AT PREFECT
Free-flowing data is the beating heart of Prefect, which
helps companies automate data workflows and
orchestrate pipelines at scale. Prefect uses their own
tooling, alongside the rest of their data stack, to move
information from their warehouse out across
operational solutions — such as taking user data from
the Prefect cloud platform and pushing it into email
marketing tooling and Salesforce.
There’s just one catch: the data team is small. Only two
dedicated data professionals are responsible for
maintaining the Prefect data ecosystem. That means
prioritizing powerful tech that provides end-to-end
visibility without a lot of maintenance, while ensuring
data reliability — and having the numbers to prove it.
No big deal, right?
The Prefect tech stack
Warehouse: BigQuery
Orchestration: Fivetran, dbt
BI: Looker
Data Observability: Monte Carlo
Dylan Hughes, Engineering Manager
Prefect's modern data stack.
How Prefect measures data stack ROI
The powerful tech stack at Prefect has delivered:
50% engineering time recovered by reducing time
to detection, triage, and remediation of data quality
issues
16x faster time to data quality tooling deployment,
reducing time-to-value by 6-8 months when
compared to building an MVP data quality solution
in-house
Preserved headcounts for implementation and
monitoring, while simultaneously increasing
productivity by 20+ hours per week
Increased capacity for change at warehouse level
with end-to-end visibility of pipelines and lineage
tracking
“It’s really easy for us to propose and make big changes
upstream in the data warehouse and know exactly what is
going to be impacted by the change. We’re able to explore
specific fields and how they move through our data
transformation steps—and to see what dashboards are
going to be impacted and where they’re going to end up.”
Dylan Hughes, Engineering Manager
DATA ROI IN ACTION AT DR. SQUATCH
The data at personal care brand Dr. Squatch has to be
fresh and flawless. In addition to handmade bar soaps and
high-quality natural products, the data-driven culture at
Dr. Squatch produces a hearty appetite for information.
Most employees have access to domain-specific
dashboards for daily needs, while nine analysts are
aligned with specific business units to provide in-depth
support. Overseeing their tech stack? One lone data
engineer.
With that ratio of data consumers expecting accurate data
at their fingertips to engineering support, the Dr. Squatch
data stack has to be powerful yet simple to maintain. And
when data incidents inevitably occur, that data engineer
and his analyst allies need to be the first to know — with
time-to-detection and time-to-resolution numbers that
reflect their squeaky-clean record.
The Prefect tech stack
Warehouse: Snowflake
Orchestration: Fivetran, Prefect, dbt, custom-built
pipelines
BI: Looker
Data Observability: Monte Carlo
Ken Nguyen, Data Engineer
Dr. Squatch's modern data stack.
How Dr. Squatch measures data stack ROI
Time saved every week: 10+ hours
Capacity increase for data engineer: 15%
Peace of mind around data quality: Priceless
“The fact we can see data outages on the specific
table or schema really helps us identify where the
data anomaly is coming from. Is it coming from
Fivetran directly? Is it coming from some model in
dbt? This is another place where we save time…I
would say peace of mind is the biggest value. You
can trust what the data is doing and focus on
other places instead of dividing your attention.
That kind of focus is extremely valuable.” - Ken
Nguyen, Data Engineer
CONCLUSION: SCALING YOUR DATA
STRATEGY FOR HYPERGROWTH
As the data leaders we just profiled will tell you,
assembling the right data stack for high-growth
companies usually means prioritizing scalability,
flexibility, and ease of use. These modern tools will
give your team the building blocks they need to
generate as much value and innovation from data as
possible while your company grows.
But we get it: data costs are rising and budgets are
tightening. Without the right context, CEOs and CFOs
can view data as a cost center rather than a revenue
driver.
As a data leader, it’s up to you to tell the story of data
ROI. This workbook should help you find the right
framework to justify your data journey.
One step that shouldn’t be missed: data observability.
If your leaders and stakeholders don’t trust that your
data is accurate, they won’t believe a word about ROI.
To learn how Monte Carlo can help you ensure data
reliability, contact our team today.