100% found this document useful (1 vote)

391 views24 pages

Report - Atlan - Data Catalog Primer

Uploaded by

Pablo Bogota Modera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

391 views24 pages

Report - Atlan - Data Catalog Primer

Uploaded by

Pablo Bogota Modera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

RESOURCE

COMPETITIVE POSITIONING

More
Thethan
Datajust a dataPrimer
Catalog catalog
Learn how to make datadata
End-to-end discovery and access
workspace a breeze
for modern datafor your data team
teams

AtlanHQ Compiled with 💙 by

Table of Contents
Chapter 1: The evolution of the data management ecosystem ….…… 03
We explore the diﬀerent data management technologies and how they changed over the years,
from data warehouses to the cloud and Hadoop.

Chapter 2: The problem with traditional data catalogs ….…… 13

We discover the challenges presented by traditional data catalogs, featuring some of the biggest pain
points from Gartner’s Peer Review.

Chapter 3: The ideal data catalog ….…… 19

We understand what a modern data catalog should look like by exploring the foundations of an ideal
data catalog.

Meet Atlan: An end-to-end data workspace for modern data teams ….…… 22
We take a sneak peek at Atlan’s futuristic data catalog platform for modern data teams.

Please note:
This report was ﬁrst compiled in October 2019 and was updated in April 2020 by Atlan.
The text, images, or a combination of both, as described in this material, cannot be copied, modiﬁed, published or distributed without prior written permission
from Atlan (Peeply Technologies Pvt Ltd) and its respective authors.
The names, logos and brand marks of all data software, platform and tools other than Atlan’s which are mentioned in this report are the properties of their
respective owners. No copyright infringement is intended. Should there be any question or concern, you can write to hello@atlan.com
2
Chapter 01
The evolution of the
data management
ecosystem
The 1990s: The era of data warehousing, data integration
and metadata management
2000-2015: The era of big data, cloud computing, data lakes
and (traditional) data catalogs

3
The evolution of the data management ecosystem

We’ve come a long way from just wondering whether we have the technology to store large
amounts of data. The challenges we face today are slightly more complex and revolve around
effectively using the data we have.
The questions that keep CDOs (Chief Data Officers) and CDAOs (Chief Data and Analytics
Officers) up all night have more to do with:
1. Access to the right data
2. Understanding the context and meaning of data
3. Using this data for further analysis
4. Keeping all organizational data safe while complying with local, regional and federal
regulations around data privacy and management

However, before we start scouring the (virtual) world for solutions, let’s understand why we
face these challenges now, starting with a quick (and nostalgic)
stroll down through the history of data management.🚶
Having troubles visualizing how much is a petabyte? This should help.
Kilobytes were stored on ﬂoppy disks. Megabytes were stored on hard disks,
So, let’s get started, shall we? while Terabytes in disk arrays. Petabytes are stored on the cloud.

Chris Anderson
THE WIRED

4
The 1990s
The era of data warehousing

5
The birth of the internet and the rise of data warehouses

The 90s = the birth of the internet… and the search engine Google.

Guess what that meant? More data! Computer scientist Michael Lesk estimated that the
amount of data available on the internet was around 12,000 petabytes (1 PB = 1000 TB), with
the size expected to grow tenfold each year. 😱

With increase in the amounts and sources of data, the demand for data management also
spiked, triggering the need for one solution to store, discover, analyze and use data for
decision making, i.e., a single source of truth.

And that led to the rise of on-premise data warehouses.

P.S. Bill Inmon coined the term “data warehouse” in the 90s[1], and wrote a book about it in
1992—considered to be a fundamental source on warehousing even today.

With data warehousing, data that had previously been spread across What is a data warehouse? A data warehouse is a key component
numerous sources could now be held together in one place. Moreover, of BI (business intelligence). For the uninitiated, a data warehouse is
these warehouses were speciﬁcally designed to support the analytical a central repository for all structured data from multiple sources
functions required for business intelligence. and/or transactional systems (aka CRMs like Salesforce or ERPs like
SAP). Through the 90s, companies worldwide had started adopting
Furhaad Shah data warehouses. Many in the data universe considered
DATACONOMY
warehouses as the one-stop solution to data chaos.

[1]
TDAN (The Data Administration Newsletter) on building data warehouses published on May 29, 2007
6
Enter data integration and metadata management tools

Data warehouses ruled the early 90s as they supported complex queries for data analysis.

However, with changing data formats and sources, transforming data (i.e. ETL) became
complicated and time-consuming.

Suddenly, warehouses from the 90s weren’t fast enough to keep up with the changes. This led
to the rise of data integration tools that simpliﬁed and fast-tracked the ETL workﬂows.

But that wasn’t the only problem. On-premise data warehouses lacked context. With more
data coming in every day, enterprises had tons of data, but no clue how to interpret or use it.

Enter metadata management tools that provided much-needed context and meaning to data.

Lastly, on-premise data warehouses were expensive to build and maintain. Since they’re built
considering peak usage capacity, it was extremely diﬃcult to predict and estimate future
capacity needs as the amount of information keeps growing exponentially.

And that brings us to the next step of evolution in the data ecosystem.

What is metadata? Metadata is information that describes your data

and provides useful context. Think about your favorite song. That’s
data. The name of the song, genre, and singer is the metadata. In
other words, metadata acts like an explainer for your data.

7
2000-2015
The era of the big data
and the cloud

8
Hadoop—the Swiss army knife of the 21st century

Two technological disruptions were happening in the 2000s.

Firstly, with the rise of web 2.0, the volume of data available started increasing exponentially,
ushering in the era of big data.
BTW, Web 2.0 (popularized by O’Reilly Media in 2005[2]) is nothing but user-generated
content—the internet as we know today (in 2020).
Think videos, audios, images, location data, social media interactions. Most of this data is
unstructured.
Guess what else is mostly unstructured? Big data.
Processing all that big data to extract meaningful insights was proving to be a major headache,
and that fueled the need for big data technologies like Hadoop.
First released in 2005[3], Hadoop quickly gained popularity and by 2013[4], almost half of Fortune
50 companies had adopted Hadoop for processing big data.
Now let’s see what was the other technological disruption.
Originating with technologies developed by Yahoo, Google, and other Web
What is Hadoop? A set of open source programs and processes that act as 2.0 pioneers in the mid-2000s, Hadoop is now central to the big data
the backbone of your data operations. It has four major components—HDFS strategies of enterprises, service providers, and other organizations.
(a distributed ﬁle system), MapReduce, Hadoop Common and YARN (Yet
Another Resource Negotiator). The Apache Software Foundation is James Kobielus
FORRESTER RESEARCH
responsible for maintaining Hadoop.

[2] [3]
O’Reilly Media on Web 2.0 published on September 30, 2005 Bernard Marr & Co. on Hadoop
[4]
PR Newswire on Altior’s AltraSTAR published on December 18, 2012
9
Cloud computing and warehousing

The 2000s was also the era of the cloud, popularized by the launch of Amazon S3 and EC2 in
2006[5], Windows Azure in 2010[6].
In 2011, Google[7] and IBM[8] threw their hats into the ring and with that, cloud had officially
arrived.
Remember the on-premise data warehouses being hard to maintain and expensive to scale?
In 2012[9], AWS introduced Redshift—a low-cost cloud data warehouse that’s easy to deploy
and scale.
While this solved some of the problems, it wasn’t enough.
Warehouses like Redshift were limited to their service providers (Redshift was limited to AWS
whereas BigQuery was limited to Google Cloud), making it challenging to find an alternative.
Also, compute and storage were interdependent, making it impossible to shut down compute
without affecting storage.
Lastly, even cloud data warehouses didn’t support unstructured We did the math and found that it costs between $19,000 and $25,000 per
data. So, still no single source of truth. 🤷 TB per year, at list prices, to build and run a good-sized data warehouse on
your own. Amazon Redshift will cost you less than $1,000 per TB per year.
And that brings us to data lakes.
Jeff Barr
AWS EVANGELIST

[5] [8]
AWS on oﬀering cloud computing services to businesses Cloudpro on IBM Cloud published on July 29, 2011
[6] [9]
Microsoft on Windows Azure Availability published on February 1, 2010 Information Week on Amazon Redshift published on November 28, 2012
[7]
Google Code on Google Cloud SQL published on October 6, 2011 10
Making sense of the unstructured with data lakes

Coined by James Dixon in 2010[10], data lakes seemed to be the one-stop solution for all big
data management problems. At least on paper.

In reality? Not so much. What enterprises ended up with was less of a lake and more of a
swamp (a data dump).

Other than the complexities in big data architecture, these are the top three challenges:

1. Finding actionable data is frustrating

2. Tracing data lineage can be elusive
3. Implementing data governance is painful

Bonus issue: Data lakes still don’t solve the single source of truth conundrum.

Originally, most companies I talked to thought that they would have one
huge, on-premises data lake that would contain all their data. As their
What is a data lake? A data lake stores a collection of various raw data understanding evolved, most enterprises realized that a single go-to point
sets from multiple internal and external data sources. The data in a data was not ideal. Between data sovereignty regulations and organizational
lake can be unstructured, semi-structured or structured. We’re talking pressures, multiple data lakes typically proved to be a better solution.
messy data from audio ﬁles, emails, photos or satellite imagery to more
neat and clean data like phone numbers, customer names, addresses and Alex Gorelik
zip codes. AUTHOR, THE ENTERPRISE BIG DATA LAKE

[10]
James Dixon’s blog on data lakes published on October 14, 2010
11
Enter the data catalog

Remember the reason why data warehouses came into the picture (way back in the 90s)?
Because enterprises needed one solution to store, discover, analyze and use data for decision
making, i.e., a single source of truth.

With cloud data warehouses, data lakes and big data technologies, data infrastructure had
gotten extremely complex. To make things easier, several companies came up with numerous
tools and technologies, which only added on to the complexities.

And despite all these advances in infrastructure, enterprises still found it diﬃcult to ﬁnd the
right data. Still no single source of truth containing all enterprise data along with metadata
information and context.

That’s where data catalogs come to the rescue by making data discovery easy across
the data ecosystem.

And with that, we end our history lesson. So far so good, yeah?

Fifty-four million data workers worldwide spend 44% of their workday on

unsuccessful data activities. Searching for and preparing data are the most
What is a data catalog? A data catalog is a library or inventory of all your common activities of the data worker role at 15% and 33% respectively. On
data assets—a place where all your data is neatly indexed, organized and average, they use four to seven different tools to perform data activities,
kept ready for use.
adding to the complexity of the data and analytics process.

The State of Data Science and Analytics Report[11]

[11]
The State of Data Science and Analytics by IDC
12
Chapter 02
The problem with
traditional data catalogs

4 critical shortcomings of traditional data catalogs

Biggest pain points from Gartner’s Peer Review on existing
data catalogs
4 reasons why data teams need a modern data catalog

13
How do traditional data catalogs fall short?

While data infrastructure has evolved, data management hasn’t.

On paper, traditional data catalogs are supposed to help enterprises make sense of their
data—where did it come from, what purpose does it serve and how is it being used.
In reality, traditional data catalogs fall short as they aren’t built for the new world of data.

Here are the top 4 shortcomings.

1. Not built for the cloud 2. Built for IT, not business

3. Need extensive support 4. Opaque pricing,

and maintenance not pay-as-you-go

14
The biggest pain points from Gartner’s Peer Review on existing catalogs
Don't just take our word for it. We sourced some of the biggest pain points that humans of data experienced while using
traditional data catalogs. It’s important to remember that the world we live in today is vastly diﬀerent from that of the 90s.

Cloud Native Limited support for cloud based storage/warehouse.

Monolithic architecture is difﬁcult to work with, and does not
scale well for enterprises or cloud deployments.

Difﬁcult to set up. (For data ingestion need to create airﬂow dags
Ease of Setup Implementation can be a challenge unless you have a good
partner.
or run python scripts. Templates available.) A data engineering
team required to set-up and maintain.

Ease of Use The catalog platform is built with a technical user in mind.
Experience for non-technical folks can be challenging. Need
better UI elements for non-technical personas.

Data catalog works well for static data, but the current data
Scalability Data catalog is built to scale (being cloud native). For AWS, it movement and architecture around data in motion (streaming) is
& Big Data has documentation for cluster deployment. not supported as well as hive structures. (Coming soon, but
behind.)

Maintenance Make sure you have an engineer team (2-3) prepared to support,
& Support
I need to hire Java developers to customize anything.
and upgrade the catalog platform.
Why do we need a modern data catalog?

We live in a new world of data with….

deequ

Cloud Proliferation Thriving Open-Source Ecosystem

Analysts Scientists Engineers

Business Users ML Researchers

Diverse Data Consumers Rapidly Innovating Ecosystem

The need of the hour is a modern data catalog for this new, increasingly cloud-ﬁrst world.

Let's take a look.

16
Enter the era of
The modern data catalog

17
What is a modern data catalog?
What makes a data catalog modern?
One fundamental characteristic—empowering non-technical or business users to understand,
interpret and use data for data and analytics initiatives.
Sounds Utopian, doesn’t it?
So long as we’re indulging ourselves, let’s take things slightly further and think of the ideal
data catalog.

What would that look like?

A data catalog creates and maintains an inventory of data assets through the
discovery, description and organization of distributed datasets. The data catalog
provides context to enable data stewards, data/business analysts, data engineers,
data scientists and other line of business (LOB) data consumers to ﬁnd and
understand relevant datasets for the purpose of extracting business value.

18
Chapter 03
The ideal data catalog
The foundations of the concept of an ideal data catalog
6 factors that make modern data catalogs the way forward
for data teams of 2020 and beyond
Meet Atlan: The ﬁrst data catalog built for the future

19
The foundation for an ideal data catalog

As a concept, the ideal data catalog for our modern world should be built on the basis of four foundational pillars.

Collaboration
4 DRIVE VALUE FROM DATA
Trust
TRUST DATA Everyone should be able to
3
use the data they need in
Knowledge Everyone should be the environment they are
2 UNDERSTAND DATA able to trust the data most comfortable in
is the right data for
Everyone should be their use case
Agility able to understand
1 DISCOVER DATA data with all its
context
Everyone should be
able to discover the
data they need in
seconds
Ideal Modern data catalogs: The way forward

To support our current data ecosystem and empower non-technical data

consumers, an ideal data catalog should have six key attributes.

BI & Reporting

Engineering

The same stack used by companies like

Data Science

01. Cloud-First: 02. Built on Open-Source, 03. Plug-And-Play With Your

Deploy on Your Cloud VPC Open By Default Favorite Data Tools

Completely Integrates directly

...designing the interface and
self-service, no into your existing
user experience of a data tool Usage
engineering AD / permissions and Users
should not be an afterthought!
support needed data infrastructure

04. Built for Business 05. No Training 06. Pay-As-You-Go Pricing

-Not Just IT or Support Overhead
The first data catalog built for the future

Cloud-native
data catalog

24 hours
to get up and running

Democratization
for business

Governance
for IT

Watch Guided Demo Take Guided Tour

22
THANK YOU FOR READING THE

Data Catalog Primer

Check out our other resources for data teams

WEBINAR SERIES EBOOK

How are top data teams The ultimate guide on

making the move to implementing agile for
remote? data teams

Compiled with ❤ by Atlan

23
Trusted by data teams around the world

We are proud to be supported by

Rajan Anandan Manoj Menon Ratan Tata

Former MD Google India Partner & MD Frost & Sullivan (APAC) Chairman Emeritus Tata Sons

The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
No ratings yet
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
12 pages
Snowflake Data Governance Guide
No ratings yet
Snowflake Data Governance Guide
35 pages
Global Services
No ratings yet
Global Services
15 pages
Dama Spring2013 DG and MDM
100% (1)
Dama Spring2013 DG and MDM
38 pages
Enterprise Data Strategy1
No ratings yet
Enterprise Data Strategy1
6 pages
Data Governance for IT Leaders
No ratings yet
Data Governance for IT Leaders
15 pages
(Bandong) SAS Data Governance Framework
100% (1)
(Bandong) SAS Data Governance Framework
17 pages
Federal Government Data Maturity Model
No ratings yet
Federal Government Data Maturity Model
8 pages
AEB-1184 DataOps Flipbook v2.4.2b
100% (1)
AEB-1184 DataOps Flipbook v2.4.2b
13 pages
Data Governance Maturity Model
No ratings yet
Data Governance Maturity Model
42 pages
How To Design An Eff 815966 NDX
No ratings yet
How To Design An Eff 815966 NDX
21 pages
Business Data Governance Guide
100% (1)
Business Data Governance Guide
14 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Data Governance Course Aid
100% (1)
Data Governance Course Aid
18 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Introduction To The Ibm Dataops Methodology and Practice
No ratings yet
Introduction To The Ibm Dataops Methodology and Practice
30 pages
Data Governance Framework Guide
No ratings yet
Data Governance Framework Guide
7 pages
IN 1040 DataDiscoveryGuide en PDF
No ratings yet
IN 1040 DataDiscoveryGuide en PDF
215 pages
Driving Effective Data Governance For Improved Quality and Analytics
No ratings yet
Driving Effective Data Governance For Improved Quality and Analytics
6 pages
Data Governance
No ratings yet
Data Governance
17 pages
Azure MDM Integration Guide
100% (1)
Azure MDM Integration Guide
5 pages
Data Architecture Essentials
No ratings yet
Data Architecture Essentials
7 pages
Preso Accenture - INFADAY - 2011
No ratings yet
Preso Accenture - INFADAY - 2011
18 pages
Understanding Data Contracts
No ratings yet
Understanding Data Contracts
7 pages
The 5 Ways Modern Data Governance Helps Business Productivity
100% (1)
The 5 Ways Modern Data Governance Helps Business Productivity
12 pages
Data Governance Framework at Stony Brook University: Scope
100% (1)
Data Governance Framework at Stony Brook University: Scope
6 pages
Solving The Enterprise Data Dilemma Ebook 28209
No ratings yet
Solving The Enterprise Data Dilemma Ebook 28209
10 pages
Module 2 - MDM
No ratings yet
Module 2 - MDM
22 pages
Data Governance Trends For 2020
No ratings yet
Data Governance Trends For 2020
3 pages
Politics in Data Warehousing Failure
No ratings yet
Politics in Data Warehousing Failure
9 pages
Teradata MLDM
No ratings yet
Teradata MLDM
9 pages
Data Governance Framework V1
No ratings yet
Data Governance Framework V1
23 pages
Designing A Data Governance Model Based
No ratings yet
Designing A Data Governance Model Based
7 pages
What Is Master Data Governance
No ratings yet
What Is Master Data Governance
3 pages
Jaquars Master Data Management Integration Framework
No ratings yet
Jaquars Master Data Management Integration Framework
21 pages
01 Develop A Master Data Management Strategy and Roadmap Executive Brief
No ratings yet
01 Develop A Master Data Management Strategy and Roadmap Executive Brief
24 pages
The Modernization of The Data Warehouse
No ratings yet
The Modernization of The Data Warehouse
17 pages
Data Quality DMB Ok Dam A Brasil
100% (1)
Data Quality DMB Ok Dam A Brasil
46 pages
Microsoft Modern Data Estate
No ratings yet
Microsoft Modern Data Estate
48 pages
EDMC DCAM Overview v2.2.3 5 1 2024 1
100% (1)
EDMC DCAM Overview v2.2.3 5 1 2024 1
36 pages
Accenture MDM Foundations
No ratings yet
Accenture MDM Foundations
4 pages
Master Data Management at Bosch PDF
100% (1)
Master Data Management at Bosch PDF
10 pages
Data Strategy
100% (1)
Data Strategy
13 pages
DATAVERSITY Erwin State of Data Governance 2020 Final 012420
100% (1)
DATAVERSITY Erwin State of Data Governance 2020 Final 012420
16 pages
DataOps Guide for Big Data & ETL
100% (1)
DataOps Guide for Big Data & ETL
13 pages
Taking Data Quality To The Enterprise Through Data Governance
100% (2)
Taking Data Quality To The Enterprise Through Data Governance
28 pages
Datamesh Ebook
No ratings yet
Datamesh Ebook
46 pages
The Data Driven Enterprise
No ratings yet
The Data Driven Enterprise
27 pages
Data-Driven Enterprise Strategy
No ratings yet
Data-Driven Enterprise Strategy
16 pages
Data Trends and Predictions 2022
No ratings yet
Data Trends and Predictions 2022
18 pages
Data Governance Template
No ratings yet
Data Governance Template
38 pages
Data Quality Essentials for Businesses
No ratings yet
Data Quality Essentials for Businesses
15 pages
David Plotkin
100% (1)
David Plotkin
41 pages
PWC A4 Data Governance Results
100% (2)
PWC A4 Data Governance Results
36 pages
Sas Data Governance Framework 107325
No ratings yet
Sas Data Governance Framework 107325
12 pages
Data Governance Maturity Model Explained - George Firican
No ratings yet
Data Governance Maturity Model Explained - George Firican
15 pages
Innovations in MDM Implementation: Success Via A Boxed Approach
No ratings yet
Innovations in MDM Implementation: Success Via A Boxed Approach
4 pages
Insurers: Embrace Enterprise Data Office
No ratings yet
Insurers: Embrace Enterprise Data Office
10 pages
The Ultimate Guide To Data Marketplaces in 2023
No ratings yet
The Ultimate Guide To Data Marketplaces in 2023
11 pages
Ebook The Evolution of The Data Warehouse
No ratings yet
Ebook The Evolution of The Data Warehouse
40 pages
ch06 2
No ratings yet
ch06 2
52 pages
India and SEA Infra Startups - Lightspeed, Google and Github Report (8th Sept 2022)
No ratings yet
India and SEA Infra Startups - Lightspeed, Google and Github Report (8th Sept 2022)
12 pages
11th Computer Science Chapter 9 To 12 Model Question Paper English Medium
No ratings yet
11th Computer Science Chapter 9 To 12 Model Question Paper English Medium
4 pages
System Programming by Dhamdhere Text
No ratings yet
System Programming by Dhamdhere Text
456 pages
Body Process A Gestalt Approach To Working With The Body in Psychotherapy Ebook and TestBank Bundle Full Download
100% (1)
Body Process A Gestalt Approach To Working With The Body in Psychotherapy Ebook and TestBank Bundle Full Download
412 pages
Daftar Hadir Apel TGL 7 Oktober 2024
No ratings yet
Daftar Hadir Apel TGL 7 Oktober 2024
4 pages
ECE: Introduction To Computer Systems Instructor: Maria Striki Spring 2021
No ratings yet
ECE: Introduction To Computer Systems Instructor: Maria Striki Spring 2021
127 pages
Advanced LTE Radio Planning Course
No ratings yet
Advanced LTE Radio Planning Course
4 pages
Open Packet Tracer Scenario: Telnet-CDP Lab-1 Scenario - Pkt. Before Proceeding, Save As Telnet-CDP Lab-1 Scenario-Working - PKT
No ratings yet
Open Packet Tracer Scenario: Telnet-CDP Lab-1 Scenario - Pkt. Before Proceeding, Save As Telnet-CDP Lab-1 Scenario-Working - PKT
3 pages
Test2 Chap5678 HonsDBMS-11feb25
No ratings yet
Test2 Chap5678 HonsDBMS-11feb25
1 page
Peer Evaluation Form Template
No ratings yet
Peer Evaluation Form Template
1 page
MPMC U3&u4 Part-C Key
No ratings yet
MPMC U3&u4 Part-C Key
19 pages
General Test Specification
No ratings yet
General Test Specification
21 pages
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
No ratings yet
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
82 pages
GE Fanuc Automation: Series 16 I / 18i / 160i / 180i - Model A Series 21 I / 210i - Model A
No ratings yet
GE Fanuc Automation: Series 16 I / 18i / 160i / 180i - Model A Series 21 I / 210i - Model A
75 pages
Global Supply Chain Management Simulation
No ratings yet
Global Supply Chain Management Simulation
9 pages
Grade 4 Math Challenge
No ratings yet
Grade 4 Math Challenge
3 pages
Unit 1
No ratings yet
Unit 1
10 pages
Roblox Api Domain List
No ratings yet
Roblox Api Domain List
7 pages
Risc & Sisc Characteristics
No ratings yet
Risc & Sisc Characteristics
9 pages
MAT4082 LPPmoreexamples
No ratings yet
MAT4082 LPPmoreexamples
12 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Changelog
No ratings yet
Changelog
4 pages
CS-H4-R201-1H3WKFL Camara
No ratings yet
CS-H4-R201-1H3WKFL Camara
2 pages
12th Computer Sc. Solved MCQs - Fullbook
No ratings yet
12th Computer Sc. Solved MCQs - Fullbook
15 pages
Bresenham Line Drawing Algorithm
No ratings yet
Bresenham Line Drawing Algorithm
14 pages
SIM7500 - SIM7600 Series - AT Command Manual - V1.10
No ratings yet
SIM7500 - SIM7600 Series - AT Command Manual - V1.10
335 pages
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
No ratings yet
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
3 pages
Microsoft Word
No ratings yet
Microsoft Word
14 pages
PKI and Digital Signatures Explained
No ratings yet
PKI and Digital Signatures Explained
4 pages

Report - Atlan - Data Catalog Primer

Uploaded by

Report - Atlan - Data Catalog Primer

Uploaded by

RESOURCE

AtlanHQ Compiled with 💙 by

Chapter 2: The problem with traditional data catalogs ….…… 13

Chapter 3: The ideal data catalog ….…… 19

And that led to the rise of on-premise data warehouses.

What is metadata? Metadata is information that describes your data

Two technological disruptions were happening in the 2000s.

1. Finding actionable data is frustrating

Fifty-four million data workers worldwide spend 44% of their workday on

The State of Data Science and Analytics Report[11]

4 critical shortcomings of traditional data catalogs

While data infrastructure has evolved, data management hasn’t.

Here are the top 4 shortcomings.

3. Need extensive support 4. Opaque pricing,

Cloud Native Limited support for cloud based storage/warehouse.

We live in a new world of data with….

Cloud Proliferation Thriving Open-Source Ecosystem

Analysts Scientists Engineers

Business Users ML Researchers

Diverse Data Consumers Rapidly Innovating Ecosystem

Let's take a look.

What would that look like?

To support our current data ecosystem and empower non-technical data

The same stack used by companies like

01. Cloud-First: 02. Built on Open-Source, 03. Plug-And-Play With Your

Completely Integrates directly

04. Built for Business 05. No Training 06. Pay-As-You-Go Pricing

Watch Guided Demo Take Guided Tour

Data Catalog Primer

WEBINAR SERIES EBOOK

How are top data teams The ultimate guide on

Compiled with ❤ by Atlan

We are proud to be supported by

Rajan Anandan Manoj Menon Ratan Tata

You might also like