0% found this document useful (0 votes)

64 views5 pages

Solution Methodology

Slowly changing dimensions

Uploaded by

Shweta Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views5 pages

Solution Methodology

Slowly changing dimensions

Uploaded by

Shweta Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

How to deal with slowly changing dimensions using Snowflake?

What is Slowly Changing Dimensions?

The terms "facts" and "dimensions" are used in data warehousing. A fact is a piece of numerical
data, such as a sale or clicks. Facts are stored in fact tables, linked to a variety of dimension
tables via foreign keys that act as companion tables to the facts. Dimension Attributes are the
different columns in a dimension table that provide descriptive features of the facts.
A Slowly Changing Dimension (SCD) stores and maintains both current and historical data across
time in a data warehouse. It is regarded as one of the essential ETL jobs for monitoring the
history of dimension records, and it has been implemented. Customer, geography, and
employee are examples of such dimensions.

SCD may be approached in a variety of ways. The most popular ones are:

Type 0: This is a passive method. When the dimensions change in this approach, no particular
action is taken. Some dimension data can be kept the same as when it was initially entered,
while others may be replaced.

Type 1: The new data overwrites the previous data in a Type 1 SCD. As a result, the existing data
is lost because it is not saved elsewhere. This is the most common sort of dimension one will
encounter. To make a Type 1 SCD, one does not need to provide further information.

Type 2: The complete history of values is preserved in a Type 2 SCD. The current record is closed
when the value of a particular attribute changes. With the updated data values, a new record is
generated, which then becomes the current record. Each record's adequate time and expiry
time are used to determine the period during which the record was active.

Type 3: For some chosen dimensions, a Type 3 SCD maintains two copies of values. The previous
and current values of the chosen attribute are saved in each record. When the value of any of
the chosen attributes changes, the latest value is recorded as the current value, and the
previous value is saved as the old value in a new column.

In this project, we use Snowflake Datawarehouse to implement different SCDs. Snowflake offers
all sorts of services to build an efficient Data warehouse with ETL capability and support for
various external data partners.
Data Pipeline:

It refers to a system for moving data from one system to another. The data may or may not be
transformed, and it may be processed in real-time (or streaming) instead of batches. The data
pipeline is extracting or capturing data using various tools, storing raw data, cleaning, validating
data, transforming data into the query-worthy format, and visualizing KPIs, including
Orchestrating the above process.

Dataset Description:

In this project, we use the faker library from Python to generate records of users and store the
records in CSV format with the name, including the current system time.
The data includes the following parameters:
● Customer_id
● First_name
● Last_name
● Email
● Street
● State
● Country

Tech Stack:

➔ Languages: Python3, JavaScript, SQL

➔ Services: NiFi, Amazon S3, Snowflake, Amazon EC2, Docker

NiFi

Apache NiFi is a data logistics platform that automates data transfer across systems. It gives
real-time control over data transportation from any source to any destination, making it simple
to handle.
Docker

Docker is a containerization platform that is available as an open-source project. It allows

developers to bundle programs into containers, which are standardized executable components
that combine application source code with the OS libraries and dependencies needed to run
that code in any environment.

Amazon EC2

In the Amazon Web Services Cloud, the Amazon Elastic Compute Cloud (Amazon EC2) offers
scalable computing capability. The user will not have to buy hardware upfront if Amazon EC2 is
used. Amazon EC2 allows developers to launch multiple virtual servers based on usage, set
security and networking, and manage storage.

Amazon S3

Amazon S3 is an object storage service that provides manufacturing scalability, data availability,
security, and performance. Users may save and retrieve any quantity of data using Amazon S3 at
any time and from any location.

Snowflake

Snowflake is a data storage, processing, and analytics platform that blends a unique SQL query
engine with a cloud-native architecture. Snowflake delivers all the features of an enterprise
analytic database to the user. Snowflake components include:

- Warehouse/Virtual Warehouse
- Database and Schema
- Table
- View
- Stored procedure
- Snowpipe
- Stream
- Task
Approach:

- Test data created using faker library and saved in CSVs.

- Data is ingested using NiFi and pushed to Amazon S3.
- A Snowpipe automation tool loads new data from S3 to the staging table.
- Data manipulation language changes are recorded using Snowflake streams in the staging
table to decide the operation to be performed.
- Based on the changes, Tasks and stored procedures are triggered to implement SCD Type-1
and Type-2.

Key Takeaways:

● Understanding the basics of SCD and its different types.

● Visualizing the complete Architecture of the system
● Understanding the project and how to use AWS EC2 Instance and security groups.
● Introduction to Docker.
● Docker Installation and execution.
● Usage of docker-composer and starting all tools.
● Creation of Access key.
● Creation of S3 bucket.
● Test Data preparation.
● Understanding basics of NiFi.
● Integrating NiFi with S3.
● Implementing NiFi flow setup.
● Introduction to different Snowflake components.
● Implementation of different Snowflake components.
● Implementation of SCD Type-1 and Type-2.
Architecture:

Snowflake Questions V2
No ratings yet
Snowflake Questions V2
6 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Database Theory Report 2
No ratings yet
Database Theory Report 2
2 pages
Snowflake
No ratings yet
Snowflake
22 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
DatawareHousing Concepts
No ratings yet
DatawareHousing Concepts
20 pages
Data Warehouse Design
No ratings yet
Data Warehouse Design
7 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
ETL Testing Fundamentals
No ratings yet
ETL Testing Fundamentals
5 pages
Data Warehousing 1
No ratings yet
Data Warehousing 1
29 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
CCS341 Data Warehousing Notes Unit I
No ratings yet
CCS341 Data Warehousing Notes Unit I
30 pages
DWDM Unit-1 R23
No ratings yet
DWDM Unit-1 R23
33 pages
Abinitio Vijay - 8553385664
No ratings yet
Abinitio Vijay - 8553385664
28 pages
Snowflake
No ratings yet
Snowflake
73 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Unit - Iii
No ratings yet
Unit - Iii
39 pages
In The Star Schema Design
No ratings yet
In The Star Schema Design
11 pages
Cloud Data Warehouse: Streamsets For Snowflake
No ratings yet
Cloud Data Warehouse: Streamsets For Snowflake
6 pages
Le0-Star - Snowflake - NoSQL Database
No ratings yet
Le0-Star - Snowflake - NoSQL Database
19 pages
What Is Snowflake - 1
No ratings yet
What Is Snowflake - 1
91 pages
Climate Change and Agricultural Yields
No ratings yet
Climate Change and Agricultural Yields
20 pages
Ly DuongHai
No ratings yet
Ly DuongHai
70 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
11 pages
Snowflake DBT Content
No ratings yet
Snowflake DBT Content
4 pages
DW and Abinitio Basic Concepts
No ratings yet
DW and Abinitio Basic Concepts
27 pages
Dimensional Data Modeling With Databricks
No ratings yet
Dimensional Data Modeling With Databricks
23 pages
Lecture 03
No ratings yet
Lecture 03
31 pages
Abinitio
100% (1)
Abinitio
28 pages
Wancerz
No ratings yet
Wancerz
2 pages
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
No ratings yet
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
15 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Data Modeling, Star Schema, Snowflake Schema
No ratings yet
Data Modeling, Star Schema, Snowflake Schema
7 pages
Architecture
No ratings yet
Architecture
4 pages
Unit 2 Updated
No ratings yet
Unit 2 Updated
50 pages
Data Warehousing and Data Mining Dec 2023
No ratings yet
Data Warehousing and Data Mining Dec 2023
28 pages
An Introduction To Snowflake - SQLKonferenz
No ratings yet
An Introduction To Snowflake - SQLKonferenz
29 pages
Data Engineering
No ratings yet
Data Engineering
5 pages
Bring Data Lakes and Data Warehouses Together
100% (1)
Bring Data Lakes and Data Warehouses Together
19 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
Data Mining Notes UNIT II
No ratings yet
Data Mining Notes UNIT II
25 pages
History Management of Data - Slowly Changing Dimensions: Marek Wancerz, Paweł Wancerz
No ratings yet
History Management of Data - Slowly Changing Dimensions: Marek Wancerz, Paweł Wancerz
3 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Snowflake Prctice1
100% (1)
Snowflake Prctice1
51 pages
DMW Lab File Work
No ratings yet
DMW Lab File Work
18 pages
Data Warehouse
No ratings yet
Data Warehouse
81 pages
Snowflake: Cloud Data Platform Guide
No ratings yet
Snowflake: Cloud Data Platform Guide
108 pages
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
No ratings yet
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
50 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
132 pages
Snowflake Interview Questions and Answers
No ratings yet
Snowflake Interview Questions and Answers
5 pages
Slowly Changing Dimensions - Type 1 and Type 2 Guide
No ratings yet
Slowly Changing Dimensions - Type 1 and Type 2 Guide
9 pages
Data Mining
No ratings yet
Data Mining
55 pages
Snowflake Best Practices Guide
0% (1)
Snowflake Best Practices Guide
33 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Law 7
No ratings yet
Law 7
6 pages
KPI Indicators Example
No ratings yet
KPI Indicators Example
3 pages
Food Science Exam Guide
No ratings yet
Food Science Exam Guide
19 pages
Iwar Alto
0% (1)
Iwar Alto
1 page
IPv4 Protocols & Unicast Routing Guide
No ratings yet
IPv4 Protocols & Unicast Routing Guide
52 pages
Business Math Payroll & Finance Assignment
No ratings yet
Business Math Payroll & Finance Assignment
4 pages
Infrastructure
No ratings yet
Infrastructure
6 pages
How To Install Niresh 10
No ratings yet
How To Install Niresh 10
2 pages
Land Title Disputes Explained
No ratings yet
Land Title Disputes Explained
2 pages
CALCULATE - Completing A 1040
No ratings yet
CALCULATE - Completing A 1040
3 pages
100 Useful Tips and Tools To Research The Deep Web
100% (1)
100 Useful Tips and Tools To Research The Deep Web
5 pages
Soundblox: Key Features and Benefits
No ratings yet
Soundblox: Key Features and Benefits
2 pages
50OhmCoax 311601 PDF
No ratings yet
50OhmCoax 311601 PDF
1 page
07 Covariance Answers Hidden Lecture
No ratings yet
07 Covariance Answers Hidden Lecture
62 pages
GTM Strategy - Solar Module Manufacturing & Data Centre
No ratings yet
GTM Strategy - Solar Module Manufacturing & Data Centre
11 pages
GE-McKinsey Matrix
No ratings yet
GE-McKinsey Matrix
12 pages
Tugas1 - 122220040 - THARIQ ZATA WAFI - TI B
No ratings yet
Tugas1 - 122220040 - THARIQ ZATA WAFI - TI B
3 pages
Protocol PDF
No ratings yet
Protocol PDF
5 pages
Ch5 Admission of A Partner Q41 60
No ratings yet
Ch5 Admission of A Partner Q41 60
35 pages
Studio
No ratings yet
Studio
65 pages
Author Dr. Joni Nicole McAllister's New Book "Paideia: The Associate Minister's Training Manual" Is Designed To Help Associate Ministers Expand Their Leadership Skills
No ratings yet
Author Dr. Joni Nicole McAllister's New Book "Paideia: The Associate Minister's Training Manual" Is Designed To Help Associate Ministers Expand Their Leadership Skills
4 pages
Concepts FICO
No ratings yet
Concepts FICO
5 pages
Vsan 802 Administration Guide
No ratings yet
Vsan 802 Administration Guide
153 pages
Project - 2022 Petrobel Final
No ratings yet
Project - 2022 Petrobel Final
456 pages
Hemo Teca
No ratings yet
Hemo Teca
17 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Score Reports
No ratings yet
Score Reports
4 pages
Accounting Standards AS-10, AS-26
No ratings yet
Accounting Standards AS-10, AS-26
25 pages
Grade 4 Science: Sun & Shadows
No ratings yet
Grade 4 Science: Sun & Shadows
4 pages
Python Operators Guide
No ratings yet
Python Operators Guide
9 pages

Solution Methodology

Uploaded by

Solution Methodology

Uploaded by

How to deal with slowly changing dimensions using Snowflake?

What is Slowly Changing Dimensions?

➔ Languages: Python3, JavaScript, SQL

Docker is a containerization platform that is available as an open-source project. It allows

- Test data created using faker library and saved in CSVs.

● Understanding the basics of SCD and its different types.

You might also like