Snowflake:
Snowflake is basically a SaaS (Software as a service is a cloud-based method of
providing software to users.) based data warehouse (DWH) platform that is built on
the top of AWS (Amazon Web Services), Microsoft Azure, and Google Cloud
infrastructures to provide companies with flexible, scalable storage solutions while
also hosting BI (Business Intelligence) solutions. It serves as a centralized platform
for data management, data lakes, data engineering, data applications development,
data science, and secure sharing and consumption of real-time and shared data.
Providing a centralized system to consolidate all data, Snowflake revolutionized the
data warehousing industry. With Snowflake, you can simplify data warehouse
management without sacrificing features.
Cloud Data Warehouse:
A cloud-based platform for storing and analyzing data, which offers scalability,
flexibility, and cost-efficiency compared to traditional on-premises data warehouses.
Snowflake provides a fully managed service with separate compute, storage, and
cloud services layers, making it easier to scale and manage data operations.
Snowflake Architecture:
Snowflake’s Architecture is a hybrid of traditional Shared-Disk and Shared-Nothing
database architectures. It consists of three key layers:
● Database Storage
Snowflake uses highly secure cloud storage to maintain all your data.
Snowflake stores all data in databases. A database is a logical
grouping of objects consisting primarily of tables and views organized
into one or more schemas.
● Query Processing
Query execution is performed in the compute layer where queries are
executed using resources provisioned from a cloud provider. Unlike
traditional architectures snowflake allows you to create multiple
independent compute clusters called Virtual Warehouses.
● Cloud Services
The Snowflake Cloud Services layer is the ‘Brain’ of the system which
coordinates and manages the entire system.
The Services layer is responsible for:
○ Authentication of users.
○ Query compilation and optimization.
○ Infrastructure Management
○ Access Control
○ Metadata Store
Snowflake Editions:
Snowflake provides below editions to choose from that fits your organization’s
specific requirements.
● Standard Edition
● Enterprise Edition
● Business Critical Edition
● Virtual Private Snowflake (VPS)
Snowflake Currently Supported Platforms:
● Amazon Web Services (AWS)
● Google Cloud Platform (GCP)
● Microsoft Azure (Azure).
Types of Tables:
1. Permanent Table:
● Default table type.
● Exists until explicitly dropped.
● Time Travel 90 days.
● Fail safe
2. Temporary Table:
● Used for transitory data.
● Persist for the duration of a session.
● Time Travel - 1 day or until the session ends.
● No Fail safe
3. Transient Table:
● Exists until explicitly dropped.
● No fail-safe period.
● Time Travel - 1 day
● No Fail safe
External Tables:
An external table is a Snowflake feature that allows you to query data stored in an
external stage as if the data were inside a table in Snowflake. The external stage is
not part of Snowflake, so Snowflake does not store or manage the stage.
Types of Views:
Non-Materialized views (Regular views):
A Non-Materialized view’s results are created by executing the query at the time that
the view is referenced in a query. The term “Views” generally refers to
Non-Materialized views.
Materialized Views:
A Materialized view is a database object that stores the pre-computed results of a
query definition of a view. While simple views allow us to save complicated queries
for future use, materialized views store a copy of the query results.
Secure Views:
A Secure View limits access to the data definition of the view so that the sensitive
data that should not be exposed to all users of the underlying table(s) stays hidden.
Virtual warehouse:
In Snowflake, a Virtual warehouse is one or more clusters endorsing users to carry
out operations like queries, data loading, and other DML operations. Virtual
warehouses approve users with the necessary resources like temporary storage,
CPU for performing various snowflake operations.
Multi-Cluster Warehouses:
Multi-cluster warehouses enable you to automatically scale out compute resources
to manage your concurrent users/queries needs as they change, such as during
peak and off hours. Multi-cluster warehouses are not beneficial for improving the
performance of slow-running queries or data loading. Instead Multi-cluster
Warehouses can spin-up additional warehouses to handle the workload.
Scaling Up:
Increasing the size of a virtual warehouse to provide more compute resources for a
specific workload.
Scaling Out:
Adding more compute clusters to a virtual warehouse to handle increased
concurrency and workload demands.
Auto-Suspend:
Automatically suspends a virtual warehouse when it is idle for a specified period,
saving costs.
Auto-Resume:
Automatically resumes a suspended virtual warehouse when a query is submitted,
ensuring availability without manual intervention.
Stages:
Staging is the process of uploading data into a stage in Snowflake. Data that needs
to be loaded or stored in Snowflake is stored elsewhere in the cloud, such as AWS
S3, GCP (Google Cloud Platform), Azure, or internally within Snowflake.
An external stage occurs when Data is stored in another cloud region.
An internal stage occurs when Data is stored within a snowflake.
Internal stages are subdivided into the following categories:
● Because each stage is associated with a specific user, they will be assigned
to all users by default for file storage.
● Table stages: Because each stage is associated with a specific database
table, they will be assigned to all tables by default.
● Internally named stages provide more flexibility than user or table stages. All
operations that can be performed on objects can also be performed internally
called stages because these are some of the Snowflake objects. These
stages must be created manually, and file formats can be specified.
Time Travel:
Allows users to query, clone, or restore data to a previous state within a defined
retention period. This feature aids in data recovery and auditing.
Zero-Copy Cloning:
Enables creating a clone of a database, schema, or table without copying the data.
Changes to the clone do not affect the original, and vice versa.
Data Sharing:
Snowflake allows secure sharing of data between different accounts without data
movement. Shared data can be accessed in real-time, ensuring consistency and
reducing latency.
Task:
Tasks are used to automate the execution of SQL statements, including procedural
logic, at specified intervals or upon completion of other tasks.
Stream:
Streams track changes to a table (inserts, updates, deletes) and provide a change
data capture (CDC) mechanism for efficient data processing.
Snowpipe:
Snowpipe is Snowflake's continuous data ingestion service, which allows for the
automated loading of data from external stages into Snowflake tables.
Data Masking:
Data masking provides a way to protect sensitive data by masking it in query results,
based on user roles. This ensures that sensitive information is not exposed to
unauthorized users.
Fail-Safe:
Fail-Safe is a Snowflake feature that provides an additional 7-day period for
recovering data after the Time Travel retention period has expired. This ensures data
recovery in case of failures.
Stored Procedures:
Stored procedures in Snowflake allow for procedural logic and complex operations to
be encapsulated in SQL or JavaScript, enabling automation and reusable code.
User-Defined Functions (UDFs):
UDFs allow users to define their own functions in SQL or JavaScript, extending
Snowflake's built-in functionality with custom logic.
System-defined Roles:
ORGADMIN
• Manages operations at organization level.
• Can create accounts in an organization.
• Can view all accounts in an organization.
• Can view usage information across an organization.
ACCOUNT ADMIN
• Top-level and most powerful role for an account.
• Encapsulates SYSADMIN & SECURITYADMIN.
• Responsible for configuring account-level parameters.
• View and operate on all objects in an account.
• View and manage Snowflake billing and credit data.
• Stop any running SQL statements.
SYSADMIN
• Can create warehouses, databases, schemas and other objects in an account
SECURITYADMIN
• Manage grants globally via the MANAGE GRANTS privilege.
• Create, monitor and manage users and roles.
USERADMIN
• User and Role management via CREATE USER and CREATE ROLE security
privileges.
• Can create users and roles in an account.
PUBLIC
• Automatically granted to every user and every role in an account.
• Can own securable objects, however objects owned by PUBLIC role are available
to every other user and role in an account