0% found this document useful (0 votes)

4 views25 pages

Snowflake

Snowflake organizes data into micro-partitions and supports data loading through the COPY command and Snowpipe for continuous ingestion. Data must first be staged before being loaded into tables, with stages categorized as internal or external. Snowflake also offers various table types (permanent, temporary, transient, external) and view types (standard, secure, materialized) for data management and access control.

Uploaded by

Vishnu Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views25 pages

Snowflake

Uploaded by

Vishnu Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Snowflake

Data in Snowflake tables is automatically organized into partitions, known as micro-partition.

Each micro-partition generally contains 50MB to 500 MB of uncompressed data. However, the

stored size is smaller as Snowflake data is always stored with compression. Within each micro-

partition, the data is stored in a columnar format.

High Level View of Data Loading into Snowflake :-

Snowflake supports data loading in two primary ways.
So the copy command can be used to load bulk data or large files.
To import data into a table.
The copy command requires the usage of a virtual warehouse.
When new data is loaded into a table, one or more new micro partitions are written to the storage,
and every time a new micro partition for a table is written, the table's metadata is changed.
This metadata contains information on the micro partitions, the range of values for each columns and
other optimization information.
At the same time, extra metadata about the file that has been imported into the database is stored
in the metadata database.
And all of this metadata is stored in Snowflake's cloud services layer.
Now, the other method of loading data into Snowflake is via the Snowpipe.
So Snowpipe is the ideal technique for loading data when the data is arriving continuously in a messaging
or a streaming manner.
So regardless of how the data is loaded into Snowflake, new micro partitions are created and metadata
updates are performed.
We learn a lot more about data loading in the coming sections.
However, for the time being, remember that every time data load is performed, new micro partitions
are created.
The metadata for the loaded table is updated and additional metadata is captured.
Which stores?
Which files were actually loaded?
Before data can be processed into a snowflake table, it must first be made available in a snowflake
stage.
This allows snowflake access to the data so that it can be loaded into a table.
Once the data is available in a stage, the copy command can then be used to copy the data into a database
table.
So now, to fully grasp Snowflake's concept of stage, we must first understand the significance of
staging in a data warehouse.
So in a data warehouse, a staging area is a storage location between the source systems and the data
warehouse.
Different methods such as FTP and other transfer mechanisms are used to move data extracted from the
source systems to the staging area.
Once the data is in the staging area, data may be imported into the data warehouse.
So the staging area can be outside a data warehouse.
So the data is first extracted from a source system and then transferred to an intermediate area.
Snowflake
Data is then loaded from this intermediate area into the data warehouse.
Alternatively, the staging area can be a location inside a data warehouse, so data from source systems
are loaded into this staging area.
Once the data is in the staging area, the data warehouse engine can easily process the data further
into tables.
Now, Snowflake stage objects serve the same purpose as data warehouses staging concept.
So stages in snowflake help snowflake load and unload data easily.
Similar to how data warehouse is used.
Staging Snowflake uses a stage object.
Snowflake uses stages to aid in the loading and unloading of data.
In order to load data into a snowflake table.
The data first must be available in a snowflake stage.
Copy command then can be used to load data into a table after it was made available in a stage.
Similarly, data unloading or exporting is also performed via stages.
That is, that the data can only be extracted to a stage whether internal or external.

Types of Stages:
So in this lecture, I would like to go through the high level view of the snowflake stages, what their
types are and how they are used.
Before I go into those details, I would like to familiarize yourself with this diagram here.
So.
In the center of the screen, we have the snowflake instance.
So this is basically the trial that you would have created.
So I've represented this here on the bottom.
We have the on premises system.
So often when you have a cloud based data warehouse, there is a lot of data coming from on premises
systems as well.
So I've represented those here.
And additionally there could be data coming from your cloud storage.
So S3 buckets and blob storage as well as Google Cloud storage.
So what I'm going to now do is overlay the types of stages in Snowflake over this diagram and hopefully
you're able to piece information together where which type of stage is used.
So snowflake stages can be classified into two high level types.
External stages and internal stages.
Now an external stage refers to a storage location that is outside of Snowflake.
External stages are commonly used to refer to storage locations on cloud platforms such as AWS, S3
Bucket or Azure Blob Storage or Google Storage.
Once an external stage has been defined, the copy command then can be used to load data from that
external
stage into a snowflake table.
Now, it is worth noting that an external stage is a reference to the cloud storage.
The stage itself does not store anything, but rather refers to the cloud storage.
Next.
We have the internal stages.
So unlike external stages.
Internal stages store data in themselves.
Snowflake
There are three types of internal stages, so named internal stages, which are stage objects that can
be created, dropped and modified as required.
Once data has been uploaded into an internal stage, it can then be loaded into a table using the copy
command.
Now, another type of an internal stage is a table stage, which is automatically created for each table
and can be used to load data into that table.
And finally we have the user stage.
So each user gets a personal stage as well, which is created as soon as the user is created.
User stages are also a subtype of an internal stage.
Now, regardless of which internal stage you use, you must first upload the data into that stage before
running the copy command.
Now, in the coming lectures, we will look into the details of each type of stage and load data using
each of the type of the stage objects.

How to Load Data from on premises[laptop] into internal named stage: -

create database dev_lnd;
use schema public;

create table LU_Airport

(
Airline_ID Number,
Airline_Description String
);

CREATE OR REPLACE STAGE LU_Airport_CSV_Stage

FILE_FORMAT = (TYPE = csv FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER =
1);

USE DATABASE dev_lnd;

-- the below statement needs to be run on snowsql--

PUT 'file:///C:/Users/Laddu/Downloads/L_AIRPORT_ID.csv' @LU_Airport_CSV_Stage;

LIST @LU_Airport_CSV_Stage;

USE DATABASE dev_lnd;

COPY INTO LU_Airport

FROM @LU_Airport_CSV_Stage;

SELECT * FROM LU_Airport;

REMOVE @LU_Airport_CSV_Stage;
Snowflake
What is Data Ingestion?

 Data ingestion is the overall process of bringing data from different sources into Snowflake (or any
system).

 It covers the end-to-end pipeline: extracting data, moving it, staging it, and then loading it.

 Sources can be files (CSV, JSON, Parquet), databases, APIs, streaming data, etc.

👉 Think of ingestion as “getting data into Snowflake, regardless of how or from where.”

Does Data Ingestion Mean Data Loading?

Not exactly:

 Data Loading → A step inside ingestion, where staged data is actually written into Snowflake tables
(using COPY INTO).

 Data Ingestion → A broader term that includes:

1. Extracting data from source

2. Moving/staging it (to internal/external stages)

3. Loading it into tables

Example Flow in Snowflake

1. Stage files (data ingestion step):

PUT file://local/orders.csv @my_stage;

2. Load into table (data loading step):

COPY INTO orders

FROM @my_stage/orders.csv

FILE_FORMAT = (TYPE = CSV);

Types of Data Ingestion in Snowflake

1. Batch ingestion → Periodically load data (daily/hourly CSV files).

2. Continuous ingestion (streaming) → Near real-time using Snowpipe or Kafka connectors.

✅ In short:

 Data ingestion = the whole process of bringing data into Snowflake.

 Data loading = one step in ingestion (moving staged data into a table).
Snowflake
Q1: Is the data in a Snowflake internal stage stored in an encrypted format?

✅ Yes.

 All data in Snowflake-managed storage (internal stages) is encrypted at rest by default.

 Snowflake uses AES-256 strong encryption for files stored in internal stages.

 The encryption is fully managed by Snowflake (you don’t need to configure keys manually unless
you want customer-managed keys with Tri-Secret Secure).

Q2: Is the data encrypted prior to being transferred to a Snowflake internal stage?

✅ Yes.

 When you run the PUT command from SnowSQL, the data is encrypted in transit using TLS
(Transport Layer Security) before it reaches Snowflake.

 Once the file arrives in the stage, it is automatically compressed (gzip) and encrypted at rest.

So you have two layers of protection:

1. In-transit encryption → TLS (while uploading via PUT).

2. At-rest encryption → AES-256 (after stored in stage).

📌 Summary for your assignment answers:

1. Internal stage data is encrypted at rest (AES-256, always on).

2. Data is encrypted prior to transfer (via TLS during upload).

Is the data in a Snowflake internal stage stored in an encrypted format?

Yes. All data in Snowflake is stored in an encrypted manner. So, data in tables and internal stages is
encrypted automatically.

Is the data encrypted prior to being transferred to a Snowflake internal stage?

Yes. SnowSQL encrypts data before performing the PUT operation, so the data is encrypted well before it is
uploaded.
Snowflake
Snowflake Objects

Here is the list of Snowflake objects −

 Account

 User

 Role

 Virtual Warehouse

 Resource Monitor

 Integration

 Database

 Schema

 Table

 View

 Stored Procedure

 User Defined Functions (UDF)

 Stage

 File Format

 Pipe

 Sequence

Write the following query to create a database "TEST_DB_2"

 CREATE DATABASE "TEST_DB_2"

View Databases
To view all the listed warehouses, user can use the following SQL. It brings the details of all listed
warehouses.
 SHOW DATABASES

To clone a database, user can use the following SQL, here TEST_DB_3 is a new database named
while DEMO_DB is used to clone it −
 CREATE DATABASE TEST_DB_3 CLONE "DEMO_DB"

To delete database, use the following SQL −

 DROP DATABASE "TEST_DB_3"
Snowflake
 Use the following query to create a warehouse TEST_WH −
 CREATE WAREHOUSE "TEST_WH"
 WITH WAREHOUSE_SIZE = 'SMALL'
 AUTO_SUSPEND = 600
 AUTO_RESUME = TRUE
 MIN_CLUSTER_COUNT = 1
 MAX_CLUSTER_COUNT = 2
 SCALING_POLICY = 'STANDARD'
 COMMENT = ' '

 Edit/Modify/Alter Warehouse
 To alter/modify the warehouse, use the following query and run it −
 ALTER WAREHOUSE "TEST_WH"
 SET WAREHOUSE_SIZE = 'SMALL'
 AUTO_SUSPEND = 1200
 AUTO_RESUME = TRUE
 MIN_CLUSTER_COUNT = 1
 MAX_CLUSTER_COUNT = 1
 SCALING_POLICY = 'STANDARD'
 COMMENT = ' '

View Warehouses

To view all listed warehouses, user can use following SQL. It brings
details of all listed warehouses.
SHOW WAREHOUSES

To suspend a warehouse, use following SQL −

ALTER WAREHOUSE TEST_WH SUSPEND

To resume a warehouse, use following SQL −

ALTER WAREHOUSE "TEST_WH" RESUME If SUSPENDED

To delete a warehouse, use the following SQL −

DROP WAREHOUSE "TEST_WH"

A database is a logical grouping of schemas where tables and columns resides.

Schema is grouping of database objects like tables, views etc. Each schema belongs to a single database.
Database.Schema is the namespace in Snowflake. While performing any operation, the namespace is
required to provide either directly in query or setting up in Snowflake's UI.
Snowflake
Table Types:-
Snowflake categorizes tables into different types based on its uses and nature. There are four types of
tables −

Permanent Table
 Permanent tables are created in the database.

 These tables persist until deleted or dropped from database.

 These tables are designed to store the data that requires highest level of data protection and
recovery.

 These are default table type.

 Time travel is possible in these tables up to 90 days, i.e., that someone can get the data up to 90
days back.

 It is Fail-safe and data can be recovered if lost due to fail.

Temporary Table
 Temporary tables, as the name indicates, exist for a shorter duration.

 These tables persist for a session.

 If a user wants a temporary table for his subsequent queries and analytics, then once a session is
completed, it automatically drops the temporary table.

 It is mostly used for transitory data like ETL/ELT

 Time travel is possible in temporary tables but only 0 to 1 day.

 It is not fail-safe, which means data cannot be recovered automatically.

Transient Table
 These tables persist until the users drop or delete them.

 Multiple users can access a transient table.

 It is used where "data persistence" is required but doesn't need "data retention" for a longer
period. For example, the details of guest visitors of a website, the details of users who visited a
website as well as registered on it, so after registration, storing the details in two different tables
might not be required.

 Time travel is possible in transient tables but only for 0 to 1 day.

 It is also not failed safe.

Snowflake
External Table
 These tables persist until removed.

 Here, the word removed is used, as external tables are like outside of snowflake and they can't be
dropped or deleted. It should be removed.

 It can be visualized as Snowflake over an external data lake, i.e., the main source of data lake is
pointed to Snowflake to utilize the data as per user's need.

 Data cannot be directly accessed. It can be accessed in Snowflake via an external stage.

 External tables are only meant for reading.

 Time travel is not possible for external tables.

 It is not fail-safe inside Snowflake environment.

View Types
There are three main categorized views in Snowflake −

Standard View
 It is the default view type.

 Select queries for tables to view data.

 User can execute queries based on role and permissions.

 Underlying DDL is available to any role who has access to these view.

Secure View
 Secure View means it can be accessed only by authorized users.

 Authorized users can view the definition and details.

 Authorized users with proper role can access these tables and execute the queries.

 In secure view, Snowflake query optimizer bypasses optimizations used for regular view.

Materialized View
 Materialized view is more like a table.

 These views store the result from the main source using filter conditions. For example, a company
has records of all employees who are active, inactive, or deceased from starting of the company.
Now, if a user needs the details of active employees only, then the main table can be queried and
stored as materialized view for further analytics.

 Materialized view is auto-refreshed, i.e., whenever the main table gets additional/new employee
records, it refreshes the materialized view as well.

 Snowflake supports secure materialized view as well.

Snowflake
 Materialized views are maintained automatically, and it can consume significant compute resources.

 Total costs for materialized views are based on "data storage + compute + serverless services."

 Compute charges per materialized view are calculated based on the volume of data changes.

View Table and Columns

To view all the listed tables, you can use the following SQL. It brings details of all
listed schemas.

SHOW TABLES

To view the column definition, use the following SQL −

DESCRIBE TABLE TEST_DB.TEST_SCHEMA_1.TEST_TABLE

To clone a table, use the following SQL −

CREATE TABLE "TEST_DB"."TEST_SCHEMA_1".TEST_TABLE_2 CLONE

"TEST_DB"."TEST_SCHEMA_1"."TEST_TABL_1"

To create a Like table, use the following query −

CREATE TABLE "TEST_DB"."TEST_SCHEMA_1".TEST_TABL_1 LIKE

"TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE"

To delete a table, use the following SQL −

DROP TABLE "TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE_2"

Key Concepts

 Storage Layer: This is where all data is stored in a compressed, columnar format, optimized for
cloud storage.

 Compute Layer (Virtual Warehouses): These are clusters of computing resources that perform all
data processing tasks. They can be resized based on workload requirements.

 Cloud Services Layer: Manages metadata, query optimization, security, collecting and maintaining
statistics, and other services that ensure smooth operation.

 Snowgrid: A cross-cloud system that connects regions and platforms (AWS, Azure, GCP) for
seamless collaboration, governance, and data replication globally.

 Snowflake Cortex AI: Provides instant access to advanced AI tools, including large language models,
directly within the Snowflake AI Data Cloud.
Snowflake
Explore Snowflake's Storage Layer

The Storage Layer in Snowflake handles all data storage tasks, ensuring that data is efficiently organized,
compressed, and encrypted. This layer supports structured, semi-structured, and unstructured data,
providing flexibility for diverse data types.

🔑 Key Insights:

 Data Compression and Storage Formats: Snowflake uses advanced compression algorithms to
reduce storage costs and improve performance. Data is stored in a columnar format, which
optimizes query speed and storage efficiency.

 Data Encryption: All data in Snowflake is encrypted both at rest and in transit, ensuring high levels
of security.

👩‍💻 Real-World Example

A financial services (FinServ) company needs to store large volumes of transactional data securely. By using
Snowflake’s Storage Layer, they can efficiently compress and encrypt the data, ensuring both cost-
effectiveness and security.

❌ Common Pitfalls and ✅ Best Practices:

 Pitfall: Not understanding the cost implications of storing uncompressed data.

 Best Practice: Always compress data before storage to optimize costs and enhance performance.

🧠 Questions to Consider:

1. How does Snowflake’s use of data compression benefit data storage and cost management?

2. What are the advantages of using a columnar storage format in Snowflake?

3. Why is data encryption important, and how does Snowflake implement it?

Exploring the Compute Layer (Virtual Warehouses)

The Compute Layer is where all data processing in Snowflake occurs. This layer consists of virtual
warehouses—independent clusters of compute resources that can be scaled up or down based on the
workload.

🔑 Key Insights:

 Scaling and Managing Virtual Warehouses: Virtual warehouses can be resized, suspended, or
resumed to optimize cost and performance.

 Workload Isolation: Each virtual warehouse operates independently, ensuring that the performance
of one does not affect another. This isolation is crucial for maintaining consistent performance
across different workloads.

👩‍💻 Real-World Example

Snowflake
A marketing team needs to run complex analytics during a campaign. By creating a large virtual warehouse,
they can process data quickly and efficiently. After the campaign, the warehouse can be scaled down or
suspended to save costs.

❌ Common Pitfalls and ✅ Best Practices:

 Pitfall: Over-provisioning warehouses, leading to unnecessary costs.

 Best Practice: Regularly monitor warehouse usage and adjust configurations based on actual
workload requirements.

🧠 Questions to Consider:

1. What factors should be considered when resizing a virtual warehouse in Snowflake?

2. How does workload isolation benefit users in a multi-tenant environment?

3. What are the cost implications of running multiple virtual warehouses simultaneously?

The Role of the Cloud Services Layer

The Cloud Services Layer coordinates all Snowflake operations, handling activities such as query
optimization, metadata management, and security enforcement. This layer ensures that the platform
operates efficiently and securely.

🔑 Key Insights:

 Query Optimization: Snowflake’s cloud services optimize queries to ensure fast and efficient data
retrieval.

 Security Management: This layer manages all security protocols, including user authentication and
access control, to protect data integrity and privacy.

👩‍💻 Real-World Example

A data science team needs to run complex queries involving multiple datasets. Snowflake’s query
optimization feature ensures these queries are executed quickly, saving time and computing resources.

🧠 Questions to Consider:

1. How does the Cloud Services Layer contribute to overall performance in Snowflake?

2. What are some of the security management functions performed by the Cloud Services Layer?

3. Why is query optimization important in a data platform like Snowflake?

Snowflake’s unique architecture consists of three key layers:

 Database Storage

 Query Processing

 Cloud Services

Database Storage
Snowflake
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized,
compressed, columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored — the organization, file size, structure,
compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data
objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible
through SQL query operations run using Snowflake.

Query Processing

Query execution is performed in the processing layer. Snowflake processes queries using “virtual
warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes
allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute resources with
other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other
virtual warehouses.

For more information, see Virtual warehouses.

Cloud Services

The cloud services layer is a collection of services that coordinate activities across Snowflake. These
services tie together all of the different components of Snowflake in order to process user requests, from
login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake
from the cloud provider.

Services managed in this layer include:

 Authentication

 Infrastructure management

 Metadata management

 Query parsing and optimization

 Access control

Connecting to Snowflake

Snowflake supports multiple ways of connecting to the service:

 A web-based user interface from which all aspects of managing and using Snowflake can be
accessed.

 Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using
Snowflake.

 ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to
Snowflake.

 Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to
Snowflake.
Snowflake
 Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica)
and BI tools (e.g. ThoughtSpot) to Snowflake.

VALIDATION_MODE in the Snowflake COPY INTO statement is super useful when you want to test your
data load before actually inserting anything into the target table.

🔹 VALIDATION_MODE in COPY INTO

 It tells Snowflake to validate the input files during a COPY INTO operation instead of (or in addition
to) loading the data.

 This helps you catch errors before polluting your table with bad rows.

🔹 Syntax

COPY INTO <table_name>

FROM @<stage>

FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)

VALIDATION_MODE = RETURN_<mode>;

🔹 Modes of VALIDATION_MODE

1. RETURN_ERRORS

o Returns rows that failed during the load (e.g., type mismatch, missing fields).

o Example:

o COPY INTO my_table

o FROM @my_stage

o FILE_FORMAT = my_csv_format

o VALIDATION_MODE = RETURN_ERRORS;

✅ Shows the first 100 errors (default limit).

2. RETURN_ALL_ERRORS

o Returns all errors for all rows/files.

o Slower than RETURN_ERRORS, but useful for thorough validation.

Snowflake
3. RETURN_N_ROWS

o Returns the first N rows (good for sampling/preview).

o Example:

o COPY INTO my_table

o FROM @my_stage

o FILE_FORMAT = my_csv_format

o VALIDATION_MODE = RETURN_5_ROWS;

✅ Returns 5 rows that would be loaded, without actually loading them.

🔹 Why use it?

 Data quality check before loading.

 Verify file format settings (delimiter, skip_header, null handling).

 Prevents wasting time on rollback/reload if there are issues.

🔹 Example

Say your file has a string "abc" in a numeric column:

COPY INTO my_table

FROM @my_stage

FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1)

VALIDATION_MODE = RETURN_ERRORS;

👉 Snowflake will return something like:

Row_Number File Error Column_Name

3 file1.csv Numeric value expected amount

✅ In short:

 RETURN_ERRORS → shows bad rows (limited).

 RETURN_ALL_ERRORS → shows all bad rows.

 RETURN_N_ROWS → preview sample rows without inserting.

Snowflake
What would be the default behavior if you ran the COPY statement twice with the same set of files to the
same table?

A. It would skip loading the file This is the right answer, by default files are not re-loaded. This is done
to prevent duplicates, the load metadata is stored in the table and the COPY knows that the file has
already been loaded. Truncate the table and you can reload the file again

How can you load duplicate data into a table with the COPY statement?

A. Force = True

The default behavior when a COPY encounters an error is to?

A. Abort the statement and rollback all the rows This is the right answer, Snowflake aborts loading if
any of the files it is loading encounters bad rows and the entire load will be roll backed

You are expecting some oversized text data to appear in your data files and want to truncate the column
data and load it. How will you do that ?
A. TRUNCATECOLUMNS=TRUE This is the right answer, when TRUNCATECOLUMNS=TRUE any data
greater than the length of the column will be truncated and loaded into the target table.

You are loading 500 small files using a single COPY statement from Snowsight and want to see only partially
loaded files names in the output.
How can you do that ?

A. RETURN_FAILED_ONLY = TRUE This is the right answer, when RETURN_FAILED_ONLY is set to TRUE,
files that have rejected rows will be displayed and all rows that have no rejected rows will not have
their names displayed, instead the count of sucessfully loaded files will be shown along with the
number of rows loaded

When using VALIDATION_MODE which statement is TRUE ?

A. No rows are inserted into the target table This is the right answer, VALIDATION_MODE displays rows
or errors, it does not insert any data into the target table

You have 900 files of the exact same structure that need to be loaded to a single target table each of 1 GB.
You want to limit the number of files that are processed in a single COPY statement to 1 every time it is
run. How can you do that ?
A. SIZE_LIMIT This is the right answer, SIZE_LIMIT can be used to limit the number of files that are
picked up by the COPY statement. Once the number of BYTES provided in the SIZE_LIMIT is
exceeded COPY stops picking new files for loading and exits. The COPY can then be re-run to load
the next set of files. To ensure that just 1 file is picked, set SIZE_LIMIT =0

You create a COPY statement with COPY option SIZE_LIMIT and are trying to determine what would be
the best value to provide.

Your goal is to process just 1 file everytime the COPY runs.

Snowflake
A. SIZE_LIMIT=0 This is the right answer, The COPY does not check for the SIZE_LIMIT till the 1st
file is loaded. After the 1st file is loaded it check to see if the SIZE_LIMIT has been exceeded. If the
LIMIT has been exceeded then COPY will not pick up any new files for loading and exit after the
COPY completes.

In snowflake, cannot change column DEPARMENT_NAME from type VARCHAR(16777216) to VARCHAR(20)

because reducing the byte-length of a varchar is not supported.

In Snowflake, once a column is created with a VARCHAR type, you cannot reduce its maximum length
(e.g., from VARCHAR(16777216) (which is string data type)→ VARCHAR(20)). That’s why you’re seeing this
error.

To see the loadtime and files loaded into a table

SELECT file_name, status, row_count, last_load_time

FROM TABLE(

INFORMATION_SCHEMA.COPY_HISTORY(

TABLE_NAME => 'DEPARTMENTS_SUBSET_FILES',

START_TIME => DATEADD(hours,-1,CURRENT_TIMESTAMP)

ORDER BY last_load_time DESC;

Good question 👍

In Snowflake:

 STRING is just an alias for VARCHAR.

 VARCHAR(n) means a string column with a maximum length of n characters.

 VARCHAR or STRING (without a number) defaults to the maximum allowed length: 16,777,216
characters.

🔑 Differences:

Data Type Meaning

VARCHAR(20) Stores up to 20 characters only. If you try to insert more, Snowflake truncates silently.

STRING / VARCHAR Stores up to 16 MB of characters (the default).

Snowflake
Data Type Meaning

⚡ Storage:

 Snowflake is columnar and compresses data.

 It doesn’t allocate the full VARCHAR(20) width for every row — it only stores what’s needed for each
value, plus metadata.

 So storage size is the same whether you use VARCHAR(20) or STRING if your data values are the
same.

 The difference is only in the limit enforcement:

o VARCHAR(20) → max 20 chars.

o STRING → up to 16 MB.

✅ So to answer you:
Yes — the same values will consume the same storage, whether the column is VARCHAR(20) or STRING.
The only difference is the maximum length allowed.

👉 Do you want me to also show you how to check max length actually used in a column (so you know
whether VARCHAR(20) is enough)?

Exactly 👍 same principle applies in Snowflake for numeric types as well.

🔑 How numbers work in Snowflake:

 NUMBER(p,s)

o p = precision (total digits)

o s = scale (digits after decimal point)

 INT, INTEGER, BIGINT, SMALLINT are all just aliases for NUMBER(38,0) (max 38 digits, no decimals).

✅ Example

 NUMBER(10) → allows up to 10 digits (e.g., 9999999999)

 INT → actually NUMBER(38,0) → allows up to 38 digits (e.g., way bigger than typical SQL int).
Snowflake
⚡ Storage behavior

 Snowflake does not reserve space for all 38 digits or 10 digits.

 It stores numbers in a compressed, variable-length format, based on actual size of the value.

 So inserting 123 into NUMBER(10) and into INT → will use the same storage.

✅ So yes, just like VARCHAR(20) vs STRING,

NUMBER(10) and INT store the same value with the same storage footprint.
The difference is only the limit enforcement:

 NUMBER(10) → max 10 digits.

 INT → max 38 digits.

👉 Do you want me to show you a quick query to check the maximum digits actually used in your numeric
column (like we did for varchar)?

Great thinking 👌 — this comes up a lot when designing Snowflake schemas.

🔑 Short answer:

No performance or storage benefit when you use NUMBER(10) instead of INT in Snowflake.

Why?

1. All integers in Snowflake are stored as NUMBER(38,0) internally.

o INT, INTEGER, BIGINT, SMALLINT → just aliases for NUMBER(38,0).

o NUMBER(10) is still stored in the same compressed numeric format.

2. Storage is compressed and variable-length.

o Storing 123 takes the same space whether the column is NUMBER(10) or INT.

o Snowflake doesn’t pre-allocate 10 digits — it stores only what’s needed.

3. Computation cost is the same.

o Arithmetic, joins, aggregations — no performance difference.

o Snowflake always handles numbers up to 38 digits in the same way.

Snowflake
✅ When NUMBER(10) is useful

Even though there’s no performance/storage gain, it’s still useful for data integrity:

 If a column should only have up to 10 digits (e.g., phone number, employee_id, account_id),
defining NUMBER(10) enforces that.

 Helps catch bad data before it’s inserted.

 Makes your schema self-documenting → other developers know the expected range.

⚡ Example

CREATE OR REPLACE TABLE employees (

emp_id NUMBER(10), -- ensures max 10 digits

salary INT -- could store up to 38 digits, but unnecessary

);

If someone tries:

INSERT INTO employees VALUES (12345678901, 50000);

➡️NUMBER(10) column will fail because the ID has 11 digits.

➡️INT would silently accept it.

✅ So:

 No performance/storage benefit.

 Only a data validation / schema clarity benefit.

👉 Do you want me to also compare this with FLOAT vs NUMBER in Snowflake? (That’s where
performance/storage does change a bit.)

Which data format is used for storing and transmitting data between applications ? Json

THESE OPTIONS ARE NOT TO USE WITH SNOWPIPE

FORCE=TRUE

PURGE = TRUE
ON_ERROR = SKIP_FILE
Snowflake
ON_ERROR=ABORT_STATEMENT

SIZE_LIMIT = 2000

RETURN_FAILED_ONLY = TRUE

VALIDATION_MODE =’RETURN_ERRORS’

ONLY SYSTEM$STREAM_HAS_DATA('<stream_name>') IS ALLOWED TO USE WITH WHEN KEYWORD IN

TASKS

YOU MUST DEFINE WAREHOUSE IN TASK SCHEDULE IF YOU DON’T MENTION THEN THE DEFAULT SIZE FOR
THE WAREHOUSE IN TASKS SCHEDULE IS MEDIUM.

TO CREATE DATA MASKING YOU NEED TO USE ACCOUNT ADMIN ROLE.

CREATE OR REPLACE MASKING POLICY MASK_PHONE_NUMBER AS (VAL VARCHAR)

RETURNS VARCHAR ->

CASE

WHEN CURRENT_ROLE() ='ROLE_DATA_ENGINEER' THEN VAL

ELSE '#######################'

END

ALTER TABLE HRMS.HR.EMPLOYEES_MASK MODIFY COLUMN PHONE_NUMBER SET MASKING POLICY

MASK_PHONE_NUMBER;

THE INPUT VALUE VAL DATA TYPE , THE PHONE NUMBER DATA TYPE AND THE RETURN DATATYPE SHOULD
MATCH.

ONCE YOU SET THE MASKING POLICY TO A COLUMN IF YOU WANT TO DROP THE MASKING POLICY OR IF
YOU WANT TO CHANGE THE CONDITIONS (LIKE ADDING MORE ROLES TO THE MASKING POLICY) IN THE
MASKING YOU MUST FIRST UNSET THE MASKING POLICY FROM THE COLUMN AND THEN YOU HAVE TO
TRY.

MASKING Policy cannot be dropped/replaced as it is associated with one or more entities.

CREATE OR REPLACE MASKING POLICY MASK_PHONE_NUMBER AS (VAL STRING)

Snowflake
RETURNS STRING ->

CASE

WHEN CURRENT_ROLE() IN ('ROLE_DATA_ENGINEER','ROLE_BI_DEVELOPER') THEN VAL

ELSE '#######################'

END;

DROP MASKING POLICY MASK_PHONE_NUMBER;

ALTER TABLE HRMS.HR.EMPLOYEES_MASK MODIFY COLUMN PHONE_NUMBER UNSET MASKING POLICY;

NOTE:- YOU CANNOT APPLY TWO DIFFERENT MASKING POLICIES TO A SINGLE COLUMN.

RISKS AND DANGERS OF USING DATA MASKING :

IF THE TABLE IS SET TO THE MASKING POLICY AND IF YOU TRY TO COPY OR CLONE THE TABLE THEN THE
DATA IN THE NEW TABLE WILL BE THE MASKED VALUES.

IF YOU TRY TO FIND SOME DATA LIKE

SELECT * FROM EMPLOYEE_MASK

WHERE DATE_OF_BIRTH LIKE ‘%2003%’;

IT WILL TRY TO FETCH THE DATA ON THE MASKING DATA ONLY NOT THE ORIGINAL DATA.

Q. WHAT IF WE CREATE A NEW TABLE WITH THE EXISTING DATA IN SOME OTHER DATA WHICH HAS
MASKING POLICY. WILL THE POLICY IS SET TO THE NEW TABLE ALSO?

A. NO THAT MASKING POLICY WON’T BE SET TO THE NEW TABLE BUT THE MASKED VALUES WILL BE
DIRECTLY INSERTED INTO THE NEW TABLE.

DROPPING A TABLE DOESN’T AUTOMATICALLY DROPS THE MASKING POLICIES . EVEN IF YOU DROP THE
TABLE THE MASKING POLICY WHICH IS ASSOCIATED WITH IT WILL BE THERE .

SHOW MASKING POLICIES;

The Masking policy existence is completely different from the existence of a table that has a masking policy
on it. A given policy can be used on multiple tables or multiple columns in the same table and hence
dropping the table does not cause the masking policy to be dropped. If you need to drop the masking
policy, all use of the policy must be stopped/removed and then the policy can be dropped.
Snowflake
When data returned from a SELECT is masked, any table created with either CLONE or Create Table As
SELECT will have the data as masked. This data in the newly created table is permanently masked like static
masking and cannot be reversed. Please use a Role that has access to the unmasked data if you want to
create a table that has no masked data

A masking policy can be used on just a single column in a given table ?

FALSE

Masking policies can be used on multiple columns on the same or different tables.What is important is to
make sure that the RETURNS data type and the data type of the column that is being masked is exactly the
same.

Which view will you use to locate the tables and columns that are being masked by a Policy?
INFORMATION_SCHEMA.POLICY_REFERENCES

INFORMATION_SCHEMA.POLICY_REFERENCES can be used to locate the table and columns that are actively
using the masking policy. You can SHOW MASKING POLICIES to see all the masking policies available and
then use the INFORMATION_SCHEMA.POLICY_REFERENCES to get the table names and columns names
that are masked

YOU CANNOT ADD MULTIPLES DATABASES TO A SINGLE SHARE OBJECT BUT YOU CAN ADD MULTIPLES
SCHEMAS AND TABLES AND VIEWS FROM A SINGLE DATABASE.

THAT MEANS A SHARE OBJECT IS ONLY FOR A SINGLE DATABASE.

EXAMPLE SITUTATION: FOR EXAMPLE I HAVE CREATED A SHARE FOR DATABASE HRMS WITH SCHEMA HR
AND FOR ALL TABLES AND AFTER CREATING THE SHARE AND ASSIGNING IT TO THE ACCOUNT . THEN IF I
CREATE ANY OTHER TABLES IN THE SAME SCHEMA OR DATABASE , THE NEWLY CREATED TABLES WILL NOT
BE SHOWING IN THE SHARE DUE TO THE SECURITY PURPOSES IF WE WANT TO ADD THE NEWLY CREATED
TABLE THEN WE SHOULD GRANT THE PERMISSION ON THE TABLE TO THE SHARE.

IF ANY CHANGES OR UPDATES MADE TO THE TABLE WHICH IS ADDED TO THE SHARE OBJECT , THEN THE
CHANGES WILL ALSO REFLECT IN THE CONSUMER ACCOUNT.

YOU CANNOT ADD OR GRANT PERMISSIONS OF A NORMAL VIEW(WHICH IS NOT SECURE VIEW) TO THE
SHARE. TO DO SO YOU HAVE TO ALTER THE ATTRIBUTE OF THE SHARE OBJECT SECURE_OBJECTS_ONLY =
FALSE

ALTER SHARE SHARE_HRMS_HR SET SECURE_OBJECTS_ONLY = FALSE;

THEN IT WILL ALLOW THE NON SECURE VIEWS ALSO TO ADD TO THE SHARE.
Snowflake

How about creating a view that is on more than one table?

And also the underlying tables are not from the same database.

I will switch back to the producer account and use the account admin role for the next step.

I will go ahead and create a new database called sales db.

In the sales DB database I will create a schema called sales.

I will follow this up by creating a table called Daily Sales, which has three columns employee ID,

sales amount, and Sales date.

Let me insert a couple of rows into the table.

I am able to see the four rows I inserted.

I will join the two tables and select data from it to make sure that the tables are accessible, and

I am able to see the data.

I am able to select data based on a join from the two tables.

Now, based on the same join condition, I am going to create a secure view.

This view is different from a previous view because it uses two tables and they are both from different

databases.

The secure view is successfully created.

I am now going to grant the secure view I created to share HRMS.

The grant failed because I am referencing the sales DB object in the view and share.

HRMS does not have usage privilege on the sales db database and sales schema.

Now if I try to grant select on the sales DB and sales schema that fails to because only objects from

one database can be given to a share, and sales DB is not that database.

So how do I fix this issue?

The way to do that is by granting reference usage on database sales db to share Ms. reference usage

gives select access on the table inside the sales db to the share.

So if you have tables from multiple databases that need to be referenced inside a view, and that view

needs to be added to a share, you can by granting reference usage on that database to the share, allow

the share to get select access on that table without giving them full access to the database.

Now that the grant is done, I will attempt to add the view to the share again.

The grant is a success.

Snowflake
I will log in to the consumer account and check to see if I can see the newly added view.

I'm able to see it.

This is how you can share data from multiple database with a view on multiple tables and a single share.

Stages in Snowflake
No ratings yet
Stages in Snowflake
18 pages
Arch and Types of Snowflake Stages
No ratings yet
Arch and Types of Snowflake Stages
51 pages
7.stages in Snowflake
No ratings yet
7.stages in Snowflake
15 pages
Stage & SnowSQL
No ratings yet
Stage & SnowSQL
8 pages
Snowflake SQL
No ratings yet
Snowflake SQL
2 pages
Snowflake Interview Questions PDF
No ratings yet
Snowflake Interview Questions PDF
6 pages
Snowflake Prctice1
100% (1)
Snowflake Prctice1
51 pages
Snowflake Questions V2
No ratings yet
Snowflake Questions V2
6 pages
Snowflake Admin & Data Loading Guide
No ratings yet
Snowflake Admin & Data Loading Guide
51 pages
Snowflake Data Processing
No ratings yet
Snowflake Data Processing
2 pages
Snowflake
No ratings yet
Snowflake
7 pages
An Overview of Data Unloading in Snowflake
No ratings yet
An Overview of Data Unloading in Snowflake
8 pages
Snowflake
No ratings yet
Snowflake
73 pages
Snowflake Viva
No ratings yet
Snowflake Viva
4 pages
All Course Slides
100% (1)
All Course Slides
192 pages
E - Snowflake-Snowpro-Core-1
No ratings yet
E - Snowflake-Snowpro-Core-1
79 pages
Snowflake Data Integration
No ratings yet
Snowflake Data Integration
2 pages
6.DataLoading in Snowflake
No ratings yet
6.DataLoading in Snowflake
10 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
Snowflake Stages and Data Loading
No ratings yet
Snowflake Stages and Data Loading
3 pages
Snowflake Snowpro Core
No ratings yet
Snowflake Snowpro Core
9 pages
Snowflake Cloud Data Warehouse Guide
0% (1)
Snowflake Cloud Data Warehouse Guide
15 pages
SnowPro Core Certification Prep
No ratings yet
SnowPro Core Certification Prep
37 pages
Ddlmnademn
No ratings yet
Ddlmnademn
3 pages
Snowflake Syllabus
100% (2)
Snowflake Syllabus
2 pages
Snow Flake Notes
No ratings yet
Snow Flake Notes
36 pages
Snowflake 4 Data Loading
No ratings yet
Snowflake 4 Data Loading
1 page
The Simple Guide To Snowflake Tables
No ratings yet
The Simple Guide To Snowflake Tables
51 pages
Interview Questions
No ratings yet
Interview Questions
16 pages
Ravi Snowflake Interview Questions-1
No ratings yet
Ravi Snowflake Interview Questions-1
20 pages
Snowflake Interview Questions and Answers
No ratings yet
Snowflake Interview Questions and Answers
5 pages
Snowflake+Interview+Questions+ +Part+I
No ratings yet
Snowflake+Interview+Questions+ +Part+I
27 pages
Unit - Iii
No ratings yet
Unit - Iii
39 pages
Snowflake
No ratings yet
Snowflake
22 pages
Loading Data in +snowflake
No ratings yet
Loading Data in +snowflake
10 pages
Architecture
No ratings yet
Architecture
4 pages
Unit - V
No ratings yet
Unit - V
27 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
Snowflake: Cloud Data Platform Guide
No ratings yet
Snowflake: Cloud Data Platform Guide
108 pages
Snowflake Snowpro Core Preview
No ratings yet
Snowflake Snowpro Core Preview
2 pages
Snowflake Data Warehouse Guide
No ratings yet
Snowflake Data Warehouse Guide
8 pages
Snowflake
No ratings yet
Snowflake
16 pages
Snowflake Notes
100% (10)
Snowflake Notes
67 pages
Snowpro Exam Questions
No ratings yet
Snowpro Exam Questions
4 pages
Snowflake Certification Practice Paper5 v1
No ratings yet
Snowflake Certification Practice Paper5 v1
19 pages
Session-13-Snowflake Internal Stages
No ratings yet
Session-13-Snowflake Internal Stages
2 pages
Snowflake DBT Content
No ratings yet
Snowflake DBT Content
4 pages
Snowflake Interview Questions: Click Here
No ratings yet
Snowflake Interview Questions: Click Here
29 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
Crack Interview With Top 25 Snowflake Questions
No ratings yet
Crack Interview With Top 25 Snowflake Questions
5 pages
Snowflake Cost Optimization Strategies 7 Tips To Reduce Your Snowflake Costs (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
No ratings yet
Snowflake Cost Optimization Strategies 7 Tips To Reduce Your Snowflake Costs (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
22 pages
Snowflake Training Slide SANMs
71% (7)
Snowflake Training Slide SANMs
218 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Snowflake Questions 2
No ratings yet
Snowflake Questions 2
6 pages
SF Notes Anuja
No ratings yet
SF Notes Anuja
12 pages
Survi Snowflake Course Content
No ratings yet
Survi Snowflake Course Content
5 pages
Health - Simple Sentences
No ratings yet
Health - Simple Sentences
12 pages
Tapescripts For ENGLISH 3-Activity Book
No ratings yet
Tapescripts For ENGLISH 3-Activity Book
23 pages
Generalized Periodontitis Treated With Periodontal, Orthodontic, and Prosthodontic Therapy: A Case Report
No ratings yet
Generalized Periodontitis Treated With Periodontal, Orthodontic, and Prosthodontic Therapy: A Case Report
16 pages
Book List GNM 2nd Year
No ratings yet
Book List GNM 2nd Year
1 page
Cattle Clinical Exam Guide
No ratings yet
Cattle Clinical Exam Guide
10 pages
Summary of Kentucky Open Meetings Law
No ratings yet
Summary of Kentucky Open Meetings Law
3 pages
Indac, Lesson 4 Take Off - Take Action. Self - Check.self Reflect.
No ratings yet
Indac, Lesson 4 Take Off - Take Action. Self - Check.self Reflect.
4 pages
Academy Awards: Top Stories
No ratings yet
Academy Awards: Top Stories
1 page
SU23 PAD6143 58628 Prysmakova
No ratings yet
SU23 PAD6143 58628 Prysmakova
14 pages
Green Valley Organisational Study
100% (1)
Green Valley Organisational Study
58 pages
MUA 0900 Winter 2024 Syllabus
No ratings yet
MUA 0900 Winter 2024 Syllabus
5 pages
Maintenance Record & Record
No ratings yet
Maintenance Record & Record
2 pages
Pandiwa (DLP) GRADE 3
No ratings yet
Pandiwa (DLP) GRADE 3
8 pages
Interpretation of Liver Enzyme Tests - A Rapid Guide
100% (2)
Interpretation of Liver Enzyme Tests - A Rapid Guide
3 pages
Batangas I Electric Cooperative, Inc
No ratings yet
Batangas I Electric Cooperative, Inc
13 pages
Brochure Water Based Adhesives Web
No ratings yet
Brochure Water Based Adhesives Web
9 pages
Managing Global Operations: 5th Term MBA-2016
No ratings yet
Managing Global Operations: 5th Term MBA-2016
4 pages
Subway Marketing Mix
No ratings yet
Subway Marketing Mix
20 pages
Unpublished
No ratings yet
Unpublished
4 pages
Scion XB Base Manual
No ratings yet
Scion XB Base Manual
40 pages
Refined Speech in Jewish Texts
No ratings yet
Refined Speech in Jewish Texts
39 pages
Prevent Arguments with One Phrase
No ratings yet
Prevent Arguments with One Phrase
3 pages
GU-569 Rev.04 Capital Cost Estimating Guidline
100% (1)
GU-569 Rev.04 Capital Cost Estimating Guidline
41 pages
21 - Data Structure and Algorithms - Hash Table
No ratings yet
21 - Data Structure and Algorithms - Hash Table
9 pages
Ways of Managing Deficiency Diseases
No ratings yet
Ways of Managing Deficiency Diseases
4 pages
Boi Duong 12 Nguyen Phu Tho
100% (1)
Boi Duong 12 Nguyen Phu Tho
224 pages
English Worksheets Class 1 Nouns Plurals Verbs Adjectives and Punctuation
No ratings yet
English Worksheets Class 1 Nouns Plurals Verbs Adjectives and Punctuation
6 pages
Power Amplifier Safety Guide
No ratings yet
Power Amplifier Safety Guide
20 pages
Baby Corgi: Pattern #-Yarn
No ratings yet
Baby Corgi: Pattern #-Yarn
7 pages
Spaces Required Area Per Person (In M) No. of Users No. of Units Total Area (In M) Remarks
100% (4)
Spaces Required Area Per Person (In M) No. of Users No. of Units Total Area (In M) Remarks
11 pages