0% found this document useful (0 votes)

28 views17 pages

How To Test Azure Data Pipeline

This document provides a comprehensive guide on testing Azure Data Pipelines using Azure Data Factory (ADF), outlining the prerequisites, steps to create and validate ETL pipelines, and various use cases. It emphasizes the importance of monitoring, managing, and validating data movement and transformation processes to ensure accuracy and reliability. Additionally, it highlights potential challenges in testing and the need for meticulous planning and well-designed test cases.

Uploaded by

p mansoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views17 pages

How To Test Azure Data Pipeline

Uploaded by

p mansoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

How to test Azure Data

Pipeline?
Shivanand Veerbhadrannavar

Shivanand Veerbhadrannavar

Innovative Technical Manager | Data Quality & Analysis Expert | ML and

Strategic Data Management Leader
Published Oct 15, 2023
+ Follow
In today’s data-driven world, organisations often need to
extract, transform, and load (ETL) data from various sources
to support their business processes. Azure Data Factory
(ADF) is one of the powerful cloud-based services provided by
Microsoft that enables you to build and manage ETL pipelines
at scale. In this blog, we will walk you through the process of
using Azure Data Factory to validate an ETL pipeline.

Prerequisites

To follow along with this blog, you will need:

1. An Azure subscription: Sign up for a free Azure

account if you don’t have one

2. Azure Data Factory: Create an Azure Data Factory

instance in your Azure portal

3. Knowledge over Azure cloud services and it’s uses

such as Storage account, SQL server, SQL database
etc

Steps to create data pipeline in ADF:

Step 1: Set up Azure Data Factory

1. Go to the Azure portal and create a new Azure Data

Factory instance

2. Provide the necessary details such as name,

subscription, resource group, and location

3. Once the Data Factory instance is created, navigate

to it and click on the “Author & Monitor” button to
open the Azure Data Factory user interface

Step 2: Create Linked Services

1. Click on the “Author” button in the Azure Data

Factory user interface to start building your ETL
pipeline

2. Begin by creating linked services, which represent

the connection information to your data sources and
destinations

3. Click on the “Manage” tab, select the desired type of

data source or destination, and provide the required
connection details. For instance, you can create
linked services for Azure SQL Database, Azure Blob
Storage, or an on-premises SQL Server
Step 3: Create Datasets

1. After setting up linked services, proceed to create

datasets. Datasets define the structure and location
of your source and destination data

2. Click on the “Author” tab and select the “Datasets”

tab

3. Create a dataset for each data source and

destination, specifying the format, location, and
linked service information. For instance, you can
create a dataset for a CSV file in Azure Blob Storage
or a table in an Azure SQL Database

Step 4: Build Pipelines

1. With the linked services and datasets in place, it’s

time to create pipelines that define the ETL workflow

2. Click on the “Author” tab, select the “Pipelines” tab,

and click on the “New pipeline” button

3. Drag and drop activities onto the pipeline canvas to define

the ETL steps
4. Configure each activity based on the data movement or
transformation required. For instance, you can use the “Copy
data” activity to move data from a source dataset to a
destination dataset or the “Data Flow” activity to perform
complex transformations using Azure Data Factory Data
Flows
5. For reference, predefined Azure Data Factory pipelines
allow you to get started quickly with Data Factory. Templates
are useful when you’re new to Data Factory and want to get
started quickly. These templates reduce the development
time for building data integration projects, thereby improving
developer productivity

Step 5: Monitor and Manage ETL Pipeline

1. After building the ETL pipeline, it’s crucial to monitor

its execution and manage its performance

2. In the Azure Data Factory user interface, click on the

“Monitor” button to access the monitoring dashboard

3. Monitor the pipeline runs, track data movement, and

troubleshoot any issues that arise during execution.

4. Utilise the integration with Azure Monitor and Azure

Log Analytics for more advanced monitoring and
analytics capabilities

Step 6: Test and Validate the Pipeline

1. Test the ETL pipeline by running it manually or

scheduling it to run at specific intervals

2. Monitor the execution of the pipeline and verify that

each Activity completes successfully
3. Validate the data movement, transformation, and
loading processes by checking the output in the
destination system or running queries against the
data

By following these steps, you can create an ETL pipeline in

Azure Data Factory for testing practice. Remember to iterate
and refine your pipeline based on feedback and continuously
improve the testing process.

Validation scenarios

As a tester for an Azure Data Factory pipeline, your role is to

ensure that the ETL process runs smoothly and that the data
transformation is accurate and reliable. Before starting it is
good to have an understanding of data model, Source to
target mapping check, Understanding data dictionary,
Metadata information and Environment readiness. Here is a
list of validations to consider when testing an Azure Data
Factory pipeline:
Remember to document your test cases, including inputs,
expected outputs, and actual results, to facilitate tracking
and issue resolution. Regular regression testing should also
be conducted when changes or updates are made to the
pipeline to ensure ongoing reliability and accuracy of the ETL
process.

Sample Use cases

Use Case1: Copying Data from Azure Blob Storage to

Azure SQL Database
Recommended by LinkedIn
Azure Data Factory vs SQL Server Integration Services:…
Sandeep Kumar Valluri 1 year ago

What is Azure Data Factory (ADF)?

Shruti Anand 9 months ago
Are you Ready to Say “Buh-Bye” to ETL and “Yo!” to…
Monika Wahi 3 years ago

Let’s consider an example where we want to copy data from

a CSV file stored in Azure Blob Storage to an Azure SQL
Database table.

1. Create linked services for Azure Blob Storage and

Azure SQL Database, providing the necessary
connection information

2. Create datasets for the source CSV file and the

destination SQL Database table, specifying the
respective linked services and file formats.

3. Build a pipeline
Above pipeline tries to truncate the table with already loaded
data then it will load a newly received csv file as per input
dataset. In this case both the activities are successful.
You may consider some negative scenarios, such as running
the pipeline with missing source file and table. Appropriate
error messages should get generate with meaningful
information for troubleshooting
Missing target table:

Missing input file:

Suggestions:
All defined columns will have a fixed mapping by default. A
fixed mapping takes a defined, incoming column and maps it
to an exact name.
If necessary, use schema drift throughout your flow to protect
against schema changes from the sources.
When schema drift is enabled, all incoming fields are read
from your source during execution and passed through the
entire flow to the Sink. By default, all newly detected
columns, known as drifted columns, arrive as a string data
type. If you wish for your data flow to automatically infer data
types of drifted columns, check Infer drifted column types in
your source settings.
Use Case 2: Delta data loading from SQL DB
Delta load refers to the process of identifying and loading
only the changed or new data since the last execution of the
pipeline. It involves comparing the source data with the
previously processed data and selecting the records that
meet specific criteria, such as updated timestamps or new
identifiers.
Below pipeline is built to load delta records. There are 3 rows
identified as a delta
By implementing delta load in your ADF pipeline, you can
efficiently process incremental data updates, reduce
processing time and resources, and keep your destination
system synchronised with the changes happening in the
source system
Use Case 3: File archive
File archiving involves moving files from the primary storage
location to a separate storage container or archive. This
process is typically performed to free up space in the primary
storage and ensure long-term retention of files while still
allowing access if needed.
Check the DimAccount.txt file got deleted from blob storage
‘etltestsourcefiles’’ and moved to Archive folder
‘archivedfiles’
By incorporating file archiving in Azure Data Pipeline creation,
organisations can effectively manage data retention, improve
system performance, ensure compliance, and enable efficient
data backup and recovery.

Challenges and Solutions

As the tester responsible for testing the Azure data pipeline, I

wanted to communicate the potential challenges that we may
face during the testing process and how we plan on
overcoming them.
Conclusion

Azure Data Factory provides a robust platform for building

data pipelines, allowing testers to define validation
checkpoints at different stages of the pipeline. From
connectivity and configuration checks to data completeness,
integrity, and transformation accuracy, testers can design
comprehensive test cases to verify the data movement,
transformations, and loading processes.
Effective data validation in Azure Data Pipelines requires
meticulous planning, well-designed test cases, and
continuous monitoring. Testers should validate the data
against expected results, check for any discrepancies or
errors, and ensure compliance with industry regulations and
security standards.
In conclusion, Azure Data Pipeline provides a powerful
platform for testers to perform data validations, ensuring the
accuracy, integrity, and quality of the data being processed.
With the right approach, thorough testing, and adherence to
best practices, testers can contribute to the success of Azure
Data Factory projects and enable organisations to make
informed decisions based on trustworthy data

Detailed Azure Data Factory Presentation
No ratings yet
Detailed Azure Data Factory Presentation
30 pages
ADF - Intro and Components
No ratings yet
ADF - Intro and Components
17 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
Data Factory
100% (2)
Data Factory
26 pages
Adf Part-1
No ratings yet
Adf Part-1
5 pages
Azure Data Factory: Cloud ETL & Integration
No ratings yet
Azure Data Factory: Cloud ETL & Integration
10 pages
Azure Data Factory Guide & Tutorials
No ratings yet
Azure Data Factory Guide & Tutorials
1,158 pages
Azure Notes - 3 Data Integration
No ratings yet
Azure Notes - 3 Data Integration
9 pages
Capgemini Questionnaire
No ratings yet
Capgemini Questionnaire
11 pages
Himanshu - Assignment Solved ETL 1
No ratings yet
Himanshu - Assignment Solved ETL 1
6 pages
Az Questions
No ratings yet
Az Questions
11 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Azure Data Factory Presentation
No ratings yet
Azure Data Factory Presentation
30 pages
Azure Data Factory Tutorial
No ratings yet
Azure Data Factory Tutorial
36 pages
Copy Activity in ADF
No ratings yet
Copy Activity in ADF
52 pages
Azure Data Factory Full Notes
No ratings yet
Azure Data Factory Full Notes
4 pages
Azure Data Factory Presentation v2
No ratings yet
Azure Data Factory Presentation v2
9 pages
Introduction To ADF - LWTN
No ratings yet
Introduction To ADF - LWTN
54 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
13 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
43 pages
ADF - Data Flow, Triggers & CICD
No ratings yet
ADF - Data Flow, Triggers & CICD
20 pages
1694639964-Module 3 Azure Data Factory
No ratings yet
1694639964-Module 3 Azure Data Factory
48 pages
Types of Activities in ADF
100% (1)
Types of Activities in ADF
37 pages
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Adf Part 1
No ratings yet
Adf Part 1
7 pages
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
4 pages
Azure Data Factory
No ratings yet
Azure Data Factory
4 pages
Azure Data Factory
No ratings yet
Azure Data Factory
1 page
Auto Jack Loader Research Paper
No ratings yet
Auto Jack Loader Research Paper
6 pages
Azure Data Factory V2 Preview Guide
No ratings yet
Azure Data Factory V2 Preview Guide
59 pages
Azure Data Factory
100% (1)
Azure Data Factory
6 pages
Azure Data Factory Mapping Data Flows
No ratings yet
Azure Data Factory Mapping Data Flows
22 pages
Microsoft ADF
No ratings yet
Microsoft ADF
11 pages
Taking Interviw
No ratings yet
Taking Interviw
15 pages
Azure Data Factory Lab Guide
No ratings yet
Azure Data Factory Lab Guide
58 pages
Azure Data Factory Use Case
No ratings yet
Azure Data Factory Use Case
9 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
Adf 1
No ratings yet
Adf 1
29 pages
Most Frequently Asked Azure Data Factory Interview Questions
0% (1)
Most Frequently Asked Azure Data Factory Interview Questions
5 pages
ADF Class Notes
No ratings yet
ADF Class Notes
2 pages
Azure Data Factory Compressed
No ratings yet
Azure Data Factory Compressed
24 pages
Azure Interview Questions
No ratings yet
Azure Interview Questions
7 pages
BY K Madhavi Data Architect
No ratings yet
BY K Madhavi Data Architect
24 pages
ADF Question Set2
No ratings yet
ADF Question Set2
2 pages
Data Platform and Analytics Foundational Training: (Speaker Notes)
No ratings yet
Data Platform and Analytics Foundational Training: (Speaker Notes)
19 pages
025.0 ADF Overview
No ratings yet
025.0 ADF Overview
12 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Azure Data Factory Data Movement Lab
No ratings yet
Azure Data Factory Data Movement Lab
26 pages
Azure Data Factory Use Cases 1740680571
No ratings yet
Azure Data Factory Use Cases 1740680571
11 pages
Azure Data Factory Interview Concepts
No ratings yet
Azure Data Factory Interview Concepts
1 page
2 Data Literacy Essentials of Azure Data Factory
No ratings yet
2 Data Literacy Essentials of Azure Data Factory
4 pages
Azure Data Factory - Important Concepts
No ratings yet
Azure Data Factory - Important Concepts
12 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
Scenario Based Interview Questions Guide
No ratings yet
Scenario Based Interview Questions Guide
10 pages
Rajshekarreddy (6y - 1m) - Cloud Data Engineer
No ratings yet
Rajshekarreddy (6y - 1m) - Cloud Data Engineer
3 pages
ADF - Data Movt and IR
No ratings yet
ADF - Data Movt and IR
26 pages
Azure Data Factory
100% (2)
Azure Data Factory
10 pages
Download the complete Solution Manual for Database Systems A Practical Approach to Design Implementation and Management 6th Edition by Connolly and Begg ISBN 0132943263 9780132943260 book instantly in PDF format.
100% (74)
Download the complete Solution Manual for Database Systems A Practical Approach to Design Implementation and Management 6th Edition by Connolly and Begg ISBN 0132943263 9780132943260 book instantly in PDF format.
41 pages
1Z0 1042 25 Demo
No ratings yet
1Z0 1042 25 Demo
6 pages
Datasheet System 1 Condition Monitoring and Diagnostic Software-108m5214
No ratings yet
Datasheet System 1 Condition Monitoring and Diagnostic Software-108m5214
7 pages
Master's in Business Administration (MBA) : A Project Report On Management Information System
No ratings yet
Master's in Business Administration (MBA) : A Project Report On Management Information System
20 pages
4 Data Warehousing & OLAP
No ratings yet
4 Data Warehousing & OLAP
62 pages
Java CRUD App for Developers
No ratings yet
Java CRUD App for Developers
10 pages
BOM Intelligence 2017
No ratings yet
BOM Intelligence 2017
4 pages
Software Development Standards: Purpose
No ratings yet
Software Development Standards: Purpose
19 pages
CTM Utilities 8.0.00.100 440276
No ratings yet
CTM Utilities 8.0.00.100 440276
633 pages
Main
No ratings yet
Main
76 pages
Mulla Saahir - Chronological Resume-2
No ratings yet
Mulla Saahir - Chronological Resume-2
2 pages
DAOstack White Paper V1.0
No ratings yet
DAOstack White Paper V1.0
33 pages
Sas SQL
No ratings yet
Sas SQL
5 pages
Exam Az 500 Microsoft Azure Security Technologies
No ratings yet
Exam Az 500 Microsoft Azure Security Technologies
7 pages
General Specifications: Models SSS7700, SSS7710, SSS7720 Plant Resource Manager
No ratings yet
General Specifications: Models SSS7700, SSS7710, SSS7720 Plant Resource Manager
26 pages
GSM Optimization Handbook
No ratings yet
GSM Optimization Handbook
32 pages
Managing Database Constraints
No ratings yet
Managing Database Constraints
76 pages
Azure Machine Learning
No ratings yet
Azure Machine Learning
18 pages
Mcs 023 IgnouAssignmentGuru
No ratings yet
Mcs 023 IgnouAssignmentGuru
2 pages
Netwrix Auditor Installation Configuration Guide
No ratings yet
Netwrix Auditor Installation Configuration Guide
104 pages
Allot NX Server Migration From Windows To Linux
No ratings yet
Allot NX Server Migration From Windows To Linux
29 pages
Update Statistics For The Oracle Cost-Based Optimizer
No ratings yet
Update Statistics For The Oracle Cost-Based Optimizer
21 pages
Assignment Full Stack
No ratings yet
Assignment Full Stack
2 pages
Order)
No ratings yet
Order)
29 pages
3.2-Relational Algebra
No ratings yet
3.2-Relational Algebra
32 pages
Odoo API Guidelines v0.1
100% (1)
Odoo API Guidelines v0.1
51 pages
(SRS) Online Shopping Cart
No ratings yet
(SRS) Online Shopping Cart
28 pages
E-Library Management System: Project Proposal For
No ratings yet
E-Library Management System: Project Proposal For
8 pages
Data Dictionary - Chinook
No ratings yet
Data Dictionary - Chinook
5 pages
Air Transport Association: E-Business Specification For Materiels Management
No ratings yet
Air Transport Association: E-Business Specification For Materiels Management
132 pages

How To Test Azure Data Pipeline

Uploaded by

How To Test Azure Data Pipeline

Uploaded by

How to test Azure Data

Innovative Technical Manager | Data Quality & Analysis Expert | ML and

To follow along with this blog, you will need:

1. An Azure subscription: Sign up for a free Azure

2. Azure Data Factory: Create an Azure Data Factory

3. Knowledge over Azure cloud services and it’s uses

Steps to create data pipeline in ADF:

1. Go to the Azure portal and create a new Azure Data

2. Provide the necessary details such as name,

3. Once the Data Factory instance is created, navigate

Step 2: Create Linked Services

1. Click on the “Author” button in the Azure Data

2. Begin by creating linked services, which represent

3. Click on the “Manage” tab, select the desired type of

1. After setting up linked services, proceed to create

2. Click on the “Author” tab and select the “Datasets”

3. Create a dataset for each data source and

Step 4: Build Pipelines

1. With the linked services and datasets in place, it’s

2. Click on the “Author” tab, select the “Pipelines” tab,

3. Drag and drop activities onto the pipeline canvas to define

Step 5: Monitor and Manage ETL Pipeline

1. After building the ETL pipeline, it’s crucial to monitor

2. In the Azure Data Factory user interface, click on the

3. Monitor the pipeline runs, track data movement, and

4. Utilise the integration with Azure Monitor and Azure

Step 6: Test and Validate the Pipeline

1. Test the ETL pipeline by running it manually or

2. Monitor the execution of the pipeline and verify that

By following these steps, you can create an ETL pipeline in

As a tester for an Azure Data Factory pipeline, your role is to

Sample Use cases

Use Case1: Copying Data from Azure Blob Storage to

What is Azure Data Factory (ADF)?

Let’s consider an example where we want to copy data from

1. Create linked services for Azure Blob Storage and

2. Create datasets for the source CSV file and the

Missing input file:

Challenges and Solutions

As the tester responsible for testing the Azure data pipeline, I

Azure Data Factory provides a robust platform for building

You might also like