Regression Testing Process
Regression testing in ETL (Extract, Transform, Load) is crucial to ensure that changes or new
deployments to an ETL pipeline do not negatively impact existing functionality or data integrity. Unlike unit
testing which focuses on individual components, regression testing aims to confirm that the entire data
flow, from source to destination, continues to operate correctly after modifications.
I. Core Concepts of Regression Testing:
● Definition: Re-executing old test cases across multiple releases or builds to ensure that changes
(addition, deletion, modification of code) have not affected the existing functionality.
● Purpose: To confirm that existing functionalities remain stable and unaffected by new changes.
● Release: The final product or software handed to the customer.
● Build: A piece of software installed on the testing server, which needs to be tested for stability.
● Impact Analysis: The process of identifying the areas of the system that are likely to be affected
by new changes.
II. Phases of Regression Testing:
1. Impact Analysis (Identifying Affected Areas):
○ This is a critical first step to determine which parts of the system might be impacted by
recent changes.
○ Stakeholders involved:
■ Customer: Provides input based on business knowledge regarding potential
impacts.
■ Developer: Offers insights based on coding knowledge and understanding of the
changes implemented.
■ Tester (TE): Plays a major role in identifying the impact area based on product
knowledge.
○ Output: A report identifying the impact area of affected functionalities (e.g., login page,
homepage, timeline).
2. Test Case Selection:
○ Based on the Impact Analysis report, select relevant test cases for re-execution.
○ Types of Regression Testing based on scope:
■ Unit Testing: Focuses on testing only the affected part of the code/functionality.
■ Regional Testing: Tests the changed and affected parts of the system.
■ Full Testing: Involves comprehensive testing of the entire system, typically
performed for major releases or significant architectural changes.
○ Prioritization: Prioritize test cases based on criticality, frequency of use, and areas with
high impact.
3. Test Environment Setup:
○ Prepare the necessary testing environment, ensuring it mirrors the production environment
as closely as possible.
○ Install the latest build/release on the testing server.
4. Test Execution:
○ Execute the selected regression test cases.
○ Example Scenario: If Release 2 includes 100 test cases, and 50 of them are old test cases
from Release 1, all 100 test cases should be re-executed in Release 2 to ensure no existing
functionality is broken.
5. Defect Reporting and Retesting:
○ If any defects are identified during test execution, report them promptly with detailed
information.
○ Once defects are fixed, retest the affected areas and related functionalities to confirm the
fixes and ensure no new regressions are introduced.
6. Reporting:
○ Generate comprehensive regression test reports summarizing the test results, including
passed, failed, and blocked test cases.
○ Provide insights into the overall quality and stability of the release.
III. Regression Testing in ETL (Extract, Transform, Load) Context:
For ETL processes, regression testing focuses on ensuring data integrity and correct data flow after
changes.
1. Create a Backup Table:
○ Before making any changes, create a backup of the target table.
○ Select * form Table_old
2. Count Validation:
○ Verify that the number of records remains consistent before and after the ETL process.
○ Select Count (*) from backup;
○ Select Count (*) From target table;
3. Data Validation:
○ Perform detailed data comparison between the source, backup, and target to identify any
discrepancies. This is crucial as some data might be lost or transformed incorrectly during
loading.
○ Using Minus query do the data validation from Source and target. It because of while
loading some data might be loose..
○ Select * form Source_backup table minus Select * form Target table.
○ Select * form Target table minus Select * form Source_backup table.
○ Note: The MINUS operator (or equivalent set difference operator in other SQL dialects) is
essential here to find rows present in one table but not the other, indicating data loss or
unexpected additions.
By following this structured regression testing process, organizations can minimize risks associated with
software changes and ensure the continued stability and reliability of their products.