What is data load strategy for testing? How to perform it?
and How to test full load
and incremental load?
What is the Data Load Strategy?
Depending on the phase in which you take this up the answer will vary. For example, if you are
looking at the data load strategy based on the task/job for which you perform the testing and that is
driven by the logical requirements driven by the business requirements.
If you are in the initial phases of reviewing the SRS/BRD/FS (Requirements) then the strategy
will be different.
If you have the tech design (from IT team) and the mapping document (Source to target
mapping document) then you have to follow the logical design of the data loads.
If the project is a (data) migration project and based on the way it is being done either manual
or automated the strategy varies.
Initial/Full Load Testing
The first load or the initial load which is done to the data warehouse is called Initial load or full load.
Initial/Full load completely cleans out the existing data which in other terms is nothing but truncating
the data from all the tables and reload with fresh data from the source systems.
In this case, we need to test that the data in loaded in the order specified by the entity dependencies
which is dimensions first and the fact tables later. Not only the load but the data is correct and matches
with what is show in the source system like the total number of records, length of each attributes,
without any data loss and the overall time it has taken for the load. Of course, the time depends on the
volumes of data or the number of years of data.
Incremental Load Testing
Once the initial/full load is complete for the first time, from the next run/load onwards only the
modified data will be loaded in to the data warehouse. This type of only new and updated records after
a set time period is called as incremental load or delta load.
Incremental load will be done based on the predefined schedule to the logical entities as per the
business requirements. For example, if you take a record which is updated in the OLTP system or the
source systems then it will be identified by a timestamp and that timestamp is used as criteria to decide
if the record should be considered for the incremental or delta load. If the timestamp for a record does
not fall after the last successful run of the load/job then it will not be considered for processing as it
already exists in the data warehouse and there are no changes to it.
In this case based on the logic used like the SCDs (Slowly Changing Dimensions) we need to test if
the records which are considered for processing are having the timestamp which are on or after the last
successful run of the job. Along with this we need to test everything related to data like the metadata
testing, data completeness, data accuracy, data quality, insert and update scenarios etc.