DATA VALIDATION, PROCESSING, AND REPORTING
Data Validation
Data validation is defined as the inspection of all the collected
data for completeness and reasonableness, and the elimination of
erroneous values. This step transforms raw data into validated
data.The validated data are then processed to produce the
summary reports you require for analysis.
A system, including computer systems and associated
personnel, that performs input, processing storage, output, and
control functions to accomplish a sequence of operations on
data.
PURPOSE AND APPLICABILITY
Data processing and data validation are performed in parallel.
Most data validation checks are performed as part of the
data processing procedures
RESPONSIBILITIES:
Project Manager
The project manager shall:
• Oversee all aspects of the program
Quality Assurance Manager
• Oversee all aspects of the program pertaining to quality
assurance
• Supervise the work of the quality assurance specialist
• Review the seasonal quality assurance summary of the
quality assurance specialist to determine if the data are
acceptable for incorporation in the concentrations database
1
Quality Assurance Specialist
• Review the site configuration database for current flow rate
calibrations
• Review the site problems database for potential problems
• Review the collection parameters in the database
• Review the quality assurance summaries of the lab
manager and the spectroscopist
• Calculate concentrations and store in temporary database
• Examine individual inconsistent samples and determine the
cause
• Update the database with revised information (modify
parameters or invalidate)
• Recalculate concentrations
• Rerun data validation programs
• Present a quality assurance summary for each season with
appropriate plots and tables to the quality assurance
manager and project manager
• Transfer the data to the final concentrations database
Laboratory Manager
• Oversee and maintain records on site and sampler operation
• Review all log sheets for completeness, and to check the
validity of the samples prior to downloading of the samples
by lab technicians.
• Resolve any inconsistencies on the log sheet or in the
samples
• Oversee entry of gravimetric analysis parameters in data
base
• Maintain documentation on daily gravimetric controls
2
Spectroscopist
The spectroscopist shall:
• Maintain records on the operation and performance of the
various analytical systems
• Maintain documentation on standards and reanalysis for
each analytical session
• Oversee all technicians performing analyses and verify the
correctness of the data entry
Raw Data Files
Develop Data Validation Routines
General System and Measured Parameter Checks
· Range tests
· Relational tests
· Trend tests
Validate Data
· Subject all data to validation
· Print validation report of suspect values
· Manually reconcile suspect values
· Insert validation codes
Create Valid Data Files
Data Processing and Report
9.1 DATA VALIDATION METHODS
3
Data can be validated either manually or automatically
(computer-based). The latter is preferred to take advantage of
the power and speed of computers, although some manual
review will always be required.
Validation software may be purchased from some data logger
vendors, created in-house using popular spreadsheet programs
(e.g., Microsoft Excel, Quatro Pro, Lotus 123), or adapted from
other utility environmental monitoring projects.
An advantage of using spreadsheet programs is that they can
also be used to process data and generate reports.
There are essentially two parts to data validation, data screening
and data verification.
Data Screening: The first part uses a series of validation
routines or algorithms to screen all the data for suspect
(questionable and erroneous) values. A suspect value deserves
scrutiny but is not necessarily erroneous.
Data Verification: The second part requires a case-by-case
decision on what to do with the suspect values retain them as
valid, reject them as invalid, or replace them with redundant,
valid values (if available). This part is where personal judgment
by a qualified person familiar with the monitoring equipment
and local meteorology is needed.
Before proceeding to the following sections, you should first
understand the limitations of data validation. There are many
possible causes of erroneous data: faulty or damaged sensors,
loose wire connections, broken wires, damaged mounting
hardware, data logger malfunctions, static discharges, sensor
calibration drift, and icing conditions, among others.
The goal of data validation is to detect as many significant errors
from as many causes as possible. Catching all the subtle ones is
impossible.
4
For example, a disconnected wire can be easily detected by a
long string of zero (or random) values, but a loose wire that
becomes disconnected intermittently may only partly reduce the
recorded value yet keep it within reasonable limits. Therefore,
slight deviations in the data can escape detection (although the
use of redundant sensors can reduce this possibility).
Properly exercising the other quality assurance components
of the monitoring program will also reduce the chances of data
problems.
To preserve the original raw data, make a copy of the original
raw data set and apply the validation steps to the copy.
The next two subsections describe two types of validation
routines, recommend specific validation criteria for each
measurement parameter, and discuss the treatment of suspect
and missing data.
A. Validation Routines
Validation routines are designed to screen each measured
parameter for suspect values before they are incorporated into
the archived data base and used for site analysis.
They can be grouped into two main categories, general system
checks and measured parameter checks.
1. General System Checks
Two simple tests evaluate the completeness of the collected
data:
Data Records: The number of data fields must equal the
expected number of measured parameters for each record.
Time Sequence: Are there any missing sequential data
values? This test should focus on the time and date stamp of
each data record.
5
2. Measured Parameter Checks: These tests represent the
heart of the data validation process and normally consist of
range tests, relational tests, and trend tests.
Range Tests: These are the simplest and most commonly used
validation tests. The measured data are compared to allowable
upper and lower limiting values
Relational Tests: This comparison is based upon expected
physical relationships between various parameters. Relational
checks should ensure that physically improbable situations are
not reported in the data without verification
Trend Tests: These checks are based on the rate of change in
a value over time
B. Treatment of Suspect and Missing Data
After the raw data are subjected to all the validation checks,
what should be done with suspect data? Some suspect values
may be real, unusual occurrences while others may be truly bad.
Here are some guidelines for handling suspect data:
1. Generate a validation report (printout or computer-based
visual display) that lists all suspect data. For each data
value, the report should give the reported value, the date
and time of occurrence, and the validation criteria that it
failed.
2. A qualified person should examine the suspect data to
determine their acceptability. Invalid should be assigned and
replaced with a validation code.
A common designation for data rejection is assigning a -900
series validation code, with numbers that represent various
rejection explanations. Operation and primary sensor with a
6
substitute one from the redundant sensor as long as the
redundant sensor’s data passed all the validation criteria.
3. Maintain a complete record of all data validation actions for
each monitoring site in a Site
4. If redundant sensors are used, replace a rejected
value from the
File name
Parameter type and monitoring height
Date and time of flagged data
Validation code assigned and explanation given for each
rejected datum
The source of the substituted values.
Important: Maintain raw and validated data files separately.
Differentiate the files by assigning different extensions to the file
names.
sFor example, the file extension for the raw data file could be
designated as “.raw” and the verified data file as “.ver.” Valid
data can then be compiled into a master
data file for further data reporting and archiving.
C. Data Recovery
The data recovery rate is defined as the number of valid data
records collected versus that possible over the reporting period
The method of calculation is as follows:
Data Recovery Rate =Data Records Collected
Data Records Possible 100
where
Data Records Collected = Data Records Possible - Number of
Invalid Records
9.2 DATA PROCESSING AND REPORTING
7
When the data validation step is complete, the data set must be
subjected to various data processing procedures to evaluate the
wind resource. This typically involves performing calculations
on the data set, as well as binning (sorting) the data values into
useful subsets based on your choice of averaging interval. From
this, informative reports can be produced, such as summary
tables and performance graphs. Data processing and reporting
software are available from several sources, including certain
data logger manufacturers and vendors of spreadsheet, data base,
and statistical software.
9.3 QUALITY ASSURANCE REPORTING
A component of your monitoring program documentation
should be a periodic report on the program’s adherence to the
quality assurance plan (described in Section 2.4). The field
and/or data analysis staff should prepare the report and submit it
to the project manager (or quality assurance coordinator)
monthly or quarterly. The report should address the following
topics:
Dates of operation and maintenance site visits: activities and
findings
Description of monitoring problems and corrective actions
taken
Record of equipment calibration activities (if applicable)
Data validation findings and actions taken
Data recovery rate.