GSIUserGuide v3.1
GSIUserGuide v3.1
July 2012
Foreword
This Users Guide describes the Gridpoint Statistical Interpolation (GSI) Version 3.1 data
assimilation system, released on July 20, 2012. This version of GSI is based on the GSI
repository from May 2012. As GSI is further developed, this document will be enhanced
and updated to match the released version.
This Users Guide is designed for both beginners and experienced users. It includes eight
chapters and two appendices:
Chapter 1: Overview
Chapter 2: Software Installation
Chapter 3: Running GSI
Chapter 4: GSI Diagnostics and Tuning
Chapter 5: GSI Applications
Chapter 6: GSI Theory and Code Structure
Chapter 7: Observation and Background Error Statistics
Chapter 8: Satellite Radiance Data Assimilation
Appendix A: GSI tools
Appendix B: GSI Namelist
Where Chapter 2 through Chapter 5 introduces basic skills on installing the GSI code,
running cases, and checking analysis results. Chapter 6 through 8 and Appendices A and B
present more details on the use of GSI and other advanced topics.
For the latest version of this document, please visit the GSI Users Website at
http://www.dtcenter.org/com-GSI/users/index.php.
ii
Acknowledgements:
We thank the U.S. Air Force Weather Agency and the National Oceanic and Atmospheric
Administration for their support of this work. . This work is also facilitated by National
Center for Atmospheric Research (NCAR). NCAR is supported by the National Science
Foundation (NSF).
iii
Table of Contents
Table of Contents
iv
Table of Contents
v
Table of Contents
vi
Overview
Chapter 1: Overview
After initial development, the GSI analysis system became operational as the core of the
North American Data Assimilation System (NDAS) in June 2006 and the Global Data
Assimilation System (GDAS) in May 2007 at NOAA. Since then, the GSI system has been
adopted in various operational systems, including the National Aeronautics and Space
Administration (NASA) global atmospheric analysis system, the NCEP Real-Time
Mesoscale Analysis (RTMA) system, the Hurricane WRF (HWRF), and the Rapid Refresh
(RR) system. The number of groups involved in operational GSI development has also
been expanded to include more development groups, including NASA Goddard Global
Modeling and Assimilation Office (GMAO), NOAA Earth System Research Laboratory
(ESRL), and National Center for Atmospheric Research (NCAR) Earth System
Laboratorys Mesoscale and Microscale Meteorology Division (MMM).
Starting in 2007, the Developmental Testbed Center (DTC) began collaborating with major
GSI development groups to transform the operational GSI system into a community system
and support distributed development. The DTC has complemented the development groups
in providing GSI documentation, porting GSI to multiple platforms, and testing GSI in an
independent and objective environment, while still maintaining functionally equivalent to
operational centers. Working with NCEP Environmental Modeling Center (EMC), the
DTC is maintaining a community GSI repository that is equivalent to the operational
developmental repository and facilitates community users to develop GSI. Based on the
repository, the DTC releases GSI code annually with intermediate bug fixes. The first
community version of the GSI system was released by the DTC in 2009. Since then, the
DTC has provided community support through the GSI helpdesk (gsi_help@ucar.edu),
community GSI webpage (http://www.dtcenter.org/com-GSI/users/index.php), and annual
community GSI tutorial/workshop.
1
Overview
In late 2010, a GSI review committee was formed to coordinate distributed development of
the GSI system. The committee is composed of primary GSI research and operational
centers, including NCEP/EMC, NASA/GMAO, NOAA/ESRL, NCAR/MMM, AFWA and
DTC. As one of the committee members, the DTC represents the research community with
GSI users from all over the world.
The GSI Review Committee primarily steers distributed GSI development and community
code management and support. The responsibilities of the Review Committee are divided
into two major aspects: coordination and code review. The purpose and guiding principles
of the Review Committee are as follows:
As a critical part of the GSI user support, this GSI Users Guide is provided to assist users
in applying GSI to data assimilation and analysis studies. It was composed by the DTC and
reviewed by the GSI Review Committee members. Please note the DTC currently focuses
on testing and evaluation of GSI for limited area Numerical Weather Prediction (NWP)
applications; however, the long-term plan includes transitioning to global forecast
applications. This documentation describes the July 2012 version 3.1 release, which
2
Overview
includes new capabilities and enhancements as well as bug fixes. This version of GSI is
based on a reversion of the community GSI repository from May 2012.
The GSI version 3.1 is primarily a three-dimensional variational (3D-Var) system with
modules developed for advanced features, such as the four-dimensional variational (4D-
Var), ensemble hybrid variational analysis, and observation sensitivity capabilities.
Coupled with forecast models and their adjoint models, GSI can be turned into a 4D-var
system. Combined with an ensemble system, this GSI can also be used in an ensemble-
variational hybrid data assimilation system (GSI-hybrid) with the appropriate
configuration. An example of such an application is the current Global Data Assimilation
System (GDAS) at NCEP, using the GSI-hybrid capability with NCEPs global ensemble
forecast system.
GSI is being used by various applications on multiple scales. Therefore, the types of
observations GSI can assimilate vary from conventional to aerosol observations. Users
should use observations with caution to fit their specific applications. The GSI version 3.1
can assimilate, but is not limited to, the following types of observations:
3
Overview
The following lists some of the new functions and changes included in the GSI release
version 3.1 versus version 3.0:
4
Overview
Please note due to the version update, some diagnostic files and static information (fixed)
files have been changed as well.
5
Overview
background and observation errors. Finally, Chapter 8 introduces several topics on satellite
radiance data assimilation using GSI, including configuration, bias correction, and
monitoring. Appendix A introduces the community tools available for GSI users and
Appendix B is a complete list of the GSI namelist with explanations.
6
Software Installation
2.1 Introduction
The DTC community GSI is a community distribution of NOAAs operational GSI. The
original operational code was developed at NOAA/NCEP for use on IBM supercomputers.
The community GSI expands the portability of the operational code by adding a flexible
build system and run script that allows GSI to be compiled and run on a variety of
platforms. The current version of GSI builds and runs on Linux platforms using Intel or
PGI compilers, IBM supercomputers using the xlf compiler, and Intel Macintosh
computers using the PGI compiler.
This chapter provides build instructions applicable only to the DTC community GSI. They
do not pertain to NCEPs operational version of the code because of the different build
environment and operational requirements. However, it can be easily adopted for the same
purpose.
This chapter describes how to build and install the GSI on your computing resources. The
process consists of a few steps:
Obtain the source code (section 2.2).
o Obtain GSI from the community GSI web page.
o Obtain the WRF model from the WRF project page.
Build the WRF model (see the WRF users guide).
Set the appropriate environment variables for the GSI build (section 2.3.1).
Configure and compile the GSI source code (section 2.3.2).
This chapter is organized as follows: Section 2.2 describes how to obtain the source code.
Section 2.3 covers setting up the build environment, configuring the build system, and
compiling the code. Section 2.4 covers the system requirements (tools, libraries, and
environment variable settings) and currently supported platforms in detail. Section 2.5
covers the directory structure and supplemental NCEP libraries included with the
distribution. Section 2.6 discusses what to do if you have problems with the build and
where to get help. And section 2.7 briefly illustrates how to port GSI to another platform.
For beginning users, sections 2.2 and 2.3 provide the necessary overview needed to build
GSI on most systems. The remainder of the chapter is provided for completeness and in
case the user has any difficulty building the code.
The community GSI resources, including source code, build system, utilities, and
documentation, are available from the DTC community GSI users website, located at
7
Software Installation
http://www.dtcenter.org/com-GSI/users/index.php
The source code is available by first selecting the Download tab on the vertical menu
located on the left of the page, and then selecting the GSI System submenu.
New users must first register before downloading the source code. Returning users only
need their registration email address to log in. After getting in the download page, select
the link to the comGSI v3.1 tarball to download the most recent version, as of July 2012,
of the source code. Selecting the newest release of the community GSI is critical for having
the most recent capabilities, supplemental libraries, and code fixes. To analyze satellite
radiance observations, GSI requires the CRTM coefficients from the most recent version of
the CRTM library. As a convenience to the user, and because of its large size, the latest
CRTM coefficients are provided as a separate tarfile, and can be downloaded by selecting
the link CRTM 2.0.5 coefficients tarball from the web page.
The community GSI version 3.1 comes in a tar file named comGSI_v3.1.tar.gz. The tar
file may be unpacked by using the UNIX commands:
gunzip d comGSI_v3.1.tar.gz
tar -xvf comGSI_v3.1.tar
After downloading the source code, and prior to building, the user should check the known
issues link on the DTC website to determine if any bug fixes or platform specific
customizations are needed.
Since the WRF model is needed for a successful GSI build, the WRF code should be
downloaded and built prior to proceeding with the GSI build. The WRF code, and full
WRF documentation, can be obtained from the WRF Users Page,
http://www.mmm.ucar.edu/wrf/users/
This section provides a quick overview of how to build GSI on most systems. Typically
GSI will build straight out of the box on any system that successfully builds the WRF
model. Should the user experience any difficulties with the default build, check the build
environment against the requirements described in section 2.4.
8
Software Installation
To proceed with the GSI build, it is assumed that the WRF model has already been built on
the current system. GSI reads and writes I/O using the WRF I/O API libraries. These I/O
libraries are created as part of the WRF build, and are linked into GSI during the GSI build
process. In order to successfully link the WRF I/O libraries with the GSI source, it is
crucial that both WRF and GSI are built using the same compilers. This means that if WRF
is built with the Intel Fortran compiler, then GSI must also be built with the same Intel
Fortran compiler.
Before configuring the GSI code to be built, at least one, and up to three environment
variables may need to be set. The first variable WRF_DIR defines the path to the root of the
WRF build directory. This one is mandatory. It tells the GSI build system where to find the
WRF I/O libraries. The second environment variable specifies the local path to NetCDF
library. The variable for this is called NETCDF. The third variable is LAPACK_PATH,
which specifies the path to the LAPACK library on your system. These variables must be
set prior to configuring the build. Be aware that these paths are dependent on your choice
of compiler. In many cases, the latter two variables may already be defined as part of your
login environment. This is typical on systems with only one compiler option such as IBM
supercomputers using the xlf compiler. On systems with multiple compiler options, such as
many Linux systems, these variables may be set as part of the module settings for each
compiler.
The process for setting the environment variables varies according to the login shell in use.
To set the path variable WRF_DIR for csh/tcsh, type;
export WRF_DIR=/path_to_WRF_root_directory/
As one of the additional environment variables, the value of the NETCDF variable may be
checked by echoing the variable using:
echo $NETCDF
If the command returns with the response that the variable is undefined, such as
it is then necessary to manually set this variable. If your system uses modules or a similar
mechanism to set the environment, do this first. If a valid path is returned by the echo
command, no further action is required.
9
Software Installation
The last environment variable, LAPACK_PATH, defines the path to the LAPACK library.
Typically, this variable will only need to be set on systems without a vender provided
version of LAPACK. IBM systems typically come installed with the LAPACK equivalent
ESSL library that links automatically. Likewise, the PGI compiler often comes with a
vender provided version of LAPACK that links automatically with the compiler. Over the
last three years, the GSI help staff have experienced only two situations where the
LAPACK variable needed to be set.
1. On stripped down versions of the IBM operating system that come without the
ESSL libraries.
2. Linux environments using Intel Fortran compiler.
Of the two, the second of these is the most common. The Intel compiler usually comes with
a vender provided mathematics library known as the Mathematics Kernel Libraries or
MKL for short. While most installations of the Intel compiler typically come with the MKL
libraries installed, the ifort compiler does not automatically load the library. It is therefore
necessary to set the LAPACK_PATH variable to the location of the MKL libraries when
using the Intel compiler. You may need to ask your system administrator for the correct
path to these libraries.
On the NOAA supercomputers, the Intel build environment can be specified through
setting the modules. When this is done, the MKL library path is available through a local
variable, MKL on Jet, and MKL_ROOT on Zeus.
Supposing that the MKL library path is set to the environment variable MKL, then the
LAPACK environment may be set for csh/tcsh with
export LAPACK_PATH=$MKL
Once the environment variables have been set, the next step in the build process is to first
run the configure script and then the compile script.
Once the environment variables have been set, building the GSI source code requires two
steps:
10
Software Installation
Change into the comGSI_v3.1 directory and issue the configure command:
./configure
This command uses user input to create a platform specific configuration file called
configure.gsi. The script starts by echoing the NETCDF and WRF_DIR paths set in the
previous section. It then examines the current system and queries the user to select from
multiple build options. On selecting an option, it reports a successful configuration with the
banner:
------------------------------------------------------------------------
Configuration successful. To build the GSI, type: compile
------------------------------------------------------------------------
Failure to get this banner means that the configuration step failed to complete. The most
typical reason for a failure is an error in one of the paths set to the environment variables.
It is important to note that on Linux systems the Fortran compiler and the C compiler do
not have to match. This allows the build to use the GNU C-compiler in place of the Intel or
PGI C-compiler.
On IBM computers with the xlf compiler, there are two build options:
1. AIX 64-bit (dmpar,optimize)
2. AIX 32-bit (dmpar,optimize)
./compile
It is typically useful to capture the build information to a log file by redirecting the output.
To conduct a complete clean, which removes ALL built files in ALL directories, as well as
the configure.gsi, type:
./clean -a
11
Software Installation
Following a successful compile, the GSI executable gsi.exe can be found in the run/
directory. If the executable is not found, check the compilation log file for the word Error
with a capital E to determine the failure.
The source code for GSI is written in FORTRAN, FORTRAN 90, and C. In addition, the
parallel executables require some flavor of MPI and OpenMP for the distributed memory
parallelism. Lastly the I/O relies on the NetCDF I/O libraries. Beyond standard shell
scripts, the build system relies on the Perl scripting language and GNU Make.
The basic requirements for building and running the GSI system are the following:
Because all but the last of these tools and libraries are typically the purview of system
administrators to install and maintain, they are lumped together here as part of the basic
system requirements.
2.4.1 Compilers
The DTC community GSI system successfully builds and runs on IBM AIX, Linux, and
Mac Darwin platforms. The following compiler/OS combinations are supported:
Currently, GSI is only tested on the IBM and Linux compiler configurations due to
availability of testing resources. Unforeseen build issues may occur when using older
compiler and library versions. As always, the best results come from using the most recent
version of compilers.
12
Software Installation
GSI requires a number of support libraries not included with the source code. These
libraries must be built with the compilers being used to build the remainder of the code.
Because of this, they are typically bundled along with the compiler installation, and are
subsequently referred to as system libraries. For our needs, the most important libraries
among these are NetCDF, MPI and OpenMP.
The NetCDF library is an exception to the rule of using the most recent version of libraries.
GSI will work with either V3.6+ or the new V4. Recent updates to NetCDF V4 support
backward compatibility. The V4 series is significantly more difficult to install, so version
3.6+ may be preferred for that reason. The NetCDF libraries can be obtained from the
Unidata website:
http://www.unidata.ucar.edu
Typically, the NetCDF library is installed in a directory that is included in the users path
such as /usr/local/lib. When this is not the case, the environment variable NETCDF, can
be set to point to the location of the library. For csh/tcsh, the path can be set with the
command:
export NETCDF=/path_to_netcdf_library/
It is crucial that system libraries, such as NetCDF, be built with the same FORTRAN
compiler, compiler version, and compatible flags, as used to compile the remainder of the
source code. This is often an issue on systems with multiple FORTRAN compilers, or
when the option to build with multiple word sizes (e.g. 32-bit vs. 64-bit addressing) is
available.
Many default Linux installations include a version of NetCDF. Typically this version is
only compatible with code compiled using gcc. To build GSI, a version of the library must
be built using your preferred Fortran compiler and with both C and FORTRAN bindings. If
you have any doubts about your installation, ask your system administrator.
GSI requires that a version of the MPI library with OpenMP support be installed. GSI is
written as a distributed memory parallel code and these libraries are required to build the
code. Just as with the NetCDF library, the MPI library must be built with the same
FORTRAN compiler, and use the same word size option flags, as the remainder of the
source code.
13
Software Installation
Installing MPI on a system is typically a job for the system administrator and will not be
addressed here. If you are running GSI on a computer at a large center, check the
machines documentation before you ask the local system administrator. On Linux or Mac
Darwin systems, you can typically determine whether MPI is installed by running the
following UNIX commands:
which mpif90
which mpicc
If any of these tests return with Command Not Found, there may be a problem with your
MPI installation or the configuration of users account. Contact your system administrator
for help if you have any questions.
The LAPACK and BLAS are open source mathematics libraries for the solution of linear
algebra problems. The source code for these libraries is freely available to download from
NETLIB at
http://www.netlib.org/lapack/
Most commercial compilers provide their own optimized versions of these routines. These
optimized versions of BLAS and LAPACK provide superior performance to the open
source versions.
On the IBM machines, the AIX compiler is often, but not always, installed with the
Engineering and Scientific Subroutine Libraries or ESSL. In part, the ESSL libraries are
highly optimized parallel versions of many of the LAPACK and BLAS routines. Because
GSI was developed on an IBM platform, all of the required linear algebra routines are
available from the ESSL libraries.
On Linux or Mac Darwin systems, GSI supports both the Intel and PGI compilers. The
Intel compiler has its own optimized version of the BLAS and LAPACK routines called the
Math Kernel Library or MKL. The MKL libraries provide sufficient routines to satisfy the
requirements of GSI. The PGI compiler typically comes with its own version of the BLAS
and LAPACK libraries. Again, the PGI version of BLAS and LAPACK contain the
necessary routines to build GSI. For PGI these libraries are loaded automatically.
Should your compiler come without any vender supplied linear algebra libraries, it will be
necessary to download and build your own local version of the BLAS and LAPACK
libraries. Just as with the NetCDF and MPI libraries, the LAPACK/BLAS libraries must be
built with the same FORTRAN compiler, and use the same word size option flags, as the
remainder of the source code.
14
Software Installation
The GSI system includes the GSI source code, build system, supplemental libraries, fixed
files, and run scripts. The following table lists the system components found inside of the
main GSI directory.
The src/ directory contains two subdirectories. The main/ directory holds the main GSI
source code along with its makefiles, and the libs/ holds the source code for the
supplemental libraries. The fix/ directory holds the static or fixed input files used by GSI.
These files include among other things, the background error covariances and the
observation error tables. Starting with version 3 of GSI, the GSI tar file no longer contains
the CRTM coefficients due to their size. The CRTM coefficients are now included as a
separate tar file and may be stored separately. This will be discussed further in the next
chapter. The include/ and lib/ directories are created by the build process to store the
include and library files, respectively. The run/ directory contains the executable and a
sample run script for conducting an analysis run. The arch/ directory contains the machine
specific information used by the build system. The build system is discussed in detail in
section 2.7. Lastly, the util/ directory contains tools for GSI diagnostics.
For the convenience of the user, supplemental NCEP libraries for building GSI are
included in the src/libs/ directory. These libraries are built when GSI is built. These
supplemental libraries are listed in the table below.
15
Software Installation
The one library in this table, which is not included with the source code, is WRF. GSI
uses the WRF I/O APIs to read NetCDF files created as WRF output. Therefore a copy of
successfully compiled WRF is required for the GSI build. Please note that only
WRFV3.3.1 and WRFV3.4 are tested with this current version of GSI.
Should the user experience any difficulty building GSI on their system, please first confirm
that all the required software is properly installed (section 2.4). Next check that the external
libraries exist and that their paths are correct. Lastly, check the resource file
configure.gsi for errors in any of the paths or settings. Should all these check out, feel
free to contact the community GSI supporters at gsi_help@ucar.edu for assistance.
At a minimum, when reporting code building problems to the helpdesk, please include with
your email a copy of the build log, and the contents of the configure.gsi file.
The GSI build system comes with build settings that should compile on most standard
systems. Typically if the WRF model builds on a system, GSI will build there as well.
Unfortunately because standardization is lacking when it comes to setting up Linux HPC
environments, it may be necessary to customize the GSI build settings in order to account
for eccentric computing environments.
Typically build problems can be traced back to issues with the libraries, the compiler, or
the support utilities such as cpp. These types of issues can usually be solved by
customizing the default configuration file settings. This sets up an iterative process where
the build parameters are modified, the compile script is run, build errors diagnosed, and the
process repeated.
The GSI build system uses a collection of data files and scripts to create a configuration
resource file that defines the local build environment.
16
Software Installation
At the top most level there are four scripts. The clean script removes everything created by
the build. The configure script takes local system information and queries the user to
select from a collection of build options. The results of this are saved into a resource file
called configure.gsi. Once the configure.gsi file is created, the actual build is initiated
by running the compile script. The compile script then calls the top-level makefile and
builds the source code.
Name Content
makefile Top-level makefile
arch/ Build options and machine architecture specifics
clean Script to clean up the directory structure
configure Script to configure the build environment for compilation.
Creates a resource file called configure.gsi
compile Script for building the GSI system. Requires the existence
of the configure.gsi prior to running
The compile script uses the resource file configure.gsi to set paths and environment
variables required by the compile. The configure script generates the resource file
configure.gsi by calling the Perl script Config.pl, located in the arch/ directory. The
script Config.pl combines the build information from the files in the arch/ directory with
machine specific and user provided build information to construct the configure.gsi
resource file.
A clean script is provided to remove the build objects from the directory structure. Running
./clean scrubs the directory structure of the object and module files. Running a clean-all
./clean a removes everything generated by the build, including the library files,
executables, and the configure resource file. Should the build fail, it is strongly
recommended that the user run a ./clean a prior to rerunning the compile script.
The arch/ directory contains a number of files used to construct the configuration resource
file configure.gsi.
Most users will not need to modify any of these files unless experiencing significant build
issues. Should a user require a significant customization of the build for their local
computing environment, those changes would be saved to the configure.defaults file
only after first testing these new changes in the temporary configure.gsi file.
17
Software Installation
To illustrate its contents, lets look at the resource for the Linux Intel build.
The link path points to an Intel version of NetCDF and the OpenMP libraries.
LDFLAGS = -Wl,-rpath,/usr/local/netcdf3-ifort/lib -openmp
Compiler definitions
Intel ifort Fortran compiler
Intel icc C compiler
SFC = ifort
SF90 = ifort -free
SCC = icc
The default Fortran compiler flags for the main source code:
FFLAGS_DEFAULT = -fp-model precise -assume byterecl -fpe0 -ftz
FFLAGS = -O3 $(FFLAGS_DEFAULT) $(INC_FLAGS) $(LDFLAGS) -DLINUX
18
Software Installation
The default CPP path and flags. If your system has multiple versions of cpp it may be
necessary to specify the correct one here.
CPP = cpp
CPP_FLAGS = -C -P -D$(BYTE_ORDER) -D_REAL8_ -DWRF -DLINUX
CPP_F90FLAGS = -traditional-cpp -lang-fortran
To demonstrate how one would go about modifying the configuration resource file, the
generic Linux/Intel configuration will be ported to build on an SGI Linux cluster called
Zeus. Zeus comes with a vender-supplied version of MPI, which necessitates modification
of the MPI paths.
The first change is that Zeus does not use the traditional MPI wrappers such as mpif90 to
invoke the compiler. Instead the Intel compiler is called directly with use of an additional
lmpi flag. Therefore the DM compiler definitions become:
DM_FC = ifort
DM_F90 = ifort -free
DM_CC = icc
Next, additional link flags for MPI are needed. These are in bold.
LDFLAGS = -Wl,-rpath,/usr/local/netcdf3-ifort/lib -L$MPI_ROOT/lib -lmpi -openmp
Then add the path to the MPI include directory, along with the additional Fortran flag.
19
Software Installation
Lastly, a difference with the MKL libraries eliminates the need for the lguide library.
MYLIBsys = -L$(LAPACK_PATH) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
These changes should be saved to the users configure.gsi resource file and tested. Once
they are confirmed to work, they may be moved into the configure.defaults file located
in the arch/ directory as a new build target.
To save your new build configuration, open the file configure.defaults, located in the
arch/ directory. You will notice that it contains a collection of platform/compiler specific
entries. The first entry is for the IBM platform, using the xlf compiler with 64-bit word
size. This entry is indicated by a label at the top of the block starting with the tag #ARCH.
For the 64-bit IBM build, the tag is:
The block for the 64-bit IBM build is immediately followed by the 32-bit IBM build entry,
which is indicated by the tag:
For our port of the generic Intel build to Zeus, locate the tag for the Linux/Intel build with
64 bit words. Its header looks like this:
Duplicate this entry and give it a unique name by modifying the ARCH entry.
#ARCH Linux x86_64, Intel compiler SGI MPT (ifort & icc) #
(dmpar,optimize)
Then update the variables to match the settings in the configure.gsi resource file tested
previously, and save your changes. Now when you run the ./configure script, there will be
a new build option for an SGI MPT build.
20
Running GSI
This chapter starts with discussing the input files required to run GSI. It then proceeds with
a detailed explanation of a sample run script and some frequently used options from the
GSI namelist. It concludes with samples of GSI products produced by a successful GSI run.
Please note that WRF is a community model system, including two dynamical cores:
the Advanced Research WRF (ARW) and the nonhydrostatic Mesoscale Model
(NMM). The GFS (Global Forecast System), NEMS (National Environmental
Modeling System)-NMMB (Nonhydrostatic Mesoscale Model B-Grid), and RTMA
(Real-Time Mesoscale Analysis) are operational systems of NCEP. Currently, the DTC
supports GSI for regional community models, such as WRF NMM and ARW.
Therefore, most of the multiple platform tests were conducted using WRF netcdf
background files (b, d). While the formal regression test suite conducts tests of all 7
background file formats (a-g) on an IBM platform, the NetCDF options are the most
thoroughly tested. For the current release, the following tests have been thoroughly
performed:
Observations
GSI can analyze many types of observational data, including conventional data,
satellite radiance observations, GSP, and radar data. A list of the default observation
21
Running GSI
file names used in GSI and the corresponding observations included in the files is
provided in the table below:
22
Running GSI
Remarks:
1. Because the current regional models do not have ozone as a prognostic variable,
ozone data are not assimilated at the regional scale.
2. GSI can analyze all of the data types on the list, but each GSI run (for both
operation or case study) only uses a subset of the data based on their quality,
forecast impacts, and availability. Some data may be outdated and not available,
some are on monitoring mode, and some data may have quality issues during
certain period. Users are encouraged to check the data quality issues prior to
running an analysis. The following NCEP links provide resources that include
data quality history:
http://www.emc.ncep.noaa.gov/mmb/data_processing/Satellite_Historical_Documentation.htm
http://www.emc.ncep.noaa.gov/mmb/data_processing/Non-satellite_Historical_Documentation.htm
3. For use with GSI, the observations are saved in the BUFR/PrepBUFR format
(with NCEP specified features). The details on the BUFR/PreBUFR processing
are introduced in the following website, which is supported and maintained by
the DTC:
http://www.dtcenter.org/com-GSI/BUFR/index.php
GSI can be run without any observations to see how the moisture constraint
modifies the first guess (background) field. GSI can be run in a pseudo single
observation mode, which does not require any BUFR observation files. In this
mode, users should specify observation information in the namelist section
SINGLEOB_TEST (see Section 4.2 for details). As more data files are used, additional
information will be added through the GSI analysis.
The GSI system has a directory of static or fixed files (called fix/ ), which includes
many files required in a GSI analysis. The following table lists fixed file names
used in the GSI code, the content of the files, and corresponding example files in
the regional test case:
23
Running GSI
Each operation system, such as GFS, NAM, and RTMA, has their own set of fixed
files. Therefore, for each fixed file used in GSI, there may be several optional files in
the fixed file directory fix/. For example, for the background error covariance file, both
nam_nmmstat_na.gcv (which is from the NAM system) and regional_glb_berror.f77.
gcv (which is from the global forecast system) can be used. We also prepared the same
background error covariance files with different byte order for Linux users such as
nam_nmmstat_na.gcv_Little_Endian and nam_glb_berror.f77.gcv_Little_Endian.
The GSI run script creates a run time environment necessary for running the GSI
executable. A typical GSI run script includes the following steps:
1. Request computer resources to run GSI.
2. Set environmental variables for the machine architecture.
3. Set experimental variables (such as experiment name, analysis time, background,
and observation).
4. Check the definitions of required variables.
5. Generate a run directory for GSI (sometime called working or temporary
directory).
6. Copy the GSI executable to the run directory.
24
Running GSI
7. Copy the background file and link ensemble members if running the hybrid to the
run directory.
8. Link observations to the run directory.
9. Link fixed files (statistic, control, and coefficient files) to the run directory.
10. Generate namelist for GSI.
11. Run the GSI executable.
12. Post-process: save analysis results, generate diagnostic files, clean run directory.
Typically, users only need to modify specific parts of the run script (steps 1, 2, and 3) to fit
their specific computer environment and point to the correct input/output files and
directories. Section 3.2.2 covers each of these modifications for steps 1 to 3. Section 3.2.3
dissects a sample run script, designed to run on a Linux cluster, an IBM supercomputer,
and a Linux workstation. Users should start with the provided run script and modify it for
their own run environment and case configuration.
This section focuses on modifying the machine specific entries in the run script. This
corresponds to step 1 of the run script. Specifically, this consists of setting Unix/Linux
environment variables and selecting the correct parallel run time environment (batch
system with options).
Let us start with the batch queuing system. On large computing platforms, machine
resources are allocated through a queuing system. A job (along with queuing options) is
submitted to a queue, where the job waits until sufficient machine resources become
available to execute the job. Once sufficient resources become available, the code executes
until it completes or the machine resources have been exhausted. Two queuing systems will
be listed below as examples:
25
Running GSI
set -x
The Linux workstation and Mac workstation generally do not run a batch system. In these
cases, step 1 can be skipped.
In both of the examples above, environment variables are set specifying system resource
management, such as the number of processors, the name/type of queue, maximum wall
clock time allocated for the job, options for standard out and standard error, etc. Some
platforms need additional definitions to specify Unix environmental variables that further
define the run environment.
These variable settings can significantly impact the GSI run efficiency and accuracy of the
GSI results. Please check with your system administrator for the optimal settings for your
computer system. Note that while the GSI can be run with any number of processors, using
more than (number_of_levels) times (number_of_variables) of processors, will not scale
well.
26
Running GSI
There are only two options to define in this block. The option ARCH selects the machine
architecture. It is a function of platform type, compiler, and batch queuing system. The
option GSIPROC sets the processor count used in the run. This option also decides if the job
is run as a multiple processor job or as a single processor run. We listed several choices of
the option ARCH in the sample run script. One of these choices should be applicable to a
users system. Please check with your system administrator about running parallel MPI
jobs on your system.
This section discusses setting up variables specific to the analysis case, such as analysis
time, working directory, background and observation files, location of fixed files and
CRTM coefficients, and the GSI executable file. A few cautions to be aware of are:
The time in the background file must be consistent with the time in the observation file
used for the GSI run (there is a namelist option to turn this check off).
Even if their contents are identical, PrepBUFR/BUFR files will differ if they were
created on platforms with different endian byte order specification (Linux intel vs.
IBM). If users obtain PrepBUFR/BUFR files from an IBM system, these files must be
converted before they can be used on a Linux system. Appendix A.1 discusses the
conversion tool ssrc to byte-swap observation files.
#
#####################################################
# case set up (users should change this part)
#####################################################
#
# ANAL_TIME= analysis time (YYYYMMDDHH)
# WORK_ROOT= working directory, where GSI runs
27
Running GSI
The next part of this block focuses on additional options that specify important aspects of
the GSI configuration. Option bk_core indicates the specific WRF core used to create the
background files and is used to specify the WRF core when generating the namelist. Option
bkcv_option specifies the background error covariance to be used in the script. Two
background error covariance matrices are provided with the release, one from NCEP global
data assimilation (GDAS), and one form NAM data assimilation system (NDAS). Please
check Section 7.2 for more details about GSI background error covariance. Option
if_clean is to set if the run script needs to delete temporal intermediate files in the
working directory after a GSI run is completed.
#------------------------------------------------
# bk_core= which WRF core is used as background (NMM or ARW)
# bkcv_option= which background error covariance and parameter will
# be used (GLOBAL or NAM)
# if_clean = clean :delete temporal files in working directory (default)
# no : leave running directory as is (this is for debug only)
bk_core=ARW
bkcv_option=NAM
if_clean=clean
If using hybrid function, the follow block of parameters related to ensemble forecasts also
need to be set:
#------------------------------------------------
# GSI hybrid options
# if_hybrid = .true. then turn on hybrid data analysis
# ntotmem = number of ensemble members used for hybrid
# mempath = path of ensemble members
#
# please note that we assume the ensemble member are located
# under ${mempath} with name wrfout_d01_${iiimem}.
# If this is not the case, please change the following lines:
#
# if [ -r ${mempath}/wrfout_d01_${iiimem} ]; then
# ln -sf ${mempath}/wrfout_d01_${iiimem} ./wrf_en${iiimem}
# else
# echo "member ${mempath}/wrfout_d01_${iiimem} is not exit"
# fi
28
Running GSI
#
#
if_hybrid=.false.
ntotmem=40
mempath=/ensemble_forecast/2011060212/wrfprd
Listed below is an annotated run script (Courier New) with explanations on each function
block.
For further details on the first 3 blocks of the script that users need to change, check section
3.2.2.1, 3.2.2.2, and 3.2.2.3:
#!/bin/ksh
#####################################################
# machine set up (users should change this part)
#####################################################
#
#
GSIPROC=1
ARCH='LINUX'
# Supported configurations:
# IBM_LSF,IBM_LoadLevel
# LINUX, LINUX_LSF, LINUX_PBS,
# DARWIN_PGI
#
#####################################################
# case set up (users should change this part)
#####################################################
#
# ANAL_TIME= analysis time (YYYYMMDDHH)
# WORK_ROOT= working directory, where GSI runs
# PREPBURF = path of PreBUFR conventional obs
# BK_FILE = path and name of background file
# OBS_ROOT = path of observations files
# CRTM_ROOT= path of crtm coefficients files
# FIX_ROOT = path of fix files
# GSI_EXE = path and name of the gsi executable
ANAL_TIME=2008051112
WORK_ROOT=./run_${ANAL_TIME}_arw_single
BK_FILE=./bkARW/wrfout_d01_2008-05-11_12:00:00
OBS_ROOT=./GSI_DATA/obs
PREPBUFR=${OBS_ROOT}/newgblav.gdas1.t12z.prepbufr.nr
CRTM_ROOT=./comGSI/crtm/CRTM_Coefficients
FIX_ROOT=./comGSI_v3.1/fix
GSI_EXE=./comGSI_v3.1/run/gsi.exe
#------------------------------------------------
# bk_core= which WRF core is used as background (NMM or ARW)
# bkcv_option= which background error covariance and parameter will
# be used (GLOBAL or NAM)
# if_clean = clean:delete temporal files in working directory (default)
29
Running GSI
At this point, users should be able to run the GSI for simple cases without changing the
scripts. However, some advanced users may need to change some of the following blocks
for special applications, such as use of radiance data, cycled runs, or running GSI on a
platform not tested by the DTC.
#####################################################
# Users should NOT change script after this point
#####################################################
#
The next block sets byte-order and run command to run GSI on multiple platforms. The
ARCH is set in the beginning of the script. Option BYTE_ORDER tells the script the byte order
of the machine, which will set up the right byte-order files for the CRTM coefficients and
background error covariance. The choices depend on a combination of the platform (e.g.,
IBM vs. Linux).
case $ARCH in
'IBM_LSF')
###### IBM LSF (Load Sharing Facility)
BYTE_ORDER=Big_Endian
RUN_COMMAND="mpirun.lsf " ;;
'IBM_LoadLevel')
###### IBM LoadLeve
BYTE_ORDER=Big_Endian
RUN_COMMAND="poe " ;;
'LINUX')
BYTE_ORDER=Little_Endian
30
Running GSI
if [ $GSIPROC = 1 ]; then
#### Linux workstation - single processor
RUN_COMMAND=""
else
###### Linux workstation - mpi run
RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach "
fi ;;
'LINUX_LSF')
###### LINUX LSF (Load Sharing Facility)
BYTE_ORDER=Little_Endian
RUN_COMMAND="mpirun.lsf " ;;
'LINUX_PBS')
BYTE_ORDER=Little_Endian
#### Linux cluster PBS (Portable Batch System)
RUN_COMMAND="mpirun -np ${GSIPROC} " ;;
'DARWIN_PGI')
### Mac - mpi run
BYTE_ORDER=Little_Endian
if [ $GSIPROC = 1 ]; then
#### Mac workstation - single processor
RUN_COMMAND=""
else
###### Mac workstation - mpi run
RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach "
fi ;;
* )
print "error: $ARCH is not a supported platform configuration."
exit 1 ;;
esac
#
The next block checks if all the variables needed for a GSI run are properly defined. These
variables should have been defined in the first 3 parts of this script.
#
################################################################
# Check GSI needed environment variables are defined and exist
#
31
Running GSI
if [ ! -r "${BK_FILE}" ]; then
echo "ERROR: ${BK_FILE} does not exist!"
exit 1
fi
The next block creates a working directory (workdir) in which GSI will run. The directory
should have enough disk space to hold all the files needed for this run. This directory is
cleaned before each run, therefore, save all the files needed from the previous run before
rerunning GSI.
32
Running GSI
#
###################################################################
# Create the ram work directory and cd into it
workdir=${WORK_ROOT}
echo " Create working directory:" ${workdir}
if [ -d "${workdir}" ]; then
rm -rf ${workdir}
fi
mkdir -p ${workdir}
cd ${workdir}
After creating a working directory, copy the GSI executable, background, observation, and
fixed files into the working directory. If hybrid is chosen, link ensemble members to the
working directory
#
###################################################################
echo " Copy GSI executable, background file, and link observation
bufr to working directory"
Note: Copy the background file to the working directory as wrf_inout. The file
wrf_inout will be overwritten by GSI to save analysis result.
Note: link ensemble members and rename them with the GSI recognized name:
wrf_en${iiimem}.
# Link ensember members if use hybrid
if [ ${if_hybrid} = .true. ] ; then
echo " link ensemble members to working directory"
let imem=1
33
Running GSI
if [ -r ${mempath}/wrfout_d01_${iiimem} ]; then
ln -sf ${mempath}/wrfout_d01_${iiimem} ./wrf_en${iiimem}
else
echo "member ${mempath}/wrfout_d01_${iiimem} is not exit"
exit 1
fi
done
fi
Note: You can link observation files to the working directory because GSI will not
overwrite these files. The possible observations to be analyzed in GSI can be found
in dfile of the GSI namelist section OBS_INPUT in this example script. Most of the
conventional observations are in one single file named prepbufr, while radiance
data are in separate files based on satellite instruments, such as AMSU-A or HIRS.
All these observation files must be linked to GSI recognized file names in dfile.
Please check the table in Section 3.1 under the observation section for a detailed
explanation of links and the meanings of each file name listed below.
# Link to the prepbufr data
ln -s ${PREPBUFR} ./prepbufr
The following block copies constant fixed files from the fix/ directory and links CRTM
coefficients. Please check Section 3.1 for the meanings of each fixed file. For background
error covariances, observation errors, and data information files, we provide two sets of
fixed files here, one set is based on GFS statistics and another is based on NAM statistics.
#
###################################################################
echo " Copy fixed files and link CRTM coefficient files to working
directory"
34
Running GSI
SATANGL=${FIX_ROOT}/global_satangbias.txt
SATINFO=${FIX_ROOT}/global_satinfo.txt
CONVINFO=${FIX_ROOT}/global_convinfo.txt
OZINFO=${FIX_ROOT}/global_ozinfo.txt
PCPINFO=${FIX_ROOT}/global_pcpinfo.txt
RTMFIX=${CRTM_ROOT}
RTMEMIS=${RTMFIX}/EmisCoeff/${BYTE_ORDER}/EmisCoeff.bin
RTMAERO=${RTMFIX}/AerosolCoeff/${BYTE_ORDER}/AerosolCoeff.bin
RTMCLDS=${RTMFIX}/CloudCoeff/${BYTE_ORDER}/CloudCoeff.bin
35
Running GSI
Set up some constants used in the GSI namelist. Please note that bkcv_option is set for
background error tuning. They should be set based on specific applications. Here we
provide two sample sets of the constants for different background error covariance options,
one set is used in the GFS operations and one for the NAM operations.
#
###################################################################
# Set some parameters for use by the GSI executable and to build
the namelist
echo " Build the namelist "
export JCAP=62
export LEVS=60
export JCAP_B=62
export DELTIM=${DELTIM:-$((3600/($JCAP/20)))}
36
Running GSI
The following large chunk of the script is used to generate the GSI namelist called
gsiparm.anl created in the working directory. A detailed explanation of each variable can
be found in Section 3.3 and Appendix B.
&SETUP
miter=2,niter(1)=10,niter(2)=10,
write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true.,
gencode=78,qoption=2,
factqmin=0.0,factqmax=0.0,deltim=$DELTIM,
ndat=77,iguess=-1,
oneobtest=.false.,retrieval=.false.,
nhr_assimilation=3,l_foto=.false.,
use_pbl=.false.,
/
&GRIDOPTS
JCAP=$JCAP,JCAP_B=$JCAP_B,NLAT=$NLAT,NLON=$LONA,nsig=$LEVS,hybrid=.true.,
wrf_nmm_regional=${bk_core_nmm},wrf_mass_regional=${bk_core_arw},
diagnostic_reg=.false.,
filled_grid=.false.,half_grid=.true.,netcdf=.true.,
/
&BKGERR
vs=${vs_op}
hzscl=${hzscl_op}
bw=0.,fstat=.true.,
/
&ANBKGERR
anisotropic=.false.,an_vs=1.0,ngauss=1,
an_flen_u=-5.,an_flen_t=3.,an_flen_z=-200.,
ifilt_ord=2,npass=3,normal=-200,grid_ratio=4.,nord_f2a=4,
/
&JCOPTS
/
&STRONGOPTS
jcstrong=.false.,jcstrong_option=3,nstrong=0,nvmodes_keep=20,period_max=3.,
baldiag_full=.true.,baldiag_inc=.true.,
/
&OBSQC
dfact=0.75,dfact1=3.0,noiqc=.false.,c_varqc=0.02,vadfile='prepbufr',
/
&OBS_INPUT
dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120,time_window_max=1.5,
dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0,
dfile(02)='prepbufr' dtype(02)='t', dplat(02)=' ', dsis(02)='t', dval(02)=1.0, dthin(02)=0,
dfile(03)='prepbufr', dtype(03)='q', dplat(03)=' ', dsis(03)='q', dval(03)=1.0, dthin(03)=0,
dfile(04)='prepbufr', dtype(04)='pw', dplat(04)=' ', dsis(04)='pw', dval(04)=1.0, dthin(04)=0,
dfile(05)='satwnd', dtype(05)='uv', dplat(05)=' ', dsis(05)='uv', dval(05)=1.0, dthin(05)=0,
dfile(06)='prepbufr', dtype(06)='uv', dplat(06)=' ', dsis(06)='uv', dval(06)=1.0, dthin(06)=0,
dfile(07)='prepbufr', dtype(07)='spd', dplat(07)=' ', dsis(07)='spd', dval(07)=1.0, dthin(07)=0,
dfile(08)='prepbufr', dtype(08)='dw', dplat(08)=' ', dsis(08)='dw', dval(08)=1.0, dthin(08)=0,
dfile(09)='radarbufr', dtype(09)='rw', dplat(09)=' ', dsis(09)='rw', dval(09)=1.0, dthin(09)=0,
dfile(10)='prepbufr', dtype(10)='sst', dplat(10)=' ', dsis(10)='sst', dval(10)=1.0, dthin(10)=0,
dfile(11)='gpsrobufr', dtype(11)='gps_ref', dplat(11)=' ', dsis(11)='gps', dval(11)=1.0, dthin(11)=0,
dfile(12)='ssmirrbufr',dtype(12)='pcp_ssmi', dplat(12)='dmsp', dsis(12)='pcp_ssmi', dval(12)=1.0, dthin(12)=-1,
dfile(13)='tmirrbufr', dtype(13)='pcp_tmi', dplat(13)='trmm', dsis(13)='pcp_tmi', dval(13)=1.0, dthin(13)=-1,
dfile(14)='sbuvbufr', dtype(14)='sbuv2', dplat(14)='n16', dsis(14)='sbuv8_n16', dval(14)=1.0, dthin(14)=0,
dfile(15)='sbuvbufr', dtype(15)='sbuv2', dplat(15)='n17', dsis(15)='sbuv8_n17', dval(15)=1.0, dthin(15)=0,
dfile(16)='sbuvbufr', dtype(16)='sbuv2', dplat(16)='n18', dsis(16)='sbuv8_n18', dval(16)=1.0, dthin(16)=0,
dfile(17)='omibufr', dtype(17)='omi', dplat(17)='aura', dsis(17)='omi_aura', dval(17)=1.0, dthin(17)=6,
dfile(18)='hirs3bufr', dtype(18)='hirs3', dplat(18)='n16', dsis(18)='hirs3_n16', dval(18)=0.0, dthin(18)=1,
dfile(19)='hirs3bufr', dtype(19)='hirs3', dplat(19)='n17', dsis(19)='hirs3_n17', dval(19)=6.0, dthin(19)=1,
dfile(20)='hirs4bufr', dtype(20)='hirs4', dplat(20)='n18', dsis(20)='hirs4_n18', dval(20)=0.0, dthin(20)=1,
dfile(21)='hirs4bufr', dtype(21)='hirs4', dplat(21)='metop-a', dsis(21)='hirs4_metop-a', dval(21)=6.0, dthin(21)=1,
dfile(22)='gsndrbufr', dtype(22)='sndr', dplat(22)='g11', dsis(22)='sndr_g11', dval(22)=0.0, dthin(22)=1,
dfile(23)='gsndrbufr', dtype(23)='sndr', dplat(23)='g12', dsis(23)='sndr_g12', dval(23)=0.0, dthin(23)=1,
37
Running GSI
/
&SUPEROB_RADAR
del_azimuth=5.,del_elev=.25,del_range=5000.,del_time=.5,elev_angle_max=5.,
minnum=50,range_max=100000., l2superob_only=.false.,
/
&LAG_DATA
/
&HYBRID_ENSEMBLE
l_hyb_ens=${if_hybrid},uv_hyb_ens=.true.,
aniso_a_en=.false.,generate_ens=.false.,
n_ens=${ntotmem},
beta1_inv=0.5,s_ens_h=110,s_ens_v=-0.3,
regional_ensemble_option=${ensbk_option},
/
38
Running GSI
&RAPIDREFRESH_CLDSURF
/
&CHEM
/
&SINGLEOB_TEST
maginnov=1.0,magoberr=0.8,oneob_type='t',
oblat=38.,oblon=279.,obpres=500.,obdattim=${ANAL_TIME},
obhourset=0.,
/
EOF
The following block runs GSI and checks if GSI has successfully completed.
#
#
###################################################
# run GSI
###################################################
echo ' Run GSI with' ${bk_core} 'background'
case $ARCH in
'IBM_LSF'|'IBM_LoadLevel')
${RUN_COMMAND} ./gsi.exe < gsiparm.anl > stdout 2>&1 ;;
* )
${RUN_COMMAND} ./gsi.exe > stdout 2>&1 ;;
esac
##################################################################
# run time error check
##################################################################
error=$?
The following block saves the analysis results with an understandable name and adds the
analysis time to some output file names. Among them, stdout contains runtime output of
GSI and wrf_inout is the analysis results.
#
##################################################################
#
# GSI updating satbias_in (only for cycling assimilation)
#
39
Running GSI
ln fort.207 fit_rad1.${ANAL_TIME}
The following block collects the diagnostic files. The diagnostic files are merged and
categorized by outer loop and data type. Setting write_diag to true, directs GSI to write
out diagnostic information for each observation station. This information is very useful to
check analysis details. Please check Appendix A.2 for the tool to read and analyze these
diagnostic files.
# Loop over first and last outer loops to generate innovation
# diagnostic files for indicated observation types (groups)
#
# NOTE: Since we set miter=2 in GSI namelist SETUP, outer
# loop 03 will contain innovations with respect to
# the analysis. Creation of o-a innovation files
# is triggered by write_diag(3)=.true. The setting
# write_diag(1)=.true. turns on creation of o-g
# innovation files.
#
case $loop in
01) string=ges;;
03) string=anl;;
*) string=$loop;;
esac
40
Running GSI
The complete namelist options and their explanations are listed in Appendix B. For most
GSI analysis applications, only a few namelist variables need to be changed. Here we
introduce frequently used variables for regional analyses:
To change the number of outer loops and the number of inner iterations in each outer loop,
you need to modify the following three variables in the namelist:
miter: number of outer loops of analysis.
st
niter(1): maximum iteration number of inner loop iterations for the 1 outer
loop. The inner loop will stop when it reaches this maximum number,
reaches the convergence condition, or when it fails to converge.
nd
niter(2): maximum iteration number of inner loop iterations for the 2 outer
loop.
If miter is larger than 2, repeat niter with larger index.
If you want to change moisture variable used in analysis, the namelist variable is:
qoption = 1 or 2:
If qoption=1, the moisture analysis variable is pseudo-relative humidity.
The saturation specific humidity, qsatg, is computed from the guess and
held constant during the inner loop. Thus, the RH control variable can only
change via changes in specific humidity, q.
If qoption=2, the moisture analysis variable is normalized RH. This
formulation allows RH to change in the inner loop via changes to surface
pressure (pressure), temperature, or specific humidity.
41
Running GSI
The following four variables define which background field will be used in the GSI
analyses:
regional: if true, perform a regional GSI run using either ARW or NMM inputs
as the background. If false, perform a global GSI analysis. If either
wrf_nmm_regional or wrf_mass_regional are true, it will be set to
true.
wrf_nmm_regional: if true, background comes from WRF NMM. When using
other background fields, set it to false.
wrf_mass_regional: if true, background comes from WRF ARW. When using
other background fields, set it to false.
netcdf: if true, wrf files are in NetCDF format, otherwise wrf files are in binary
format. This option only works for performing a regional GSI analysis.
The following variables decide if the GSI will write out diagnostic results in certain loops:
write_diag(1): if true, write out diagnostic data in the beginning of the analysis,
so that we can have information on Observation Background (O-B) .
st nd
write_diag(2) : if true, write out diagnostic data in between the 1 and 2 outer
loop .
nd
write_diag(3): if true, write out diagnostic data at the end of the 2 outer loop
(after the analysis finishes if the outer loop number is 2), so that we can
have information on Observation Analysis (O-A) .
Please check appendix A.2 for the tools to read the diagnostic files.
The following set up the GSI recognized observation files for GSI observation ingests:
ndat: number of observation variables (not observation types). This number
should be consistent with the observation variable lines in section
OBS_INPUT. If adding a new observation variable, ndat must be
incremented by one and one new line is added in OBS_INPUT. Based on
dimensions of the variables in OBS_INPUT (e.g., dfile), the maximum
value of ndat is 200 in this version.
dfile(01)='prepbufr': GSI recognized observation file name. The
observation file contains observations used for a GSI analysis. This file can
include several observation variables from different observation types. The
file name in this parameter will be read in by GSI. This name can be
changed as long as the name in the link from the BUFR/PrepBUFR file also
changes correspondingly.
42
Running GSI
dtype(01)='ps': analysis variable name that GSI can read in and handle. As an
example here, GSI will read all ps observations from the file prepbufr.
Please note this name should be consistent with that used in the GSI code.
dplat(01): sets up the observation platform for a certain observation, which will
be read in from the file dfile.
dsis(01): sets up data name (including both data type and platform name) used
inside GSI.
Please see Section 4.3 and Section 8.1 for examples and explanations to these
variables.
In the namelist section OBS_INPUT, use time_window_max to set maximum half time
window (hours) for all data types. In the convinfo file, you can use column twindow to set
the half time window for a certain data type (hours). For conventional observations, only
observations within the smaller window of these two will be kept to further processing. For
others, observations within time_window_max will be kept to further processing.
Radiance data thinning is controlled through two GSI namelist variables in the section
&OBS_INPUT. Below is an example of the section:
&OBS_INPUT
dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120,time_window_max=1.5,
dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0,
dfile(11)='ssmirrbufr',dtype(11)='pcp_ssmi', dplat(11)='dmsp', dsis(11)='pcp_ssmi', dval(11)=1.0, dthin(11)=-1,
dfile(12)='tmirrbufr', dtype(12)='pcp_tmi', dplat(12)='trmm', dsis(12)='pcp_tmi', dval(12)=1.0, dthin(12)=-1,
dfile(13)='sbuvbufr', dtype(13)='sbuv2', dplat(13)='n16', dsis(13)='sbuv8_n16', dval(13)=1.0, dthin(13)=0,
dfile(14)='sbuvbufr', dtype(14)='sbuv2', dplat(14)='n17', dsis(14)='sbuv8_n17', dval(14)=1.0, dthin(14)=0,
dfile(15)='sbuvbufr', dtype(15)='sbuv2', dplat(15)='n18', dsis(15)='sbuv8_n18', dval(15)=1.0, dthin(15)=0,
dfile(17)='hirs2bufr', dtype(17)='hirs2', dplat(17)='n14', dsis(17)='hirs2_n14', dval(17)=6.0, dthin(17)=1,
dfile(18)='hirs3bufr', dtype(18)='hirs3', dplat(18)='n16', dsis(18)='hirs3_n16', dval(18)=0.0, dthin(18)=1,
dfile(19)='hirs3bufr', dtype(19)='hirs3', dplat(19)='n17', dsis(19)='hirs3_n17', dval(19)=6.0, dthin(19)=1,
dfile(20)='hirs4bufr', dtype(20)='hirs4', dplat(20)='n18', dsis(20)='hirs4_n18', dval(20)=0.0, dthin(20)=1,
dfile(27)='msubufr', dtype(27)='msu', dplat(27)='n14', dsis(27)='msu_n14', dval(27)=2.0, dthin(27)=2,
dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0,dthin(28)=2,
dfile(34)='amsubbufr', dtype(34)='amsub', dplat(34)='n15', dsis(34)='amsub_n15', dval(34)=3.0, dthin(34)=3,
dfile(35)='amsubbufr', dtype(35)='amsub', dplat(35)='n16', dsis(35)='amsub_n16', dval(35)=3.0, dthin(35)=3,
dfile(41)='ssmitbufr', dtype(41)='ssmi', dplat(41)='f15', dsis(41)='ssmi_f15', dval(41)=0.0, dthin(41)=4,
The two namelist variables that control the radiance data thinning are real array dmesh
in the 1st line and the integer array dthin in the last column. The dmesh gives a set of
the mesh sizes in unit km for radiance thinning grids, while the dthin defines if the
data type it represents needs to be thinned and which thinning grid (mesh size) to use. If
the value of dthin is:
43
Running GSI
The following gives several thinning examples defined by the above sample
&OBS_INPUT section:
Data type 1 (ps in prepbufr): no thinning because dthin(01)=0
Data type 11 (pcp_ssmi from dmsp): no thinning because dthin(01)=-1
Data type 17 (hirs2 from NOAA-14): thinning in a 120 km grid because dthin(17)=1
and dmesh(1)=120
Data type 41 (ssmi from f15): thinning in a 60 km grid because dthin(41)=4 and
dmesh(4)=60
The conventional data can also be thinned. However, the setup of thinning is not in the
namelist. To give users a complete picture of data thinning, conventional data thinning
is briefly introduced here. There are three columns, ithin, rmesh, pmesh, in the
convinfo file (more details on this file are in Section 4.3) to configure conventional data
thinning:
In the namelist section BKGERR, use vs to set up scale factor for vertical correlation length
and hzscl to set up scale factors for horizontal smoothing. The scale factors for the
variance of each analysis variables are set in anavinfo file.
To do single observation test, the following namelist option has to be set to true first:
oneobtest=.true.
Then go to the namelist section SINGLEOB_TEST to set up the single observation location
and variable to be tested, please see Section 4.2 for an example and details on the single
observation test.
44
Running GSI
When completed, GSI will create a number of files in the run directory. Below is an
example of the files generated in the run directory from one of the GSI test case runs. This
case was run to perform a regional GSI analysis with a WRF ARW NetCDF background
using conventional (prepbufr), radiance data (AMSU-A, AMSU-B, HIRS3, HIRS4, and
MHS), and GPSRO data. The analysis time is 12Z 22 March 2011. Four processors were
used. To make the run directory more readable, we turned on the clean option in the run
script, which deleted all temporary intermediate files:
amsuabufr fit_p1.2011032212 fort.221
amsubbufr fit_q1.2011032212 gpsrobufr
anavinfo fit_rad1.2011032212 gsi.exe
berror_stats fit_t1.2011032212 gsiparm.anl
convinfo fit_w1.2011032212 hirs3bufr
diag_amsua_metop-a_anl.2011032212 fort.201 hirs4bufr
diag_amsua_metop-a_ges.2011032212 fort.202 l2rwbufr
diag_amsua_n15_anl.2011032212 fort.203 list_run_directory
diag_amsua_n15_ges.2011032212 fort.204 mhsbufr
diag_amsua_n18_anl.2011032212 fort.205 ozinfo
diag_amsua_n18_ges.2011032212 fort.206 pcpbias_out
diag_amsub_n17_anl.2011032212 fort.207 pcpinfo
diag_amsub_n17_ges.2011032212 fort.208 prepbufr
diag_conv_anl.2011032212 fort.209 prepobs_prep.bufrtable
diag_conv_ges.2011032212 fort.210 satbias_angle
diag_hirs3_n17_anl.2011032212 fort.211 satbias_in
diag_hirs3_n17_ges.2011032212 fort.212 satbias_out
diag_hirs4_metop-a_anl.2011032212 fort.213 satinfo
diag_hirs4_metop-a_ges.2011032212 fort.214 stdout
diag_mhs_metop-a_anl.2011032212 fort.215 stdout.anl.2011032212
diag_mhs_metop-a_ges.2011032212 fort.217 wrfanl.2011032212
diag_mhs_n18_anl.2011032212 fort.218 wrf_inout
diag_mhs_n18_ges.2011032212 fort.219
errtable fort.220
There are several important files that hold GSI analysis results and diagnostic information.
We will introduce these files and their contents in detail in the following chapter. The
following is a brief list of what these files contain:
stdout.anl.2011032212/stdout: standard text output file, which is a copy
of stdout with the analysis time appended. This is the most commonly
used file to check the GSI analysis processes as well as basic and important
information about the analyses. We will explain the contents of stdout in
Section 4.1 and users are encouraged to read this file in detail to become
familiar with the order of GSI analysis processing.
wrfanl.2011032212/wrf_inout: analysis results if GSI completes
successfully it exists only if using wrf for background. This is a copy of
wrf_inout with the analysis time appended. The format is the same as the
background file.
diag_conv_anl.(time): binary diagnostic files for conventional and GPS RO
observations at the final analysis step (analysis departure for each
observation, please see Appendix A.2 for more information).
diag_conv_ges.(time): binary diagnostic files for conventional and GPS RO
observations at initial analysis step (background departure for each
observation, please see Appendix A.2 for more information)
45
Running GSI
There are many intermediate files in this directory during the running stage; the complete
list of files before cleaning is saved in a file list_run_directory. Some knowledge about the
content of these files is very helpful for debugging if the GSI run crashes. Please check the
following list for the meaning of these files: (Note: you may not see all the files in the list
because different observational data are used. Also, the fixed files prepared for a GSI run,
such as CRTM coefficient files, are not included.)
46
Running GSI
47
GSI Diagnostics and Tuning
The purpose of this chapter is to help users understand and diagnose the GSI analysis. The
chapter starts with an introduction to the content and structure of the GSI standard output
(stdout). It continues with the use of a single observation to check the features of the GSI
analysis. An introduction to observation usage control, analysis domain partition, fit files,
and the optimization process will all be presented from information within the GSI output
files (including stdout).
In Section 3.4, we listed the files present in the GSI run directory following a successful
GSI analysis and briefly introduced the contents of several important files. Of these, stdout
is the most useful because the most critical information about the GSI analysis can be
obtained from the file. From stdout, users can check if the GSI has successfully completed,
if optimal iterations look correct, and if the background and analysis fields are reasonable.
The stdout file also contains the GSI standard output from all the processors.
Understanding the content of this file can also be very helpful for users to find where and
why the GSI failed if it crashes.
The structure of stdout follows the typical steps in a meteorological data analysis system:
In this section, the detailed structure and content of stdout are explained using an on-line
example case. This case uses a WRF-ARW NetCDF file as the background and analyzes
several observations typical for operations, including most conventional observation data,
several radiance data (AMSU-A, AMSU-B, HIRS3, HIRS4, and MHS), and GPSRO data.
The case was run on the Linux cluster supercomputer, using 4 processors. To keep the
output concise and make it more readable, most repeated content was deleted (shown by
the blue dotted line). For the same reason, the accuracy of some numbers has been reduced
to avoid line breaks in stdout.
The following indicates the start of the GSI analysis. It includes some run time information
of the analysis and control variables.
48
GSI Diagnostics and Tuning
* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
PROGRAM GSI_ANL HAS BEGUN. COMPILED 1999232.55 ORG: NP23
STARTING DATE-TIME JUL 13,2012 21:55:14.798 195 FRI 2456122
Next is the content of all namelist variables used in this analysis. The 1st part shows the
4DVAR setups. Please note that while this version of the GSI includes some 4DVAR code,
it is untested in this release. The general set up for the GSI analysis (3DVAR) is located in
the &SETUP section of the GSI namelist. Please check Appendix B for definitions and
default values of each namelist variable.
GSI_4DVAR: nobs_bins = 1
SETUP_4DVAR: l4dvar= F
SETUP_4DVAR: l4densvar= F
SETUP_4DVAR: winlen= 3.00000000000000
SETUP_4DVAR: winoff= 3.00000000000000
SETUP_4DVAR: hr_obsbin= 3.00000000000000
SETUP_4DVAR: nobs_bins= 1
SETUP_4DVAR: ntlevs_ens= 1
SETUP_4DVAR: nsubwin,nhr_subwin= 1 3
SETUP_4DVAR: lsqrtb= F
SETUP_4DVAR: lbicg= F
SETUP_4DVAR: lcongrad= F
SETUP_4DVAR: lbfgsmin= F
&SETUP
GENCODE = 78.0000000000000 ,
FACTQMIN = 0.000000000000000E+000,
FACTQMAX = 0.000000000000000E+000,
DELTIM = 1200.00000000000 ,
DTPHYS = 3600.00000000000 ,
BIASCOR = -1.00000000000000 ,
BCOPTION = 1,
DIURNALBC = 0.000000000000000E+000,
NDAT = 75,
NITER = 0, 2*10, 48*0,
NITER_NO_QC = 51*1000000,
MITER = 2,
QOPTION = 2,
NHR_ASSIMILATION = 3,
MIN_OFFSET = 180,
IOUT_ITER = 220,
NPREDP = 6,
49
GSI Diagnostics and Tuning
RETRIEVAL = F,
0: /
0: &GRIDOPTS
0: &BKGERR
0: &ANBKGERR
0: &JCOPTS
0: &STRONGOPTS
0: &OBSQC
0: &SUPEROB_RADAR
0: &LAG_DATA
0: &HYBRID_ENSEMBLE
0: &RAPIDREFRESH_CLDSURF
0: &CHEM
This version of GSI attempts to read multi-time-level backgrounds, however we only have
provided one. Therefor, there is some error information at the beginning of the background
reading portion:
CONVERT_NETCDF_MASS: problem with flnm1 = wrf_inou1, Status = -1021
We can ignore these errors for missing files wrf_inou1, wrf_inou2, to wrf_inou9 because
we only ran 3DVAR with one background.
Next, the background fields for the analysis are read in and the maximum and minimum
values of the fields at each vertical level are displayed. Here, only part of the variables znu
and T are shown, and all other variables read by the GSI are listed only as the variable
name in the NetCDF file (rmse_var = ). The maximum and minimum values are useful
for a quick verification that the background fields have been read successfully.
dh1 = 3
iy,m,d,h,m,s= 2011 3 22 12 0
0
dh1 = 3
before rmse var T
after rmse var T
dh1 = 3
rmse_var = T
ndim1 = 3
ordering = XYZ
k,znu(k)= 1 0.9990000
k,znu(k)= 2 0.9960001
k,znu(k)= 49 7.1999999E-03
k,znu(k)= 50 2.3500000E-03
rmse_var=ZNW
rmse_var=RDX
rmse_var=RDY
50
GSI Diagnostics and Tuning
rmse_var=MAPFAC_M
rmse_var=XLAT
rmse_var=XLONG
rmse_var=MUB
rmse_var=MU
rmse_var=PHB
rmse_var=T
ordering=XYZ
WrfType,WRF_REAL= 104 104
ndim1= 3
staggering= N/A
start_index= 1 1 1 0
end_index= 348 247 50 0
k,max,min,mid T= 1 310.8738 234.0430 280.8286
k,max,min,mid T= 2 311.2928 235.4056 280.9952
k,max,min,mid T= 3 311.4995 237.6656 281.3005
rmse_var=U
rmse_var=V
rmse_var=LANDMASK
rmse_var=SEAICE
rmse_var=SST
rmse_var=IVGTYP
rmse_var=ISLTYP
rmse_var=VEGFRA
rmse_var=SNOW
rmse_var=U10
rmse_var=V10
rmse_var=SMOIS
rmse_var=TSLB
rmse_var=TSK
rmse_var=QCLOUD
rmse_var=QRAIN
rmse_var=QSNOW
rmse_var=QICE
rmse_var=QGRAUP
rmse_var=RAD_TTEN_DFI
51
GSI Diagnostics and Tuning
The following example demonstrates how the analysis domain is partitioned into sub-
domains for each processor. 4 processors were used in this example (see Section 4.4 for
more information):
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 0 1 1 124 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 1 125 1 123 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 2 1 175 124 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 3 125 175 123 174
Information on the horizontal dimensions of the sub-domains and set grid related variables:
in general_sub2grid_create_info, kbegin= 1 77 153
228 303
in general_sub2grid_create_info, kend= 76 152 227
302
INIT_GRID_VARS: number of threads 1
INIT_GRID_VARS: for thread 1 jtstart,jtstop = 1
176
Information on using vertical levels from background fields to replace the number of
vertical levels set in anavinfo and the start index of each state variables in the bundle array:
INIT_RAD_VARS: ***WARNING*** mxlvs from the anavinfo file 30
is different from that of the guess 50
. Resetting maxlvs to match NSIG from guess.
Vars in Rad-Jacobian (dims)
--------------------------
tv 0
q 50
oz 100
u 150
v 151
sst 152
Display the analysis and background file time (they should be the same):
READ_wrf_mass_FILES: analysis date,minutes 2011 3
22 12 0 17472240
READ_wrf_mass_FILES: sigma guess file, nming2 0.000000000000000E+000
12 3 22 2011 17472240
READ_wrf_mass_FILES: sigma fcst files used in analysis : 3
3.00000000000000 1
READ_wrf_mass_FILES: surface fcst files used in analysis: 3
3.00000000000000 1
GESINFO: Guess date is 12 3 22 2011
0.000000000000000E+000
GESINFO: Analysis date is 2011 3 22 12
0 2011032212 3.00000000000000
Read in radar station information and conduct superobs for radar Level-II radial velocity.
This case didnt have radar Level-II velocity data linked. There is warning information
about opening the file but this will not impact the rest of the GSI analysis.
RADAR_BUFR_READ_ALL: problem opening level 2 bufr file "l2rwbufr"
Read in and show the content of the observation info files (see Section 4.3 for details).
Here is part of the stdout shown convinfo:
READ_CONVINFO: tcp 112 0 1 3.00000 0 0 0 75.0000 5.00000 1.00000 75.0000 0.00000 0 0.00000 0.00000 0
READ_CONVINFO: ps 120 0 1 3.00000 0 0 0 4.00000 3.00000 1.00000 4.00000 0.30E-03 0 0.00000 0.00000 0
52
GSI Diagnostics and Tuning
READ_CONVINFO: ps 180 0 1 3.00000 0 0 0 4.00000 3.00000 1.00000 4.00000 0.30E-03 0 0.00000 0.00000 0
READ_CONVINFO: t 120 0 1 3.00000 0 0 0 8.00000 5.60000 1.30000 8.00000 0.10E-05 0 0.00000 0.00000 0
READ_CONVINFO: t 126 0 -1 3.00000 0 0 0 8.00000 5.60000 1.30000 8.00000 0.10E-02 0 0.00000 0.00000 0
READ_CONVINFO: gps 440 0 -1 3.00000 0 0 0 10.0000 10.0000 1.00000 10.0000 0.00000 0 0.00000 0.0000 0
Read in background fields from intermediate binary file sigf03 and partition the whole 2D
field into subdomains for parallel analysis:
READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03
READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03
ifld, temp1(im/2,jm/2)= 4 0.28100E+03
READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03
ifld, temp1(im/2,jm/2)= 2 0.52546E+04
ifld, temp1(im/2,jm/2)= 3 0.28083E+03
gsi_metguess_mod*create_: alloc() for met-guess done
at 0 in read_wrf_mass_guess
at 0.1 in read_wrf_mass_guess
at 1 in read_wrf_mass_guess, lm = 50
at 1 in read_wrf_mass_guess, num_mass_fields= 214
at 1 in read_wrf_mass_guess, nfldsig = 1
at 1 in read_wrf_mass_guess, num_all_fields= 214
at 1 in read_wrf_mass_guess, npe = 4
at 1 in read_wrf_mass_guess, num_loc_groups= 53
at 1 in read_wrf_mass_guess, num_all_pad = 216
at 1 in read_wrf_mass_guess, num_loc_groups= 54
READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03
ifld, temp1(im/2,jm/2)= 1 0.93598E+05
ifld, temp1(im/2,jm/2)= 5 0.28130E+03
ifld, temp1(im/2,jm/2)= 6 0.28213E+03
ifld, temp1(im/2,jm/2)= 7 0.28372E+03
Show the source of observation error used in the analysis (details see Section 7.1.1):
CONVERR: using observation errors from user provided table
Read in all the observations. The observation ingesting processes are distributed over all
the processors with each processor reading in at least one observation type. To speed up
reading some of the large datasets, one type can be read in by more than one (ntasks)
processors.
Before reading in the data from BUFR files, GSI resets the file status depending on
whether the observation time matches the analysis time and how offtime_date is set. This
step also checks for consistency between the satellite radiance data types in the BUFR files
53
GSI Diagnostics and Tuning
and the usage setups in the satinfo files. The following shows stdout information from this
step:
read_obs_check: bufr file date is 2011032212 prepbufr t
read_obs_check: bufr file date is 2011032212 prepbufr q
read_obs_check: bufr file date is 2011032212 prepbufr ps
read_obs_check: bufr file date is 0 satwnd uv not used
read_obs_check: bufr file date is 0 radarbufr rw not used
read_obs_check: bufr file date is 0 tmirrbufr pcp_tmi not used
read_obs_check: bufr file date is 0 omibufr omi not used
read_obs_check: bufr file date is 0 gimgrbufr goes_img not used
read_obs_check: bufr file date is 2011032212 prepbufr uv
read_obs_check: bufr file date is 2011032212 amsuabufr amsua
54
GSI Diagnostics and Tuning
Using the above output information, many details on the observations can be obtained. For
example, the last line (bold) indicates that subroutine READ_BUFRTOVS was called to
read in NOAA-18 AMSU-A (sis=amsua_n18) from the BUFR file amsuabufr
(file=amsuabufr). Furthermore, this kind of data has 63825 observations in the file
(nread=63825) and 47704 in analysis domain and time-window (ndata=47704). The data
was thinned on a 60 km coarse grid (rmesh=60.000000).
The next step partitions observations into subdomains. The observation distribution is
summarized below by listing the number of observations for each observation variable in
each subdomain (see Section 4.4 for more information):
OBS_PARA: ps 2607 2878 9565 3019
OBS_PARA: t 5172 4743 13902 5590
OBS_PARA: q 4107 4197 11998 4090
OBS_PARA: pw 296 92 475 83
OBS_PARA: uv 6640 5439 18109 5725
OBS_PARA: sst 0 0 6 3
OBS_PARA: gps_ref 3538 5580 2277 6768
OBS_PARA: hirs3 n17 0 0 30 35
OBS_PARA: hirs4 metop-a 0 0 28 35
OBS_PARA: amsua n15 2564 1333 133 195
OBS_PARA: amsua n18 1000 2117 0 90
OBS_PARA: amsua metop-a 0 0 58 67
OBS_PARA: amsub n17 0 0 57 70
OBS_PARA: mhs n18 1438 2942 0 119
OBS_PARA: mhs metop-a 0 0 54 71
OBS_PARA: hirs4 n19 246 1097 0 37
OBS_PARA: amsua n19 656 3503 0 93
OBS_PARA: mhs n19 936 4277 0 116
55
GSI Diagnostics and Tuning
From this point forward in the stdout, the output shows many repeated entries. This is
because the information is written from inside the outer loop. Typically the outer loop is
iterated twice.
For each outer loop, the work begins with the calculation of the observation innovation.
This calculation is done by the subroutine setuprhsall, which sets up the right hand side
(rhs) of the analysis equation. This information is contained within the stdout file, which is
discussed in the following sections:
Calculate observation innovation for each data type in the first outer loop:
SETUPALL:,obstype,isis,nreal,nchanl=ps ps 22
0 2607
SETUPALL:,obstype,isis,nreal,nchanl=t t 24
0 5172
Read in CRTM coefficients for the first outer loop, compute radiance observation
innovation, and save diagnostic information in the first outer loop. In stdout, only
information related to available radiance data are printed. The complete information can be
found in the diagnostic files for each observation (for details see Appendix A.2):
SETUPALL:,obstype,isis,nreal,nchanl=amsua amsua_n15 33
15 2564
INIT_CRTM: crtm_init() on path "./"
Read_SpcCoeff_Binary(INFORMATION) : FILE: ./amsua_n15.SpcCoeff.bin; ^M
SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M
amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15
Read_ODPS_Binary(INFORMATION) : FILE: ./amsua_n15.TauCoeff.bin; ^M
ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15
N_COEFFS=21600
Read_EmisCoeff_Binary(INFORMATION) : FILE: ./EmisCoeff.bin; ^M
EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11
SETUPRAD: write header record for amsua_n15 5 30
8 0 0 15 0 13784
to file pe0000.amsua_n15_01 2011032212
INIT_CRTM: crtm_init() on path "./"
Read_SpcCoeff_Binary(INFORMATION) : FILE: ./hirs3_n17.SpcCoeff.bin; ^M
SpcCoeff RELEASE.VERSION: 7.02 N_CHANNELS=19
Read_ODPS_Binary(INFORMATION) : FILE: ./hirs3_n17.TauCoeff.bin; ^M
ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=5 N_ABSORBERS=3 N_CHANNELS=19
N_COEFFS=84300
Read_EmisCoeff_Binary(INFORMATION) : FILE: ./EmisCoeff.bin; ^M
EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11
SETUPRAD: write header record for hirs3_n17 5 30
8 0 0 15 0 13784
to file pe0002.hirs3_n17_01 2011032212
56
GSI Diagnostics and Tuning
The inner iteration of the first outer loop is discussed in the example below. In this simple
example, the maximum number of iterations is 10:
Print Jo components (observation term for each observation type in cost function) at the
beginning of the inner loop:
Begin Jo table outer loop
Observation Type Nobs Jo Jo/n
surface pressure 15600 1.1832900994269770E+04 0.759
temperature 11913 1.9028504337386752E+04 1.597
wind 36726 4.0133809857345797E+04 1.093
moisture 4416 4.6817824733959324E+03 1.060
gps 8709 1.8551360623628108E+04 2.130
radiance 53749 4.8794660264558479E+04 0.908
Nobs Jo Jo/n
Jo Global 131113 1.4302301855058485E+05 1.091
End Jo table outer loop
Print cost function values for each inner iteration (see section 4.6 for more details):
GLBSOI: START pcgsoi jiter= 1
pcgsoi: gnorm(1:2),b= 1.156412696509E+07 1.156412696509277448E+07 0.0000000000E+00
stprat 0.959014637164523
stprat 5.926551062234699E-014
Initial cost function = 1.430230185505848494E+05
Initial gradient norm = 3.400606852473948493E+03
Minimization iteration 0
grepcost J,Jb,Jo,Jc,Jl = 1 0 1.430230185505848494E+05 0.000000000000000000E+00
1.430230185505848494E+05 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 0 3.400606852473948493E+03 1.000000000000000000E+00
pcgsoi: cost,grad,step = 1 0 1.430230185505E+05 3.4006068524E+03 2.4398954427E-03
pcgsoi: gnorm(1:2),b= 1.870988434476E+06 1.8709884344764E+06 1.617924500590633419E-01
stprat 0.680228405155974
stprat 4.467539812170933E-015
Minimization iteration 1
grepcost J,Jb,Jo,Jc,Jl = 1 1 1.148077578695608245E+05 6.884228595044261567E+01
1.147389155836103746E+05 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 1 1.367840792810494804E+03 4.022343223284949310E-01
pcgsoi: cost,grad,step = 1 1 1.14807757869E+05 1.3678407928E+03 7.63011937910E-03
pcgsoi: gnorm(1:2),b= 1.701652049920E+06 1.7016520499209E+06 9.094936230309182967E-01
stprat 0.375750738871811
stprat 1.030833603466626E-016
Minimization iteration 10
grepcost J,Jb,Jo,Jc,Jl = 1 10 6.518198181954051688E+04 4.712017911695538714E+03
6.046996390784497635E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 10 3.526173233561088409E+02 1.036924698012588603E-01
pcgsoi: cost,grad,step = 1 10 6.5181981819E+04 3.526173233E+02 9.3430503337038E-03
Minimization final diagnostics
At the end of the 1st outer loop, print some diagnostics about the guess fields after
adding the analysis increment to the guess and diagnostics about the analysis
increment:
pcgsoi: Updating guess
================================================================================
Status Var Mean Min Max
analysis U 8.687820027834E+00 -5.993215723446E+01 7.019321109527E+01
analysis V -8.389957616632E-01 -7.818069678361E+01 9.543296554002E+01
analysis TV 2.402283737571E+02 1.913718225601E+02 3.039049699223E+02
analysis Q 1.527484398981E-03 1.000000000000E-07 1.836425129904E-02
analysis TSEN 2.399649493443E+02 1.913715400572E+02 3.008258699568E+02
analysis OZ 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis DUMY 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis DIV 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis VOR 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
57
GSI Diagnostics and Tuning
Information on the state variable array allocation after cleaning up major fields at the
end of the inner loop.
state_vectors: latlon11,latlon1n,latlon1n1,lat2,lon2,nsig= 22176
1108800 1130976 126 176 50
state_vectors: length= 8936928
state_vectors: currently allocated= 0
state_vectors: maximum allocated= 4
state_vectors: number of allocates= 4
state_vectors: number of deallocates= 4
state_vectors: Estimated max memory used= 272.9 Mb
Calculate observation innovations for each data type in the second outer loop:
SETUPALL:,obstype,isis,nreal,nchanl=ps ps 22
0 2607
SETUPALL:,obstype,isis,nreal,nchanl=t t 24
0 5172
Read in CRTM coefficients from the fix file for the second outer loop and save diagnostic
information at the beginning of the second outer loop:
SETUPALL:,obstype,isis,nreal,nchanl=amsua amsua_n15 33
15 2564
INIT_CRTM: crtm_init() on path "./"
Read_SpcCoeff_Binary(INFORMATION) : FILE: ./amsua_n15.SpcCoeff.bin; ^M
SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M
amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15
Read_ODPS_Binary(INFORMATION) : FILE: ./amsua_n15.TauCoeff.bin; ^M
ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15
N_COEFFS=21600
Read_EmisCoeff_Binary(INFORMATION) : FILE: ./EmisCoeff.bin; ^M
EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11
INIT_CRTM: crtm_init() on path "./"
Read_SpcCoeff_Binary(INFORMATION) : FILE: ./hirs4_metop-a.SpcCoeff.bin; ^M
SpcCoeff RELEASE.VERSION: 7.02 N_CHANNELS=19
Read_ODPS_Binary(INFORMATION) : FILE: ./hirs4_metop-a.TauCoeff.bin; ^M
ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=5 N_ABSORBERS=3 N_CHANNELS=19
N_COEFFS=82700
Read_EmisCoeff_Binary(INFORMATION) : FILE: ./EmisCoeff.bin; ^M
EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS
58
GSI Diagnostics and Tuning
The output from the inner iterations in the second outer loop is shown below. In this
example, the maximum number of iterations is 10:
Print Jo components (observation term for each observation type in cost function) at the
beginning of the inner loop:
Begin Jo table outer loop
Observation Type Nobs Jo Jo/n
surface pressure 15620 6.0353797873191743E+03 0.386
temperature 11913 1.0272593707203929E+04 0.862
wind 36796 2.2776352264954159E+04 0.619
moisture 4416 2.2085428390180150E+03 0.500
gps 8860 8.9698796153000221E+03 1.012
radiance 84607 4.1586556746867529E+04 0.492
Nobs Jo Jo/n
Jo Global 162212 9.1849304960662834E+04 0.566
End Jo table outer loop
Print cost function values for each inner iteration (see section 4.6 for more details):
GLBSOI: START pcgsoi jiter= 2
pcgsoi: gnorm(1:2),b= 6.04436368069045E+06 6.04436368069045E+06 0.000000000000000E+00
stprat 0.952177909610047
stprat 3.022462323621298E-014
Initial cost function = 9.709172928659517493E+04
Initial gradient norm = 2.458528763445824097E+03
Minimization iteration 0
grepcost J,Jb,Jo,Jc,Jl = 2 0 9.709172928659517493E+04 5.242424325932335705E+03
9.184930496066283376E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 0 2.458528763445824097E+03 1.000000000000000000E+00
pcgsoi: cost,grad,step = 2 0 9.70917292865E+04 2.45852876344E+03 2.0910838314E-03
pcgsoi: gnorm(1:2),b= 1.39687516886273E+06 1.39687516886273E+06 2.311037592468580E-01
stprat 0.326022963208770
stprat 3.222291707173269E-015
Minimization iteration 1
grepcost J,Jb,Jo,Jc,Jl = 2 1 8.445245812257242505E+04 5.411029823593031324E+03
7.904142829897940101E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 1 1.181894736794584333E+03 4.807325236000348778E-01
pcgsoi: cost,grad,step = 2 1 8.44524581225E+04 1.18189473679E+03 3.1026039720E-03
pcgsoi: gnorm(1:2),b= 6.2897182736819E+05 6.2897182736819E+05 4.5027060498202475E-01
stprat 1.600925505225033E-002
stprat 1.290408460808865E-015
Minimization iteration 10
grepcost J,Jb,Jo,Jc,Jl = 2 10 7.074054129933725926E+04 6.952319149296819887E+03
6.378822215004044119E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 10 3.275896517354244679E+02 1.332462148119354928E-01
pcgsoi: cost,grad,step = 2 10 7.07405412993E+04 3.27589651735E+02 4.3417883383E-03
Minimization final diagnostics
Because the outer loop is set to 2, the completion of the 2nd outer loop is the end of the
analysis. The next step is to save the analysis results. Again, only a portion of variable T is
shown and all other variables are listed according to variable name in the NetCDF file
(rmse_var = ). The maximum and minimum values are useful information for a quick
check of the reasonableness of the analysis:
pcgsoi: Updating guess
at 2 in wrwrfmassa
at 3 in wrwrfmassa
at 6 in wrwrfmassa
iy,m,d,h,m,s= 2011 3 22 12 0
0
nlon,lat,sig_regional= 348 247 50
59
GSI Diagnostics and Tuning
rmse_var=P_TOP
ordering=0
WrfType,WRF_REAL= 104 104
ndim1= 0
staggering= N/A
start_index= 1 1 1 0
end_index1= 348 247 50 0
p_top= 1000.000
rmse_var=MUB
ordering=XY
WrfType,WRF_REAL= 104 104
ndim1= 2
staggering= N/A
start_index= 1 1 1 0
end_index1= 348 247 50 0
max,min MUB= 99645.99 62950.09
max,min psfc= 103974.1 64035.82
max,min MU= 3974.133 -2036.453
rmse_var=MU
ordering=XY
WrfType,WRF_REAL= 104 104
ndim1= 2
staggering= N/A
start_index= 1 1 1 0
end_index1= 348 247 50 0
k,max,min,mid T= 1 311.0948 234.0167 280.5982
k,max,min,mid T= 2 311.5334 235.6663 280.7569
rmse_var=QVAPOR
rmse_var=U
rmse_var=V
rmse_var=SEAICE
rmse_var=SST
rmse_var=TSK
Diagnostics of the analysis results after adding the analysis increment to the guess and
diagnostics about the analysis increment:
================================================================================
Status Var Mean Min Max
analysis U 8.684494834819E+00 -5.994932102119E+01 7.041893161310E+01
analysis V -8.368213328561E-01 -7.803787125994E+01 9.484754997846E+01
analysis TV 2.401499796179E+02 1.913068477406E+02 3.039191344395E+02
analysis Q 1.535424180027E-03 1.000000000000E-07 1.869392878573E-02
analysis TSEN 2.398851864729E+02 1.913065654009E+02 3.008630161153E+02
analysis OZ 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis DUMY 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis DIV 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis VOR 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
analysis PRSL 4.098357903854E+01 1.148194759980E+00 1.038787838788E+02
analysis PS 9.917332267920E+01 6.406096161559E+01 1.039929254169E+02
analysis SST 2.807239944456E+02 2.168921356201E+02 3.027479553223E+02
analysis radb -7.769461831636E-03 -4.254112280406E+01 4.610384289368E+01
analysis pcpb 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
================================================================================
wwww u -3.325193015686E-03 -2.341066364048E+00 2.228906207570E+00
wwww v 2.174428807068E-03 -2.228629595272E+00 3.099467379467E+00
wwww tv -7.839413921041E-02 -1.677489609269E+00 1.525438097693E+00
60
GSI Diagnostics and Tuning
Information on control variable allocation after cleaning up major fields at the end of the
2nd outer loop:
state_vectors: latlon11,latlon1n,latlon1n1,lat2,lon2,nsig= 22176
1108800 1130976 126 176 50
state_vectors: length= 8936928
state_vectors: currently allocated= 0
state_vectors: maximum allocated= 4
state_vectors: number of allocates= 8
state_vectors: number of deallocates= 8
state_vectors: Estimated max memory used= 272.9 Mb
SETUPALL:,obstype,isis,nreal,nchanl=amsua amsua_n15 33
15 2564
INIT_CRTM: crtm_init() on path "./"
Read_SpcCoeff_Binary(INFORMATION) : FILE: ./amsua_n15.SpcCoeff.bin; ^M
SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M
amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15
Read_ODPS_Binary(INFORMATION) : FILE: ./amsua_n15.TauCoeff.bin; ^M
ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15
N_COEFFS=21600
Read_EmisCoeff_Binary(INFORMATION) : FILE: ./EmisCoeff.bin; ^M
EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11
SETUPRAD: write header record for amsua_n15 5 30
8 0 0 15 0 13784
to file pe0000.amsua_n15_03 2011032212
61
GSI Diagnostics and Tuning
Print Jo components (observation term for each observation type) after the analysis, which
shows the fit of the analysis results to the data if compare to the same section before the 1st
outer loop:
Begin Jo table outer loop
Observation Type Nobs Jo Jo/n
surface pressure 15620 5.6788282318981810E+03 0.364
temperature 11913 9.6271243544834779E+03 0.808
wind 36794 2.1433478710069721E+04 0.583
moisture 4417 2.0790935066632146E+03 0.471
gps 8868 8.1188601604511678E+03 0.916
radiance 100900 3.2527599143479922E+04 0.322
Nobs Jo Jo/n
Jo Global 178512 7.9464984107045690E+04 0.445
End Jo table outer loop
The end of the GSI analysis (a successful analysis must reach this end, but to reach this end
is not necessarily a successful analysis):
ENDING DATE-TIME JUL 13,2012 21:58:55.998 195 FRI 2456122
PROGRAM GSI_ANL HAS ENDED. IBM RS/6000 SP
* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
For runs on IBM computers only, additional machine resource statistics are provided. This
section gives very useful information about the computer resources used in the analysis.
Linux computer platforms do not provide this sort of information. The user must look for
other ways to find the time and memory information about the analysis run.
0:*****************RESOURCE STATISTICS*******************************
0:
0:The total amount of wall time = 551.066561
0:The total amount of time in user mode = 544.004598
0:The total amount of time in sys mode = 0.718078
0:The maximum resident set size (KB) = 2087516
0:Average shared memory use in text segment (KB*sec) = 3096668
0:Average unshared memory use in data segment (KB*sec) = 1082940860
0:Average unshared memory use in stack segment(KB*sec) = 0
0:Number of page faults without I/O activity = 34260
0:Number of page faults with I/O activity = 697
0:Number of times process was swapped out = 0
0:Number of times filesystem performed INPUT = 0
0:Number of times filesystem performed OUTPUT = 0
0:Number of IPC messages sent = 0
0:Number of IPC messages received = 0
0:Number of Signals delivered = 0
0:Number of Voluntary Context Switches = 1098
0:Number of InVoluntary Context Switches = 596
0:*****************END OF RESOURCE STATISTICS*************************
0:
62
GSI Diagnostics and Tuning
A single observation test is a GSI run with only one (pseudo) observation at a specific
location of the analysis domain. By examining the analysis increments, one can visualize
the important features of the analysis, such as the ratio of background and observation
variance and the pattern of the background error covariance. Therefore, the single
observation test is the first check that users should do after successfully installing the GSI.
To perform the single observation test with the GSI, the following GSI namelist variables
need to be set, which should be done through editing the run script (run_gsi.ksh):
oneobtest=.true.,
maginnov=1.0,
magoberr=1.0,
oneob_type='t',
oblat=20.,
oblon=285.,
obpres=850.,
obdattim= 2007081512,
obhourset=0.,
Note:
Please check Section 3.4 for the explanation of each parameter. From these
parameters, we can see that a useful observation in the analysis should include
information like the observation type (oneob_type), value (maginnov), error
(magoberr), location (oblat, oblong, obpres) and time (obdattim,
obhourset). Users can dump out (ncdump) the global attributes from the NetCDF
background file and set oblat=CEN_LAT, oblong=360-CEN_LON to have the
observation in the center of the domain.
In the analysis, the GSI first generates a prepbufr file including only one
observation based on the information given in the namelist &SINGLEOB_TEST
section. To generate this prepbufr file, the GSI needs to read in a PrepBUFR table,
which is not needed when generating a GSI analysis with real observations. The
table can be found in the fix/ directory and needs to be copied to the run directory.
Please check if the following lines are in the GSI run script before running the
single observation test:
bufrtable=${FIX_ROOT}/prepobs_prep.bufrtable
cp $bufrtable ./prepobs_prep.bufrtable
63
GSI Diagnostics and Tuning
A horizontal cross section (left column) and a vertical cross section (right column) of
analysis increment of T, U, V, and Q from a single T observation
64
GSI Diagnostics and Tuning
This single observation was located at the center of the domain. The results are shown with
figures of the horizontal and vertical cross sections through the point of maximum analysis
increment. The above figure was generated using NCL scripts, which can be found in the
util/Analysis_Utilities/plots_ncl directory, introduced in Section A.4.
Observation data used in the GSI analysis are controlled by three parts of the GSI system:
All BUFR/PrepBUFR observation files need to be linked to the working directory using
GSI recognized names. The run script (run_gsi.ksh) makes these links after locating the
working directory. Turning on or off these links can control the use of all the data types
contained in the BUFR files. In Section 3.1, we provide a table listing all default GSI
recognized observation file names and the corresponding examples of the observation
BUFR files from NCEP. The following is the first 3 rows of the table as an example:
The left column is the GSI recognized name (bold) and the right column are names of
BUFR files from NCEP. In the run script, the following lines are used to link the BUFR
files in the right column to the working directory using the GSI recognized names shown in
the left column:
The GSI recognized default observation filenames are set up in the namelist section
&OBS_INPUT, which certainly can be changed based on application needs (details see
below).
65
GSI Diagnostics and Tuning
In this namelist section, observation files (dfile) are linked to the observed variables used
in the GSI analysis (dsis), for example:
dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0,
dfile(02)='prepbufr' dtype(02)='t', dplat(02)=' ', dsis(02)='t', dval(02)=1.0, dthin(02)=0,
dfile(03)='prepbufr', dtype(03)='q', dplat(03)=' ', dsis(03)='q', dval(03)=1.0, dthin(03)=0,
Here, conventional observations ps, t, and q will be read in from file prepbufr and
AMSU-A radiances from NOAA-15 and -16 satellites will be read in from amsuabufr.
Deleting a particular line in &OBS_INPUT will turn off the use of this observation variable in
the GSI analysis but other variables under the same type still can be used. For example, if
we delete:
dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0, dthin(28)=2,
Then, the AMSU-A observation from NOAA-15 will not be used in the analysis but the
AMSU-A observations from NOAA-16 will still be used.
The observation filename in dfile can be different from the sample script (run_gsi.ksh). If
the filename in dfile has been changed, the link from the BUFR files to the GSI recognized
name in the run script also need to be changed correspondingly. For example, if we change
the dfile(28):
dfile(28)='amsuabufr_n15', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0, dthin(28)=2,
dfile(29)='amsuabufr', dtype(29)='amsua', dplat(29)='n16', dsis(29)='amsua_n16', dval(29)=0.0, dthin(29)=2,
The GSI will read NOAA-16 AMSU-A observations from file amsuabufr and NOAA-15
AMSU-A observations from file amsuabufr_n15 based on the above changes to the run
scripts and namelist. In this example, both amsuabufr and amsuabufr_15 are linked to
the same BUFR file and NOAA-15 AMSU-A and NOAA-16 AMSU-A observations are
still read in from the same BUFR file. If amsuabufr and amsuabufr_15 link to different
BUFR files, then NOAA-15 AMSU-A and NOAA-16 AMSU-A will be read in from
different BUFR files. Clearly, the changeable filename in dfile gives GSI more capability
to handle multiple data resources.
66
GSI Diagnostics and Tuning
For each variable, observations can come from multiple platforms (data types or
observation instruments). For example, surface pressure (ps) can come from METAR
observation stations (data type 187) and Rawinsonde data (data type 120). There are
several files named *info in the GSI system (located in /fix) to control the usage of
observations based on the observation platform. Below is a list of info files and their
function:
The header of each info file includes an explanation of the content of the file. Here we
discuss the most commonly used two info files:
convinfo
The convinfo is to control the usage of conventional data. The following is the part of
the content of convinfo:
!otype type sub iuse twindow numgrp ngroup nmiter gross ermax ermin var_b var_pg ithin rmesh pmesh npred
tcp 112 0 1 3.0 0 0 0 75.0 5.0 1.0 75.0 0.000000 0 0. 0. 0
ps 120 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
ps 132 0 -1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
ps 180 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
ps 181 0 1 3.0 0 0 0 3.6 3.0 1.0 3.6 0.000300 0 0. 0. 0
ps 182 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
ps 183 0 -1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
ps 187 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0 0. 0. 0
t 120 0 1 3.0 0 0 0 8.0 5.6 1.3 8.0 0.000001 0 0. 0. 0
t 126 0 -1 3.0 0 0 0 8.0 5.6 1.3 8.0 0.001000 0 0. 0. 0
t 130 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.001000 0 0. 0. 0
t 131 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.001000 0 0. 0. 0
t 132 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.001000 0 0. 0. 0
t 133 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
t 134 0 -1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
t 135 0 -1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
t 180 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
t 181 0 -1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
t 182 0 1 3.0 0 0 0 7.0 5.6 1.3 7.0 0.004000 0 0. 0. 0
67
GSI Diagnostics and Tuning
The meaning of each column is explained in the header of the file and listed also in the
following table:
From this table, we can see that parameter iuse is used to control the usage of data and
parameter twindow is to control the time window of data usage. Parameters gross,
ermax, and ermin are for gross quality control.
satinfo:
The satinfo file contains information about the channels, sensors, and satellites. It
specifies observation error (cloudy or clean) for each channel, how to use the channels
(assimilate, monitor, etc), and other useful information. The following is the part of the
content of satinfo:
!sensor/instr/sat chan iuse error error_cld ermax var_b var_pg
amsua_n15 1 1 3.000 9.100 4.500 10.000 0.000
amsua_n15 2 1 2.000 13.500 4.500 10.000 0.000
amsua_n15 3 1 2.000 7.100 4.500 10.000 0.000
amsua_n15 4 1 0.600 1.300 2.500 10.000 0.000
68
GSI Diagnostics and Tuning
In the standard output file (stdout), there is an information block that lists information
regarding each sub-domain partition, including domain number (task), starting point
(istart,jstart), and dimension of the sub-domain (ilat1,jlon1). Here is an
example from the case we showed in Section 4.1. Please note that 4 processors were used
in the analysis:
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 0 1 1 124 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 1 125 1 123 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 2 1 175 124 174
general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= 3 125 175 123 174
The standard output file (stdout) also has an information block that shows the distribution
of different kinds of observations in each sub-domain. This block follows the observation
input section. The following is the observation distribution of the case shown in Section
4.1. From the case introduction, we know the prepbufr (conventional data), radiance BUFR
files, and GPS BUFR files were used. In this list, the conventional observations (ps, t, q,
pw, uv, and sst), GPS (gps_ref), and radiance data (amusa, amsub, hirs3/4, and mhs from
Metop-a, NOAA 15, 17, 18, and 19) were distributed among 4 sub-domains:
69
GSI Diagnostics and Tuning
This list is a good way to quickly check which kinds of data are used in the analysis and
how they are distributed in the analysis domain.
The GSI analysis gives a group of files named fort.2* (other than fort.220, see explanation
on fort.220 in next section) to summarize observations fitting to the current solution in each
outer loop. The content of each of these files is listed in the following table:
To help users understand the information inside these files, the following are some
examples of the contents in these files and corresponding explanations.
70
GSI Diagnostics and Tuning
o-g 01 all count 1962 1400 1488 2688 1014 1071 18363
o-g 01 all bias -0.31 0.92 0.56 0.00 0.21 0.29 0.27
o-g 01 all rms 3.02 3.88 4.68 4.65 5.59 5.36 4.74
o-g 01 all cpen 0.45 0.53 0.85 0.94 1.62 1.49 1.09
o-g 01 all qcpen 0.45 0.53 0.84 0.92 1.60 1.48 1.08
o-g 01 uv rej 220 0000 count 0 0 0 0 0 0 292
o-g 01 uv rej 220 0000 bias 0.00 0.00 0.00 0.00 0.00 0.00 2.81
o-g 01 uv rej 220 0000 rms 0.00 0.00 0.00 0.00 0.00 0.00 8.69
o-g 01 uv rej 220 0000 cpen 0.00 0.00 0.00 0.00 0.00 0.00 0.00
o-g 01 uv rej 220 0000 qcpen 0.00 0.00 0.00 0.00 0.00 0.00 0.00
71
GSI Diagnostics and Tuning
Please note 4 layers from 600 to 200 hPa have been deleted to make each row fit into one
line. Only observation type 220 and 223 are shown as an example.
The following table lists the meaning of each item in file fort.201-213 except file fort.207:
Name explanation
it outer loop
= 01: observation background
= 02: observation analysis (after 1st outer loop)
= 03: observation analysis fields (after 2nd outer loop)
obs observation variable (such as uv, ps) and usage of the type, which
include:
blank: used in GSI analysis
mon: monitored, (read in but not assimilated by GSI).
rej: rejected because of quality control in GSI
type prepbufr observation type (see BUFR Users Guide for details)
styp prepbufr observation subtype (not used now)
ptop for multiple level data: pressure at the top of the layer
pbot for multiple level data: pressure at the bottom of the layer
count The number of observations summarized under observation types
and vertical layers
bias Bias of observation departure for each outer loop (it)
rms Root Mean Square of observation departure for each outer loop (it)
cpen Observation part of penalty (cost function)
qcpen nonlinear qc penalty
The contents of the file fort.2* are calculated based on O-B or O-A for each observation.
This detailed information about each observation is saved in the diagnostic files. For the
content of the diagnostic files, please check the content of the array rdiagbuf in one of the
setup subroutines for conventional data, for example, setupt.f90. We provide a tool in
appendix A.2 to help users read in the information from the diagnostic files.
These fit files give lots of useful information on how data are analyzed by the GSI, such as
how many observations are used and rejected, what is the bias and rms for certain data
types or for all observations, and how analysis results fit to the observation before and after
analysis. Again, we use observation type 220 in fort.202 (fit_w1.2011032212) as an
example to illustrate how to read this information. The fit information for observation type
220 (sounding observation) is listed in the next page. Like the previous example, 4 layers
from 600 to 200 hPa were deleted to make each row fit into one line. Also, the qcpen
lines have been deleted because they have the same values as cpen lines. Otherwise, all
fit information of observation type 220 are shown.
72
GSI Diagnostics and Tuning
In loop section o-g 01, from count line, we can see there are 8061 sounding
observations used in the analysis. Among them, 135 are within the 1000-1200 hPa layer.
Also from the count lines, in the rejection and monitoring section, there are 292
observations rejected and 145 observations being monitored. In the same loop section, from
the bias line and rms lines, we can see the total bias and rms of O-B for soundings is
0.64 and 4.84. The bias and rms of each layer for sounding observation can also be found
in the file.
73
GSI Diagnostics and Tuning
When reading bias and rms values from different loops, as shown with the comparison in
the following three lines:
o-g 01 uv 220 0000 rms 3.62 3.68 4.21 4.13 5.61 5.33 4.84
o-g 02 uv 220 0000 rms 3.14 2.93 2.99 2.76 4.75 4.90 3.97
o-g 03 uv 220 0000 rms 3.01 2.77 2.82 2.57 4.62 4.82 3.86
we can see the rms reduced from 4.84 (o-g 01, which is O-B) to 3.97 (o-g 02, which is O-A
after 1st outer loop) and then to 3.86 (o-g 03, which is O-A after 2nd outer loop, the final
analysis result). The reduction in the rms shows the observation type 220 (sounding) was
used in the GSI analysis to modify the background fields to fit to the observations. Please
note this example only used 10 iterations for each outer loop.
The file fort.207 is statistic file for radiance data. Its contents include important
information about the radiance data analysis.
The first part of the file fort.207 lists the content of the file satinfo, which is the info file to
control the data usage for radiance data. This portion begins with the line:
RADINFO_READ: jpch_rad= 2680
This shows there are 2680 channels listed in the satinfo file and the 2680 lines following
this line include the detailed setups in the satinfo file for each channel.
The second part of the file is a list of the coefficients for mass bias correction, which begins
with the line:
RADINFO_READ: guess air mass bias correction coefficients below
Each channel has 5 coefficients listed in a line. Therefore, there are 2680 lines of mass bias
correction coefficients for all channels though some of the coefficients are 0.
The 3rd part of the fort.207 file is similar to other fit files with content repeated in 3
sections to give detailed statistic information about the data in stages before the 1st outer
loop, between 1st and 2nd outer loop, and after 2nd outer loop. The results before the 1st
outer loop are used here as an example to explain the content of the statistic results:
sat type penalty nobs iland isnoice icoast ireduce ivarl nlgross
n15 amsua 24874.97687020 4149 673 1475 268 1311 29170 0
qcpenalty qc1 qc2 qc3 qc4 qc5 qc6 qc7
24874.97687020 947 94 1277 183 0 20 46
74
GSI Diagnostics and Tuning
The following table lists the meaning of each item in the above statistics:
Name explanation
sat satellite name
type instrument type
penalty contribution to cost function from this observation type
nobs number of good observations used in the assimilation
iland number of observations over land
isnoice number of observations over sea ice and snow
icoast number of observations over coast
ireduce number of observations that reduce qc bounds in tropics
ivarl number of observations tossed by gross check
nlgross number of observation tossed by nonlinear qc
qcpenalty nonlinear qc penalty from this data type
qc1-7 number of observations whose quality control criteria has been
adjusted by each qc method (1-7), details see Section 8.3
rad total summary of penalty for all radiance observation types
penalty_all
rad total summary of qcpenalty for all radiance observation types
qcpenalty_all
rad total summary of observation tossed by nonlinear qc for all radiance
failed nonlinqc observation types
Note: one observation may include multiple channels, not all channels are used in the
analysis.
75
GSI Diagnostics and Tuning
The following table lists the meaning of each column in above statistics:
Column content
#
1 series number of the channel in satinfo file
2 channel number for certain radiance observation type
3 radiance observation type (for example: amsua_n15)
4 number of observations (nobs) used in GSI analysis within this channel
5 number of observations (nobs) tossed by gross check within this channel
6 variance for each satellite channel
7 bias (observation-guess before bias correction)
8 bias (observation-guess after bias correction)
9 penalty contribution from this channel
10 (observation-guess with bias correction)**2
11 standard deviation
The following table lists the meaning of each column in the above statistics:
Name Explanation
it stage (o-g 01 rad = before 1st outer loop for radiance data)
satellite satellite name (n15=NOAA-15)
instrument instrument name (AMSU-A)
# read number of data (channel value) read in within analysis time window
and domain
# keep number of data (channel value) after data thinning
# assim number of data (channel value) used in analysis (passed all qc process)
penalty contribution from this observation type to cost function
qcpnlty nonlinear qc penalty from this data type
cpen penalty divided by (# assim)
qccpen qcpnlty divided by (# assim)
Same as other fit files, a comparison between results from different outer loops can give us
very useful information on how each channel and data type are analyzed in the GSI.
76
GSI Diagnostics and Tuning
There are two ways to check the convergence information for each iteration of the GSI:
The value of the cost function and norm of the gradient for each iteration are listed in the
file stdout.
The following is an example showing the first two iterations from the first outer loop:
Minimization iteration 0
grepcost J,Jb,Jo,Jc,Jl = 1 0 1.430230185505848494E+05 0.000000000000000000E+00
1.430230185505848494E+05 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 0 3.400606852473948493E+03 1.000000000000000000E+00
pcgsoi: cost,grad,step = 1 0 1.43023018550E+05 3.40060685247E+03 2.43989544270E-03
pcgsoi: gnorm(1:2),b= 1.87098843447644E+06 1.87098843447644E+06 1.61792450059063E-01
stprat 0.680228405155974
stprat 4.467539812170933E-015
Minimization iteration 1
grepcost J,Jb,Jo,Jc,Jl = 1 1 1.148077578695608245E+05 6.884228595044261567E+01
1.147389155836103746E+05 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 1 1.367840792810494804E+03 4.022343223284949310E-01
pcgsoi: cost,grad,step = 1 1 1.148077578695E+05 1.36784079281E+03 7.6301193791E-03
pcgsoi: gnorm(1:2),b= 1.7016520499209E+06 1.7016520499209E+06 9.094936230309182967E-01
stprat 0.375750738871811
stprat 1.030833603466626E-016
The following are the first two iterations from the second outer loop:
Minimization iteration 0
grepcost J,Jb,Jo,Jc,Jl = 2 0 9.709172928659517493E+04 5.242424325932335705E+03
9.184930496066283376E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 0 2.458528763445824097E+03 1.000000000000000000E+00
pcgsoi: cost,grad,step = 2 0 9.70917292865E+04 2.458528763445E+03 2.0910838314E-03
pcgsoi: gnorm(1:2),b= 1.39687516886273E+06 1.39687516886273E+06 2.311037592468580815E-01
stprat 0.326022963208770
stprat 3.222291707173269E-015
Minimization iteration 1
grepcost J,Jb,Jo,Jc,Jl = 2 1 8.445245812257242505E+04 5.411029823593031324E+03
7.904142829897940101E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 1 1.181894736794584333E+03 4.807325236000348778E-01
pcgsoi: cost,grad,step = 2 1 8.445245812257E+04 1.18189473679E+03 3.102603972078E-03
pcgsoi: gnorm(1:2),b= 6.28971827368193E+05 6.28971827368193E+05 4.502706049820247580E-01
stprat 1.600925505225033E-002
stprat 1.290408460808865E-015
We can see clearly the number of outer loop and the inner loops (Minimization iteration).
The meaning of the names (bold) used in stdout are explained in the following:
J,Jb,Jo,Jc,Jl: the values of cost function (J, or penalty), background term (Jb),
observations term (Jo), dry pressure constraint term (Jc), and negative and excess
moisture term (Jl).
grad: inner product of gradients (norm of the gradient (Y*X))
grad
reduction:
grad of the first step of the loop
cost: the values of cost function, (=J)
77
GSI Diagnostics and Tuning
step: stepsize ()
2 2
gnorm(1:2): 1=(norm of the gradient) , 2= (norm of the gradient)
b: parameter to estimate the new search direction
stprat: convergence in stepsize estimation
In file fort.220, users can find more detailed information about each iterations. The
following example uses the first two iterations to explain the meaning of each value:
1)
J= 0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.118329009942697698E+05
0.190285043373867524E+05 0.401338098573457984E+05 0.468178247339593267E+04
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.185513606236281089E+05 0.487946602645584815E+05 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00
2)
S=-0.100000000000000018E-03 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.931339146384645329E-02
0.112072975062044600E-01 0.106814995402123112E-01 0.296198179729126945E-01
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.325642588289663153E-02 0.215772188906240879E-02 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00
3)
b=-0.115641269650927740E+04 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.281704458045555561E+06
0.402962980357327977E+06 0.292918321075311759E+06 0.274591776085362393E+05
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.262337918646139989E+06 0.982394058251658943E+07 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00
4)
c= 0.115641269650927719E+08 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.302472476475514677E+08
0.359554103149527393E+08 0.274229587308945897E+08 0.927054232191691408E+06
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.805600766238803874E+08 0.455292252088399070E+10 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00
5)
stepsize estimates = 0.100000000000000005E-03 0.243989544270763757E-02
0.243989544270778218E-02
78
GSI Diagnostics and Tuning
6)
0.100000E-03 0.900000E-04 0.110000E-03 0.000000E+00 0.243990E-02 0.241550E-02
0.246429E-02
7)
penalties = 0.000000E+00 0.222277E+03 -0.221329E+03 0.226543E+04 -0.259498E+05 -
0.259470E+05 -0.259470E+05
8)
penalty,grad ,a,b= 1 0 0.143023018550584849E+06 0.115641269650927745E+08
0.243989544270778198E-02 0.000000000000000000E+00
9)
pnorm,gnorm, step? 1 0 0.100000000000000000E+01 0.100000000000000000E+01 good
79
GSI Diagnostics and Tuning
For each inner iteration, there are 9 different outputs. The 1st iteration is labeled with
numbers 1 to 9, with a detailed explanation below:
It is suggested that the users check stpcalc.f90 for the code including the above
information.
80
GSI Diagnostics and Tuning
To evaluate the convergence of the iteration, we usually make plots based on the above
information, such as the value of the cost function and the norm of the gradient. The
following are example plots showing the evolution of the cost function and the norm of
gradient in different outer loops:
Evolution of cost function (left column) and the norm of gradient (right column) in the
first outer loop (top raw) and the second outer loop (bottom raw)
Please see Section A.3 for the information on how to read convergence information from
fort.220 and how to make the above plots.
81
GSI Diagnostics and Tuning
4.7 Use bundle to configure control, state variables and background fields
Since the GSI release version 3.0, the control variables, state variables, and background
fields can be configured through a new info file named anavinfo. Different GSI
applications need a different anavinfo file to setup the control variables, state variables, and
background fields. In the ./fix directory of the release package, there are many example
anavinfo files for different operational applications. Because this is a working in progress,
users should use one of the sample anavinfo files instead of making a new one. The
released GSI run script has added the link for this new info file. Users should be able to use
this without any problem.
state_vector::
!var level itracer amedge source funcof
u 30 0 no met_guess u
v 30 0 no met_guess v
tv 30 0 no met_guess tv
tsen 30 0 no met_guess tv,q
q 30 1 no met_guess q
oz 30 1 no met_guess oz
cw 30 1 no met_guess cw
p3d 31 0 yes met_guess p3d
ps 1 0 no met_guess p3d
sst 1 0 no met_guess sst
::
control_vector::
!var level itracer as/tsfc_sdv an_amp0 source funcof
sf 30 0 1.00 -1.0 state u,v
vp 30 0 1.00 -1.0 state u,v
ps 1 0 0.50 -1.0 state p3d
t 30 0 0.70 -1.0 state tv
q 30 1 0.70 -1.0 state q
oz 30 1 0.50 -1.0 state oz
sst 1 0 1.00 -1.0 state sst
cw 30 1 1.00 -1.0 state cw
stl 1 0 1.00 -1.0 motley sst
sti 1 0 1.00 -1.0 motley sst
::
82
GSI Diagnostics and Tuning
In each section, the 1st column sets up the variable name and 2nd column sets up the vertical
levels. The 4th column in the section control_vector is the normalized scale factor for the
background error variance.
Analysis increments are defined as the fields of analysis minus background. A plot of
analysis increments can help users to understand how the analysis procedure modifies the
background fields according to observations, errors (background and observation), and
other constraints. You can either calculate analysisguess and plot the difference field or
use the tools introduced in Appendix A.4 to make analysis increment figures for different
analysis fields.
In addition to analysis increments, run time and memory usage are other important features
of an analysis system, especially for operational code like the GSI.
Some computer operating systems provide CPU, Wall time, and memory usage after the
job has completed, but an easy way to determine how much wall time is used for the GSI is
to check the time tag between the files generated at the beginning and the end of the run,
for example:
Also, the GSI standard output file (stdout) gives the GSI start time and end time at the top
and the end of the file. For example:
* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
PROGRAM GSI_ANL HAS BEGUN. COMPILED 1999232.55 ORG: NP23
STARTING DATE-TIME JUL 13,2012 21:55:14.798 195 FRI 2456122
This tells us this case started at 21:55:14.798 and ended at 21:58:55.998. Meaning GSI
used 3 minutes and 41.2 second to finish
For a GSI run using an IBM computer, there is a resource statistics section at the end of the
stdout file, which gives information about run time and memory usage for the analysis (see
example at the end of Section 4.1).
83
GSI Applications
In this chapter, the knowledge from the previous chapters will be applied to several GSI
cases to show how to setup GSI with various different data sources, and how to properly
check the run status and analysis results in order to determine if a particular GSI
application was successful. Note the examples here only use the WRF ARW system
global experiments or WRF NMM runs are similar, but require different background and
namelist files. It is assumed that the reader has successfully compiled GSI on a local
machine, and has the following data available:
1. Background file
When using WRF, WPS and real.exe will be run to create a WRF input file:
wrfinput_<domain>_<yyyy-mm-dd_hh:mm:ss>
2. Conventional data
Real time NAM PREPBUFR data can be obtained from the server:
ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nam/prod
Note: NDAS prepbufr data was chosen to increase the amount of data
3. Radiance data and GPS RO data
Real time GDAS BUFR files can be obtained from the following server:
ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod
Note: GDAS data was chosen to get better coverage for radiance and GPS RO
The following case study will give users an example of a successful GSI run with various
data sources. Users are welcome to download these example data from the GSI users
webpage (online case for release version 3.1) or create a new background and get real-time
observation data from the above server. The background and observations used in this case
study are as follows:
Figure 5.1: The terrain (left) and land mask (right) of the background used in this case study
84
GSI Applications
3. Radiance and GPS RO data: GDAS PREPBUFR data from 12 UTC 22 March 2011
Files: gdas.t12z.1bmaua.tm00.bufr_d
gdas.t12z.1bamub.tm00.bufr_d
gdas.t12z.1bhrs4.tm00.bufr_d
gdas.t12z.gpro.tm00.bufr_d
This case study was run on a Linux cluster. The BUFR/PrepBUFR files have been byte-
swapped to little endian format using the tool introduced in section A.1:
le_nam.t12z.prepbufr.tm00.nr
le_gdas.t12z.1bmaua.tm00.bufr_d
le_gdas.t12z.1bamub.tm00.bufr_d
le_gdas.t12z.1bhrs4.tm00.bufr_d
le_gdas.t12z.gpro.tm00.bufr_d
/scratch1/NA30km/bk
/scratch1/NA30km/obs20110322
/scratch1/comGSI_v3.1
With GSI successfully compiled and background and observational data acquired, move to
the ./run directory under ./comGSI_v3.1 to setup the GSI run following the sample script
run_gsi.ksh. To successfully run the run_gsi.ksh script, it must be modified in several
places:
85
GSI Applications
the jobs control system. More examples of the setup are described in section 3.2.2.1.
The following example is setup to run on a Linux cluster supercomputer with PBS. The
job head is as follows:
In order to find out how to setup the job head, a good method is to use an existing MPI
job script and copy the job head over.
Setup the number of processors and the job queue system used. For this example,
LINUX_PBS and 4 processors were used:
GSIPROC=4
ARCH='LINUX_PBS'
Setup the case data, analysis time, and GSI fix, exe, and CRTM coefficients:
ANAL_TIME=2011032212
Setup a working directory, which will hold all the analysis results. This directory must
have correct write permissions, as well as enough space to hold the output. Also, for the
following example, directory /scratch1 has to exist.
WORK_ROOT=/scratch1/gsiprd_${ANAL_TIME}_prepbufr
Set path to the observation directory and the PrepBUFR file within the observation
directory. All observations to be assimilated should be in the observation directory.
OBS_ROOT=/scratch1/NA30km/obs20110322
PREPBUFR=${OBS_ROOT}/le_nam.t12z.prepbufr.tm00.nr
Set the GSI system used for this case: the fix and CRTM coefficients directories as well
as the location of the GSI executable:
CRTM_ROOT=/scratch1/crtm/CRTM_Coefficients
FIX_ROOT=/scratch1/comGSI_v3.1/fix
GSI_EXE=/scratch1/comGSI_v3.1/run/gsi.exe
86
GSI Applications
This example uses the ARW NetCDF background; therefore bk_core is set to ARW.
The regional background error covariance files were also used in this case. Finally, the
run scripts are set to clean the run directory to delete all temporary intermediate files.
Once the run scripts are set up properly for the case and machine, GSI can be run through
the run scripts. On our test machine, the GSI run is submitted as follows:
While the job is running, move to the working directory and check the details. Given the
following working directory setup:
WORK_ROOT=/scratch1/gsiprd_${ANAL_TIME}_prepbufr
imgr_g12.TauCoeff.bin ssmi_f15.SpcCoeff.bin
imgr_g13.SpcCoeff.bin ssmi_f15.TauCoeff.bin
imgr_g13.TauCoeff.bin ssmis_f16.SpcCoeff.bin
These files are CRTM coefficients that have been linked to this run directory through the
GSI run scripts. Additionally, many other files are linked or copied to this run directory or
generated during run, such as:
87
GSI Applications
The presence of these files indicates that the GSI run scripts have successfully setup a run
environment for GSI and the GSI executable is running. While GSI is still running,
checking the content of the standard output file (stdout) can monitor the stage of the GSI
analysis:
$ tail -f stdout
The above output lines show that GSI is in the inner iteration stage. It may take several
minutes to finish the GSI run. Once GSI has finished running, the number of files in the
directory will be greatly reduced from those during the run stage. This is because the run
script was set to clean the run directory after a successful run. The important analysis
result files and configuration files will remain in the run directory. Please check Section 3.4
for more details on GSI run results. Upon successful completion of GSI, the run directory
looks as follows:
88
GSI Applications
It is important to always check for successful completion of the GSI analysis. Completion
of the GSI run without crashing does not guarantee a successful analysis. First, check the
stdout file in the run directory to be sure GSI completed each step without any obvious
problems. The following are several important steps to check:
GSI_4DVAR: nobs_bins = 1
SETUP_4DVAR: l4dvar= F
SETUP_4DVAR: l4densvar= F
SETUP_4DVAR: winlen= 3.00000000000000
SETUP_4DVAR: winoff= 3.00000000000000
SETUP_4DVAR: hr_obsbin= 3.00000000000000
89
GSI Applications
This table is important to see if the observations have been read in, which types of
observations have been read in, and the distribution of observations in each sub
domain. At this point, GSI has read in all the data needed for the analysis. Following
this table is the inner iteration information.
4. Inner iteration
The inner iteration step in the stdout file will look as follows:
Minimization iteration 0
grepcost J,Jb,Jo,Jc,Jl = 1 0 7.024833842077980808E+04 0.000000000000000000E+00
7.024833842077980808E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 1 0 8.188107512799259666E+02 1.000000000000000000E+00
pcgsoi: cost,grad,step = 1 0 7.024833842077980808E+04 8.188107512799259666E+02
1.788853101746809213E-02
pcgsoi: gnorm(1:2),b= 3.574858851399047417E+05 3.574858851399045670E+05
5.332020690447850653E-01
stprat 0.114387353859176
stprat 4.672705475826555E-016
Following the namelist set up, similar information will be repeated for each inner loop.
In this case, 2 outer loops with 50 inner loops in each outer loop have been set. The
last iteration looks like:
Minimization iteration 40
grepcost J,Jb,Jo,Jc,Jl = 2 40 3.664942287987162126E+04 6.573582409713609195E+03
3.007584047015801480E+04 0.000000000000000000E+00 0.000000000000000000E+00
grepgrad grad,reduction= 2 40 7.094588989162899789E-03 2.206814011091630791E-04
pcgsoi: cost,grad,step = 2 40 3.664942287987162126E+04 7.094588989162899789E-03
4.844607290004876443E-02
PCGSOI: WARNING **** Stopping inner iteration ***
gnorm 0.750736287079361097E-10 less than 0.100000000000000004E-09
Minimization final diagnostics
Clearly, the iteration met the stop threshold before meeting the maximum iteration
number (50). As a quick check of the iteration: the J value should descend through
each iteration. Here, the J has a value of 7.024833842077980808E+04 at the beginning
and a value of 3.664942287987162126E+04 at the final iteration. This means the value
has reduced by almost half, which is an expected reduction.
90
GSI Applications
6. As an indication that GSI has successfully run, several lines will appear at the
bottom of the file:
GSI was mainly developed on an IBM machine, so the IBM RS/6000 SP markers will
still appear on a Linux machine.
After carefully investigating each portion of the stdout file, it can be concluded that GSI
successfully ran through every step and there were no run issues. A more complete
description of the stdout file can be found in Section 4.1. However, it cannot be concluded
that GSI did a successful analysis until more diagnosis has been completed.
The analysis uses observations to correct the background fields to fit the observations
under certain constraints. The easiest way to confirm the GSI analysis fits the observations
better than the background is to check a set of files with names fort.2??, where ?? is a
91
GSI Applications
number from 01 to 19. In the run scripts, several fort files also have been renamed as fit_t1
(q1, p1, rad1, w1).YYYYMMDDHH. Please check Section 4.5.1 for a detailed explanation
of the fit files. The following are several examples of these fit files.
fit_t1.2011032212 (fort.203)
This file shows how the background and analysis fields fit to temperature observations.
The contents of this file show three data types were used in the analysis: 120, 130, and
180. Also included are the number of observations, bias and rms of observation minus
background (O-B) on each level for the three data types. The following is a part of the
file:
ptop 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 50.0 0.0
it obs type styp pbot 1200.0 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 2000.0
------------------------------------------------------------------------------------------------------------------------
o-g 01 t 120 0000 count 185 527 573 995 1117 637 226 443 642 979 855 8692
o-g 01 t 120 0000 bias 0.62 0.88 -0.22 -0.20 -0.19 -0.38 -1.07 -0.89 -0.95 -0.99 -1.63 -0.92
o-g 01 t 120 0000 rms 2.62 2.57 2.01 1.31 0.90 1.02 1.66 1.71 1.94 1.86 2.67 2.27
o-g 01 t 130 0000 count 0 0 0 0 0 11 177 626 67 0 0 881
o-g 01 t 130 0000 bias 0.00 0.00 0.00 0.00 0.00 0.18 0.06 -0.02 -1.23 0.00 0.00 -0.09
o-g 01 t 130 0000 rms 0.00 0.00 0.00 0.00 0.00 0.96 0.97 1.48 2.28 0.00 0.00 1.46
o-g 01 t 180 0000 count 1260 28 0 0 0 0 0 0 0 0 0 1288
o-g 01 t 180 0000 bias 0.67 0.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.68
o-g 01 t 180 0000 rms 1.76 1.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.75
o-g 01 all count 1445 555 573 995 1117 648 403 1069 709 979 855 10861
o-g 01 all bias 0.67 0.88 -0.22 -0.20 -0.19 -0.37 -0.57 -0.38 -0.97 -0.99 -1.63 -0.66
o-g 01 all rms 1.89 2.52 2.01 1.31 0.90 1.02 1.40 1.58 1.97 1.86 2.67 2.16
---------------------------------------------------------------------------------------------------------------------------
o-g 03 t 120 0000 count 185 527 573 995 1117 637 226 443 642 979 855 8692
o-g 03 t 120 0000 bias 0.37 0.51 -0.15 -0.04 -0.02 0.02 -0.24 -0.17 -0.03 -0.17 -0.31 -0.10
o-g 03 t 120 0000 rms 2.12 2.05 1.60 1.00 0.61 0.59 0.90 1.01 1.28 1.36 1.93 1.46
o-g 03 t 130 0000 count 0 0 0 0 0 11 177 626 67 0 0 881
o-g 03 t 130 0000 bias 0.00 0.00 0.00 0.00 0.00 0.19 -0.04 0.07 -0.33 0.00 0.00 0.02
o-g 03 t 130 0000 rms 0.00 0.00 0.00 0.00 0.00 0.66 0.76 1.10 1.47 0.00 0.00 1.07
o-g 03 t 180 0000 count 1260 28 0 0 0 0 0 0 0 0 0 1288
o-g 03 t 180 0000 bias 0.38 0.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.38
o-g 03 t 180 0000 rms 1.46 0.76 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.45
o-g 03 all count 1445 555 573 995 1117 648 403 1069 709 979 855 10861
o-g 03 all bias 0.38 0.50 -0.15 -0.04 -0.02 0.03 -0.15 -0.03 -0.06 -0.17 -0.31 -0.03
o-g 03 all rms 1.56 2.01 1.60 1.00 0.61 0.59 0.84 1.06 1.29 1.36 1.93 1.43
For example: data type 120 has 1172 observations in layer 400.0-600.0 hPa, a bias of
-0.19, and a rms of 0.90. The last column shows the statistics for the whole atmosphere.
There are several summary lines for all data types, which is indicated by all in the
data types column. For summary O-B (which is o-g 01 in the file), we have 10861
observations total, a bias of -0.66, and a rms of 2.16.
Skipping ahead in the fort file, o-g 03 columns (under it) show the observation
minus analysis (O-A) information. Under the summary (all) lines, it can be seen that
there were 10861 total observations, a bias of -0.03, and a rms of 1.43. This shows that
from the background to the analysis the bias reduced from -0.06 to -0.03, and the rms
92
GSI Applications
reduced from 2.16 to 1.43. This is about a 34% reduction, which is a reasonable value
for large-scale analysis.
fit_w1.2011032212 (fort.202)
This file demonstrates how the background and analysis fields fit to wind observations.
This file (as well as fit_q1) are formatted the same as the above example. Therefore,
only the summary lines will be shown for O-B and O-A to gain a quick view of the
fitting:
ptop 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 50.0 0.0
it obs type styp pbot 1200.0 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 2000.0
------------------------------------------------------------------------------------------------------------------------
o-g 01 all count 1161 1175 1218 2195 1952 1080 557 1131 653 876 915 14812
o-g 01 all bias -0.31 1.01 0.58 0.08 -0.10 -0.05 -0.09 0.81 1.05 0.20 0.28 0.33
o-g 01 all rms 3.43 3.93 4.60 4.59 4.93 4.88 5.04 5.31 5.74 5.59 5.59 4.82
o-g 03 all count 1163 1173 1223 2197 1955 1096 558 1132 651 881 915 14843
o-g 03 all bias 0.28 0.69 0.32 0.19 0.15 0.17 0.09 0.32 0.47 0.22 0.55 0.37
o-g 03 all rms 2.69 2.92 2.85 2.72 2.87 2.50 2.62 3.02 3.65 4.24 4.90 3.35
fit_q1.2011032212 (fort.204)
This file demonstrates how the background and analysis fields fit to moisture
observations (relative humidity). The summary lines for O-B and O-A are as follows:
ptop 1000.0 950.0 900.0 850.0 800.0 700.0 600.0 500.0 400.0 300.0 0.0 0.0
it obs type styp pbot 1200.0 1000.0 950.0 900.0 850.0 800.0 700.0 600.0 500.0 400.0 300.0 2000.0
------------------------------------------------------------------------------------------------------------------------
o-g 01 all count 686 228 312 343 228 561 423 473 571 415 0 4240
o-g 01 all bias 1.37 -0.29 1.15 0.92 0.93 0.23 -4.64 -7.81 -8.29 -8.55 0.00 -2.84
o-g 01 all rms 12.92 17.73 16.92 15.55 15.36 15.72 19.37 21.03 21.42 17.97 0.00 17.61
o-g 03 all count 686 229 312 343 228 561 423 473 571 415 0 4241
o-g 03 all bias 0.20 -1.54 0.28 0.43 -0.50 0.66 -0.98 -0.71 -1.09 -3.94 0.00 -0.64
o-g 03 all rms 7.31 10.29 10.66 9.83 10.64 11.57 14.67 14.34 14.23 13.01 0.00 11.94
O-B: 4240 observations in total and bias is -2.84 and rms is 17.61
O-A: 4241 observations in total and bias is -0.64 and rms is 11.94
The total bias and rms were reduced.
fit_p1.2011032212 (fort.201)
This file demonstrates how the background and analysis fields fit to surface pressure
observations. Because the surface pressure is two-dimensional, the table is formatted
93
GSI Applications
different than the three-dimensional fields shown above. Once again, the summary
lines will be shown for O-B and O-A to gain a quick view of the fitting:
it obs type stype count bias rms cpen qcpen
o-g 01 all 13986 0.4223 1.0427 0.8230 0.7487
O-B: 13986 observations in total and bias is 0.4223 and rms is 1.0427
O-A: 14003 observations in total and bias is 0.0172 and rms is 0.6744
Both the total bias and rms were reduced.
These statistics show that the analysis results fit to the observations closer than the
background, which is what the analysis is supposed to do. How close the analysis fit to the
observations is based on the ratio of background error variance and observation error.
In addition to the minimization information in the stdout file, GSI writes more detailed
information into a file called fort.220. The content of fort.220 is explained in section 4.6.
Below is an example of a quick check of the trend of the cost function and norm of
gradient. The value should get smaller with each iteration step.
In the run directory, the cost function and norm of the gradient information can be dumped
into an output file by using the command:
The file cost_gradient.txt includes 6 columns, however only the first 4 columns are shown
below. The first 6 and last 6 lines read are:
1 0 0.702483384207798081E+05 0.670451046411596704E+06
1 1 0.582549540813520216E+05 0.357485885139904742E+06
1 2 0.510340800654954874E+05 0.223748229447264661E+06
1 3 0.451910522149672179E+05 0.100693261241731772E+06
1 4 0.421994383952682037E+05 0.726689181584937469E+05
1 5 0.403738674444606295E+05 0.565482157493619452E+05
2 35 0.366494229475844040E+05 0.654227325952160183E-03
2 36 0.366494229180964758E+05 0.328958749059858212E-03
2 37 0.366494229007449394E+05 0.215132065808595375E-03
2 38 0.366494228897850917E+05 0.140109917756355446E-03
2 39 0.366494228834806418E+05 0.873987085206934423E-04
2 40 0.366494228798716213E+05 0.503331929251514545E-04
The first column is the outer loop number and the second column is the inner iteration
number. The third column is the cost function, and the forth column is the norm of
94
GSI Applications
gradient. It can be seen that both the cost function and norm of gradient are descending
with each iterations.
To get a complete picture of the minimization process, the cost function and norm of
gradient can be plotted using a provided NCL script located under:
./util/Analysis_Utilities/plot_ncl/GSI_cost_gradient.ncl.
Figure 5.2: Cost function and norm of gradient change with iteration steps
The above plots demonstrate that both the cost function and norm of gradient descend very
fast in the first 10 iterations in both outer loops and drop very slowly after the 10th iteration.
For more details on how to use the tool to make this plot, please see section A.3.
95
GSI Applications
The analysis increment gives us an idea where and how much the background fields have
been changed by the observations. Another useful graphics tool that can be used to look at
the analysis increment is located under:
./util/Analysis_Utilities/plot_ncl/Analysis_increment.ncl.
The graphic below shows the analysis increment at the 15th level. Notice that the scales are
different for each of the plots.
It can be clearly seen that the U.S. CONUS domain has many upper level observations and
the data availability over the ocean is very sparse.
For more details on how to use the tool to make this plot, please see section A.4.
96
GSI Applications
Adding radiance data into the GSI analysis is very straightforward after a successful run of
GSI with conventional data. The same run scripts from the above section can be used to
run GSI with radiance with or without PrepBUFR data. The key step to adding the
radiance data is linking the radiance BUFR data files to the GSI run directory with the
names listed in the &OBS_INPUT section of the GSI namelist. The following example adds
the three following radiance BUFR files:
AMSU-A: le_gdas1.t12z.1bamua.tm00.bufr_d
AMSU-B: le_gdas1.t12z.1bamub.tm00.bufr_d
HIRS4: le_gdas1.t12z.1bhrs4.tm00.bufr_d
The location of these radiance BUFR files has been previously saved to the environment
variable OBS_ROOT, therefore the following three lines can be inserted below the link to the
PrepBUFR data in the script run_gsi.ksh:
ln -s ${OBS_ROOT}/gdas1.t12z.1bamua.tm00.bufr_d amsuabufr
ln -s ${OBS_ROOT}/gdas1.t12z.1bamub.tm00.bufr_d amsubbufr
ln -s ${OBS_ROOT}/gdas1.t12z.1bhrs4.tm00.bufr_d hirs4bufr
If it is desired to run radiance data in addition to the conventional PrepBUFR data, the
following link to the PrepBUFR should be left as is:
ln -s ${PREPBUFR} ./prepbufr
Alternatively to analyze radiance data without conventional PrepBUFR data, this line can
be commented out in the script run_gsi.ksh:
## ln -s ${PREPBUFR} ./prepbufr
In the following example, the case study will include both radiance and conventional
observations.
In order to link the correct name for the radiance BUFR file, the namelist section
&OBS_INPUT should be referenced. This section has a list of data types and BUFR file
names that can be used in GSI. The 1st column dfile is the file name recognized by GSI.
The 2nd column dtype and 3rd column dplat are the data type and data platform that are
included in the file listed in dfile, respectively. More detailed information on data usage
can be found in Section 4.3 and Section 8.1 for the radiance data specifically. For example,
the following line tells us the AMSU-A observation from NOAA-17 should be in a BUFR
file named as amsuabufr:
97
GSI Applications
dfile(30)='amsuabufr',dtype(30)='amsua',dplat(30)='n17',dsis(30)='amsua_n17',dval(30)=0.0,dthin(30)=2,
With radiance data assimilation, two important setups, data thinning and bias correction,
need to be checked carefully. The following is a brief description of these two setups:
The radiance data thinning is setup in the namelist section &OBS_INPUT. The following
is a part of namelist in that section:
dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120
dfile(30)='amsuabufr',dtype(30)='amsua',dplat(30)='n17',dsis(30)='amsua_n17',dval(30)=0.0,dthin(30)=2,
The first line of &OBS_INPUT lists multiple thinning grids as elements of the array
demesh. For each line specifying a data type, the last element of that line is a
specification such as, dthin(30)=2. This selects the mesh grid to be used for
thinning. It can be seen that the data thinning option for NOAA-17 AMSU-A
observations is 60 km because the dmesh(2) is set to 60 km. For more information
about data thinning, please refer to section 3.3 and Section 8.1.
The radiance data bias correction is very important for a successful radiance data
analysis. In the run scripts, there are two files related to bias correction:
SATANGL=${FIX_ROOT}/global_satangbias.txt
cp ${FIX_ROOT}/sample.satbias ./satbias_in
The first file (global_satangbias.txt) provides GSI the coefficients for angle bias
correction. The angle bias coefficients are calculated off line outside of GSI. The
second file (sample.satbias) provides GSI the coefficients for mass bias correction.
They are calculated from within GSI in the previous cycle. In the released version 3.1,
most of the mass bias correction values in sample.satbias are 0. This means there was
not a good estimate base for mass bias correction in this case. The angle bias file
global_satangbias.txt is also out of date. These two files are provided in ./fix as an
example of the bias correction coefficients. For the best results, it will be necessary for
the user to generate their own bias files. The details of the radiance data bias correction
are discussed in Section 8.4 of this document.
For this case, the GDAS bias correction files were downloaded and saved in the
observation directory. The run script should contain two lines to link to the bias
correction coefficient files, in order to do this, change the following lines in run script:
SATANGL=${FIX_ROOT}/global_satangbias.txt
cp ${FIX_ROOT}/sample.satbias ./satbias_in
98
GSI Applications
The first line sets the path to the angle bias coefficient file, and the second copies the mass
bias coefficient file into the working directory. Once these links are set, we are ready to run
the case.
The process for running GSI is the same as described in section 5.1.2. Once run_gsi.ksh
has been submitted, move into the run directory to check the GSI analysis results. For our
current case, the run directory will look almost as it did for the conventional data case, the
exception being the three links to the radiance BUFR files and new diag files for the
radiance data types used. Following the same steps as in section 5.1.2, check the stdout file
to see if GSI has run through each part of the analysis process successfully. In addition to
the information outlined for the conventional run, the radiance BUFR files should have
been read in and distributed to each sub domain:
OBS_PARA: ps 2352 2572 8367 2673
OBS_PARA: t 4617 4331 12418 4852
OBS_PARA: q 3828 3908 11096 3632
OBS_PARA: pw 89 31 141 23
OBS_PARA: uv 5704 4835 15025 4900
OBS_PARA: sst 0 0 2 0
OBS_PARA: hirs4 metop-a 0 0 28 35
OBS_PARA: amsua n15 2564 1333 133 195
OBS_PARA: amsua n18 1000 2117 0 90
OBS_PARA: amsua metop-a 0 0 58 67
OBS_PARA: amsub n17 0 0 57 70
OBS_PARA: hirs4 n19 246 1097 0 37
OBS_PARA: amsua n19 656 3503 0 93
When comparing this output to the content in step 3 of section 5.1.3, it can be seen that
there are 7 new radiance data types that have been read in: HIRS4 from METOP-A and
NOAA-19, AMSU-A from NOAA-15, NOAA-18, NOAA-19, and METOP-A, and
AMSU-B from NOAA-17. The table above shows that most of the radiance data read in
this case are AMSU-A from NOAA-15.
The file fort.207 contains the statistics for the radiance data, similar to file fort.203 for
temperature. This file contains important details about the radiance data analysis. Section
4.5.2 explains this file in detail. Below are some values from the file fort.207 to give a
quick look at the radiance assimilation for this case study.
99
GSI Applications
From the above information, it can be seen that AMSU-A data from NOAA-15 have
128055 observations within the analysis time window and domain. After thinning, 53932
of this data type remained, and only 14552 were used in the analysis. The penalty for this
data decreased from 24875 to 9993.3 after 2 outer loops. It is also very interesting to see
that the number of AMSU-A observations assimilated in the O-A calculation increase to
37345 from 14552. This is almost triple times larger than the number of O-B. In this
analysis, AMSU-B data from NOAA-17 have 213920 within the analysis time window and
domain. After thinning, only 254 were left and none of them were used in the analysis.
The statistics for each channel can be view in the fort.207 file as well. Below channels
from AMSU-A NOAA-15 are listed as an example:
The second column is channel number for AMSU-A and the last column is the standard
deviation for each channel. It can be seen that most of the channels fit to the observation
more or less, but the standard deviation increased for channels 9 and 10. The increase in
the standard deviation seems to be occurring because a much greater number of
observations are used in the O-A calculation (column 3), not because the fit has gotten
worse.
100
GSI Applications
The same methods for checking the optimal minimization as demonstrated in section
5.1.4.2 can be used for radiance assimilation. Similar features to the conventional
assimilation should be seen with the minimization. The figures below show detailed
information of how the radiance data impact the analysis results. Using the same NCL
script as in section 5.1.4.3, analysis increment fields are plotted comparing the analysis
results with radiance and conventional to the analysis results with conventional
assimilation only. This difference is only shown at level 49 and level 6, which represent
the maximum temperature increment level (49) and maximum moisture increment level
(6).
Figure 5.4: Analysis increment fields comparing to the analysis with PREPBUFR only at level 6
101
GSI Applications
Figure 5.5: Analysis increment fields comparing to the analysis with PREPBUFR only at level 49
In order to fully understand the analysis results, the following needs to be understood:
1. The weighting functions of each channel and the data coverage at this analysis time.
There are several sources on the Internet to show the weighting function for the
AMSU-A channels. Chanel 1 is the moisture channel, while the others are mainly
temperature channels (Channels 2, 3 and 15 also have large moisture signals). Because
a model top of 10 mb was specified for this case study, the actual impact should come
from channels below channel 12.
2. The usage of each channel is located in the file named satinfo in the run directory.
The first two columns show the observation type and platform of the channels and the
third column tells us if this channel is used in the analysis. Because a lot of amsua_n15
and amsua_n18 data were used, they should be checked in detail. In this case, Channel
11 and 14 from amsua_n15 and channel 9 and 14 from amsua_n18 were turned off.
3. Thinning information: a quick look at the namelist in the run directory: gsiparm.anl
shows that both amsua_n15 and amsu_n18 using thinning grid 2, which is 60 km. In
this case, the grid spacing is 30 km, which indicates to use the satellite observations
every two grid-spaces, which may be a little dense.
4. Bias correction: radiance bias correction was previously discussed. It is very important
for a successful radiance data analysis. The run scripts can only link to the old bias
correction coefficients that are provided as an example in ./fix:
SATANGL=${FIX_ROOT}/global_satangbias.txt
cp ${FIX_ROOT}/ sample.satbias ./satbias_in
102
GSI Applications
Users can also download the operation bias correction coefficients during experiment
period as a starting point to calculate the coefficients suitable for their experiments.
Radiance bias correction for regional analysis is a difficult issue because of the limited
coverage of radiance data. This topic is out of the scope of this document, but this
issue should be considered and understood when using GSI with radiance applications.
The addition of GPS Radio Occultation (RO) data into the GSI analysis is similar to that of
adding radiance data. In the example below, the RO data is used as refractivities. There is
also an option to use the data as bending angles. The same run scripts used in sections 5.1.1
and 5.2.1 can be used with the addition of the following link to the observations:
ln -s ${OBS_ROOT}/le_gdas1.t12z.gpsro.tm00.bufr_d gpsrobufr
For this case study, the GPS RO BUFR file was downloaded and saved in the OBS_ROOT
directory. The file is linked to the name gpsrobufr, following the namelist section
&OBS_INPUT:
dfile(10)='gpsrobufr',dtype(10)='gps_ref',dplat(10)='',dsis(10)='gps',dval(10)=1.0,dthin(10)=0,
This indicates that GSI is expecting a GPS refractivity BUFR file named gpsrobufr. In
the following example, GPS RO and conventional observations are both assimilated.
Change the run directory name in the run scripts to reflect this test:
WORK_ROOT=/scratch1/gsiprd_${ANAL_TIME}_gps_prepbufr
The process of running GSI is the same as described in section 5.1.2. Once run_gsi.ksh has
been submitted, move into the working directory, gsiprd_2011032212_gps_prepbufr, to
check the GSI analysis results. The run directory will look exactly the same as with the
conventional data, with the exception of the link to the GPS RO BUFR files used in this
case. Following the same steps as in section 5.1.3, check the stdout file to see if GSI has
run through each part of the analysis process successfully. In addition to the information
outlined for the conventional run, the GPS RO BUFR files should have been read in and
distributed to each sub domain:
103
GSI Applications
Comparing the output to the content in section 5.1.3, it can be seen that the GPS RO
refractivity data have been read in and distributed to four sub-domains successfully.
The file fort.212 is the file for the fit of gps data in fractional difference. It has the same
structure as the fit files for conventional data. Below is a quick look to be sure the GPS RO
data were used:
ptop 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 50.0 0.0
it obs type styp pbot 1200.0 1000.0 900.0 800.0 600.0 400.0 300.0 250.0 200.0 150.0 100.0 2000.0
--------------------------------------------------------------------------------------------------------------------------
o-g 03 all count 5 108 192 615 950 673 413 500 631 868 1417 8891
o-g 03 all bias -0.31 -0.11 -0.06 -0.01 -0.01 -0.04 0.00 0.01 0.01 -0.01 -0.01 0.01
o-g 03 all rms 0.43 0.62 0.79 0.77 0.54 0.29 0.20 0.21 0.24 0.28 0.40 0.48
It can be seen that most of the GPS RO data are located in the upper levels, with a total of
8709 observations used in the analysis during the 1st outer loop, and 8891 used to calculate
O-A. After the analysis, the data bias reduced from -0.07 to 0.01, and the rms was reduced
from 0.66 to 0.48. It can be concluded that the analysis with GPS RO data looks reasonable
from these statistics.
104
GSI Applications
The same methods for checking the minimization in section 5.1.4.2 can be used for the
GPS RO assimilation. The following figures give detailed information of how the new
data impacts the analysis result. Using the NCL script used in section 5.1.4.3, analysis
increment fields are plotted comparing the analysis results with GPS RO and conventional
to the analysis results with conventional assimilation only for level 48, which represents the
maximum temperature increment.
Figure 5.6: Analysis increment fields comparing to the analysis with PrepBUFR only at level 48
The 3DVAR hybrid analysis is a new analysis option in the GSI system. It provides the
ability to bring the flow dependent background error covariance into the analysis based on
ensemble forecasts. If ensemble forecasts have been generated, setting up GSI to do a
hybrid analysis is straightforward and only requires two changes in the run script in
addition to the current 3DVAR run script:
105
GSI Applications
This change is to link the ensemble members to the GSI run directory and assign
each ensemble member a name that GSI recognizes. For ensemble forecasts using
WRF, GSI assumes each ensemble member has been named as follows:
This naming convention is used for forecast files with WRF NMM binary and
netcdf format as well as WRF ARW netcdf format. This version of GSI does not
support WRF ARW binary as an ensemble member format. Another ensemble
forecast format GSI supports is GEFS.
Please note: the parameters s_ens_h, s_ens_v, and beta1_inv are tunable
parameters. They should be tuned for best performance.
106
GSI Applications
In this release version, a new option for the GSI hybrid application is added in the run
script. The run script assumes all the ensemble members are located under the directory
located at the path defined by the parameter ${mempath} and the ensemble members have a
name such as: wrfout_d01_${iiimem}, where ${iiimem} is an integer number indicating the
ensemble member ID. If users have ensemble forecasts in a different file structure with
different names, please change the following lines in the run script:
if [ -r ${mempath}/wrfout_d01_${iiimem} ]; then
ln -sf ${mempath}/wrfout_d01_${iiimem} ./wrf_en${iiimem}
else
echo "member ${mempath}/wrfout_d01_${iiimem} is not exit"
fi
After setup of the namelist parameters and the path and name of the ensemble members,
GSI can be run following the other 3DVAR cases introduced in this Chapter.
Summary
This chapter applied the knowledge from the previous chapter to demonstrate how to set
up, run, and analyze GSI for various applications. It is important to always check for
successful completion of the GSI analysis, as running to completion does not always
indicate a successful run. Using the tools and methods described in this chapter, a
complete picture of the GSI analysis can be obtained.
107
GSI Theory and Code Structure
J = xT B 1 x + ( H ( xb + x) oo )T O 1 ( H ( xb + x) oo ) + J c (2)
By assuming the linearity of the observation operator H, equation (2) can be written as:
J = xT B 1 x + ( Hx o)T O 1 ( Hx o) + J c (4)
108
GSI Theory and Code Structure
To improve convergence ,GSI preconditions its cost function by defining a new variable
y = B 1 x . Equation (4), in terms of the new variable y, becomes:
Using the chain rule, the gradients of background and observation parts of the cost function
(4) with respect to x and cost function (5) with respect to y have the form:
x J = B 1 x + H T O 1 ( Hx o) (6)
y J = BT y + BT H T O 1 ( HBy o) = B x J (7)
Equations (6) and (7) are simultaneously minimized by employing an iterative Conjugate
Gradient process.
Start by assuming:
x0 = y 0 = 0
x J n = B 1 x n1 + H T O 1 (Hx n1 o) = y n1 + H T O 1 (Hx n1 o)
y J n = B x J n
Dir x n = y J n + Dir x n 1
Dir y n = x J n + Dir y n 1
x n = x n 1 + Dir x n
y n = y n 1 + Dir y n
Until either the maximum number of iterations has been reached or the gradient is
sufficiently minimized.
During the above iteration, the step size is calculated in subroutine dprodx based on the
equation:
( x J n x J n 1 )T y J n
=
( x J n x J n 1 )T Dir x n
109
GSI Theory and Code Structure
Stream function ()
Unbalanced velocity potential ()
Unbalanced virtual temperature (T)
Unbalanced surface pressure (P)
Pseudo relative humidity [qoption =1] or normalized relative humidity [qoption=2]
Ozone mixing ratio (only for global GSI)
Cloud condensate mixing ratio (only for global GSI)
With broader application of GSI for chemical data assimilation, some new variables, such
as trace gases, aerosols, and chemistry are added as analysis variables. Also, gust and
visibility were added as analysis variables for RTMA application.
This section introduces the basic code structure of the GSI. Section 6.2.1 describes the
main processes of the GSI using the three main routines. Sections 6.2.2 to 6.2.5 introduce
the code related to four important parts of GSI: background IO, observation ingestion,
observation innovation calculation, and minimization iteration.
At the top most level of abstraction, the GSI code is divided into three phases; the
initialization, the run, and the finalize phase. The philosophy behind this division is to
create a modular program structure with tasks that are independent of one another.
The main top-level driver routine is called gsimain and is located in the file gsimain.f90.
Ninety percent of gsimain.f90 is a variety of useful Meta data.
Possibly the most important of these is the list of exit codes. Should the GSI run fail from
an internal error, the exit code may provide sufficient insight to resolve the issue. The final
110
GSI Theory and Code Structure
lines of gsimain.f90 consist of the three main calls to initialize, run and finalize. The
table below summarizes each of these phases.
gsi_4dcoupler_parallel_init
MPI initialize
Initialize defaults of variables in modules
Read in user input from namelist
4DVAR setup if it is true (not supported)
call gsimain_initialize Check user input for consistency among
(gsimod.F90) parameters for given setups
Optional read in namelist for single observation
run
Write namelist to standard out
If this is a wrf regional run, the run interface
with wrf:
call convert_regional_guess (details in section 6.2.2)
Initialize variables, create/initialize arrays
Initialize values in radinfo and aeroinfo
If 4DVAR, then:
call gsi_4dcoupler_final_traj
111
GSI Theory and Code Structure
112
GSI Theory and Code Structure
113
GSI Theory and Code Structure
114
GSI Theory and Code Structure
115
GSI Theory and Code Structure
The inner iteration loop of GSI is where the cost function minimization is computed. GSI
provides several minimization options, but here we will focus on the preconditioned
conjugate gradient method. The inner iteration of the GSI variational analysis is performed
in subroutine pcgsoi (pcgsoi.f90). inside the following loop:
inner_iteration: do iter=0,niter(jiter)
end do inner_iteration
The main steps inside the loop are listed as a table below with the corresponding code and
the terms of equation in Section 6.1.
For detailed steps, advanced developers are suggested to read through the code and send
questions to gsi_help@ucar.edu.
116
Observation and Background Error Statistics
Each observation type has its own observation errors. In this section, we introduce several
topics related to the conventional observation error processing in GSI. The observation
error for satellite radiance and its adjustment will be discussed in Chapter 8.
For the global GSI analysis, when oberrflg (a namelist option in section &obsqc) is true,
observation errors are generated based on an external observation error table according to
the types of observations. Otherwise, observation errors are read in from the PrepBUFR
file.
For regional GSI runs, GSI forces the use of an external observation error table to get
observation errors no matter what the oberrflg is set to (oberrflg is set to true for
regional runs in gsimod.F90).
The external observation error table file, errtable, includes observation errors for all types
of observations. It is copied from the ~/comGSI_v3.1/fix directory by the run script. This
release package provides two sample observation error table files, nam_errtable.r3dv and
prepobs_errtable.global, in the ./fix directory. The nam_errtable.r3dv is used in the sample
run script as a default observation error table. The observation error file is a text file and
therefore the error values can be tuned easily. The following shows a portion of
nam_errtable.r3dv for rawinsondes and its description highlighted in bold and in the table
following the file:
Column# 1 2 3 4 5 6
120 OBSERVATION TYPE
0.11000E+04 0.12671E+01 0.56103E+00 0.10000E+10 0.68115E+00 0.10000E+10
0.10500E+04 0.13302E+01 0.63026E+00 0.10000E+10 0.68115E+00 0.10000E+10
0.10000E+04 0.14017E+01 0.73388E+00 0.10000E+10 0.68115E+00 0.10000E+10
0.95000E+03 0.14543E+01 0.86305E+00 0.10000E+10 0.71307E+00 0.10000E+10
0.90000E+03 0.14553E+01 0.99672E+00 0.10000E+10 0.74576E+00 0.10000E+10
0.85000E+03 0.13865E+01 0.11210E+01 0.10000E+10 0.77845E+00 0.10000E+10
117
Observation and Background Error Statistics
Column # 1 2 3 4 5 6
Content Pressure T q UV Ps Pw
Unit hPa degree C percent/10 m/s mb kg/m2(or mm)
For each type of observation, the error table has 6 columns and 33 rows(levels). The 1st
column prescribes 33 pressure levels, which cover from 1100 hPa to 0 hPa. The columns 2-
6 prescribe the observation errors for temperature (T), moisture (q), horizontal wind
component (UV), surface pressure (Ps), and the total column precipitable water (Pw). The
missing value is 0.10000E+10.
The observation error table for each observation type starts with the PrepBUFR type
number, such as:
120 OBSERVATION TYPE
The PrepBUFR data type number 100-199 are for temperature (T), moisture (q), and
surface pressure (Ps) observations, while number 200-299 are horizontal wind component
(UV) observations. Please check the BUFR/PrepBUFR Users Guide for more details on
PrepBUFR, which can be downloaded at the DTC BUFR/PrepBUFR website:
http://www.dtcenter.org/com-GSI/BUFR/index.php
During the GSI analysis, the entire contents of this file are read in by the subroutine
converr_read and stored in a three-dimensional array etabl. When a conventional
observation of a certain type is read in from the prepbufr file (read_prepbufr.f90), the
corresponding observation error profile of this type in array etabl will be vertically
interpolated to the specific observation level and used as the error of this observation.
Consequently, for each conventional type of observation, a matrix (obserr) is generated
holding the corresponding observation errors for the basic five variables at observation
levels. The contents of obserr array are listed below:
7.1.2 Observation error adjustment and gross error check within GSI
The actual observation errors used in GSI start with the external (either from PrepBUFR
files or an error table file) observation errors in the obserr array and go through multiple
adjustment steps, which are set based on observation quality, vertical sigma location,
118
Observation and Background Error Statistics
1. Observation errors for each variable are bonded by their corresponding lower limits.
Currently, these lower limits are hard coded and prescribed in read_prepbufr. The
observation error limits for temperature, moisture, wind, surface pressure and total
precipitable water are: terrmin=0.5, qerrmin=0.1, werrmin=1.0, perrmin=0.5,
pwerrmin=1.0, respectively.
2. Observation errors are adjusted based on the quality markers from the prepbufr data
files. If the quality markers from prepbufr are larger than a threshold value (lim_qm
in read_prepbuf.f90r), the corresponding observation errors are adjusted to a very
large number (1.0x106, which indicates a bad observation and will not make any
impact on the analysis results). If the quality markers are smaller than lim_qm, the
observation errors are adjusted based on the vertical location and vertical
distribution of the observations. Please refer to the BUFR/PrepBUFR Users Guide
for more details on the quality markers and the values of lim_qm.
3. If an observation quality marker is either 3 or 7, the observation error can be
inflated by setting inflate_error as true. The value of the inflation factor may be set
based on observation types. However, currently it is fixed as 1.2.
4. For certain observation types (e.g., T), their observation errors are amplified by a
factor of 1.2 if the observation locations are above 100 hPa.
Besides the above-mentioned steps, observation errors are further inflated during the
observation innovation calculation (e.g., in the subroutine setupt, see section 6.2.4 for
detail) when the observation is located either lower than the lowest analysis level or higher
than the highest analysis level. In the same routine, GSI performs gross error checks and, if
oberror_tune is set to true, observation error tuning (this function is not discussed in this
document).
The gross error check is an important quality check step to exclude questionable
observations that degrade the analysis. Users can adjust the threshold of the gross error
check for each data type within the convinfo file. For example, the following is a part of
convinfo without last three columns:
!otype type sub iuse twindow numgrp ngroup nmiter gross ermax ermin var_b var_pg ithin
ps 120 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0
ps 180 0 1 3.0 0 0 0 4.0 3.0 1.0 4.0 0.000300 0
t 120 0 1 3.0 0 0 0 8.0 5.6 1.3 8.0 0.000001 0
t 126 0 -1 3.0 0 0 0 8.0 5.6 1.3 8.0 0.001000 0
The gross check for each data type is controlled by gross, ermax, and ermin. If an
observation has observation error: obserror, then a gross check ratio is calculated:
ratio = (Observation-Background)/max(ermin,min(ermax,obserror))
If ratio > gross, then this observation fails the gross check and will not be used in the
analysis. The unused observation is indicated as rejection in the fit files.
119
Observation and Background Error Statistics
Background error covariance plays a very important role in determining the quality of
variational analysis for NWP models. It controls what percentage of the innovation
becomes the analysis increment, how each observation impacts a broad area, and the
balance among different analysis variables.
Since most of the data assimilation background are model forecasts from a prior time step,
the background error covariance matrix (B) can be defined as the error covariance of model
forecasts:
[Forecast (x) Truth (xtruth)]
Since the actual state of atmosphere (truth) is not known, the forecast errors need to be
estimated. When estimating forecast errors, the most common methods are the NMC
method and ensemble method. In the NMC method, forecast errors are estimated with
the difference of two (typically 12 and 24 hours) forecasts valid for the same time. In the
ensemble method, the forecast errors are estimated with ensemble perturbations
(ensemble - ensemble mean).
Because of the size of the model variables, the full size of a B matrix is extremely large. It
is typically on the order of 106x106, which in its present form cannot be stored in any
computer. This problem is simplified by using an ideal set of analysis variables for which
the analysis is performed. These are generally referred to as analysis control variables.
The analysis control variables are selected such that the cross-correlations between these
variables are minimum, which means less off-diagonal terms in B. The cross dependency is
removed with pre-computed regression coefficients. Further, the forecast errors are
modeled as a Gaussian distribution with pre-computed variances and lengthscale
parameters for each of the analysis control variables. We will use the following sub-
sections to briefly introduce how GSI processes these pre-computed background error
statistics and applies them in a GSI analysis.
120
Observation and Background Error Statistics
The GSI package has several files in ~/comGSI_v3.1/fix/ to hold the pre-computed
background error statistics for regional and global domains for different grid
configurations. Since the GSI code has a build-in mechanism to interpolate input
background error statistics to any desired analysis grid, the following two files can be used
to specify the B matrix for any GSI regional application.
All the parameters for the global background error statistics are latitude dependent. In the
case of the regional background error statistics, regression coefficients of velocity potential
as well as variances and horizontal lengthscales for all the control variables are latitude
dependent. The remaining parameters such as regression coefficients for unbalanced
surface pressure, temperature and vertical lengthscales for all the fields do not vary
with latitude. In the GSI code, the background error statistics are initially read in at their
original sigma levels and interpolated vertically in log(sigma) coordinates on the desired
vertical sigma levels. All the subroutines performing such interpolations for global and
regional applications are available in m_berror_stats.F90 and m_berror_stats_reg.f90
modules, respectively.
In subroutines prewgt and prewgt_reg, lengthscales (both horizontal and vertical) and
variance information are read in and then vertically interpolated to analysis grids by calling
berror_read_wgt and berror_read_wgt_reg, while the balance information is read in
and vertically interpolated to analysis grids by calling berror_read_bal and
berror_read_bal_reg, respectively for global and regional applications
The following shows the list of arrays in which the original background error statistics are
read by the various subroutines discussed above:
121
Observation and Background Error Statistics
Horizontal interpolation of regression coefficients to the desired grid is done for global and
regional applications respectively in subroutines prebal and prebal_reg, residing in the
balmod.f90 module. Horizontally interpolated regression coefficients on the desired grid
are stored in bvz, agvz,wgvz and bvk, agvk, wgvk arrays for global and
regional applications, respectively. These regression coefficients are used in subroutine
balance to build the respective balance part of velocity potential, temperature, and surface
pressure fields.
122
Observation and Background Error Statistics
According to the variational equations used in the GSI, the background error covariance is
used to calculate the gradient of the cost function with respect to y based on the gradient of
the cost function with respect to x, which can be represented below following Section
6.1.2:
y J = B x J (subroutine bkerror(gradx,grady))
Because B is very complex and has a very large dimension in most data analysis domains,
in reality, it must be decomposed into several sub-matrices to fulfill its function step by
step. In GSI, the B matrix is decomposed into the following form:
B = B balanceV B Z ( B x B y B y B x ) B Z V B Tbalance
The function of each sub-matrix is explained in the following table:
T
Step 1. Adjoint of balance equation ( Bbalance ) is done by calling tbalance
Step 2. Apply square root of variances, vertical and horizontal parts of background error
correlation by calling subroutine bkgcov
Multiply by square root of background error variances ( V ) by calling bkgvar;
Apply vertical smoother ( BZ ) by calling frfhvo;
Convert from subdomain to full horizontal field distributed among processors
by calling sub2grid;
Apply self-adjoint smoothers in West-East (Bx) and South-North (By) direction
by calling smoothrf. Smoothing in horizontal is achieved by calling ryxyyx at
each vertical sigma level in a loop over number of vertical sigma levels (nlevs).
Smoothing for three horizontal scales is done with the corresponding weighting
factors (hswgt) and horizontal lengthscale tuning factors (hzscl);
The horizontal field is transformed back to respective subdomains by calling
grid2sub;
Apply vertical smoother ( BZ ) by calling frfhvo;
123
Observation and Background Error Statistics
124
Satellite Radiance Data Assimilation
Satellite radiance data analysis is one of the most advanced and important features in the
GSI system. GSI has developed complex functions and code components to ingest,
analyze, bias correct, and monitor radiance observations from various satellite instruments.
In this Chapter, we will discuss these satellite radiance analysis related aspects from the
users point of view, including how to correctly setup and run GSI with radiance
observations, how to check and understand the radiance analysis results, bias correction,
and monitoring radiance observations. Related code structure will also be described to help
advanced users to further investigate and apply radiance data analysis with GSI.
All radiance observations used by the GSI are saved in the BUFR format. For detailed
information on the BUFR format and its processing techniques, please see the
BUFR/PrepBUFR Users Guide, which is available on line:
http://www.dtcenter.org/com-GSI/BUFR/docs/index.php
In the Section 3.1 of this users guide, we introduced all GSI BUFR/PrepBUFR
observation files and the GSI recognized observation file names in a table format. From
this table, we can see most of the BUFR files are used for satellite radiance data. Here, we
will use a small part of the table to explain the link between the GSI name and the file
name:
The right column of the table gives example radiance BUFR files that can be downloaded
from the NCEP data servers (please see BUFR/PrepBUFR users guide for the naming
convention for these files), while the left column is the data file name that GSI expects
during observation data ingestion. The middle column is a brief explanation of the data
content in each file.
125
Satellite Radiance Data Assimilation
As explained in section 5.2.1, running radiance data analysis with GSI could be as simple
as linking the radiance BUFR files to the GSI run directory with the GSI recognized name
in the run script. For example, if we add the following two lines to the GSI run script:
we should see AMSU-A and AMSU-B observations are analyzed in the GSI analysis, as
illustrated in the rest of Section 5.2. Here, we will give more detail on the setup and usage
of the GSI recognized observation file name (GSI name) in the left column of the section
3.1 table.
The GSI names, amsuabufr and amsubbufr, are actually decided by the parameters in the GSI
namelist section OBS_INPUT. As an example, the part of OBS_INPUT is:
dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15',
dfile(29)='amsuabufr', dtype(29)='amsua', dplat(29)='n16', dsis(29)='amsua_n16',
dfile(30)='amsuabufr', dtype(30)='amsua', dplat(30)='n17', dsis(30)='amsua_n17',
dfile(31)='amsuabufr', dtype(31)='amsua', dplat(31)='n18', dsis(31)='amsua_n18',
dfile(32)='amsuabufr', dtype(32)='amsua', dplat(32)='metop-a', dsis(32)='amsua_metop-a',
dfile(33)='airsbufr', dtype(33)='amsua', dplat(33)='aqua', dsis(33)='amsua_aqua',
dfile(34)='amsubbufr', dtype(34)='amsub', dplat(34)='n15', dsis(34)='amsub_n15',
dfile(35)='amsubbufr', dtype(35)='amsub', dplat(35)='n16' dsis(35)='amsub_n16',
dfile(36)='amsubbufr', dtype(36)='amsub', dplat(36)='n17', dsis(36)='amsub_n17',
Please note that the last two columns of the OBS_INPUT have been excluded for
conciseness. From this list, we can see the content of dfile is the GSI name, which is the
observation file name recognized by GSI, while dtype and dplat indicate the radiance
instruments and the satellite name associated with the GSI name in dfile. The dsis is the
radiance observation type that is the combination of the instruments and satellite names.
This list tells us that the GSI expects NOAA-15 AMSU-A radiance observations from a
BUFR file with name amsuabufr. It also reads in the NOAA-18 AMSU-A observations
from the same file. For NOAA-17 AMSU-B observations, GSI will read them in from a
file named amsubbufr.
It is possible to change the GSI name in dfile to a user specified name (for example,
amsuagsi rather than 'amsuabufr') as long as the GSI name (amsuabufr)in the link from
the BUFR file (gdas1.t12z.1bamua.tm00.bufr_d) to the GSI name has also been
changed. The following demonstrates the process required to change the name in dfile.
126
Satellite Radiance Data Assimilation
It is advised to use the GSI names provided in the released run script because they describe
the contents of the file well and are used by many users. However, the flexibility to setup a
different GSI name does give GSI more ability to analyze radiance observations from
different resources. For example, if we want GSI to assimilate NOAA-15 AMSU-A
observations from a BUFR file named gdas1.t12z.1bamua.tm00.bufr_d and
NOAA-16 AMSU-A observations from another BUFR file named
gdas2.t12z.1bamua.tm00.bufr_d, we can setup the run script and namelist as
follows:
Now, GSI will read in NOAA-15 AMSU-A observations from the GSI file amsuabufr,
which is the BUFR file gdas1.t12z.1bamua.tm00.bufr, and read in NOAA-16
AMSU-A observations from another GSI file amsuagsi, which is the BUFR file
gdas2.t12z.1bamua.tm00.bufr.
A common user mistake in the setup of the radiance data analysis is forgetting to add the
radiance observation type the user wants to use into the OBS_INPUT. Some users may
notice that NOAA-19 AMSU-A is not on the list of the OBS_INPUT setups in release
version 3. To use GSI to analyze NOAA-19 AMSU-A observations with the run script and
name list from release version 3, users need to add one more line in OBS_INPUT, for
example:
dfile(79)='amsuabufr', dtype(79)='amsua', dplat(79)='n19', dsis(79)='amsua_n19',
Where index 79 for this new line is from the existing number of parameter ndat in
namelist section SETUP plus 1. The ndat should also be set to 79.
In this case, NOAA-19 AMSU-A observations should be included in the BUFR file that
amsuabufr is linked to. The released run script will be continually updated to include new
satellite platforms, however users are suggested to double-check the content of the BUFR
file and the setup of the namelist if desired data types are missing from the analysis.
The radiance data normally need to be thinned in the analysis, the last column (dthin(26)=1,)
in the namelist section OBS_INPUT is used to setup radiance data thinning. The details of
radiance data thinning are described in section 3.3 under item 7.
More detailed control on how to use each channel of certain radiance observation types in
the GSI analysis can be achieved by setting up the satinfo file. The use of the satinfo file
127
Satellite Radiance Data Assimilation
was previously introduced in section 4.3. Please note the satinfo file is structured
differently in release version 3.1 from previous released versions.
GSI has a set of files (subroutines) named read_*.f90 to read in different types of
observations, including satellite radiance. The table in Section 6.2.3 gives a complete list of
such subroutines. Below is an excerpt of the table that applies to radiance data:
ssmi read_ssmi
From this table, we can see TOVS 1b observations from the NOAA and METOP satellites
are read in by subroutine read_bufrtovs and the GEOS sounder and SSMI observations are
read in by subroutine read_goesndr and read_ssmi.
In general, GSI reads in radiance observations from external BUFR files, picks the
observations within the analysis domain and time window, performs thinning based on the
coarse grid setup in OBS_INPUT, and saves them into an intermediate binary file using a
general data format across all observation types.
The basic structure of BUFR file ingesting is two loops to read in every message
(read_subset) from the BUFR file and then reads in all observations (read_loop) from
128
Satellite Radiance Data Assimilation
each message. In the subroutine read_bufrtovs, the two loops are marked by the following
code:
...
The content of each observation needed by GSI can be found by searching the BUFR
mnemonics (bold in following code sample), for example, the following lines of the code
give a list of mnemonics included in the subroutine:
hdr1b ='SAID FOVN YEAR MNTH DAYS HOUR MINU SECO CLAT CLON CLATH CLONH HOLS'
if (atms) hdr1b ='SAID FOVN YEAR MNTH DAYS HOUR MINU SECO CLAT CLON CLATH CLONH HMSL'
call ufbrep(lnbufr,data1b8,1,nchanl,iret,'TMBR')
call ufbrep(lnbufr,data1b8,1,nchanl,iret,'TMBRST')
An explanation of each mnemonic can be easily found from the BUFR table used to
generate this BUFR file. Users can get this BUFR table on-line, from decoding the BUFR
file, or checking the BUFR file bufrtab.012 in the fix directory of the release package. For
example, a search for SAZA SOZA in bufrtab.012, we found the following two lines:
| SAZA | 007024 | SATELLITE ZENITH ANGLE |
| SOZA | 007025 | SOLAR ZENITH ANGLE |
These lines tell us that GSI needs to read in satellite zenith angle and solar zenith angle for
each observation profile.
In the data ingesting subroutine, only observations within the analysis domain (for regional
applications) and time window are processed for the thinning. After establishing a coarse
grid based on the setups in the parameters from OBS_INPUT, GSI starts a smart selection
of radiance profiles for the coarse grid. This processing of radiance data thinning not only
selects the nearest radiance profile in a coarse grid, but also considers the quality of the
radiance profiles. Each observation needs to go through the following steps in the data
thinning to only keep the radiance profile with the best quality for analysis:
129
Satellite Radiance Data Assimilation
3. Skip profiles that the Field of View (FOV) are out of range
4. Select profiles that are over better surface fields. Pick the observation in such
order based on surface features: sea, sea ice, snow and land, mixed surface.
5. Select profile based on the data quality predictor
After data thinning, the best quality radiance observation for each coarse grid is then saved
with surface status (calculated from the background) in a two-dimensional array called
data_all. The 1st dimension of the array saves all information about one observation and
the 2nd one loops through the observations. The code that assigns the content of the array
starts like:
data_all(1 ,itx)= rsat ! satellite ID
data_all(2 ,itx)= t4dv ! time
data_all(3 ,itx)= dlon ! grid relative longitude
data_all(4 ,itx)= dlat ! grid relative latitude
The code itself gives clear notation on the content of the 1st dimension of the array except
for the last three lines. For example, it clearly tells us the first 4 items in the array are
satellite ID (rsat), observation time (t4dv), and grid relative longitude (dlon) and latitude
(dlat). However, there is no clear notation for data_all(i+nreal,itx), a little search for
the array data1b8 indicates it contains the brightness temperature from all channels in an
observation profile.
After reading and processing all observations in the BUFR file and saving them in the data
array data_all, this array is written to an intermediate binary file at the end of the
subroutine read_bufrtovs.
From the stdout file, we can see the following information counting the data during the data
ingesting stage, an example from the case in Chapter 5:
READ_BUFRTOVS: file=amsuabufr type=amsua sis=amsua_n15 nread= 128055
ithin= 2 rmesh= 60.000000 isfcalc= 0 ndata= 53932 ntask= 1
This tells us that the subroutine read_bufrtovs is reading NOAA-15 AMSU-A observations
from file amsuabufr. There are 128055 observations (profile number * channels number)
read in from the BUFR file and 53932 observations kept for further processing after data
selection and thinning.
130
Satellite Radiance Data Assimilation
The analysis in GSI is done in each subdomain for MPI runs. The observation number in
each sub-domain can be found in the stdout file. All data types are listed in the stdout file
as shown in the following example, using the same example as section 5.2.2:
3:OBS_PARA: ps 2352 2572 8367 2673
3:OBS_PARA: t 4617 4331 12418 4852
3:OBS_PARA: q 3828 3908 11096 3632
3:OBS_PARA: uv 7859 6216 17360 6126
3:OBS_PARA: sst 96 239 564 150
3:OBS_PARA: pw 89 31 141 23
3:OBS_PARA: hirs4 metop-a 0 0 28 35
3:OBS_PARA: amsua n15 2564 1333 133 195
3:OBS_PARA: amsua n18 1000 2117 0 90
3:OBS_PARA: amsua metop-a 0 0 58 67
3:OBS_PARA: amsub n17 0 0 58 70
3:OBS_PARA: hirs4 n19 245 1097 0 37
3:OBS_PARA: amsua n19 656 3503 0 93
Please note the number in each subdomain is the number of the radiance profiles, not the
number of observed channels. Each profile includes many channels. For example, each
HRIS observation has 19 channels, each MSU has 4 channels, each AMSU-A has 15 and
AMSU-B has 5, each MHS has 5 and SSU has 3.
The observation operator for radiance observations is very complex and out of the scope of
this users guide. Here, we only briefly introduce some features of the radiance observation
operator. The Community Radiative Transfer Model (CRTM) developed by JCSDA is
employed by the GSI system to transform control variables into simulated radiance or
brightness temperatures. This operator can be illustrated by the following equation:
y=K(x,z)
where:
y are simulated radiance observations;
x are profiles of temperature, moisture, and ozone;
K is the radiative transfer equation (CRTM);
z are unknown parameters such as the surface emissivity, CO2 profile, methane
profile, etc.
In GSI, x (including surface condition) are calculated based on the background fields and
then are put into the CRTM functions (K) to calculate the simulated radiance observations
y. When unknowns in K(x, z) are too large, which may be from the formulation of K or
unknown variables (z), observed radiance data cannot be reliably used and must be
removed during quality control. Examples of this include when clouds, trace gases, or
aerosols exist in the observed column. The description of radiance data quality control can
be found in the next section. For advanced users needing to learn the details of the radiance
131
Satellite Radiance Data Assimilation
observation operator in GSI, please check the corresponding subroutine listed in the right
column of the section 6.2.4 table.
Because GSI uses the CRTM functions as part of the radiance observation operator, the
CRTM coefficients have to be available during the radiance data analysis. In the GSI
release package, these CRTM coefficients are linked to the running directory by the run
script before the GSI starts to run. The details of linking CRTM coefficients can be found
in Chapter 3 in the introduction of the GSI run scripts. Please note that the GSI run script
does not know which kind of radiance observations will be used in the analysis. The script
links all the CRTM coefficients for the radiance observation types listed in the satinfo file.
After reading in radiance observations from BUFR files, GSI recognizes which kind of
radiance observations to be used and only reads in the corresponding coefficients needed.
Therefore, users only need to check whether the CRTM coefficients of the user interested
radiance data types are linked correctly. At the same time, users can ignore the warning
information on the missing CRTM coefficients if those coefficients are for the radiance
data types that are not used in the application.
The quality control (QC) may be the most important aspect of satellite data assimilation.
Unlike conventional observations from a prepbufr file, which includes the quality markers
from the NCEP quality control process, the satellite radiance BUFR file does not include
observation quality information. Instead, the quality control for radiance observations is
inside the GSI.
The GSI radiance data quality control starts right after the radiance observations are read in
(such as in read_bufrtovs.f90). We can think of the processing of radiance data thinning as
a part of the quality control because the thinning process selects the best quality
observations. The major radiance data quality control step is after the calculation of the
radiance observation departure in file setuprad.f90. Many QC steps are employed to
capture problematic satellite data, which mainly come from the following 4 sources:
Instrument problems
Clouds and precipitation simulation errors
Surface emissivity simulation errors.
Processing errors (e.g., wrong height assignment, incorrect tracking, etc.)
In GSI, each instrument has its own quality control subroutine. All these subroutines are in
the file qcmod.f90 and are listed as follows for reference:
132
Satellite Radiance Data Assimilation
After calculating the radiance observation departure from the background and bias
correction, these QC functions are called for each instrument to either toss the bad
(questionable) observations or inflate the low confidence observations. The number of
filtered observations by these QC functions is summarized in the radiance fit file (fort.207)
as 7 QC categories (steps). To help users understand the meanings of these numbers in the
radiance fit file, we will briefly introduce these QC steps in subroutine qc_amusa in the
following table. Please note these QC categories may have different meaning for different
instruments:
Using the above table, we can understand numbers listed under qc1 to qc7. Listed below
also includes the explanation of the numbers not in the above table, for a complete
understanding of this part of the radiance fit file. For other portions of the fit file, please see
the introduction in section 4.5.2.
133
Satellite Radiance Data Assimilation
From the above example, we see there are 4149 NOAA-15 AMSU-A profiles after
thinning, among which there are:
Using bias correction to correct the system bias in the satellite radiance observations is one
of the key steps to get a successful satellite radiance data assimilation. This section will
introduce the basic theory of the GSI bias correction, the procedures and configurations of
the bias correction in the GSI system, an explanation of the namelist, satinfo, and
coefficients for bias correction, the use of the angle bias correction utility, and discussions
of some common issues users encounter in the application of the GSI bias correction.
Observation bias can systematically damage the data assimilation results and,
consequently, the quality of the forecasting system. Biases in satellite observations are of
particular concern because they may larger than the signal and damage the numerical
weather prediction system in a very short period of time.
Biases between the satellite observations and the model may come from the following
sources:
satellite instrument itself (e.g. poor calibration or characterization, or adverse
environmental effects);
radiative transfer model (RTM) linking the atmospheric state to the radiation
measured by the satellite (e.g. errors in the physics or spectroscopy, or from non-
modeled atmospheric processes);
systematic errors in the background atmospheric state provided by the NWP model
used for monitoring.
134
Satellite Radiance Data Assimilation
Since the bias correction is applied to the radiance departures, this is equivalent to using the
modified definition of the observation operator:
The training of the bias correction consists in finding the vector that allows the best fit
between the NWP fields x and the observations. This is obtained by minimizing the
following cost function:
For more details on the bias correction, please see the references listed below:
1. Auligne T., A. P. McNally and D. P. Dee. 2007. Adaptive bias correction for satellite data in a numerical
weather prediction system. Q. J. R. Meteorol. Soc. 133: 631-642.
2. Derber JC, Wu W-S. 1998. The use of TOVS cloud-cleared radiances in the NCEP SSI analysis system.
Mon. Weather Rev. 126: 22872299.
3. Harris BA, Kelly G. 2001. A satellite radiance-bias correction scheme for data assimilation. Q. J. R.
Meteorol. Soc. 127: 14531468.
4. Dee, D. P., 2004: Variational bias correction of radiance data in the ECMWF system. Pp. 97112 in
Proceedings of the workshop on assimilation of high-spectral-resolution sounders in NWP. 28 June1
July 2004, ECMWF, Reading, UK.
5. Dee, D. P. and S. M. Uppala, 2009, Variational bias correction of satellite radiance data in the ERA-
Interim reanalysis. Q. J. R. Meteorol. Soc. 135, 18301841.
In GSI, the bias correction for satellite radiance has two parts: one part is air mass bias
correction, also called the predictive part of the bias correction; another part is angle
dependent bias correction. Each part of bias correction has its own bias correction
coefficient file:
The satbias_angle file contains the angle dependent part of the brightness
temperature bias for each channel/instrument/satellite. Also included in this file is
the mean temperature lapse rate for each channel weighted by the weighting
function for the given channel/instrument.
The satbias_in file contains the coefficients for the predictive part of the bias
correction.
135
Satellite Radiance Data Assimilation
GSI will read in the coefficients from both satbias_angle and satbias_in files, combine
them together with predictors to generate a system bias value for each channel, and then
subtract this system bias from the observation innovation during the radiance observation
operator calculation. During the minimization process, GSI will calculate the updated
coefficients for the predictive part of the bias correction and save the updated coefficients
in another file called satbias_out. The angle dependent bias coefficients are updated
outside of GSI using a utility named gsi_angleupdate in the release package. These new
mass and angle dependent bias coefficients should be used for the bias correction in the
next cycle of the GSI analysis.
To set up the bias correction for satellite radiance in the GSI system, users need to link the
right coefficient files in the run directory and keep the coefficient files updated in cycles:
Step 1, Link coefficient files for both mass bias correction and angle dependent bias into
the GSI run directory before running the GSI executable.
The coefficient files should come from the previous data assimilation cycle. However, if
there is no previous data analysis cycle, the sample coefficient files can be copied from the
directory ./fix within the community release version as a cold start. When using the run
script with the released version, the following lines in the run script copy the coefficient
files:
SATANGL=${FIX_ROOT}/global_satangbias.txt
SATINFO=${FIX_ROOT}/global_satinfo.txt
...
cp $SATANGL satbias_angle
cp $SATINFO satinfo
Within the directory ./fix, the sample angle dependent bias correction coefficients file is
called global_satangbias.txt, and the file for mass bias correction coefficients is
sample.satbias. Here, we also include the copy to the satinfo file because the bias
correction needs information from the satinfo file.
Step 2, Run GSI and save the output from the mass bias correction for next cycle
After running the GSI, an updated coefficient file for the mass bias correction is generated
in the run directory. This file is called satbias_out, which should be saved for the next
cycle of the GSI analysis. There is a line commented out in the released GSI run script
reserved for this purpose. The user should choose how to save the file for the next cycle:
# GSI updating satbias_in
#
# cp ./satbias_out ${FIX_ROOT}/sample.satbias
Step 3, Run the angle dependent bias correction utility after GSI runs and save updated
coefficients of angle dependent bias correction for use in the next cycle
136
Satellite Radiance Data Assimilation
The update of the coefficients for angle dependent bias correction is done by a stand alone
application named gsi_angleupdate located under the directory ./util, but outside the GSI
itself. This application reads in the diag files from the GSI analysis results and the old
angle dependent bias coefficients, updates the coefficients and saves them as a new file
called satbias_ang.out. We will introduce how to apply this utility in the next section.
Please note the cycling of the coefficients to let the bias information accumulate during the
data assimilation cycle is the key to getting the right bias correction.
To conduct the bias correction, GSI needs several pieces of information from different
files:
The following is a brief introduction to these files to help the user to understand the
contents of each file and know how to check if the user interested satellite channels are
correctly configured in these files.
The complete explanation of the GSI run script and most often used namelist options can
be found in Chapter 3 of this guide. More details of setting up radiance data analysis in the
run script are described in section 1 of this chapter. Users should make sure that user
interested satellite instruments and platforms are in the list in &OBS_INPUT and have
been correctly linked to the BUFR files.
Also, the following is a list of GSI namelist options related to the bias correction:
137
Satellite Radiance Data Assimilation
The usage information for each channel from the satinfo file
The GSI uses an information file called satinfo to control how to use each radiance
channel. Detailed information about satinfo can be found in the GSI Users Guide Section
4.3. The following is an example:
!sensor/instr/sat chan iuse error error_cld ermax var_b var_pg
amsua_n15 1 1 3.000 9.100 4.500 10.000 0.000
amsua_n15 2 1 2.000 13.500 4.500 10.000 0.000
amsua_n15 3 1 2.000 7.100 4.500 10.000 0.000
amsua_n15 4 1 0.600 1.300 2.500 10.000 0.000
amsua_n15 5 1 0.300 0.550 2.000 10.000 0.000
amsua_n15 6 1 0.230 0.230 2.000 10.000 0.000
amsua_n15 7 1 0.250 0.195 2.000 10.000 0.000
amsua_n15 8 1 0.275 0.232 2.000 10.000 0.000
amsua_n15 9 1 0.340 0.235 2.000 10.000 0.000
amsua_n15 10 1 0.400 0.237 2.000 10.000 0.000
amsua_n15 11 -1 0.600 0.270 2.500 10.000 0.000
amsua_n15 12 1 1.000 0.385 3.500 10.000 0.000
amsua_n15 13 1 1.500 0.520 4.500 10.000 0.000
amsua_n15 14 -1 2.000 1.400 4.500 10.000 0.000
amsua_n15 15 1 3.000 10.000 4.500 10.000 0.000
hirs3_n17 1 -1 2.000 0.000 4.500 10.000 0.000
hirs3_n17 2 -1 0.600 0.000 2.500 10.000 0.000
Users can easily understand the first 2 columns are sensor/instrument/satellite and channel
number information. The 3rd column is the usage information, which has the following
meanings:
For bias correction purposes, please make sure user interested channels are listed in the
satinfo file and have been set to the correct usage flag.
The coefficients from both mass and angle dependent bias correction coefficient
files
As previously introduced in this section, there are two bias correction coefficient files.
These files include the bias correction coefficients for each channel:
138
Satellite Radiance Data Assimilation
1) satbias_in
This file contains the coefficients for the predictive part of the bias correction (air mass
bias correction coefficients). There is a sample for this file named sample.satbias in the
GSI release package under the directory ./fix. All coefficients in this sample file are 0.
Here, we use NOAA-15 AMSU-A from the satbias_out file from the radiance application
case in Chapter 5 as example:
1 amsua_n15 1 0.472353 -0.231512 0.291223 0.000634 -0.148959
2 amsua_n15 2 -0.677697 0.382025 1.424922 -0.000061 0.016514
3 amsua_n15 3 -2.631062 0.134578 2.968469 -0.004946 1.213581
4 amsua_n15 4 -0.470401 2.121855 5.764014 0.006496 1.333609
5 amsua_n15 5 10.996354 -0.762965 1.787372 0.082404 -1.661531
6 amsua_n15 6 -22.026905 -1.543174 -1.397403 0.175626 -11.384948
7 amsua_n15 7 -1.954080 -0.293421 0.029899 -0.064129 3.039958
8 amsua_n15 8 -9.468913 -1.490995 -0.856006 -0.013090 -0.945916
9 amsua_n15 9 -22.737061 -2.195735 0.247890 -0.357354 -15.298422
10 amsua_n15 10 -0.875332 2.212551 -0.392323 -0.337414 -8.785395
11 amsua_n15 11 0.000000 0.000000 0.000000 0.000000 0.000000
12 amsua_n15 12 2.800800 -4.042608 0.060067 0.913834 15.980004
13 amsua_n15 13 0.000000 0.000000 0.000000 0.000000 0.000000
14 amsua_n15 14 0.000000 0.000000 0.000000 0.000000 0.000000
15 amsua_n15 15 -0.439501 0.539856 0.412582 -0.001741 0.158646
The first 3 columns are series number, the sensor/instrument/satellite, and channel number
of each instrument. Columns 4 through 8 are 5 coefficients for the predictive (air mass) part
of the bias correction, which has 5 predictors.
2) satbias_angle
The satbias_angle contains the angle dependent part of the brightness temperature bias.
There are two sample files for this in the GSI release package under the directory ./fix:
global_satangbias.txt and nam_global_satangbias.txt. Here we only give two channels as
examples from the file global_satangbias.txt:
1 amsua_n15 1 0.528768E-02
0.063 -0.200 -0.411 -0.588 -0.638 -0.523 -0.493 -0.466 -0.482 -0.475
-0.666 -0.587 -0.593 -0.602 -0.766 -0.955 -1.080 -1.218 -1.149 -1.374
-1.553 -1.635 -1.715 -1.783 -1.689 -1.507 -1.473 -1.244 -1.233 -1.259
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
2 amsua_n15 2 0.290661E-02
-2.769 -2.880 -2.583 -2.449 -2.218 -1.810 -1.536 -1.242 -0.882 -0.788
-0.676 -0.697 -0.508 -0.464 -0.544 -0.790 -0.945 -1.108 -1.002 -1.364
-1.404 -1.315 -1.318 -1.151 -0.826 -0.219 0.086 0.631 1.121 1.807
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
139
Satellite Radiance Data Assimilation
The columns are series number, the sensor/instrument/satellite, channel number of each
instrument, and T lap mean, respectively. The next 90 numbers are coefficients for angle
dependent bias correction. These numbers correspond to the number of the FOV per scan.
For example, AMSU-A has 30 FOV per scan, which is using the first 30 numbers to
represents the bias correction coefficients for each FOV position, while the AMSU-B has
90 FOV per scan, all 90 numbers are used to do bias correction.
If there are some instruments in satinfo but not in satbias_in and satbias_angle, GSI will
set 0 as the initial value for these instruments and write out updated coefficients for these
new instruments in coefficient results files.
The coefficients for correcting the angle dependent part of the brightness temperature bias
in the GSI are calculated after each GSI run. The NCEP has developed a tool to calculate
these coefficients and, for the first time, this community GSI release (v3.1) started to
include this tool as a part of the release package. This tool is released as a directory named
./gsi_angupdate within ./comGSIv3.1/util .
Please note the community GSI release package version 3.0 doesnt have this tool. If users
are using the GSI release 3.0 and need this tool, please download the tar file
gsi_angupdate.tar from the GSI download page on-line. Once untarred, under the GSI
directory ./comGSIv3/util, you will see a new directory: ./gsi_angupdate, which includes
the tool to update coefficients for radiance angle dependent bias correction.
Compile
./make
Please note that before compiling this utility, the community GSI should already be
compiled successfully. Please refer to Chapter 2 of this Users Guide on how to configure
and compile the community GSI.
Run
Before running gsi_angupdate.exe, make sure that GSI with radiance data have finished
successfully and the diagnostic files that hold O-B information have been generated in the
140
Satellite Radiance Data Assimilation
GSI run directory. The executable gsi_angupdate.exe will read in O-B information from
the diagnostic files for each sensor to update coefficients for angle dependent bias
correction of the sensor.
To help users easily run this tool, a sample run script named run_gsi_angupdate.ksh is
provided within the ./comGSIv3.1.run directory. If a user uses the tar file downloaded
separately on-line, a similar run script can be found in the directory ./comGSI/util with
the code.
This script is modified based on the GSI run script. The script has a similar structure.
Please check section 3.2.2.1 for instructions on setting up the machine environment, and
section 3.2.2.2 for setting up the run environment. The run environment portion is
illustrated below:
#####################################################
# machine set up (users should change this part)
#####################################################
#
# GSIPROC = processor number used for GSI analysis
#------------------------------------------------
GSIPROC=1
ARCH='LINUX_PBS'
# Supported configurations:
# IBM_LSF,
# LINUX, LINUX_LSF, LINUX_PBS,
# DARWIN_PGI
In this script, only four parameters need to be set for a case study. These parameters have
been explained clearly in the run script and illustrated below:
#
#####################################################
# case set up (users should change this part)
#####################################################
#
# ANAL_TIME= analysis time (YYYYMMDDHH)
# WORK_ROOT= working directory, where angupdate executable runs
# GSI_WORK_ROOT= GSI working directory, where GSI runs
# GSI_ANGUPDATE_EXE = path and name of the gsi angupdate executable
ANAL_TIME=2011032212
WORK_ROOT=./comGSIv3.1/run/angupdate_${ANAL_TIME}
GSI_WORK_ROOT=./comGSIv3.1/run/arw_2011032212
GSI_ANGUPDATE_EXE=./comGSIv3.1/util/gsi_angupdate/gsi_angupdate.exe
#
These parameters tell the analysis case time, where to find the GSI run directory and
gsi_angupdate.exe, and where to run gsi_angupdate.exe.
The run time information can be found in the stdout file. A successful run should end with
the following information:
PROGRAM GLOBAL_ANGUPDATE HAS ENDED. IBM RS/6000 SP
141
Satellite Radiance Data Assimilation
Namelist
The namelist for gsi_angupdate.exe has two sections: setup and obs_input. Here, we
only show and illustrate part of the namelist as an example.
&setup
jpch=2680,nstep=90,nsize=20,wgtang=0.008333333,wgtlap=0.0,
iuseqc=1,dtmax=1.0,
iyy1=${iy},imm1=${im},idd1=${id},ihh1=${ih},
iyy2=${iy},imm2=${im},idd2=${id},ihh2=${ih},
dth=01,ndat=50
/
&obs_input
dtype(01)='hirs3', dplat(01)='n17', dsis(01)='hirs3_n17',
dtype(02)='hirs4', dplat(02)='metop-a', dsis(02)='hirs4_metop-a',
dtype(03)='goes_img', dplat(03)='g11', dsis(03)='imgr_g11',
dtype(04)='goes_img', dplat(04)='g12', dsis(04)='imgr_g12',
dtype(05)='airs', dplat(05)='aqua', dsis(05)='airs281SUBSET_aqua',
dtype(06)='amsua', dplat(06)='n15', dsis(06)='amsua_n15',
The section obs_input only has three columns, which have the same meaning as their
counterparts in the GSI namelist, i.e., dtype and dplat specify the radiance instrument and
the satellite name, respectively, and dsis indicates the radiance observation type with a
name combining both the instrument and the satellite names.
Most of the parameters in the section setup are different from the section setup in GSI
namelist. We will explain these parameters below:
142
Satellite Radiance Data Assimilation
In this section, we will discuss some frequently asked questions on satellite radiance bias
correction.
Where to get bias correction coefficient files for the NCEP operational system.
The real-time satellite bias correction coefficients used for the NCEP operational
system is available on-line from the same website that holds observation
BUFR/PrepBUFR files:
Right now, the coefficient files for angle dependent bias correction are not available in
these web sites.
As mentioned in this section, the released version provides sample files for these
coefficients under the directory ./fix:
satbias_in: sample.satbias
satbias_angle: global_satangbias.txt and nam_global_satangbias.txt
These files are provided as a sample only. Users need to generate their own coefficients
based on their experiments. Usually, these coefficients need to be cycled for a period
(weeks or months) to get to a stage to do the right bias correction.
143
Satellite Radiance Data Assimilation
What if the user has no bias correction coefficients and only runs short experiments
(e.g., a week) for radiance data assimilation?
Following the suggestion from NCEP experts, the following may help some users
to improve their radiance data assimilation experiments:
1) Start with coefficient files as close as possible to your cases.
2) Run a single GSI analysis with mass bias and angle dependent bias correction.
You can get updated mass bias and angle dependent bias correction coefficient
files.
3) Run the same GSI analysis as step 2 using the same background and
observations but supply GSI with updated mass and angle dependent bias
correction coefficient files.
4) Repeat step 3 about 10 times to spin up the mass and angle dependent
coefficients.
5) Move on to the next cycle or analysis time and repeat steps 2 to 4.
6) After one or two days, the mass coefficient should be ready for the real case
test. Angle dependent bias correction will spin up slowly.
By starting two days prior to your real case period to spin up the coefficients, you
should be able to get better bias correction results.
The radiance channels in satinfo should match the channels in satbias_in and
satbias_angle. If they do not match, GSI will match satbias_in based on channels in
satinfo:
If radiance channels only exist in satinfo but not in satbias_in, these channels will
be added to the updated coefficient files with 0 as the initial values.
If radiance channels are not in satinfo but are in satbias_in, the extra channels in
satbias_in will be removed from the updated files.
If channels in satinfo and satbias_angle do not match, GSI will use the channels in both
files, but the angle dependent update tool will crash due to the mismatch. Therefore,
users need to make sure the channels in satinfo and satbias_angle match.
How to select suitable satellite radiance channels when assimilating radiance data with
GSI:
This question is not only for bias correction.
1) Model top and instrument weighting functions:
Each channel has its own weighting function. If part of the weighting function is
above the model top, you may need to exclude this channel because your model
cannot obtain the correct simulated radiance from background.
144
Satellite Radiance Data Assimilation
2) Bias correction:
If a particular channel cannot be bias-corrected, for reasons such as the channel
is not correctly calibrated or due to instrument failure, you need to turn that
channel off. You may be able to check the time-series of bias for a certain
channel to get an idea of the status of the channel bias correction.
3) Test:
Try to view the data impact of each channel on the forecast to decide which
channel(s) are best for your application. You can monitor and perform bias
correction on each channel for a certain period and then turn that particular
channel from monitoring to usage in order to check the impact of the channel.
The NCEP operational GSI system includes a Radiance Monitoring Package to extract
certain radiance data from the GSI radiance diagnostic files and produce images as an aid
to monitor GSI radiance data assimilation performance and diagnose assimilation
problems. This package has been used at NCEP to support the following Radiance
Assimilation Monitoring web site:
http://www.emc.ncep.noaa.gov/gmb/gdas/radiance/index.html
This Radiance Monitoring Package includes a set of shell and Perl scripts, and small
programs. The package has two major components: data extraction and image generation.
Both components include scripts and programs. The package is also divided according to
data sources; it supports both global (glb) and regional (rgn) sources. During porting and
testing, the data sources are always set as regional because the DTC supports the GSI with
the regional WRF model.
145
Satellite Radiance Data Assimilation
To install the package, users need to download the community GSI version 3.1 tar file and
untar the file. The Radiance Monitoring Package is located in a directory named
./Radiance_Monitor under the directory comGSIv3.1/util. There are two subdirectories and
one shell script in the directory ./Radiance_Monitor:
Before using makeall.sh to compile all of the programs for data extraction, GSI itself needs
to be compiled first. After a successful GSI compilation, users can simply go to the
directory:
comGSIv3.1/util/Radiance_Monitor
and type:
./makeall.sh
This shell script will setup the compiling environment based on parameters in configure.gsi
and then moves to source code subdirectories under ./data_extract/sorc and
./image_gen/src to compile each of them. Finally, all of the executables are copied to the
directory ./data_extract/exec. The following is a list of the executables generated after
running makeall.sh:
Basically, the executable with name including the tag glb is for global resources and with
rgn is for regional resources. The focus of the next section will be using the regional
resources.
The first step to generate radiance monitoring images is to extract the data from the GSI
radiance diagnostic files and save the data into small, binary files that are ordered for use
with GrADS plotting scripts.
The DTC has provided two sample run scripts, vrfyrad.sh and run.ksh, to help users to run
these data extraction executables. Both scripts are in the directory
./Radiance_Monitor/data_extract/scripts.
146
Satellite Radiance Data Assimilation
Before using these scripts, users need to configure several parameters located in the top
part of the script vrfyrad.sh:
COMOUT=.../comGSIv3.1/run/arw_2011032212
HOMEgfs=...comGSIv3.1/util/Radiance_Monitor/data_extract
TANKverf=.../comGSv3.1I/run/verf
Where, COMOUT is the GSI run directory that has the GSI radiance data analysis results;
HOMEgfs is the full path of the subdirectory ./data_extract is the radiance monitoring
package; TANKverf is the location that users want to save the data extraction results. The
above example configurations tell the script that there is a GSI case under the community
GSI run directory in a subdirectory named ./arw_2011032212; the radiance monitoring
package is still located under community GSI system util directory; and results will be
saved under the directory ./run/verf.
Also, the default data resource is regional, which is set up by the following parameter in
script vrfyrad.sh:
RAD_AREA=${RAD_AREA:-rgn}
If users need to apply this package with global data, please change -rgn to -glb.
Please note the global resources have not been tested due to lack of data.
After setting the above environmental variables, users can run the vrfyrad.sh using:
./run.ksh
vrfyrad.sh YYYYMMDD HH
Where the two input parameters gives the year, month, day, and hour of the GSI analysis
results that will be used as data resources for the data extraction. For our sample case, we
use:
vrfyrad.sh 20110322 12 .
The script vrfyrad.sh will create a working subdirectory called ./run_radmon under the
directory that runs vrfyrad.sh, copy radiance diagnostic files and bias correction coefficient
files into the working directory ./run_radmon, and call five child scripts to run executables
with names including angle, bcor, bcoef, time, and horiz. These names correspond to the
type of images that can be generated from the files. Examples of all types of plots are
available on the NCEP radiance assimilation monitoring web site given at the beginning of
this section. Here we briefly list the result data from each executable in the following table:
147
Satellite Radiance Data Assimilation
Image Executable name Result variables for certain radiance observation type
type
angle for each FOV and each channel:
radmon_angle.glb
radmon_angle.rgn
angle bias correction coefficient
number of observations
contribution to penalty
summary of o-g without bias correction
summary of total bias correction
summary of o-g with bias correction
summary of fixed angle dependent
summary of lapse rate
summary of (lapse rate)**2
summary of mean
summary of scan angle
summary of cloud liquid water
bcor bias correction term for each channel:
radmon_bcor.glb
radmon_bcor.rgn
number of observations
contribution to penalty
bias and standard deviation of total bias
correction
bias and standard deviation of fixed angle
dependent
bias and standard deviation of lapse rate
bias and standard deviation of (lapse rate)**2
bias and standard deviation of mean
bias and standard deviation of scan angle
bias and standard deviation of cloud liquid
water
bcoef bias correction coefficient related data:
radmon_bcoef.glb
radmon_bcoef.rgn
contribution to penalty from each channel
mass bias correction coefficients for each
channel and predictor
time radiance observation for each channel:
radmon_time.glb
radmon_time.rgn
number of observations
contribution to penalty
summary of o-g without bias correction
summary of total bias correction
summary of o-g with bias correction
horiz horiz.glb.x for each radiance observation profile:
horiz.rgn.x
observations
total bias correction
simulated observation with bias correction
simulated observation without bias correction
148
Satellite Radiance Data Assimilation
After successfully running of each child script in the working directory for each executable,
the result data are copied and stored in subdirectories named ./radmon.[yyyymmdd] under
the directory set up by parameter TANKverf.
All the result data files from the same day are saved under the same directory:
./radmon.[yyyymmdd]. The file starts with a name corresponding to an image type such as:
angle, bcor, bcoef, time, and horiz. For each radiance data type, each image type has three
files:
the binary data file: with a file extension of .ieee_d. The full name of the file
looks like: image-type.radiance_data_type.yyyymmddhh.ieee_d. For
example: angle.mhs_n18.2011032212.ieee_d
the GrADS data descriptor file: with a file extension .ctl. The full name of the
file looks like: image-type.radiance_data_type.ctl. For example:
angle.mhs_n18.ctl
the standard out file from the executable: with name stdout in it. The full
name of the file looks like: image-type.stdout.radiance_data_type.
For example: angle.stdout.mhs_n18
Both the GrADS data descriptor file and standard out file are text files that can be used to
check if the data extraction step completed successfully.
The original image generation portion of the package generates images and transfers them
to a web server. The scripts were only ported and tested to generate the image files,
whereas the image display portion was left up to the users. To generate images from the
files under ./radmon.[yyyymmdd], users need to run scripts in the directory:
./Radiance_Monitor/image_gen/ush. There are many scripts in this directory to generate
different kinds of images. We provide a single control script, run_rgnl.sh, as a starting
point to run most of these scripts.
The script run_rgnl.sh is very simple and only does two things: set up environment
variables and run child scripts to generate the images.
The environmental variables needed by image generation are similar to those for data
extraction. Here is a list of variables at the top of the script run_rgnl.sh and their
explanations:
PDATE=2011032212
export RAD_AREA=rgn
MY_TANKDIR=.../comGSv3.1/run
MY_RADMON=.../comGSIv3.1/util/Radiance_Monitor
export IMGNDIR=${MY_TANKDIR}/imgn
149
Satellite Radiance Data Assimilation
export TANKDIR=${MY_TANKDIR}/verf
export PLOT_WORK_DIR=${MY_TANKDIR}/plot_${RAD_AREA}_rad
Where PDATE indicates the date and time of GSI analysis in YYYYMMDDHH;
RAD_AREA indicates the data source, where rgn is for regional and glb for global; the
MY_TANKDIR is the working directory for both data extraction (has default subdirectory
name ./verf) and the image generation (has default subdirectory name ./imgn); the
MY_RADMON is the path of the radiance monitoring package; PLOT_WORK_DIR is working
directory for plotting. Please note these scripts need to use the result data files and data
descriptor files from the data extraction step, the variable RAD_AREA has to set up with the
same data sources as the data extraction step and TANKDIR has to point to the same path as
the variable TANKverf in the script./Radiance_Monitor/data_extract/scripts/ vrfyrad.sh.
Because the plotting script stops right after image generation, the generated image files are
still in the working directory: PLOT_WORK_DIR . In the above example, the image files are in
the directory: .../comGSIv3.1/run/plot_rgn_rad.
After setting up these parameters, users can start image generation by running:
./run_rgnl.sh
Similar to the data extract step, the script run_rgnl.sh calls five child scripts to plot the
following images for each radiance observation type from the data resources:
Please see the Radiance Assimilation Monitoring web site for detailed examples.
As we discussed before, the generated image files are saved under the plot work directory:
.../comGSIv3.1/run/plot_rgn_rad. Under this directory, we can see many subdirectories
corresponding to the image types. The image file name represents the content of the image.
They all have the format:
radiance-data-type.image-content_frequency-category.png
150
Satellite Radiance Data Assimilation
Please note we only have tested the scripts using a single case, however the scripts
themselves were built based on assimilation cycles with 30 days worth of data files (4
cycles per day). Therefore, the images generated here are mainly for testing on the Linux
platform and can only be used with further modifications according to users application.
For example, users may notice the time series plots are empty because this type of plot
must have at least two files from two GSI analysis cycles.
The adjustment for the users application could involve both the run scripts and the code.
Below are several suggested points that may be adapted by users:
Need to set idhh to the number of cycles accumulated for the calculation (for
example, 720 is the cycle number for 30 day cycles, 4 cycles per day, which is the
GDAS setup) and incr to the cycling interval in hours (for example: 6 for GFS).
151
References
References
2. R. James Purser, Wan-Shu Wu, David F. Parrish, and Nigel M. Roberts, 2003:
Numerical Aspects of the Application of Recursive Filters to Variational Statistical
Analysis. Part I: Spatially Homogeneous and Isotropic Gaussian Covariances.
Monthly Weather Review, 131, 15241535.
3. R. James Purser, Wan-Shu Wu, David F. Parrish, and Nigel M. Roberts, 2003:
Numerical Aspects of the Application of Recursive Filters to Variational Statistical
Analysis. Part II: Spatially Inhomogeneous and Anisotropic General Covariances.
Monthly Weather Review, 131, 15361548.
4. David F. Parrish and John C. Derber, 1992: The National Meteorological Center's
Spectral Statistical-Interpolation Analysis System. Monthly Weather Review, 120,
17471763.
5. R. James Purser, 2005: Recursive Filter Basics. 1st GSI User Orientation. 4th-5th
January 2005. Camp Springs, MD
6. Russ Treadon, 2005: GSI Compilation, Coding, and Updates. 1st GSI User
Orientation. 4th-5th January 2005. Camp Springs, MD
7. John Derber, 2005: minimization and Preconditioning. 1st GSI User Orientation. 4th-
5th January 2005. Camp Springs, MD
8. Wu, Wan-Shu, 2005: Background error for NCEPs GSI analysis in regional mode.
Fourth WMO International Symposium on Assimilation of Observations in
Meteorology and Oceanography, 18-22 April 2005, Prague, Czech Republic
9. Seung-Jae Lee, David F. Parrish, and Wan-Shu Wu, 2005: Near-Surface Data
Assimilation in the NCEP Gridpoint Statistical Interpolation System: Use of Land
Temperature Data and a Comprehensive Forward Model. NOAA/NCEP Office Note
446, 46 pp
152
GSI tools
Under ./util/ bufr_tools, there are many Fortran examples to illustrate basic
BUFR/PrepBUFR file process skills such as encoding, decoding, and appending and a
small C program called ssrc.c to covert Big Endian BUFR file to Little Endian file. For
details of these examples and the BUFR format, please see the BUFR/PrepBUFR Users
Guide, which is free available on-line
http://www.dtcenter.org/com-GSI/BUFR/docs/index.php
The observation BUFR files generated by NCEP (for example, PrepBUFR and BUFR files
from NCEP ftp server or gdas1.t12z.prepbufr.nr in tutorial case) are in Big Endian binary
format. When running the GSI on a Linux or Darwin system, these files must be converted
to Little Endian binary format. The C program ssrc.c under ./util/ bufr_tools is to perform
this conversion.
The program ssrc.c may be compiled using any standard C compiler. After compiling
ssrc.c to generate executable: ssrc.exe, you can byte-swap a Big Endian BUFR/PrepBUFR
file to a Little_Endian file by:
ssrc.exe < name of Big Endian bufr file > name of Little Endian bufr file
In the on-line tutorial, the file little_endian.gdas1.t12z.prepbufr.nr has the same content as
PrepBUFR file as gdas1.t12z.prepbufr.nr, but has been byte-swapped to Little Endian for
use on Linux systems.
Lots of useful information about one observation used in the analysis such as innovation,
observation values, observation error and adjusted observation error, and quality control
information, may be output in diagnostic files. To generate the diagnosis files, users need
to the set namelist variable write_diag to true in namelist section &SETUP. The
write_diag variable has a dimension equal to the number of outer loops in the
minimization plus 1. The first element of write_diag controls the write of a diagnosis
file before the first outer loop (O-B), while the last element controls the write after the last
outer loop (O-A). The remaining elements control the write after the (n-1) outer loop.
For example, if we set the number of outer loops to 2, and set the write_diag namelist
variable to the following:
write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true.,
153
GSI tools
the GSI will write out diagnostic files before the start of the 1st outer loop start (O-B) and
after the completion of the 2nd outer loop finish (O-A). We dont want GSI to write out
diagnosis files after the 1st outer loop because we set write_diag(2)=.false.
This is what we set in our example case described in section 5.2. From this case, we can
see the following diagnosis files generated from the GSI analysis:
diag_amsua_metop-a_anl.2011032212 diag_amsub_n17_anl.2011032212
diag_amsua_metop-a_ges.2011032212 diag_amsub_n17_ges.2011032212
diag_amsua_n15_anl.2011032212 diag_conv_anl.2011032212
diag_amsua_n15_ges.2011032212 diag_conv_ges.2011032212
diag_amsua_n18_anl.2011032212 diag_hirs4_metop-a_anl.2011032212
diag_amsua_n18_ges.2011032212 diag_hirs4_metop-a_ges.2011032212
All files are identified with a filename containing three elements. The first element diag
indicates these are combined diagnostic files. The second element identifies the
observation type (here, conv means conventional observation from prepbufr and
amsua_n15 is radiance observation from AMUS-A in NOAA 15). The last element
identifies which steps of outer loop the files were generated for. Here, anl means the
contents were written after the last outer loop (from write_diag(3)=.true.) and ges
means the contents were written before the first output loop (from write_diag(1)=
.true.).
To help users to read the information from these diagnosis files, we have provided two
Fortran programs in the ./util/Analysis_Utilities/read_diag/ directory:
./make
Note: since information in the GSI include directory is required, the GSI must have been
compiled first.
To run read_diag_conv.exe, two things need to be in the directory along with the
executable. The first is the following namelist file namelist.conv, and the second is the
diagnosis file to be processed.
154
GSI tools
&iosetup
infilename='./diag_conv_anl', : GSI diagnosis file
outfilename='./results_conv_anl', : Text file used to save the content
of diagnosis file
/
The user should copy or link the diagnosis file, for example diag_conv_anl.2011032212
in section 5.2, from the test case directory to the directory with the executable and name it
with the name specified by the entry infilename in the namelist. In this case that would be
diag_conv_anl . With everything in place, then run the executable
./read_diag_conv.exe
The results are placed in the file specified by the outfilename entry in the namelist. In this
case that would be a file results_conv_anl located in the directory where the executable
was run.
&iosetup
infilename='./diag_amsua_n18_ges.2011032212',
outfilename='./results_amsua_n18_ges',
/
Running the executable creates the text file results_amsua_n18_ges specified by the
namelist in the directory read_diag_rad.exe runs.
For the conventional observations, the data is stored in two arrays: rdiagbuf and cdiagbuf.
Their contents are listed as follows:
Cdiagbuf = station id
rdiagbuf(1) = observation type
rdiagbuf(2) = observation subtype
rdiagbuf(3) = observation latitude (degrees)
rdiagbuf(4) = observation longitude (degrees)
rdiagbuf(5) = station elevation (meters)
rdiagbuf(6) = observation pressure (hPa)
rdiagbuf(7) = observation height (meters)
rdiagbuf(8) = observation time (hours relative to analysis time)
rdiagbuf(9) = input prepbufr qc or event mark
rdiagbuf(10) = setup qc or event mark (currently qtflg only)
rdiagbuf(11) = read_prepbufr data usage flag
rdiagbuf(12) = analysis usage flag (1=use, -1=not used)
rdiagbuf(13) = nonlinear qc relative weight
rdiagbuf(14) = prepbufr inverse obs error (K**-1)
rdiagbuf(15) = read_prepbufr inverse obs error (K**-1)
155
GSI tools
The read_diag_conv.exe reads these arrays and outputs important information in the text
file results_conv_anl specified by the user in the &iosetup namelist. Example:
For wind, the last 4 columns are the wind components in the order of: U observation, O-B
for U, V observation, O-B for V.
For radiance observations, the data is stored in two arrays: diagbuf and diagbufchan.
Their contents are listed as following:
156
GSI tools
if (.not.microwave) then
diagbuf(25) = cloud fraction (%)
diagbuf(26) = cloud top pressure (hPa)
else
diagbuf(25) = cloud liquid water (kg/m**2)
diagbuf(26) = total column precip. water (km/m**2)
endif
In the sample output file results_amsua_n18_ges, only the observation location and time
are written in the file. Users can write out other information based on the list.
In section 4.6, we introduced how to check the convergence information in the fort.220 file.
Here, we provide tools to filter this file and to plot the values of the cost function and the
norm of gradient during each iterations.
These tools - one ksh script and one ncl script are in the ./util/plot_cost_grad directory:
To run filter_fort220.ksh, the fort.202 needs to be in the same directory. The script will
filter out the values of the cost function and the norm of gradient at each iteration from
fort.220 into a text file called cost_gradient.txt.
Before making plots, there are two lines in ncl script GSI_cost_gradient.ncl that needs to
be changed based on content of cost_gradient.txt:
Then, in the same directory as the file cost_gradient.txt, run the ncl script:
ncl GSI_cost_gradient.ncl
157
GSI tools
The pdf file GSI_cost_gradient.pdf is created. The pdf file contains plots of the
convergence of the GSI analysis like those in section 4.6.
In Section 4.2, we introduced how to do a single observation test for GSI. Here we provide
users with the ncl scripts to plot the single observation test results.
The main difference between the ARW and NMM core used in GSI is that ARW is on a C
grid, while NMM is on an E grid. GSI_singleobs_nmm.ncl calls fill_nmm_grid2.ncl to
convert the E grid to an A grid for plotting, while GSI_singleobs_arw.ncl itself includes a
C grid to A grid convertor.
These scripts read in the analysis and background fields of temperature (T), U component
of wind (U), V component of wind (V), and moisture (Q) and calculate the difference of
the analysis field minus the background field. Then XY sections (left column) and XZ
sections (right column) are plotted for T, U, V, and Q through the point that has maximum
analysis increment of single observation. Here the default single observation test is T for
NMM and U for ARW. If the user conducts other single observation tests, the
corresponding changes should be made based on the current scripts.
In the same directory are the two ncl scripts Analysis_increment.ncl and
Analysis_increment_nmm.ncl. These are provided to plot the analysis increment for the
ARW NetCDF and NMM NetCDF case. These scripts are very similar the one for the
single observation.
For more information on how to use ncl, please check the NCL website at:
http://www.ncl.ucar.edu/
158
GSI Namelist
The following are lists and explanations of the GSI namelist variables. You can also find
them in the source code gsimod.F90.
159
GSI Namelist
160
GSI Namelist
161
GSI Namelist
162
GSI Namelist
163
GSI Namelist
164
GSI Namelist
165
GSI Namelist
166
GSI Namelist
167
GSI Namelist
168
GSI Namelist
169
GSI Namelist
170
GSI Namelist
171
GSI Namelist
172
GSI Namelist
173
GSI Namelist
174