KEMBAR78
Programming Openresty PDF | PDF | Perl | Regular Expression
0% found this document useful (0 votes)
237 views73 pages

Programming Openresty PDF

Uploaded by

raghumys4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views73 pages

Programming Openresty PDF

Uploaded by

raghumys4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Table

of Contents
Introduction 1.1
Automated Testing 1.2
Introduction 1.2.1
Test::Nginx 1.2.2
Test Suite Layout 1.2.3
Test File Layout 1.2.4
Running Tests 1.2.5
Preparing Tests 1.2.6
Testing Erroneous Cases 1.2.7
Test Modes 1.2.8
Advanced Topics 1.2.9

1
Introduction

Programming OpenResty
This is an official guide on OpenResty programming written by the OpenResty creator. This
book is still in preparation. Please check back often for updates.

The entire Programming OpenResty book, written by Yichun Zhang, is available here. All
content is licensed under the Creative Commons Attribution Non Commercial Share Alike
3.0 license. You can download or browse the rendered book in various different formats on
the GitBook website below.

https://www.gitbook.com/book/openresty/programming-openresty/

The latest source of the book can be found in the following GitHub repository:

https://github.com/openresty/programming-openresty

Pull requests are always welcome.

2
Automated Testing

Automated Testing
Automated testing plays a critical role in software development and maintainance.
OpenResty provides a data-driven test scaffold for writing declarative test cases for NGINX
C modules, Lua libraries, and even OpenResty applications. The test cases are written in a
specification-like format, which is both intuitive to read and write for humans and also easy
to handle for machines. The data-driven approach makes it easy to run the same tests in
wildly different ways that can help expose issues in different scenarios or with different kinds
of external tools.

This chapter introduces the Test::Nginx test scaffold that has been widely used to organize
test suites for almost all the OpenResty components, including the ngx_http_lua module,
most of the lua-resty-* Lua libraries, as well as full-blown business applications like
CloudFlare’s Lua CDN and Lua SSL.

Keywords: Testing, Mocking

3
Introduction

Introduction
OpenResty itself has been relying on automated testing to remain high quality over the
years. As OpenResty core developers, we embrace the test driven development (TDD)
process all the time. An excellent result of our TDD practices over the years is a huge set of
test suites for all the OpenResty components. These test suites are so large as a whole, so
it is impractical to run all the tests thoroughly on a single machine. A relatively large test
cluster is often run on Amazon EC2 to run all these tests in all existing test modes. Lying at
the heart of these test suites is usually the Test::Nginx test scaffold module developed by
the OpenResty team.

The Test::Nginx scaffold provides a generic simple specification language for expressing
and organizing test cases in an intuitive way. It also provides various powerful testing modes
or "engines" to run the tests in various different ways in the hope of exposing bugs in
different settings. It is also supported to extend the test specification language to add custom
abstractions for advanced testing needs, usually found in application-level regression
testing.

Conceptual Roadmap

Overview

4
Test::Nginx

Test::Nginx
Test::Nginx is a test framework that drives test cases written for any code running atop
NGINX, and also, naturally, the NGINX core itself. It is written in Perl because of the rich
testing facilities and toolchain already accumulated in the Perl world for years. Fortunately,
the user does not really need to know Perl for writing test cases atop this scaffold since
Test::Nginx provides a very simple notation to present the test cases in a specification-like

format.

The simple test specification format, or language, used in Test::Nginx is just a dialect of
the more general testing language provided by the Test::Base testing module in the Perl
world. In fact, Test::Nginx is just a subclass of Test::Base in the sense of object-oriented
programming. This means that all the features offered by Test::Base is available in
Test::Nginx and Test::Nginx just provides handy primitives and notations that simplify

testing in the NGINX and OpenResty context. The core idea of Test::Base is so useful that
we have been using testing scaffolds based on Test::Base in many different projects even
including Haskell programs and Linux kernel modules. Test::Nginx is such an example we
created for the NGINX and OpenResty world. Detailed discussion of the Test::Base
framework itself is beyond the scope of this book, but we will introduce the important
features of Test::Base that are inherited by Test::Nginx in the later sections.

Test::Nginx is distributed via CPAN, the Comprehensive Perl Archive Network, just like

most of the other Perl libraries. If you already have perl installed in your system (many
Linux distributions ship with perl by default), then you can install Test::Nginx with the
following simple command:

cpan Test::Nginx

For the first time that the cpan utility is run, you may be prompted to configure the cpan
utility to fit your requirements. If you are unsure about those options, just choose the
automatic configuration option (if available) or just accept all the default settings.

Test::Nginx provides several different testing classes for different user requirements. The

most frequently used one is Test::Nginx::Socket . The rest of this chapter will focus on this
testing class and its subclasses. We will use the names Test::Nginx and
Test::Nginx::Socket interchangeably from now on to mean the Test::Nginx::Socket test

module and its subclasses, unless otherwise specified.

5
Test::Nginx

There is actually another different testing scaffold called Test::Nginx , created


by Maxim Dounin and maintained by the official NGINX team. That testing
module is shipped with the official NGINX test suite and has no relationship with
our Test::Nginx except that both of these are meant to test NGINX related
Note
code. The NGINX team’s Test::Nginx requires the user to directly code in Perl
to convey all the test cases, which means that tests written for their
Test::Nginx are not data driven and requires decent knowledge about Perl
programming.

6
Test Suite Layout

Test Suite Layout


Projects using Test::Nginx to drive their test suites usually have a common directory layout
and common test file name patterns to organize their tests. This makes it easy for the user
to reason about the location of the test suite in a project source tree and the usage of the
tests. It is not really required, however, to use this common convention; it is just highly
recommended.

By convention, such projects have a t/ directory at the root of their source tree where test
files reside in. Each test file contains test cases that are closely related in some way and has
the file extension .t to easily identify themselves as "test files". Below is the directory tree
structure of a real-world test suite inside the headers-more-nginx-module project:

└── t
├── bug.t
├── builtin.t
├── eval.t
├── input-conn.t
├── input-cookie.t
├── input-ua.t
├── input.t
├── phase.t
├── sanity.t
├── subrequest.t
├── unused.t
└── vars.t

When you have many test files, you can also group them further with sub-directories under
t/ . For example, in the lua-nginx-module project, we have sub-directores like 023-

rewrite/ and 024-access/ under its t/ directory.

In essence, each .t file is a Perl script file runnable by either perl or Perl’s universal test
harness tool named prove. We usually use the prove command-line utility to run such .t
files to obtain test results. Although .t files are Perl scripts per se, they usually do not
have much Perl code at all. Instead, all of the test cases are declared as cleanly formatted
"data" in these .t files.

7
Test Suite Layout

The test suite layout convention we use here are also used by the Perl
community for many years. Because Test::Nginx is written in Perl and reuses
Note
Perl’s testing toolchain, it makes sense for us to simply follow that convention in
the NGINX and OpenResty world as well.

8
Test File Layout

Test File Layout


Test files usually have a common file extension, .t , to distinguish themselves from other
types of files in the source tree. Each test file is a Perl script per se. Test::Nginx follows a
special design that decomposes each test file into two main parts: the first part is a very
short prologue that consists of a few lines of Perl code while the second part is a listing of
the test cases in a special data format. These two parts are separated by the following
special line

__DATA__

The perl interpreter or the prove utility stop interpreting the file content as Perl source
code until they see this special line. Everything after this line is treated as data in plain text
that is reachable by the Perl code above this line. The most interesting part of each .t test
file is the stuff after this line, i.e., the data part.

The special __DATA__ notation is a powerful feature of the Perl programming


language that allows embedding arbitrary free-text data in any Perl script files
that can be manipulated by the containing Perl scripts themselves.
Note
Test::Nginx takes advantage of this feature to allow data-driven test case
specifications in a simple format or language that is easily understandable by
everyone, even those without any prior experiences in Perl programming.

The Prologue Part


The first part, i.e., the "prologue" above the __DATA__ line is usually just a few lines of Perl
code. You do not have to know Perl programming to write them down because they are so
simple and seldom or never change. The simplest Perl code prologue is as follows:

use Test::Nginx::Socket 'no_plan';


run_tests();

The first line is just loading the Perl module (or class), Test::Nginx::Socket and passing the
option 'no_plan' to it to disable test plans (we will talk more about test plans in later
chapters and we do not bother worrying about it here). Test::Nginx::Socket is one of the
most popular class in the Test::Nginx test framework. The second line just calls the
run_tests Perl function imported automatically from the Test::Nginx::Socket module to

run all the test cases defined in the data part of the test file (i.e., the things coming after the
__DATA__ line).

9
Test File Layout

There are, however, more complicated prologue parts in many real-world test suites. Such
prologues usually define some special environment variables or Perl variables that can be
shared and referenced in the test cases defined in the "data part", or just call some other
Perl functions imported by the Test::Nginx::Socket module to customize the testing
configurations and behaviors for the current test file. We will return to such fancier prologues
in later sections. They can be very helpful in some cases.

Perl allows function calls to omit the parentheses if the context is unambiguous.
So we may see Perl function calls without parentheses in real-world test files'
Note
prologue part, like run_tests; . We may use such forms in examples presented
in later sections because they are more compact.

The Data Part


The data part is the most important part of any test files powered by Test::Nginx . This is
where test cases reside. It uses a simple specification format to express test cases so that
the user does not use Perl or any other general-purpose languages to present the tests
themselves. This special specification format is an instance of Domain-Specific Languages
(DSL) where the "domain" is defined as testing code running upon or inside NGINX. Use of
a DSL to present test cases open the door of presenting the test cases as data instead of
code. This is also why Test::Nginx is a data-driven testing framework.

The test case specification in the data part is composed by a series of test blocks. Each test
block usually corresponds to a single test case, which has a title, an optional description,
and a series of data sections. The structure of a test block is described by the following
template.

=== title
optional description
goes here...
--- section1
value1 goes
here
--- section2
value2 is
here
--- section3
value3

Block Titles
As we can see, each test block starts with a title line prefixed by three equal sign ( === ). It is
important to avoid any leading spaces at the beginning of the line. The title is mandatory and
is important to describe the intention of the current test case in the most concise form, and

10
Test File Layout

also to identify the test block in the test report when test failures happen. By convention we
put a TEST N: prefix in this title, for instance, TEST 3: test the simplest form . Don’t worry
about maintaining the test ordinal numbers in these titles yourself, we will introduce a
command-line utility called reindex in a later section that can automatically update the
ordinal numbers in the block titles for you.

Block Descriptions
Each test block can carry an optional description right after the block title line. This
description can span multiple lines if needed. It is a more detailed description of the intention
of the test block than the block title and may also give some background information about
the current test. Many test cases just omit this part for convenience.

Data Sections
Every test block carries one or more data sections right after the block description (if any).
Data sections always have a name and a value, which specify any input data fields and the
expected output data fields.

The name of a data section is the word after the line prefix --- . Spaces are allowed though
not syntactically required after --- . We usually use a single space between the prefix and
the section name for aesthetic considerations and we hope that you follow this convention
as well. The section names usually contain just alphanumeric letters and underscore
characters.

Section values are specified in two forms. One is all the lines after the section name line,
before the next section or the next block. The other form is more concise and specifies the
value directly on the same line as the section name, but right after the first colon character
( : ). The latter form requires that the value contains no line-breaks. Any spaces around the
colon are always discarded and never count as a part of the section value; furthermore, the
trailing line-break character in the one-line form does not count either.

If no visible values come after the section name in either form, then the section takes an
empty string value, which is still a defined value, however. On the other hand, omitting the
section name (and value) altogether makes that section undefined.

Test::Nginx offers various pre-defined data section names that can be used in the test

blocks for different purposes. Some data sections are for specifying input data, some are for
expected output, and some for controlling whether the current test block should be run at all.

It is best to explain data sections in a concrete test block example.

11
Test File Layout

=== TEST 1: hello, world


This is just a simple demonstration of the
echo directive provided by ngx_http_echo_module.
--- config
location = /t {
echo "hello, world!";
}
--- request
GET /t
--- response_body
hello, world!
--- error_code: 200

Here we have two input data sections, config and request , for specifying a custom
NGINX configuration snippet in the default server {} and the HTTP request sent by the
test scaffold to the test NGINX server, respectively. In addition, we have one output data
section, response_body , for specifying the expected response body output by the test
NGINX server. If the actual response body data is different from what we specify under the
response_body section, this test case fails. We have another output data section,

error_code , which specifies its value on the same line of the section name. We see that a

colon character is used to separate the section name and values. Obviously, the
error_code section specifies the expected HTTP response status code, which is 200.

Empty lines around data sections are always discarded by Test::Nginx::Socket . Thus the
test block above can be rewritten as below without changing its meaning.

=== TEST 1: hello, world


This is just a simple demonstration of the
echo directive provided by ngx_http_echo_module.

--- config
location = /t {
echo "hello, world!";
}

--- request
GET /t

--- response_body
hello, world!

--- error_code: 200

Some users prefer this style for aesthetic reasons. We are free to choose whatever form you
like.

12
Test File Layout

There are also some special data sections that specify neither input nor output. They are just
used to control how test blocks are run. For example, the ONLY section makes only the
current test block in the current test file run and all the other test blocks are skipped. This is
extremely useful for running an individual test block in any given file, which is a common
requirement while debugging a particular test failure. Also, the special SKIP section can
skip running the containing test block unconditionally, handy for preparing test cases for
future features without introducing any expected test failures. We will visit more such "control
sections" in later sections.

We shall see, in a later section, that the user can define her own data sections or extending
existing ones by writing a little bit of custom Perl code to satisfy her more complicated
testing requirements.

Section Filters
Data sections can take one or more filters. Filters are handy when you want to adjust or
convert the section values in certain ways.

Syntactically, filters are specified right after the section name with at least one space
character as the separator. Multiple filters are also separated by spaces and are applied in
the order they are written.

Test::Nginx::Socket provides many filters for your convenience. Consider the following

data section from the aforementioned test block.

--- error_code: 200

If we want to place the section value, 200, in a separate line, like below,

--- error_code
200

then the section value would contain a trailing new line, which leads to a test failure. This is
because the one-line form always excludes the trailing new-line character while the multi-line
form always includes one. To explicitly exclude the trailing new-line in the multi-line form, we
can employ the chomp filter, as in

--- error_code chomp


200

Now it has exactly the same semantics as the previous one-line form.

13
Test File Layout

Some filters have more dramatic effect on the section values. For instance, the eval filter
evaluates the section value as arbitrary Perl code, and the Perl value resulted from the
execution will be used as the final section value. The following section demonstrates using
the eval filter to produce 4096 a’s:

--- response_body eval


"a" x 4096

The original value of the response_body section above is a Perl expression where the x
symbol is a Perl operator is used to construct a string that repeats the string specified as the
left-hand-side N times where N is specified by the right-hand-side. The resulting 4096-byte
Perl string after evaluating this expression dictated by the eval filter will be used as the
final section value for comparison with the actual response body data. It is obvious that use
of the eval filter and a Perl expression here is much more readable and manageable than
directly pasting that 4096-byte string in the test block.

As with data sections, the user can also define her own filters, as we shall see in a later
section.

A Complete Example
We can conclude this section by a complete test file example given below, with both the
prologue part and the data part.

use Test::Nginx::Socket 'no_plan';

run_tests();

__DATA__

=== TEST 1: hello, world


This is just a simple demonstration of the
echo directive provided by ngx_http_echo_module.
--- config
location = /t {
echo "hello, world!";
}
--- request
GET /t
--- response_body
hello, world!
--- error_code: 200

We will see how to actually run such test files in the next section.

14
Test File Layout

The test file layout described in this section is exactly the same as the test files
based on other test frameworks derived from Test::Base , the superclass of
Test::Nginx::Socket , except those specialized test sections and specialized
Note
Perl functions defined only in Test::Nginx::Socket . All the Test::Base
derivatives share the same basic layout and syntax. They proudly inherit the
same veins of blood.

15
Running Tests

Running Tests
Like most Perl-based testing frameworks, Test:Nginx relies on Perl’s prove command-line
utility to run the test files. The prove utility is usually shipped with the standard perl
distribution so we should already have it when we have perl installed.

Test::Nginx always invokes a real NGINX server and a real socket client to run the tests. It

automatically uses the nginx program found in the system environment PATH . It is your
responsibility to specify the right nginx in your PATH environment for the test suite. Usually
we just specify the path of the nginx program inside the OpenResty installation tree. For
example,

export PATH=/usr/local/openresty/nginx/sbin:$PATH

Here we assume that OpenResty is installed to the default prefix, i.e.,


/usr/local/openresty/ .

You can always use the which command to verify if the PATH environment is indeed set
properly:

$ which nginx
/usr/local/openresty/nginx/sbin/nginx

For convenience, we usually wrap such environment settings in a custom shell script so that
we do not risk polluting the system-wide or account-wide environment settings nor take on
the burden of manually setting the environments manually for every shell session. For
example, I usually have a local bash script named go in each project I work on. A typical
go script might look like below

#!/usr/bin/env bash

export PATH=/usr/local/openresty/nginx/sbin:$PATH

exec prove "$@"

Then we can use this ./go script to substitute the prove utility in any of the subsequent
commands involving prove .

Because Test::Nginx makes heavy use of environment variables for the callers to fine tune
the testing behaviors (as we shall see in later sections), such shell wrapper scripts also
make it easy to manage all these environment variable settings and hard to get things

16
Running Tests

wrong.

Please do not confuse the name of this bash script with Google’s Go
Note
programming language. It has nothing to do with the Go language in any way.

Running A Single File


If you want to run a single test file, say, t/foo.t , then all you need to do is just to type the
following command in your terminal.

prove t/foo.t

Here inside t/foo.t we employ the simple test file example presented in the previous
section. We repeat the content below for the reader’s convenience.

t/foo.t

use Test::Nginx::Socket 'no_plan';

run_tests();

__DATA__

=== TEST 1: hello, world


This is just a simple demonstration of the
echo directive provided by ngx_http_echo_module.
--- config
location = /t {
echo "hello, world!";
}
--- request
GET /t
--- response_body
hello, world!
--- error_code: 200

It is worth mentioning that we could run the following command instead if we have a custom
wrapper script called ./go for prove (as mentioned earlier in this section):

./go foo.t

When everything goes well, it generates an output like this:

17
Running Tests

t/foo.t .. ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.02 usr 0.01 sys + 0.08
cusr 0.03 csys = 0.14 CPU)
Result: PASS

This is a very concise summary. The first line tells you all tests were passed while the
second line gives you a summary of the number of test files (1 in this case), the number of
tests (2 in this case), and the wallclock and CPU times used to run all the tests.

It is interesting to see that we have only one test block in the sample test file but in the test
summary output by prove we see that the number of tests are 2. Why the difference? We
can easily find it out by asking prove to generate a detailed test report for all the individual
tests. This is achieved by passing the -v option (meaning "verbose") to the prove
command we used earlier:

prove -v t/foo.t

Now the output shows all the individual tests performed in that test file:

t/foo.t ..
ok 1 - TEST 1: hello, world - status code ok
ok 2 - TEST 1: hello, world - response_body - response is
expected (req 0)
1..2
ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.01 usr 0.01 sys + 0.07
cusr 0.03 csys = 0.12 CPU)
Result: PASS

Obviously, the first test is doing the status code check, which is dictated by the error_code
data section in the test block, and the second test is doing the response body check,
required by the response_body section. Now the mystery is solved.

It is worth mentioning that the --- error_code: 200 section is automatically assumed when
no error_code section is explicitly provided in the test block. So our test block above can be
simplified by removing the --- error_code: 200 line without affecting the number of tests.

18
Running Tests

This is because that checking 200 response status code is so common that Test::Nginx
makes it the default. If you expect a different status code, like 500, then just add an explicit
error_code section.

From this example, we can see that one test block can contain multiple tests and the
number of tests for any given test block can be determined or predicted by looking at the
data sections performing output checks. This is important when we provide a "test plan"
ourselves to the test file where a "test plan" is the exact number of tests we expect the
current test file to run. If a different number of tests than the plan were actually run, then the
test result would be considered malicious even when all the tests are passed successfully.
Thus, a test plan adds a strong constraint on the total number of tests expected to be run.
For our t/foo.t file here, however, we intentionally avoid providing any test plans by
passing the 'no_plan' argument to the use statement that loads the Test::Nginx::Socket
module. We will revisit the "test plan" feature and explain how to provide one in a later
section.

Running Multiple Files


Running multiple test files is straightforward; just specify the file names on the prove
command line, as in

prove -v t/foo.t t/bar.t t/baz.t

If you want to run all the test files directly under the t/ directory, then using a shell wildcard
can be handy:

prove -v t/*.t

In the case that you have sub-directories under t/ , you can specify the -r option to ask
prove to recursively traverse the while directory tree rooted at t/ to find test files:

prove -r t/

This command is also the standard way to run the whole test suite of a project.

Running Individual Test Blocks


Test::Nginx makes it easy to run an individual test block in a given file. Just add the special

data section ONLY to that test block you want to run individually and prove will skip all the
other test blocks while running that test file. For example,

19
Running Tests

=== TEST 1: hello, world


This is just a simple demonstration of the
echo directive provided by ngx_http_echo_module.
--- config
location = /t {
echo "hello, world!";
}
--- request
GET /t
--- response_body
hello, world!
--- ONLY

Now prove won’t run any other test blocks (if any) in the same test file.

This is very handy while debugging a particular test block. You can focus on one test case at
a time without worrying about other unrelated test cases stepping in your way.

When using the Vim editor, we can quickly insert a --- ONLY line to the test block we are
viewing in the vim file buffer, and then type :!prove % in the command mode of vim without
leaving the editor window. This works because vim automatically expands the special %
placeholder with the path of the current active file being edited. This workflow is great since
you never leave your editor window and you never have to type the title (or other IDs) of
your test block nor the path of the containing test file. You can quickly jump between test
blocks even across different files. Test-driven development usually demands very frequent
interactions and iterations, and Test::Nginx is particularly optimized to speed up this
process.

Sometimes you may forget to remove the --- ONLY line from some test files even after
debugging, this will incorrectly skip all the other tests in those files. To catch such mistakes,
Test::Nginx always reports a warning for files using the ONLY special section, as in

$ prove t/foo.t
t/foo.t .. # I found ONLY: maybe you're debugging?
t/foo.t .. ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.01 usr 0.00 sys + 0.09 cusr 0.03 csys = 0.13 CPU
)
Result: PASS

This way it is much easier to identify any leftover --- ONLY lines.

Similar to ONLY , Test::Nginx also provides the LAST data section to make the containing
test block become the last test block being run in that test file.

20
Running Tests

The special data sections ONLY and LAST are actually features inherited from
Note
the Test::Base module.

Skipping Tests
We can specify the special SKIP data section to skip running the containing test block
unconditionally. This is handy when we write a test case that is for a future feature or a test
case for a known bug that we haven’t had the time to fix right now. For example,

=== TEST 1: test for the future


--- config
location /t {
some_fancy_directive;
}
--- request
GET /t
--- response_body
blah blah blah
--- SKIP

It is also possible to skip a whole test file in the prologue part. Just replace the use
statement with the following form.

use Test::Nginx::Socket skip_all => "some reasons";

Then running the test file gives something like follows.

t/foo.t .. skipped: some reasons

It is also possible to conditionally skip a whole test file but it requires a little bit of
Note Perl programming. Interested readers can try using a BEGIN {} before the
use statement to calculate the value of the skip_all option on the fly.

Test Running Order

Test File Running Order


Test files are usually run by the alphabetical order of their file names. Some people prefer
explicitly controlling the running order of their test files by prefixing the test file names with
number sequences like 001- , 002- , and etc.

21
Running Tests

The test suite of the ngx_http_lua module follows this practice, for example, which has test
file names like below

t/000-sanity.t
t/001-set.t
t/002-content.t
t/003-errors.t
...
t/139-ssl-cert-by.t

Although the prove utility supports running test files in multiple parallel jobs via the -jN
option, Test::Nginx does not really support this mode since all the test cases share exactly
the same test server directory, t/servroot/ , and the same listening ports, as we have
already seen, while parallel running requires strictly isolated running environments for each
individual thread of execution. One can still manually split the test files into different groups
and run each group on a different (virtual) machine or an isolated environment like a Linux
container.

Test Block Running Order


By default, the Test::Nginx scaffold shuffles the test blocks in each file and run them in a
random order. This behavior encourages writing self-contained and independent test cases
and also increases the chance of hitting a bug by actively mutating the relative running order
of the test cases. This may, indeed, confuse new comers, coming from a more traditional
testing platform.

We can always disable this test block shuffling behavior by calling the Perl function,
no_shuffle() , imported by the Test::Nginx::Socket module, before the run_tests() call

in the test file prologue. For example,

use Test::Nginx::Socket 'no_plan';

no_shuffle();
run_tests();

__DATA__
...

With the no_shuffle() call in place, the test blocks are run in the exact same order as their
appearance in the test file.

22
Running Tests

23
Preparing Tests

Preparing Tests
As we have seen in the previous sections, Test::Nginx provides a simple declarative
format to express test cases. Each test case is represented by a test block. A test block
consists of a title, an optional description, and several data sections for specifying inputs and
expected outputs. In this section, we will have a close look at how to prepare such test
cases for different test requirements.

Designing test cases is an art, in many ways. It may, sometimes, take even more time and
effort than implementing the feature to be tested, according to our own experience.
Test::Nginx tries hard to make writing tests as simple as possible but it still cannot

automate the whole test case design process. Only you know exactly what to test and how it
can be tested anyway. This section will focus on the basic primitives provided by
Test::Nginx that you can take advantage of to devise clever and effective test cases.

Preparing NGINX Configuration


In a test block, we can use different data sections to specify our custom snippets in different
positions of the final nginx.conf configuration file generated by Test::Nginx .

The most common one is the config section which is used to insert custom snippets inside
the server {} configuration block for the default test server. We can also use the
http_config section to insert our custom content into the http {} configuration block of

nginx.conf . The main_config section can be used to insert content into the top-level scope

of the NGINX configuration. Let’s consider the following example.

24
Preparing Tests

=== TEST 1:
--- main_config
env MY_ENVIRONMENT;

--- http_config
init_worker_by_lua_block {
print("init")
}

--- config
location = /t {
echo ok;
}

--- request
GET /t
--- response_body
ok

This test block will generate an nginx.conf file with the following basic structure:

...
env MY_ENVIRONMENT;

http {
...

init_worker_by_lua_block {
print("init")
}

server {
...

location = /t {
echo ok;
}
}
}

Please pay attention to how the main_config , http_config , and config data sections'
values are mapped into different locations in the NGINX configuration file.

When in doubt, we can always check out the actual nginx.conf file generated by the test
scaffold at the location t/servroot/conf/nginx.conf in the current working directory (usually
just being the root directory of the current project).

25
Preparing Tests

Test::Nginx generates a new nginx.conf file for each test block, which makes it possible

for each test block to become self-contained. By default, the test scaffold automatically starts
a new NGINX server before running each test block and shuts down the server immediately
after running the block. Fortunately, NGINX is a lightweight server and it is usually very fast
to start and stop. Thus, the test blocks are not that slow to run as it might look.

Preparing Requests
The simplest way to prepare a request is to use the request data section, as in

--- request
GET /t?a=1&b=2

The HTTP/1.1 protocol is used by default. You can explicitly make it use the HTTP/1.0
protocol if desired:

--- request
GET /t?a=1&b=2 HTTP/1.0

Leading spaces or empty lines in the value of the request section are automatically
discarded. You can even add comments by leading them with a # character, as in

--- request

# this is a simple test:


GET /t

You can add some additional request headers at the same time through the more_headers
section as below.

--- request
GET /t
--- more_headers
Foo: bar
Bar: baz

Pipelined Requests
Preparing pipelined HTTP requests are also possible. But you need to use the
pipelined_requests section instead of request . For instance,

26
Preparing Tests

=== TEST 1: pipelined requests


--- config
location = /t {
echo ok;
}

--- pipelined_requests eval


["GET /t", "GET /t"]

--- response_body eval


["ok\n", "ok\n"]

It is worth noting that we use the eval filter with the pipelined_requests section to treat
the literal value of that section as Perl code. This way we can construct a Perl array of the
request strings, which is the expected data format for the pipelined_requests section.
Similarly we need a similar trick for the response_body section when checking outputs. With
an array of expected response body data, we can expect and check different values for
different individual request in the pipeline. Note, however, not every data section supports
the same array-typed value semantics as response_body .

Checking Responses
We have already visited the response_body and error_code data sections for checking the
response body data and response status code, respectively.

The response_body data section always performs an exact whole-string comparison


between the section value and the actual response body. It tries to be clever when long
string value comparison fails. Consider the following sample output from prove .

27
Preparing Tests

t/foo.t .. 1/?
# Failed test 'TEST 1: long string test - response_body -
response is expected (req 0)'
# at .../test-nginx/lib/Test/Nginx/Socket.pm line 1282.
# got: ..."IT 2.x is enabled.\x{0a}\x{0a}"...
# length: 409
# expected: ..."IT 2.x is not enabled.\x{0a}"...
# length: 412
# strings begin to differ at char 400 (line 1 column 400)
# Looks like you failed 1 test of 2.
/tmp/foo.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests

Test Summary Report


-------------------
/tmp/foo.t (Wstat: 256 Tests: 2 Failed: 1)
Failed test: 2
Non-zero exit status: 1
Files=1, Tests=2, 0 wallclock secs (0.01 usr 0.00 sys + 0.09
cusr 0.03 csys = 0.13 CPU)
Result: FAIL

From this test report, we can clearly see that

1. it is the test block with the title TEST 1: long string test that is failing,

2. it is the response_body data section check that fails,

3. the actual response body data is 409 bytes long while the expected value is 412 bytes,
and

4. the expected value has an additional not word in the string fragment IT 2.x is
enabled and the difference starts at the offset 400 in the long string.

Behind the scene, Test::Nginx uses the Perl module Test::LongString to do the long string
comparisons. It is also particularly useful while checking response body data in binary
formats.

If your response body data is in a multi-line textual format, then you may also want to use a
diff -style output when the data does not match. To achieve this, we can call the

no_long_string() Perl function before the run_tests() function call in the prologue part of

the test file. Below is such an example.

28
Preparing Tests

use Test::Nginx::Socket 'no_plan';

no_long_string();

run_tests();

__DATA__

=== TEST 1:
--- config
location = /t {
echo "Life is short.";
echo "Moon is bright.";
echo "Sun is shining.";
}
--- request
GET /t
--- response_body
Life is short.
Moon is deem.
Sun is shining.

Note the no_long_string() call in the prologue part. It is important to place it before the
run_tests() call otherwise it would be too late for it to take effect, obviously.

Invoking the prove utility (or any shell wrappers for it) to run this test file gives the following
details about the test failure:

# Failed test 'TEST 1: - response_body - response is expected


(req 0)'
# at .../test-nginx/lib/Test/Nginx/Socket.pm line 1277.
# @@ -1,3 +1,3 @@
# Life is short.
# -Moon is deem.
# +Moon is bright.
# Sun is shining.
# Looks like you failed 1 test of 2.

It is obvious that the second line of the response body output is different.

You can even further disable the diff -style comparison mode by adding a no_diff() Perl
function call in the prologue part. Then the failure report will look like this:

29
Preparing Tests

# Failed test 'TEST 1: - response_body - response is expected


(req 0)'
# at .../test-nginx/lib/Test/Nginx/Socket.pm line 1277.
# got: 'Life is short.
# Moon is bright.
# Sun is shining.
# '
# expected: 'Life is short.
# Moon is deem.
# Sun is shining.
# '
# Looks like you failed 1 test of 2.

That is, Test::Nginx just gives full listing of the actual response body data and the expected
one without any abbreviations or hand-holding.

Pattern Matching on Response Bodies


When the request body may change in some ways or you just care about certain key words
in a long data string, you can specify a Perl regular expression to do a pattern match against
the actual request body data. This is achieved by the response_body_like data section. For
example,

--- response_body_like: age: \d+

Be careful when you are using the multi-line data section value form. A trailing newline
character appended to your section value may make your pattern never match. In this case
the chomp filter we introduced in an early section can be very helpful here. For example,

--- response_body_like chomp


age: \d+

You can also use the eval filter to construct a Perl regular expression object with a Perl
expression, as in

--- response_body_like eval


qr/age: \d+/

This is the most flexible form to specify a pattern.

30
Preparing Tests

Perl uses the qr quoting structure to explicitly construct regular expression


Note objects. You can use various different quoting forms like qr/…​/ , qr!…​! , qr#…​
# , and qr{…​} .

Checking Response Headers


The response_headers data section can be used to validate response header entries. For
example,

--- response_headers
Foo: bar
Bar: baz
!Blah

This section dictates 3 tests actually:

1. The response header Foo must appear and must take the value bar ;

2. The response header Bar must appear and must take the value baz ; and

3. The response header Blah must not appear or take an empty value.

Checking NGINX Error Logs


In addition to responses, the NGINX error log file is also an important output channel for an
NGINX server setup.

True-False Tests
One immediate testing requirement is to check whether or not a piece of text appears in any
error log messages. Such checks can be done via the data sections error_log and
no_error_log , respectively. The former ensures that some lines in the error log file contain

the string specified as the section value while the latter tests the opposite: ensuring that no
line contains the pattern.

For example,

--- error_log
Hello world from my server

Then the string Hello world from my server (without the trailing new-line) must appear in at
least one line of the NGINX error log. You can specify multiple strings in separate lines of the
section value to perform different checks, for instance,

31
Preparing Tests

--- error_log
This is a dog!
Is it a cat?

Then it performs two error log checks, one is to ensure that the string This is a dog!
appears in some error log lines. The order of these two string patterns do not matter at all.

If one of the string pattern failed to match any lines in the error log file, then we would get a
test failure report from prove like below.

# Failed test 'TEST 1: simple test - pattern "This is a dog!"


matches a line in error.log (req 0)'

If you want to specify a Perl regular expression (regex) as one of the patterns, then you
should use the eval section filter to construct a Perl-array as the section value, as in

--- error_log eval


[
"This is a dog!",
qr/\w+ is a cat\?/,
]

As we have seen earlier, Perl regexes can be constructed via the qr/…​/ quoting syntax.
Perl string patterns in the Perl array specified by double quotes or single quotes are still
treated as plain string patterns, as usual. If the array contains only one regex pattern, then
you can omit the array itself, as in

--- error_log eval


qr/\w+ is a cat\?/

Test::Nginx puts the error log file of the test NGINX server in the file path

t/servroot/logs/error.log . As a test writer, we frequently check out this file directly when

things go wrong. For example, it is common to make mistakes or typos in the patterns we
specify for the error_log section. Also, scanning the raw log file can give us insight about
the details of the NGINX internal working when the NGINX debugging logs are enabled in
the NGINX build.

The no_error_log section is very similar to error_log but it checks the nonexistence of the
string patterns in the NGINX error log file. One of the most frequent uses of the
no_error_log section is to ensure that there is no error level messages in the log file.

32
Preparing Tests

--- no_error_log
[error]

If, however, there is a line in the nginx error log file that contains the string [error] , then
the test fails. Below is such an example.

# Failed test 'TEST 1: simple test - pattern "[error]" should


not match any line in error.log but matches line "2016/02/01
11:59:50 [error] 1788\#0: *1 lua entry thread aborted: runtime
error: content_by_lua(nginx.conf:42):2: bad"'

This is a great way to find the details of the error quickly by just looking at the test report.

Like error_log , this section also supports Perl array values and Perl regex values through
the eval filter.

Grep Tests
The error_log and no_error_log sections are very handy in quickly checking the
appearance of contain patterns in the NGINX error log file. But they have serious limitations
in that it is impossible to impose stronger constraints on the relative order of the messages
containing the patterns nor on the number of their occurrences.

To address such limitations, Test::Nginx::Socket provides an alternative way to check


NGINX error logs in a way similar to the famous UNIX tool, grep . The sections
grep_error_log and grep_error_log_out are used for this purpose. The test writer uses the

grep_error_log section to specify a pattern, with which the test framework scans through

the NGINX error log file and collect all the matched parts of the log file lines along the way,
forming a final result. This aggregated log data result is then matched against the expected
value specified as the value of the grep_error_log_out section, in a similar way as with the
response_body section discussed above.

It is easiest to explain with a simple example.

33
Preparing Tests

=== TEST 1: simple grep test for error logs


--- config
location = /t {
content_by_lua_block {
print("it is matched!")
print("it is matched!")
print("it is matched!")
}
}
--- request
GET /t
--- grep_error_log: it is matched!
--- grep_error_log_out
it is matched!
it is matched!
it is matched!

Here we use the Lua function print() provided by the ngx_http_lua module to generate
NGINX error log messages at the notice level. This test case tests the number of the log
messages containing the string it is matched! . It is important to note that only the
matched part of the log file lines are collected in the final result instead of the whole log
lines. This simplifies the comparison a lot since NGINX error log messages can contain
varying details like timestamps and connection numbers.

A more useful form of this test is to specify a Perl regex pattern in the grep_error_log
section. Consider the following example.

=== TEST 1: simple grep test for error logs


--- config
location = /t {
content_by_lua_block {
print("test: before sleeping...")
ngx.sleep(0.001) -- sleeping for 1ms
print("test: after sleeping...")
}
}
--- request
GET /t
--- grep_error_log eval: qr/test: .*?\.\.\./
--- grep_error_log_out
test: before sleeping...
test: after sleeping...

We specify a Perl regex pattern, test: .*?\.\.\. , here to filter out all the error log
messages starting with test: and ending with …​ . And naturally in this test we also require
the relative order of these two messages, that is, before sleeping must appear before
after sleeping . Otherwise, we shall see failure reports like below:

34
Preparing Tests

# Failed test 'TEST 1: simple grep test for error logs -


grep_error_log_out (req 0)'
# at ..../lib/Test/Nginx/Socket.pm line 1048.
# got: "test: after sleeping...\x{0a}test: before
sleeping...\x{0a}"
# length: 49
# expected: "test: before sleeping...\x{0a}test: after
sleeping...\x{0a}"
# length: 49
# strings begin to differ at char 7 (line 1 column 7)

As with the response_body section, we can also call the no_long_string() Perl function
before run_tests() in the test file prologue, so as to disable the long string output mode
and enable the diff mode. Then the test failure would look like this:

# Failed test 'TEST 1: simple grep test for error logs -


grep_error_log_out (req 0)'
# at .../lib/Test/Nginx/Socket.pm line 1044.
# @@ -1,2 +1,2 @@
# -test: before sleeping...
# test: after sleeping...
# +test: before sleeping...

Obviously, for this test case, the diff format looks better.

Extra Delay Before Log Checks


By default, Test::Nginx::Socket performs the NGINX error log checks not long after it
receives the complete HTTP response for the test request. Sometimes, when the log
messages are generated by the server after sending out the response, the error log checks
may be carried out too early that the messages are not yet written into the log file. In this
case, we can specify an extra delay via the wait data section for the test scaffold to wait for
the error log messages. Here is an example:

35
Preparing Tests

=== TEST 1: wait for the timer


--- config
location = /t {
content_by_lua_block {
local function f(premature)
print("HERE!")
end
assert(ngx.timer.at(0.1, f))
}
}
--- request
GET /t
--- error_log
HERE!
--- no_error_log
[error]
--- wait: 0.12

Here we create a timer via the ngx.timer.at Lua function, which expires after 0.1 seconds.
Due to the asynchronous nature of timers, the request handler does not wait for the timer to
expire and immediately finishes processing the current request and sends out a response
with an empty body. To check for the log message HERE! generated by the timer handler
f , we have to specify an extra delay for the test scaffold to wait. The 0.12 seconds time is

specified in this example but any values larger than 0.1 would suffice. Without the wait
section, this test case would fail with the following output:

# Failed test 'TEST 1: wait for the timer - pattern "HERE!"


matches a line in error.log (req 0)'

Obviously the test scaffold checks the error log too soon, even before the timer handler runs.

Section Review
Test::Nginx::Socket offers a rich set of data sections for specifying various different input

data and expected output data, ranging from NGINX configuration file snippets, test
requests, to expected responses and error log messages. We have already demonstrated
the power of data driven testing and declarative test case crafting. We want to achieve
multiple goals at the same time, that is, not only to make the tests self-contained and highly
readable, but also to make the test report easy to interpret and analyze when some of the
tests fail. Raw files automatically generated by the test scaffold, like
t/servroot/conf/nginx.conf and t/servroot/logs/error.log , should be checked frequently

when manually debugging the test cases. The next section extends the discussion of this
section with a focus on testing erroneous cases.

36
Preparing Tests

37
Testing Erroneous Cases

Testing Erroneous Cases


Most robust software invests heavily on error handling, and naturally test designers focus on
corner cases and erroneous scenarios to maximize code coverage of the tests.

The previous section introduces data sections provided by Test::Nginx::Socket for


examining messages in the NGINX error log file, which is a powerful tool to check for errors
in the tests. Sometimes we want to test more extreme cases like server startup failures,
malformed responses, bad requests, and various kinds of timeout errors.

Expected Server Startup Failures


Sometimes the NGINX server is expected to fail to start, like using an NGINX configuration
directive in the wrong way or some hard prerequisites are not met in early initialization. If we
want to test such cases, especially the error log messages generated for such failures, we
could use the must_die data section in our test block to signal the test scaffold that the
NGINX server is expected to die upon startup in this very block.

The following example tests the case of throwing a Lua exception in the context of
init_by_lua_block of the ngx_http_lua module.

=== TEST 1: dying in init_by_lua_block


--- http_config
init_by_lua_block {
error("I am dying!")
}
--- config
--- must_die
--- error_log
I am dying!

The Lua code in init_by_lua_block runs in the NGINX master process during the NGINX
configuration file loading process. Throwing out a Lua exception there aborts the NGINX
startup process immediately. The occurrence of the must_die section tells the test scaffold
to treat NGINX server startup failures as a test pass while a successful startup as a test
failure. The error_log section there ensures that the server fails in the expected way, that
is, due to the "I am dying!" exception.

If we remove the --- must_die line from the test block above, then the test file won’t even
run to completion:

38
Testing Erroneous Cases

t/a.t .. nginx: [error] init_by_lua error: init_by_lua:2: I am


dying!
stack traceback:
[C]: in function 'error'
init_by_lua:2: in main chunk
Bailout called. Further testing stopped: TEST 1: dying in
init_by_lua_block
- Cannot start nginx using command
"nginx -p .../t/servroot/ -c .../t/servroot/conf/nginx.conf >
/dev/null".

By default the test scaffold treats NGINX server startup failures as fatal errors in running the
tests. The must_die section, however, turns such a failure into a normal test checkup.

Expected Malformed Responses


HTTP responses should always be well-formed, but unfortunately the real world is
complicated and there indeed exists cases where the responses can be malformed, like
being truncated due to some unexpected causes. As a test designer, we always want to test
such strange abnormal cases, among other things.

Naturally, Test::Nginx::Socket treats malformed responses from the NGINX server as an


error since it always does sanity checks on the responses it receives from the test server by
default. But for test cases where we expect a malformed or truncated response sent from
the server, we should explicitly tell the test scaffold to disable the response sanity check via
the ignore_response data section.

Consider the following example that closes the downstream connection immediately after
sending out the first part of the response body.

39
Testing Erroneous Cases

=== TEST 1: aborting response body stream


--- config
location = /t {
content_by_lua_block {
ngx.print("hello")
ngx.flush(true)
ngx.exit(444)
}
}
--- request
GET /t
--- ignore_response
--- no_error_log
[error]

The ngx.flush(true) call in the content_by_lua_block handler is to ensure that any


response body data buffered by NGINX is indeed flushed out to the system socket send
buffers, which also usually means flushing the output data to the client side for local sockets.
Also, the ngx.exit(444) call is used to immediately close the current downstream
connection so it just interrupts the response body stream in the HTTP 1.1 chunked
encoding. The important part is the --- ignore_response line which tells the test scaffold not
to complain about the interrupted response data stream. If the test block above goes without
this line, we will see the following test failure while running prove :

# Failed test 'TEST 1: aborting response body stream - no last


chunk found - 5
# hello
# '

Obviously, the test scaffold complains about the lack of the "last chunk" used to indicate the
end of the chunked encoded data stream. Because the server aborts the connection in the
middle of response body data sending, there is no chance for the server to properly send
well-formed response bodies in the chunked encoding.

Testing Timeout Errors


Timeout errors are one of the most common network issues in the real world. Timeout might
happen due to many reasons, like packet dropping on the wire or on the other end,
connectivity problems, and other expensive operations blocking the event loop. Most of
applications want to ensure they have a timeout protection that prevents them from waiting
for too long.

40
Testing Erroneous Cases

Testing and emulating timeout errors are often tricky in a self-contained unit test framework
since most of the network traffic initiated by the test cases are local only, that is, going
through the local "loopback" device that has perfect latency and throughput. We will examine
some of the tricks that can be used to reliably emulate various different kinds of timeout
errors in the test suite.

Connecting Timeouts
Connecting timeouts in the context of the TCP protocol are easiest to emulate. Just point the
connecting target to a remote address that always drops any incoming ( SYN ) packets via a
firewall rule or something similar. We provide such a "black-hole service" at the port 12345
of the agentzh.org host. You can make use of it if your test running environment allows
public network access. Consider the following test case.

=== TEST 1: connect timeout


--- config
resolver 8.8.8.8;
resolver_timeout 1s;

location = /t {
content_by_lua_block {
local sock = ngx.socket.tcp()
sock:settimeout(100) -- ms
local ok, err = sock:connect("agentzh.org", 12345)
if not ok then
ngx.log(ngx.ERR, "failed to connect: ", err)
return ngx.exit(500)
end
ngx.say("ok")
}
}
--- request
GET /t
--- response_body_like: 500 Internal Server Error
--- error_code: 500
--- error_log
failed to connect: timeout

We have to configure the resolver directive here because we need to resolve the domain
name agentzh.org at request time (in Lua). We check the NGINX error log via the
error_log section for the error string returned by the cosocket object’s connect() method.

It is important to use a relatively small timeout threshold in the test cases so that we do not
have to wait for too long to complete the test run. Tests are meant to be run very often. The
more frequently we run the tests, the more value we may gain from automating the tests.

41
Testing Erroneous Cases

It is worth mentioning that the test scaffold’s HTTP client does have a timeout threshold as
well, which is 3 seconds by default. If your test request takes more than 3 seconds, you get
an error message in the test report:

ERROR: client socket timed out - TEST 1: connect timeout

This message is what we would get if we commented out the settimeout call and relies on
the default 60 second timeout threshold in cosockets.

We could change this default timeout threshold used by the test scaffold client by setting a
value to the timeout data section, as in

--- timeout: 10

Now we have 10 seconds of timeout protection instead of 3.

Reading Timeouts
Emulating reading timeouts is also easy. Just try reading from a wire where the other end
never writes anything but still keeps the connection alive. Consider the following example:

42
Testing Erroneous Cases

=== TEST 1: read timeout


--- main_config
stream {
server {
listen 5678;
content_by_lua_block {
ngx.sleep(10) -- 10 sec
}
}
}
--- config
lua_socket_log_errors off;
location = /t {
content_by_lua_block {
local sock = ngx.socket.tcp()
sock:settimeout(100) -- ms
assert(sock:connect("127.0.0.1", 5678))
ngx.say("connected.")
local data, err = sock:receive() -- try to read a line
if not data then
ngx.say("failed to receive: ", err)
else
ngx.say("received: ", data)
end
}
}
--- request
GET /t
--- response_body
connected.
failed to receive: timeout
--- no_error_log
[error]

Here we use the main_config data section to define a TCP server of our own, listening at
the port of 5678 on the local host. This is a mocked-up server that can establish new TCP
connections but never write out anything and just sleep for 10 second before closing the
session. Note that we are using the ngx_stream_lua module in the stream {} configuration
block. In our location = /t , which is the main target of this test case, connects to our mock
server and tries to read a line from the wire. Apparently the 100ms timeout threshold on the
client side is reached first and we can successfully exercise the error handling code for the
reading timeout error.

Sending Timeouts
Triggering sending timeouts is much harder than connecting and reading timeouts. This is
due to the asynchronous nature of writing.

43
Testing Erroneous Cases

For performance reasons, there exists at least two layers of buffers for writes:

1. the userland send buffers inside the NGINX core, and

2. the socket send buffers in the operating system kernel’s TCP/IP stack implementation

To make the situation even worse, there also at least exists a system-level receive buffer
layer on the other end of the connection.

To make a send timeout error happen, the most naive way is to fill out all these buffers along
the data sending chain while ensuring that the other end never actually reads anything on
the application level. Thus, buffering makes a sending timeout particularly hard to reproduce
and emulate in a typical testing and development environment with a small amount of (test)
payload.

Fortunately there is a userland trick that can intercept the libc wrappers for the actual system
calls for socket I/O and do funny things that could otherwise be very difficult to achieve. Our
mockeagain library implements such a trick and supports emulating timeout errors at user-
specified precise positions in the output data stream.

The following example triggers a sending timeout right after sending out the "hello, world"
string as the response body.

44
Testing Erroneous Cases

=== TEST 1: send timeout


--- config
send_timeout 100ms;
postpone_output 1;

location = /t {
content_by_lua_block {
ngx.say("hi bob!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #1 failed: ", err)
return
end

ngx.say("hello, world!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #2 failed: ", err)
return
end
}
}
--- request
GET /t
--- ignore_response
--- error_log
flush #2 failed: timeout
--- no_error_log
flush #1 failed

Note the send_timeout directive that is used to configure the sending timeout for NGINX
downstream writing operations. Here we use a small threshold, 100ms , to ensure our test
case runs fast and never hits the default 3 seconds timeout threshold of the test scaffold
client. The postpone_output 1 directive effectively turns off the "postpone output buffer" of
NGINX, which may hold our output data before even reaching the libc system call wrappers.
Finally, the ngx.flush() call in Lua ensures that no buffers along the NGINX output filter
chain holds our data without sending downward.

Before running this test case, we have to set the following system environment variables (in
the bash syntax):

export LD_PRELOAD="mockeagain.so"
export MOCKEAGAIN="w"
export MOCKEAGAIN_WRITE_TIMEOUT_PATTERN='hello, world'
export TEST_NGINX_EVENT_TYPE='poll'

Let’s go through them one by one:

45
Testing Erroneous Cases

1. The LD_PRELOAD="mockeagain.so" assignment pre-loads the mockeagain library into the


running processes, including the NGINX server process started by the test scaffold, of
course. You may also need to set the LD_LIBRARY_PATH environment to include the
directory path of the mockeagain.so file if the file is not in the default system library
search paths.

2. The MOCKEAGAIN="w" assignment enables the mockeagain library to intercept and do


funny things about the writing operations on nonblocking sockets.

3. The MOCKEAGAIN_WRITE_TIMEOUT_PATTERN='hello, world' assignment makes mockeagain


refuse to send more data after seeing the specified string pattern, hello, world , in the
output data stream.

4. The TEST_NGINX_EVENT_TYPE='poll' setting makes NGINX server uses the poll event
API instead of the system default (being epoll on Linux, for example). This is because
mockeagain only supports poll events for now. Behind the scene, this environment

just makes the test scaffold generate the following nginx.conf snippet.

events {
use poll;
}

You need to ensure, however, that your NGINX or OpenResty build has the poll
support compiled in. Basically, the build should have the ./configure option --with-
poll_module .

We have plans to add epoll edge-triggering support to mockeagain in the future.


Hopefully by that time we do not have to use poll at least on Linux.

Now you should get the test block above passed!

Ideally, we could set these environments directly inside the test file because this test case
will never pass without these environments anyway. We could add the following Perl code
snippet to the very beginning of the test file prologue (yes, even before the use statement):

BEGIN {
$ENV{LD_PRELOAD} = "mockeagain.so";
$ENV{MOCKEAGAIN} = "w";
$ENV{MOCKEAGAIN_WRITE_TIMEOUT_PATTERN} = 'hello, world';
$ENV{TEST_NGINX_EVENT_TYPE} = 'poll';
}

The BEGIN {} block is required here because it runs before Perl loads any modules,
especially Test::Nginx::Socket , in which we want these environments to take effect.

46
Testing Erroneous Cases

It is a bad idea, however, to hard-code the path of the mockeagain.so file in the test file itself
since different test runners might put mockeagain in different places in the file system. Better
let the test runner configure the LD_LIBRARY_PATH environment containing the actual library
path from outside.

Mockeagain Troubleshooting

If you are seeing the following error while running the test case above,

ERROR: ld.so: object 'mockeagain.so' from LD_PRELOAD cannot be


preloaded (cannot open shared object file): ignored.

then you should check whether you have added the directory path of your mockeagain.so
library to the LD_LIBRARY_PATH environment. On my system, for example, I have

export LD_LIBRARY_PATH=$HOME/git/mockeagain:$LD_LIBRARY_PATH

If you are seeing an error similar to the following,

nginx: [emerg] invalid event type "poll" in


.../t/servroot/conf/nginx.conf:76

then your NGINX or OpenResty build does not have the poll module compiled in. And you
should rebuild your NGINX or OpenResty by passing the --with-poll_module option to the
./configure command line.

We will revisit the mockeagain library in the Test Modes section soon.

Mocking Bad Backend Responses


Earlier in this section we have already seen examples that uses the ngx_stream_lua module
to mock a backend TCP server that accepts new incoming connections but never writes
anything back. We could of course do fancier things in such a mocked server like emulating
a buggy or malicious backend server that returns bad response data.

For example, while testing a Memcached client, it would be pretty hard to emulate erroneous
error responses or ill-formed responses with a real Memcached server. Now it is trivial with
mocking:

47
Testing Erroneous Cases

=== TEST 1: get() results in an error response


--- main_config
stream {
server {
listen 1921;
content_by_lua_block {
ngx.print("SERVER_ERROR\r\n")
}
}
}
--- config
location /t {
content_by_lua_block {
local memcached = require "resty.memcached"
local memc = memcached:new()

assert(memc:connect("127.0.0.1", 1921))

local res, flags, err = memc:get("dog")


if not res then
ngx.say("failed to get: ", err)
return
end

ngx.say("get: ", res)


memc:close()
}
}
--- request
GET /t
--- response_body
failed to get: SERVER_ERROR
--- no_error_log
[error]

Our mocked-up Memcached server can behave in any way that we like. Hooray!

Test::Nginx::Socket provides the data sections tcp_listen , tcp_query ,


tcp_reply , and etc to enable the builtin mocked TCP server of the test
scaffold. You can use this facility when you do not want to depend on the
ngx_stream_lua module or the NGINX stream subsystem for your test suite.
Note Indeed, we were solely relying on the builtin TCP server of
Test::Nginx::Socket before the ngx_stream_lua module was born. Similarly,
Test::Nginx::Socket offers a builtin UDP server via the data sections
udp_listen , udp_query , udp_reply , and etc. You can refer to the official
documentation of Test::Nginx::Socket for more details.

Emulating Bad Clients

48
Testing Erroneous Cases

The Test::Nginx::Socket test framework provides special data sections to help emulating
ill-behaved HTTP clients.

Crafting Bad Requests


The raw_request data section can be used to specify whatever data for the test request. It
is often used with the eval section filter so that we can easily encode special characters
like \r . Let’s look at the following example.

=== TEST 1: missing the Host request header


--- config
location = /t {
return 200;
}
--- raw_request eval
"GET /t HTTP/1.1\r
Connection: close\r
\r
"
--- response_body_like: 400 Bad Request
--- error_code: 400

So we easily construct a malformed request that does not have a Host header, which
results in a 400 response from the NGINX server, as expected.

The request data section we have been using so far, on the other hand, always ensures
that a well-formed HTTP request is sent to the test server.

Emulating Client Aborts


Client aborts are a very intriguing phenomenon in the web world. Sometimes we want the
server to continue processing even after the client aborts the connection; on other occasions
we just want to abort the whole request handler immediately in such cases. Either way, we
need robust way to emulate client aborts in our unit test cases.

We have already discussed the timeout data section that can be used to adjust the default
timeout protection threshold used by the test scaffold client. We could also use it to abort the
connection prematurely. A small timeout threshold is often desired for this purpose. To
suppress the test scaffold from printing out an error on client timeout, we can specify the
abort data section to signal the test scaffold. Let’s put these together in a simple test case.

49
Testing Erroneous Cases

=== TEST 1: abort processing in the Lua callback on client aborts


--- config
location = /t {
lua_check_client_abort on;

content_by_lua_block {
local ok, err = ngx.on_abort(function ()
ngx.log(ngx.NOTICE, "on abort handler called!")
ngx.exit(444)
end)

if not ok then
error("cannot set on_abort: " .. err)
end

ngx.sleep(0.7) -- sec
ngx.log(ngx.NOTICE, "main handler done")
}
}
--- request
GET /t
--- timeout: 0.2
--- abort
--- ignore_response
--- no_error_log
[error]
main handler done
--- error_log
client prematurely closed connection
on abort handler called!

In this example, we make the test scaffold client abort the connection after 0.2 seconds via
the timeout section. Also we prevent the test scaffold from printing out the client timeout
error by specifying the abort section. Finally, in the Lua application code, we checks for
client abort events by turning on the lua_check_client_abort directive and aborts the server
processing by calling ngx.exit(444) in our Lua callback function registered by the
ngx.on_abort API.

Clients Never Closing Connections


Unlike most well-formed HTTP clients in the market, the HTTP client used by
Test::Nginx::Socket never actively closes the connection unless a timeout error happens

(exceeding the timeout threshold as specified by the --- timeout section). This can ensure
the NGINX server always actually closes the connection when the request specifies the
"Connection: close" request header.

50
Testing Erroneous Cases

When the server does not close the connection, there is a "connection leak" bug on the
server side. For example, NGINX uses reference counting (in r→main→count ) in its HTTP
subsystem to determine whether a request can be closed and freed. When there is an error
in this reference counting, NGINX may never close the request, leading to resource leaks. In
such cases, the corresponding test cases always fail with a client-side timeout error, for
instance,

# Failed test 'ERROR: client socket timed out - TEST 1: foo


# '

Obviously Test::Nginx::Socket is a malicious HTTP client by default in this aspect. This is


also why our test scaffold avoids using a well-formed HTTP client library itself. Most test
suite is focusing on extreme and erroneous cases anyway and well-formed HTTP clients
help hiding problems instead of exposing them.

51
Test Modes

Test Modes
One unique feature of Test::Nginx is that it allows running the same test suite in wildly
different ways, or test modes, by just configuring some system environment variables.
Different test modes have different focuses and may find different categories of bugs or
performance issues in the applications being tested. The data driven nature of the test
framework makes it easy to add new test modes without changing the user test files at all.
And it is also possible to combine different test modes to form new (hybrid) test modes. The
capability of running the same test suite in many different ways helps squeezing more value
out of the tests we already have.

This section will iterate through various different test modes supported by
Test::Nginx::Socket and their corresponding system environment variables used to enable

or control them.

Benchmark Mode
Test::Nginx has built-in support for performance testing or benchmarking. It can invoke

external load testing tools like ab and weighttp to load each test case as hard as
possible.

To enable this benchmark testing mode, you can specify the TEST_NGINX_BENCHMARK system
environment variable before running the prove command. For example,

export TEST_NGINX_BENCHMARK='2000 2'


prove t/foo.t

This will run all the test cases in t/foo.t in benchmark mode. In particular, the first number,
2000 in the environment variable value indicates the total number of requests used to flood

the server while the second number, 2 , means that the number of concurrent connections
the client will use.

If the test case uses an HTTP 1.1 request (which is the default), then the test scaffold will
invoke the weighttp tool. If it is an HTTP 1.0 request, then the test scaffold invokes the ab
tool.

This test mode requires the unbuffer command-line utility from the expect package, as
well as the ab and weighttp load testing tools. On Ubuntu/Debian systems, we can install
most of the dependencies with the command

52
Test Modes

sudo apt-get install expect apache2-utils

You may need to build and install weighttp from source on Ubuntu/Debian yourself due to
the lack of the Debian package.

For the Mac OS X system, on the other hand, we can use homebrew to install it like this:

brew install expect weighttp

Now let’s consider the following example.

t/hello.t

use Test::Nginx::Socket 'no_plan';

run_tests();

__DATA__

=== TEST 1: hello world


--- config
location = /hello {
return 200 "hello world\n";
}
--- request
GET /hello
--- response_body
hello world

Then we run this test file in the benchmark mode, like this:

export TEST_NGINX_BENCHMARK='200000 2'


prove t/hello.t

The output should look like this:

53
Test Modes

t/hello.t .. TEST 1: hello world


weighttp -c2 -k -n200000 http://127.0.0.1:1984/hello
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 2 concurrent requests, 200000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 2 sec, 652 millisec and 752 microsec, 75393 req/s,


12218 kbyte/s
requests: 200000 total, 200000 started, 200000 done, 200000
succeeded, 0 failed, 0 errored
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 33190005 bytes total, 30790005 bytes http, 2400000
bytes data
t/hello.t .. ok
All tests successful.
Files=1, Tests=2, 3 wallclock secs ( 0.01 usr 0.00 sys + 0.33
cusr 1.47 csys = 1.81 CPU)
Result: PASS

The most important line in this:

finished in 2 sec, 652 millisec and 752 microsec, 75393 req/s,


12218 kbyte/s

We can see that this test case can achieve 75393 requests per second and 12218 KB per
second. Not bad for a single NGINX worker process!

54
Test Modes

It is also important to keep an eye on failed requests. We surely do not care about the
performance of error pages. We can get the number of error responses by checking the
following output lines:

requests: 200000 total, 200000 started, 200000 done, 200000


succeeded, 0 failed, 0 errored
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx

We are glad to see that all our requests succeeded in this run.

If we want to benchmark the performance of multiple NGINX worker processes so as to


utilize multiple CPU cores, then we can add the following lines to the test file prologue,
before the line run_tests() :

master_on();
workers(4);

This way we can have 4 NGINX worker processes sharing the load.

Behind the scenes, the test scaffold assembles the command line involving weighttp from
the test block specification, in this case, the command line looks like this:

weighttp -c2 -k -n200000 http://127.0.0.1:1984/hello

There exists complicated cases, however, where the test scaffold fails to derive the exact
command line equivalent.

We can also enforce HTTP 1.0 requests in our test block by appending the "HTTP/1.0" string
to the value of the --- request section:

--- request
GET /hello HTTP/1.0

In this case, the test scaffold will invoke the ab tool to flood the matching HTTP 1.0
request. The output might look like this:

t/hello.t .. TEST 1: hello world


ab -r -d -S -c2 -k -n200000 http://127.0.0.1:1984/hello
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
http://www.zeustech.net/

55
Test Modes

Licensed to The Apache Software Foundation,


http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)


Completed 20000 requests
Completed 40000 requests
Completed 60000 requests
Completed 80000 requests
Completed 100000 requests
Completed 120000 requests
Completed 140000 requests
Completed 160000 requests
Completed 180000 requests
Completed 200000 requests
Finished 200000 requests

Server Software: openresty/1.9.15.1


Server Hostname: 127.0.0.1
Server Port: 1984

Document Path: /hello


Document Length: 12 bytes

Concurrency Level: 2
Time taken for tests: 3.001 seconds
Complete requests: 200000
Failed requests: 0
Keep-Alive requests: 198000
Total transferred: 33190000 bytes
HTML transferred: 2400000 bytes
Requests per second: 66633.75 [#/sec] (mean)
Time per request: 0.030 [ms] (mean)
Time per request: 0.015 [ms] (mean, across all concurrent
requests)
Transfer rate: 10798.70 [Kbytes/sec] received

Connection Times (ms)


min avg max
Connect: 0 0 1

56
Test Modes

Processing: 0 0 132
Waiting: 0 0 132
Total: 0 0 132
t/hello.t .. ok
All tests successful.
Files=1, Tests=2, 4 wallclock secs ( 0.02 usr 0.00 sys + 0.51
cusr 1.39 csys = 1.92 CPU)
Result: PASS

The most important output lines, in this case, are

Failed requests: 0
Requests per second: 66633.75 [#/sec] (mean)
Transfer rate: 10798.70 [Kbytes/sec] received

Different hardware and operating systems may lead to very different results. Therefore, it
generally does not make sense at all to directly compare numbers obtained from different
machines and systems.

Clever users can write some external scripts to record and compare these numbers across
different runs, so as to keep track of performance changes in the web server or application.
Such comparison scripts must take into account any measurement errors and any
disturbances from other processes running in the same system.

Performance benchmark is a large topic and we gives it a more detailed treatment in a


dedicated chapter.

HUP Reload Mode


By default, the test scaffold always starts a fresh instance of the NGINX server right before
running each individual test block and stops the server right after the checks of the current
test block are all done. This ensures that there is no side effects among test blocks,
especially those running successively. But it can also be desired to ensure everything also
works fine when the NGINX server is just reloading its configurations without a full server
restart. Such configuration reloading is usually done via sending the HUP signal to the
master process of NGINX. So we usually call it "HUP reload".

57
Test Modes

On some non-UNIX-style operating systems like Microsoft Windows, there is no


such things as signals. In such platforms, NGINX users usually use the -s
reload command-line option of the nginx executable to do the same thing. It
Note should be noted, however, the use of the -s reload option has one side effect
that can be annoying: it loads the nginx configuration twice instead of just once,
which may incur unnecessary initialization overhead. Therefore, we should
always use the HUP signal instead of -s reload whenever possible.

One example of OpenResty features that behaves different upon HUP reload than server
restart is the shared dictionary mechanism (lua_shared_dict) that does not wipe out any
existing data in the shared memory storage during HUP reload. When testing this feature or
application code relying on this feature, it is wise to test how it behaves upon HUP reload.
We saw in the past that some 3rd-party NGINX C modules dealing with shared memory, for
example, have bugs across HUP reloads, like nasty memory leaks.

Test::Nginx has built-in support for the HUP reload test mode, which can be enabled by

specifying the TEST_NGINX_USE_HUP=1 environment:

export TEST_NGINX_USE_HUP=1

Then we can run our existing test suite as usual but now HUP signal is used by the test
scaffold to reload the NGINX configuration specified by different test blocks. The NGINX
server will only be automatically shut down when the test harness finishes running each test
file.

We can even avoid the automatic server shutdown behavior upon test file
Note completion by specifying the TEST_NGINX_NO_CLEAN=1 environment. See the later
section Manual Debugging Mode for more details.

UNIX signals like HUP usually work asynchronously. Thus, there is a delay between the test
scaffold finishes sending the HUP signal to the NGINX server and the NGINX server forks
off a new worker process using the newly loaded configuration and starts accepting new
connections with the new worker. For this reason, there is a (small) chance that the request
of a test block is served by an NGINX worker process still using the configuration specified
by the previous test block. Although Test::Nginx tries hard to wait as long as it can with
some simple heuristics, some test blocks may still experience some intermittent test failures
due to the mismatch of the NGINX configuration. Be prepared for such false positives when
using the HUP reload testing mode. This is also one of the reasons why the HUP reload
mode is not the default. We hope this issue can be further improved in the future.

Another limitation with the HUP reload mode is that HUP reloads only happen upon test
block boundaries. There are cases where it is desired to issue HUP reload in the middle of a
test block. We can achieve that by using some custom Lua code in your test block to send a

58
Test Modes

HUP signal yourself, as in

local f = assert(io.open("t/servroot/logs/nginx.pid", "r"))


local master_pid = assert(f:read())
assert(f:close())
assert(os.execute("kill -HUP " .. master_pid) == 0)

Valgrind Mode
One of the biggest enemies in web servers or web applications that are supposed to run in a
24x7 manner is memory issues. Memory issues include memory leaks, memory invalid
reads (like reading beyond the buffer boundary), and memory invalid writes (like buffer
overflow). In case of memory leaks, the processes can take up more and more memory in
the system and eventually exhaust all the physical memory available, leading to
unresponsive systems or triggering the system to start killing processes with force. Memory
invalid accesses, on the other hand, can lead to process crashes (like segmentation faults),
or worse, leading to nondeterminism in the process' s behavior (like giving out wrong
results).

Valgrind is a powerful tool for programmers to detect a wide range of memory issues,
including many memory leaks and many memory invalid accesses. This is usually for
debugging lower level code like the OpenResty core (including the NGINX core), the Lua or
LuaJIT VM, as well as those Lua libraries involved with C and/or FFI. Plain Lua code without
using FFI is considered "safe" and is not subject to most of the memory issues.

Plain Lua code without using FFI can still contain bugs that result in memory
leaks, like inserting new keys into a globally shared Lua table without control or
Note appending a string to a global Lua string infinitely. Such memory leaks,
however, cannot be detected by Valgrind since it is managed by Lua or LuaJIT’s
garbage collector.

Test::Nginx provides a testing mode that can automatically use Valgrind to run the existing

tests and check if there is any memory issues that can be caught by Valgrind. This test
mode is called "Valgrind mode". To enable this mode, just set the environment
TEST_NGINX_USE_VALGRIND , as in

export TEST_NGINX_USE_VALGRIND=1

Then just run the test files as usual.

Let’s consider the following example.

59
Test Modes

=== TEST 1: C strlen()


--- config
location = /t {
content_by_lua_block {
local ffi = require "ffi"
local C = ffi.C

if not pcall(function () return C.strlen end) then


ffi.cdef[[
size_t strlen(const char *s);
]]
end

local buf = ffi.new("char[3]", {48, 49, 0})


local len = tonumber(C.strlen(buf))
ngx.say("strlen: ", len)
}
}
--- request
GET /t
--- response_body
strlen: 2
--- no_error_log
[error]

Here we use the ffi.new API to allocate a C string buffer of 3 bytes long and initialize the
buffer with the bytes 48, 49, and 0, in the decimal ASCII code. Then we call the standard C
function strlen via the ffi.C API with our C string buffer.

It is worth noting that we need to first declare the strlen function prototype via the
ffi.cdef API. Since we declare the C function in the request handler

( content_by_lua_block ), we should only declare it once instead of upon every request. To


achieve that, we use a Lua if statement to check if the symbol strlen is already
declared (when strlen is not declared or defined, the Lua expression C.strlen would
throw out a Lua exception, which can make the pcall call fail).

This example contains no memory issues since we properly initialize our C string buffer by
setting the null terminator character ( \0 ) at end of our C string. The C function strlen
should correctly report back the length of the string, which is 2 , without reading beyond our
buffer boundary. Now we run this test file with the Valgrind mode enabled using the default
OpenResty installation’s nginx :

export TEST_NGINX_USE_VALGRIND=1
export PATH=/usr/local/openresty/nginx/sbin:$PATH

prove t/a.t

60
Test Modes

There should be a lot of output. The first few lines should look like this:

t/a.t .. TEST 1: C strlen()


==7366== Invalid read of size 4
==7366== at 0x546AE31: str_fastcmp (lj_str.c:57)
==7366== by 0x546AE31: lj_str_new (lj_str.c:166)
==7366== by 0x547903C: lua_setfield (lj_api.c:903)
==7366== by 0x4CAD18: ngx_http_lua_cache_store_code
(ngx_http_lua_cache.c:119)
==7366== by 0x4CAB25: ngx_http_lua_cache_loadbuffer
(ngx_http_lua_cache.c:187)
==7366== by 0x4CB61A: ngx_http_lua_content_handler_inline
(ngx_http_lua_contentby.c:300)

Ouch! Valgrind reports a memory invalid read error. Fortunately it is just a false positive due
to the optimization inside the LuaJIT VM when it is trying to create a new Lua string. The
LuaJIT code repository maintains a file named lj.supp that lists all the known Valgrind false
positives that can be used to suppress these messages. We can simply copy that file over
and rename it to valgrind.suppress in the current working directory. Then Test::Nginx will
automatically feed this valgrind.suppress file into Valgrind while running the tests in
Valgrind mode. Let’s try that:

cp -i /path/to/luajit-2.0/src/lj.supp ./valgrind.suppress
prove t/a.t

This time, the test scaffold is calmed:

t/a.t .. TEST 1: C strlen()


t/a.t .. ok
All tests successful.
Files=1, Tests=3, 2 wallclock secs ( 0.01 usr 0.00 sys + 1.51
cusr 0.06 csys = 1.58 CPU)
Result: PASS

We might encounter other Valgrind false positives like some of those in the NGINX core or
the OpenSSL library. We can add those to the valgrind.suppress file as needed. The
Test::Nginx test scaffold always outputs suppression rules that can be added directly to the

suppression file. For the example above, the last few lines of the output are like below.

61
Test Modes

{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:str_fastcmp
fun:lj_str_new
fun:lua_setfield
fun:ngx_http_lua_cache_store_code
fun:ngx_http_lua_cache_loadbuffer
fun:ngx_http_lua_content_handler_inline
fun:ngx_http_core_content_phase
fun:ngx_http_core_run_phases
fun:ngx_http_process_request
fun:ngx_http_process_request_line
fun:ngx_epoll_process_events
fun:ngx_process_events_and_timers
fun:ngx_single_process_cycle
fun:main
}
t/a.t .. ok
All tests successful.
Files=1, Tests=3, 2 wallclock secs ( 0.01 usr 0.00 sys + 1.47
cusr 0.07 csys = 1.55 CPU)
Result: PASS

The suppression rule generated is the stuff between the curly braces (including the curly
braces themselves):

62
Test Modes

{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:str_fastcmp
fun:lj_str_new
fun:lua_setfield
fun:ngx_http_lua_cache_store_code
fun:ngx_http_lua_cache_loadbuffer
fun:ngx_http_lua_content_handler_inline
fun:ngx_http_core_content_phase
fun:ngx_http_core_run_phases
fun:ngx_http_process_request
fun:ngx_http_process_request_line
fun:ngx_epoll_process_events
fun:ngx_process_events_and_timers
fun:ngx_single_process_cycle
fun:main
}

We could have simply copied and pasted this rule into the valgrind.suppress file. It is worth
mentioning however, we can make this rule more general to exclude the C function frames
belonging to the NGINX core and the ngx_lua module (near the bottom of the rule) since this
false positive is related to LuaJIT only.

Let’s continue our experiment with our current example. Now we edit our test case and
change the following line

local buf = ffi.new("char[3]", {48, 49, 0})

to

local buf = ffi.new("char[3]", {48, 49, 50})

That is, we replace the null character (with ASCII code 0) to a non-null character whose
ASCII code is 50. This change makes our C string buffer lacks any null terminators and thus
calling strlen on it will result in memory reads beyond our buffer boundary.

Unfortunately running this edited test file fail to yield any Valgrind error reports regarding this
memory issue:

63
Test Modes

t/a.t .. TEST 1: C strlen()


t/a.t .. 1/?
# Failed test 'TEST 1: C strlen() - response_body - response
is expected (repeated req 0, req 0)'
# at /home/agentzh/git/lua-nginx-module/../test-
nginx/lib/Test/Nginx/Socket.pm line 1346.
# got: "strlen: 4\x{0a}"
# length: 10
# expected: "strlen: 2\x{0a}"
# length: 10
# strings begin to differ at char 9 (line 1 column 9)
# Looks like you failed 1 test of 3.

The response body check fails as expected. This time strlen returns 4, which is larger
than our buffer size, 3. This is a clear indication of memory buffer over-read. So why does
Valgrind fail to catch this?

To answer this question, we need some knowledge about how LuaJIT allocates memory. By
default, LuaJIT uses its own memory allocator atop the system allocator (usually provided by
the standard C library). For performance reasons, LuaJIT pre-allocates large memory blocks
than request. Because Valgrind has no knowledge about LuaJIT’s own allocator and Lua
user-level buffer boundary definitions, it can be cheated and can get confused.

To remove this limitation, we can enforce LuaJIT to use the system allocator instead of its
own. To achieve this, we need build LuaJIT with special compilation options like below.

make CCDEBUG=-g XCFLAGS='-DLUAJIT_USE_VALGRIND -DLUAJIT_USE_SYSMALLOC'

The most important option is -DLUAJIT_USE_SYSMALLOC which forces LuaJIT to use the system
allocator. The other options are important for our debugging as well, for example, the
CCDEBUG=-g option is to enable debug symbols in the LuaJIT binary while -

DLUAJIT_USE_VALGRIND enables some other special collaborations with Valgrind inside the

LuaJIT VM.

If we are using the OpenResty bundle, we can simply build another special version of
OpenResty like below:

64
Test Modes

./configure \
--prefix=/opt/openresty-valgrind \
--with-luajit-xcflags='-DLUAJIT_USE_VALGRIND -
DLUAJIT_USE_SYSMALLOC' \
--with-debug \
-j4
make -j4
sudo make install

This will build and install a special debug version of OpenResty for Valgrind checks to the file
system location /opt/openresty-valgrind .

There is some other LuaJIT special build options that can further help us, like -
Note DLUA_USE_APICHECK and -DLUA_USE_ASSERT . But they are beyond the scope of our
current example.

Now let’s try running our previous buggy example with this special OpenResty and Valgrind:

export TEST_NGINX_USE_VALGRIND=1
export PATH=/opt/openresty-valgrind/nginx/sbin:$PATH

prove t/a.t

This time Valgrind succeeds in catching the memory bug!

t/a.t .. TEST 1: C strlen()


==8128== Invalid read of size 1
==8128== at 0x4C2BC34: strlen (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==8128== by 0x5467217: lj_vm_ffi_call (in
/opt/luajit21sysm/lib/libluajit-5.1.so.2.1.0)
==8128== by 0x54B5DE7: lj_ccall_func (lj_ccall.c:1136)
==8128== by 0x54CAD45: lj_cf_ffi_meta___call (lib_ffi.c:230)
==8128== by 0x5465147: lj_BC_FUNCC (in
/opt/luajit21sysm/lib/libluajit-5.1.so.2.1.0)
==8128== by 0x4C72BC: ngx_http_lua_run_thread
(ngx_http_lua_util.c:1015)
==8128== by 0x4CB039: ngx_http_lua_content_by_chunk
(ngx_http_lua_contentby.c:120)
...

65
Test Modes

We omit the rest of the output for brevity. Here Valgrind reports an invalid read of one byte of
memory in the C function strlen , which is exactly what we’d expect. Mission
accomplished!

LuaJIT built with the system allocator should only be used with Valgrind only. On
Note
computer architectures like x86_64, such LuaJIT may not even start up.

From this example, we can see how application-level memory allocation optimizations and
management can compromise the effectiveness of Valgrind’s memory issue detection.
Similarly, the NGINX core also comes with its own memory allocator via "memory pools".
Such memory pools tend to allocate page-sized memory blocks for small allocations and
thus can also inversely affect Valgrind' s detection. OpenResty provides a patch for the
NGINX core to disable the memory pool optimizations altogether. The easiest way to use the
patch is to specify the --with-no-pool-patch option when running the ./configure script
while building OpenResty.

Since NGINX 1.9.13, NGINX provides a C macro NGX_DEBUG_PALLOC which


when set can be used to achieve similar effect as OpenResty’s "no-pool patch".
Note
But still the "no-pool patch" is much more aggressive and thorough and can
help find more potential memory problems in NGINX related C code.

This Valgrind mode is used by OpenResty developers on a daily basis and has helped
locate countless memory manage bugs in the OpenResty C and Lua/FFI code base.
Interestingly, this test mode also located memory issues in the official NGINX core and the
official LuaJIT core. Unlike analyzing core dumps, Valgrind can almost always find the first
scene of memory offends, studying the memory error reports can usually give rise to
immediate code fixes.

As with all the other tools, Valgrind has its own limitations and cannot find all the memory
issues even when we carefully disable application level memory allocators as demonstrated
above. For example,

1. memory issues on the C runtime stack cannot be caught by Valgrind (at least for
Valgrind' s default memcheck tool).

2. Also, memory leaks in application-level resource managers cannot be detected. For


example, memory leaks in NGINX’s global memory pool won’t get detected since
NGINX always destroy all the memory pools upon process termination. Similarly, an
ever growing Lua object managed by the Lua garbage collector (GC) won’t get caught
either, since the Lua VM always frees all its GC-managed objects.

Understanding the weakness of the tool is as important as understanding its strengths. We


shall see an alternative approach in the next section for detecting leaks in the application-
level memory managers.

66
Test Modes

Google’s AddressSanitizer tool can also be used to detect memory issues. As


compared to Valgrind, it has the advantages of running much faster and can
detect memory issues on the C runtime stack as well. Unfortunately it has its
own limitations too. For example, it requires special C/C++ compiler options to
Note
rebuild all the related C code and C libraries for the best result. Also, it cannot
find problems in dynamically generated machine code (like from a Just-in-Time
compiler) or hand-written assembly code (like LuaJIT’s Lua interpreter).
Therefore, OpenResty developers use Valgrind much more often.

Naive Memory Leak Check Mode


As we have seen from the previous section, Valgrind is great at detecting a wide range of
memory leaks and memory invalid accesses. But Valgrind also suffers from limitations in
detecting leaks in application-level memory managers such as garbage collectors (GC) and
memory pools, which is also quite common in reality. To see this, let’s consider the following
simple example that leaks in LuaJIT’s GC-managed memory.

=== TEST 1:
--- config
location = /t {
content_by_lua_block {
package.path = "/path/to/some/lib/?.lua;" .. package.path
ngx.say("ok")
}
}
--- request
GET /t
--- response_body
ok
--- no_error_log
[error]

This example demonstrates a common mistake made by many OpenResty beginners. The
package.path field specifies the search paths used by the require builtin function for

loading pure Lua modules. This string value is hooked up in the global Lua table package
which has the same lifetime as the current Lua virtual machine (VM) instance. Since Lua VM
instances usually have the same lifetime as NGINX worker processes (unless the
lua_code_cache directive is turned off in nginx.conf ), prepending a new string to the value

of package.path in a request handler like content_by_lua_block apparently results in a


memory leak.

Unfortunately Valgrind cannot find this leak at all since the leak happens in the GC-managed
memory inside the Lua VM because all such leaked memory will always get released upon
GC destruction (or VM destruction) before the current process exits, which fools Valgrind to

67
Test Modes

think that there is no leaks at all. Interested readers can try running this example with the
"Valgrind test mode" as explained in the previous section.

To address this limitation of Valgrind, Test::Nginx::Socket introduces a new test mode


called "naive memory leak check mode", or just "check leak mode" for short. In this mode,
the test scaffold performs the following things:

1. loads the NGINX server with many of the test request specified in the test block, in a
way similar to the "benchmark test mode" we discussed earlier,

2. and at the same time, periodically polls and records the memory footprint of the NGINX
worker process with the system command ps ,

3. and finally analyzes the memory usage data points collected in 2) by finds the slope
( k ) of a line that best fits those data points.

To make use of this mode, just specify the TEST_NGINX_CHECK_LEAK=1 environment, before
running existing test files, as in

export TEST_NGINX_CHECK_LEAK=1
prove t/a.t

Assuming the t/a.t test file contains the test block example given above, we should get an
output similar to the following.

68
Test Modes

t/a.t .. TEST 1:
LeakTest: [3740 3756 3620 3624 4180 3808 4044 4240 4272 4888
3876 3520 4516
4368 5216 4796 4420 4508 4068 5336 5220 3888 4196 4544 4100
3696 5028 5080
4580 3936 5236 4308 5320 4748 5464 4032 5492 4996 4588 4932
4632 6388 5228
5516 4680 5348 5420 5964 5436 5128 5720 6324 5700 4948 4312
6208 5192 5268
5600 4144 6556 4248 5648 6612 4044 5408 5120 5120 5740 6048
6412 5636 6488
5184 6036 5436 5808 4904 4980 6772 5148 7160 6576 6724 5024
6768 7264 5540
5700 5284 5244 4512 5752 6752 6868 6064 4940 5636 6388 7468]
LeakTest: k=22.6
t/e.t .. ok
All tests successful.
Files=1, Tests=3, 6 wallclock secs ( 0.01 usr 0.01 sys + 0.61
cusr 1.68 csys = 2.31 CPU)
Result: PASS

The special output lines from this test mode have the prefix LeakTest: . The first such line
lists all the data points for the memory footprint size in the unit of kilo bytes (KB), collected
every 0.02 seconds. And the second line is the slope ( k ) of the data line that best fits these
data points. And in this case, k equals to 22.6 .

The slope of the line can usually serve as an indication for the speed of memory leaking.
The larger the slope is, the faster the leak is. A 2-digit data line slope here is very likely an
indication of memory leak. To be sure, we plot these data points in a graph using the
gnuplot tool.

69
Test Modes

There are quite some fluctuations in the graph. This is due to how garbage collector
normally behaves. It usually allocates page-sized or even larger memory blocks than
actually requested for performance reasons and delays the release of unused memory
blocks because of the sweep phase or something else. Still, it is clear that the memory
usage is going up over all.

We can try enforcing a full garbage collection cycle upon the entry of our request handler,
like this:

content_by_lua_block {
collectgarbage()
package.path = "/path/to/some/lib/?.lua;" .. package.path
ngx.say("ok")
}

This way we can ensure that there is no memory garbage hanging around after the point we
call the Lua builtin function collectgarbage() .

Now the output looks like this:

70
Test Modes

t/e.t .. TEST 1:
LeakTest: [2464 2464 2360 2464 2232 2520 2380 2536 2440 2320
2300 2464
2576 2584 2540 2408 2608 2420 2596 2596 2332 2648 2660 2460
2680 2320
2688 2616 2332 2628 2408 2728 2716 2380 2752 2360 2768 2376
2372 2376
2732 2800 2808 2816 2464 2396 2668 2688 2848 2672 2412 2416
2536 2420
2424 2632 2904 2668 2912 2564 2724 2448 2932 2944 2856 2960
2616 2672
2976 2620 2984 2600 2808 2980 3004 2996 3236 3012 2724 3168
3072 3536
3260 3412 3028 2700 2480 3188 2808 3536 2640 3056 2764 3052
3440 3308
3064 2680 2828 3372]
LeakTest: k=7.4
t/e.t .. ok
All tests successful.
Files=1, Tests=3, 6 wallclock secs ( 0.02 usr 0.00 sys + 0.62
cusr 1.75 csys = 2.39 CPU)
Result: PASS

We can see this time, the slope of the best-fitting line is much smaller, but still much larger
than 0.

The line graph is now much smoother, as expected:

71
Test Modes

And we can see that the line is still going upward relatively steadily over time.

Large fluctuations and variations in the memory footprint may create noises in our data
samples and even result in false positives. We already saw how big fluctuations may result
in large data-fitting line slopes. It is usually a good idea to enforce full garbage collection
cycles frequently to reduce such noises at least in GC-managed memory. The
collectgarbage() function, however, is quite expensive in terms of CPU resources and may

hurt the over-all performance very badly. Ensure you do not call it often (like in every
request) in the "benchmark test mode" introduced above or even in production applications.

In reality, this brute-force "check leak" test mode has helped catching quite a lot of real
memory leaks in OpenResty’s test suites over the years. Most of those leaks made their way
around the Valgrind test mode since they happened in GC-managed memory or NGINX’s
memory pools.

The NGINX no-pool patch mentioned in the previous section does not help here
Note since all the leaked memory blocks in the pool still get released before the
process exits.

Nevertheless, there exists one big drawback of this test mode. Unlike Valgrind, it cannot give
any detailed information about the locations where leaks (may) happen. All it reports are just
data samples and other metrics that verify just the existence of a leak (at least to some
extend). We shall see in a later chapter how we can use the "memory leak flame graphs" to
overcome this limitation even for leaks and big swings in GC-managed or pool-managed
memory.

72
Test Modes

Mockeagain Mode

Manual Debugging Mode

SystemTap Mode

73

You might also like