Introduction
We have written this small book for those who only start
getting acquainted with the world of PostgreSQL. From this
book, you will learn:
I    PostgreSQL — what is it all about? . . . . . . . . . . . . . . . . . . . . 3
II   What’s new in PostgreSQL 15 . . . . . . . . . . . . . . . . . . . . . . . . . 15
III  Installation on Linux and Windows . . . . . . . . . . . . . . . . . . 23
IV   Connecting to a server, writing SQL queries,
     and using transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
V    Learning the SQL language on a demo database . . . . 59
VI   Using PostgreSQL with your application . . . . . . . . . . . . . . 87
VII Minimal server setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
VIII About a useful pgAdmin application . . . . . . . . . . . . . . . . 109
IX   Advanced features:
     full-text search, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
     JSON format, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
     foreign data wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
X    Education and certification opportunities . . . . . . . . . . 145
XI   Keeping up with all updates . . . . . . . . . . . . . . . . . . . . . . . . . 167
XII About the Postgres Professional company . . . . . . . . . . 171
We hope that our book will make your first experience with
PostgreSQL more pleasant and help you blend into the Post-
greSQL community. Good luck!
I About PostgreSQL
PostgreSQL is the most feature-rich free open-source data-
base system. Originally developed in the academic environ-
ment, it has managed to bring together a wide developer
community through its long history. Nowadays, PostgreSQL
offers everything that most customers need, and it is actively
used all over the world to create high-load business-critical
systems.
Some History
Modern PostgreSQL originates from the POSTGRES project,
which was led by Michael Stonebraker, professor of the
University of California, Berkeley. Before this work, Michael
Stonebraker had been managing INGRES development; it was
one of the first relational database systems, and POSTGRES
appeared as the result of rethinking all the previous work
and the desire to overcome the limitations of its rigid type
system.
The project was started in 1985, and by 1988 a number of
scientific articles had been published that described the data
model, POSTQUEL query language (SQL was not an accepted
standard at the time), and data storage structure.
4   POSTGRES is sometimes considered to be a so-called post-
i   relational database system. The relational model had always
    been criticized for its restrictions, even though they were
    the flip side of its strictness and simplicity. As computer
    technologies were spreading in all spheres of life, new types
    of applications started to appear, and databases had to
    support custom data types and such features as inheritance
    or creating and managing complex objects.
    The first version of this database system appeared in 1989.
    It was being improved and enhanced for several years, but
    in 1993, when version 4.2 was released, the project was shut
    down. However, despite its official cancellation, UC Berkeley
    alumni Andrew Yu and Jolly Chen revived the project and
    resumed its development in 1994, taking advantage of its
    liberal BSD license and open source. They replaced POSTQUEL
    query language with SQL, which had become a generally
    accepted standard by that time. The project was renamed
    to Postgres95.
    In 1996, it became obvious that the Postgres95 name would
    not stand the test of time, and a new name was selected:
    PostgreSQL. This name reflects the connection both with the
    original POSTGRES project and the SQL adoption. This name
    may be quite hard to articulate, but nevertheless, we should
    pronounce it as “Post-Gres-Q-L” or simply “postgres,” but not
    as “postgre.”
    The first PostgreSQL release had version 6.0, keeping the
    original numbering scheme. The project grew, and its man-
    agement was taken over by at first a small group of active
    users and developers, which was named PostgreSQL Global
    Development Group.
Development                                                     5
                                                                i
All the main decisions about developing and releasing new
PostgreSQL versions are taken by the Core team, which
consists of seven people at the moment.
Apart from developers who contribute to the project from
time to time, there is a group of main developers who have
made significant contributions to PostgreSQL. They are called
major contributors. There is also a group of committers,
who have the write access to the source code repository.
Group members change over time, new developers join the
community, others leave the project. The current list of
developers is published on the PostgreSQL’s official website:
postgresql.org/community/contributors.
The contribution of Russian developers into PostgreSQL is
compelling. This is arguably the largest global open-source
project with such a vast Russian representation.
Vadim Mikheev, a software programmer from Krasnoyarsk
who used to be a member of the Core team, played an
important role in PostgreSQL evolution and development. He
created such key core features as multi-version concurrency
control (MVCC), vacuum, write-ahead log (WAL), subqueries,
triggers. Vadim is not involved with the project anymore.
In 2015, Oleg Bartunov, a professional astronomer and
research scientist at Sternberg Astronomical Institute of
Lomonosov Moscow State University, teamed up with Teodor
Sigaev and Alexander Korotkov to start the Postgres Profes-
sional company, which is now the main talent foundry in
Russia when it comes to database system development.
The main areas of their contribution are PostgreSQL local-
ization (national encodings and Unicode support), full-text
6   search, working with arrays and semi-structured data (hstore,
i   json, jsonb), new index methods (GiST, SP-GiST, GIN and RUM,
    Bloom). They have also created a lot of popular extensions.
    PostgreSQL release cycle usually takes about a year. In
    this timeframe, the community receives patches with bug
    fixes, updates, and new features from everyone willing to
    contribute. Traditionally, all patches are discussed in the
    pgsql-hackers mailing list. If the community finds the idea
    useful, its implementation is correct, and the code passes
    a mandatory code review by other developers, the patch is
    included into the next release.
    At some point (usually in spring, about half a year before the
    release), code stabilization is announced: all new features
    get postponed till the next version, only bug fixes and
    improvements for the already included patches are accepted.
    Within the release cycle, beta versions appear. Closer to the
    end of the release cycle a release candidate is built, and soon
    a new major version of PostgreSQL is released.
    Major versions used to be defined by two numbers, but in
    2017 it was decided to start using a single number. Thus,
    version 9.6 was followed by PostgreSQL 10, while the latest
    available version is PostgreSQL 15, which was released in
    October 2022.
    As the new version is being prepared, developers can find and
    fix bugs in it. The most critical fixes are backported to the
    previous versions. The community usually releases updates
    quarterly; these minor versions accumulate such fixes. For
    example, version 14.5 contains bug fixes for the 14.4 release,
    while version 15.2 provides fixes for PostgreSQL 15.1.
Support                                                          7
                                                                 i
PostgreSQL Global Development Group supports major re-
leases for five years. Both support and development are
managed through mailing lists. A correctly filed bug report
has all the chances to be addressed very fast: bug fixes can
be released as fast as 24 hours.
Apart from the community support, 24x7 commercial sup-
port for PostgreSQL is also provided by a number of compa-
nies in different countries, including Postgres Professional
(www.postgrespro.com).
Current State
PostgreSQL is one of the most popular databases. Based
on the solid foundation of academic development, over
several decades PostgreSQL has evolved into an enterprise-
level product that is now a real alternative to commercial
databases. You can see it for yourself by looking at the
key features of PostgreSQL 15, which is the latest released
version right now.
Reliability and Stability
Reliability is especially important in enterprise-level appli-
cations that handle business-critical data. For this purpose,
PostgreSQL provides support for hot standby servers, point-
in-time recovery, different types of replication (synchronous,
asynchronous, cascade).
8   Security
i
    PostgreSQL supports secure SSL connections and provides
    various authentication methods, such as password authen-
    tication (including SCRAM), client certificates, and external
    authentication services (LDAP, RADIUS, PAM, Kerberos).
    For user management and database access control, the
    following features are provided:
    •   creating and managing new users and group roles
    •   role- and group-based access control to database objects
    •   row-level and column-level security
    •   SELinux support via a built-in SE-PostgreSQL functionality
        (Mandatory Access Control)
    Russian Federal Service for Technical and Export Control has
    certified a custom PostgreSQL version released by Postgres
    Professional for use in data processing systems for personal
    data and classified information.
    Conformance to the SQL Standard
    As the ANSI SQL standard is evolving, its support is constantly
    being added to PostgreSQL. This is true for all versions of the
    standard, from SQL-92 to the most recent SQL:2016, which
    standardizes JSON support. Much of this functionality is
    already implemented in PostgreSQL 15.
    In general, PostgreSQL provides a high rate of conformance
    to the SQL standard, supporting 170 out of 177 mandatory
    features and many optional ones.
Transaction Support                                                 9
                                                                    i
PostgreSQL provides full support for ACID properties and
efficient transaction isolation based on the multi-version
concurrency control (MVCC). This method allows us to avoid
locking in all cases except for concurrent updates of the same
row by different processes. Reading transactions never block
writing ones, and writing never blocks reading.
This is true even for the serializable isolation level, which is
the strictest one. Using an innovative Serializable Snapshot
Isolation system, this level ensures that there are no serializa-
tion anomalies and guarantees that concurrent transaction
execution produces the same result as sequential one.
For Application Developers
Application developers get a rich toolset for creating appli-
cations of any type:
•   Support for various server programming languages: built-
    in PL/pgSQL (which is closely integrated with SQL), C for
    performance-critical tasks, Perl, Python, Tcl, as well as
    JavaScript, Java, etc.
•   APIs to access the database from applications written in
    virtually any language, including the standard ODBC and
    JDBC APIs.
•   A rich set of database objects that allow you to efficiently
    implement the logic of any complexity on the server
    side: tables and indexes, sequences, integrity constraints,
    views and materialized views, partitioning, subqueries
    and WITH-queries (including recursive ones), aggregate
    and window functions, stored functions, triggers, etc.
10   •   A flexible full-text search system that supports a variety of
 i       languages, extended with efficient index access methods.
     •   Semi-structured data typical of NoSQL: hstore (storage of
         key–value pairs), xml, json (represented as text or in a
         more robust jsonb binary format).
     •   Foreign Data Wrappers. This feature allows adding new
         data sources as external tables by the SQL/MED standard.
         You can use any major database as an external data
         source. PostgreSQL provides full support for foreign data,
         including write access and distributed query execution.
     Scalability and Performance
     PostgreSQL takes advantage of the modern multi-core CPU
     architecture. Its performance grows almost linearly as the
     number of cores increases.
     PostgreSQL can parallelize queries and some commands
     (such as index creation and vacuuming). In this mode, reads
     and joins are performed by several concurrent processes. JIT-
     compilation of queries can speed up operations thanks to
     better use of hardware resources. Each PostgreSQL version
     adds new parallelization features.
     Horizontal scaling can rely on both physical and logical
     replication. It allows you to build PostgreSQL-based clus-
     ters to achieve high performance, fault tolerance, and geo-
     distribution. Some examples of such systems are Citus (Citus-
     data), Postgres-BDR (2ndQuadrant), Multimaster (Postgres
     Professional), Patroni (Zalando).
Query Planner                                                     11
                                                                  i
PostgreSQL relies on a cost-based query planner. Using
the collected statistics and taking into account both disk
operations and CPU time in its mathematical models, the
planner can optimize even the most complex queries. It can
use all access paths and join methods available in state-of-
the-art commercial database systems.
Indexing
PostgreSQL provides various types of indexes. Apart from tra-
ditional B-trees, you can use many other access methods.
•   Hash,
    a hash-based index. Unlike B-trees, such indexes work
    only for equality checks, but in some cases they can prove
    to be more efficient and compact.
•   GiST,
    a generalized balanced search tree. This access method
    is used for the data that cannot be ordered. For example,
    R-trees that are used to index points on a plane and can
    serve to implement fast k-nearest neighbor (k-NN) search,
    or indexing overlapping intervals.
•   SP-GiST,
    a generalized non-balanced tree based on dividing the
    search space into non-intersecting nested partitions. For
    example, quad-trees for spatial data and radix trees for
    text strings.
•   GIN,
    a generalized inverted index used for compound multi-
    element values. It is mainly applied in full-text search to
12       find documents that contain the words used in the search
 i       query. Another example is search for elements in data
         arrays.
     •   RUM,
         an enhancement of the GIN method for full-text search.
         Available as an extension, this index type can speed
         up phrase search and return the results in the order of
         relevance without any additional computations.
     •   BRIN,
         a compact structure that provides a trade-off between the
         index size and search efficiency. Such index is useful for
         huge clustered tables.
     •   Bloom,
         an index based on the Bloom filter. Having a compact rep-
         resentation, this index can quickly filter out non-matching
         tuples, but the remaining ones have to be re-checked.
     Many index types can be built upon both a single column
     and multiple columns. Regardless of the type, you can
     build indexes not only on columns, but also on arbitrary
     expressions. It is also possible to create partial indexes for
     specific sets of rows. Covering indexes can speed up queries,
     since all the required data is retrieved from the index itself,
     avoiding heap access.
     The planner can use a bitmap scan, which allows combining
     several indexes together for faster access.
     Cross-Platform Support
     PostgreSQL runs both on Unix operating systems (including
     server and client Linux distributions, FreeBSD, Solaris, and
     macOS) and on Windows systems.
Its portable open-source C code allows building PostgreSQL     13
on a variety of platforms, even if there is no package sup-    i
ported by the community.
Extensibility
One of the main advantages of PostgreSQL architecture is
extensibility. Without changing the core system code, users
can add various features, such as:
•   data types
•   functions and operators to support new data types
•   index and table access methods
•   server programming languages
•   foreign data wrappers
•   loadable extensions
Full-fledged support of extensions enables you to develop
new features of any complexity that can be installed on
demand without changing the PostgreSQL core. For example,
the following complex systems are built as extensions:
•   CitusDB,
    which implements massively parallel query execution and
    data distribution between different PostgreSQL instances
    (sharding).
•   PostGIS,
    one of the most popular and powerful geoinformation
    data processing systems.
14   •   TimescaleDB,
 i       which provides support for time-series data, including
         special partitioning and sharding.
     The standard PostgreSQL 15 distribution alone includes
     about fifty extensions that have proved to be useful and
     reliable.
     Availability
     A liberal PostgreSQL license, which is similar to BSD and
     MIT licenses, allows unrestricted use of PostgreSQL; you may
     also modify PostgreSQL code without any limitations and
     integrate it into other products, including commercial and
     closed-source software.
     Independence
     PostgreSQL does not belong to any company; it is developed
     by the international community, which includes developers
     from all over the world. It means that systems using Post-
     greSQL do not depend on a particular vendor, thus keeping
     the investment safe in any circumstances.
II What’s New in
   PostgreSQL 15
If you are familiar with the previous versions of PostgreSQL,
this chapter can give you a sense of what has changed over
the past year. It mentions only some of the updates; for the
full list of changes, see the Release Notes: postgrespro.com/
docs/postgresql/15/release-15.
SQL Commands
After an unsuccessful attempt made as early as PostgreSQL
11, the MERGE command is finally implemented. This com-
mand is defined by the SQL standard. It is more flexible and
sometimes also more efficient than the previously available
INSERT ... ON CONFLICT command.
Should NULL values be considered distinct in integrity con-
straints? There used to be only one correct answer to this
question (negative), but now the NULLS [NOT] DISTINCT
clause allows choosing the desired behavior.
You can now provide a list of columns in the ON DELETE SET
NULL clause for composite foreign keys: instead of resetting
values in all the columns, simply specify the exact fields to
be affected when the parent entry is deleted.
16    The COPY command can now take the table header in the first
 ii   data row as input, as well as include it in the output.
      Functions
      The unnest and range_agg functions have been added for
      multidimensional data types introduced in version 14.
      New regular expression functions have been added to facili-
      tate migration: regexp_like, regexp_count, regexp_instr,
      and regexp_substr.
      Functions pg_size_pretty and pg_size_bytes have been
      improved to support petabyte units. Prefixes for larger units
      have been reserved for future enhancements.
      Partitioning
      If a trigger on a partitioned table gets renamed, triggers on
      table partitions are renamed automatically.
      The CLUSTER command is now supported for partitioned
      tables.
      Planning is now performed faster if only a few partitions are
      relevant to the query.
Write-Ahead Log                                                17
                                                               ii
LZ4 and ZStd compression algorithms for full page images
have been added. As compared to the standard PGLZ algo-
rithm, LZ4 usually consumes less resources but is just as
efficient, while ZStd is more CPU-intensive but shows better
compression results. The LZ4 algorithm is also supported by
the pg_receivewal utility.
WAL records for recovery can now be prefetched, which may
speed up server restart after a failure, backup restore, and
application of WAL records during replication.
Continuous archiving now allows using custom modules
instead of an OS command. Developers of backup solutions
can now rely on this feature.
A new feature that is not directly available to end users
is the ability to create custom resource managers. This
functionality is important for developers of both index and
table access methods. Incidentally, table access methods
can now be altered using the ALTER TABLE ... SET METHOD
command (but sadly, there is no alternative available yet).
Added the pg_walinspect extension, which enables viewing
WAL records via an SQL query (in addition to the already
available pg_waldump utility).
Logical Replication
Logical replication has been significantly improved. There
is hope that such long-awaited features as replication of
sequences and DDL commands will be soon implemented
as well.
18    When creating a publication, you can now filter rows and
 ii   columns to be replicated, as well as replicate all tables that
      belong to a particular schema (FOR ALL TABLES IN SCHEMA).
      Added support for prepared transactions to logical repli-
      cation.
      Subscriber processes are now executed with the privileges
      of the subscription owner; superuser rights are not required
      anymore.
      Subscriptions can now be stopped on conflict. Previously, the
      WAL receiver process was restarted every second; now the
      conflicting transaction can be skipped (ALTER SUBSCRIPTION
      ... SKIP).
      Backup
      The good-old pg_basebackup utility was refactored.
      You can now choose the backup location: it can be created
      either on the client or on the server. Following the principle
      of extensibility, you can create custom backup targets; for
      example, the new basebackup_to_shell extension passes
      the backup to an OS command.
      Data compression on the server side can now be performed
      by gzip, LZ4, and ZStd algorithms, which can be useful if the
      network throughput is low.
Security                                                           19
                                                                   ii
The CREATE privilege has been revoked from the public
schema; this privilege used to be inherited by all users from
the public pseudorole. The public schema is now owned
by a new pseudorole called pg_database_owner, which im-
plicitly includes the owner of the current database.
PostgreSQL now provides more opportunities for delegating
system administration tasks to non-privileged users. The
right to perform the CHECKPOINT command is granted to the
new role called pg_checkpointer. The access to statistics on
backend memory contexts (pg_backend_memory_contexts
and pg_shmem_allocations catalog views) is granted to
the pg_read_all_stats role. You can also control access
to configuration parameters (GRANT SET and GRANT ALTER
SYSTEM).
It is now possible to create views with the caller’s privileges:
the user must have the right to access these views’ objects
in this case.
Regular users can no longer manage their own role mem-
bership (that is, use the ADMIN OPTION) by default to add or
remove members of its own role).
Monitoring
Cumulative statistics is now stored in shared memory; a
separate statistics collector process is not used anymore, and
there is no need to mount tmpfs for the files that accumulate
statistics.
20    New wait events for archive commands have appeared:
 ii   ArchiveCommand, ArchiveCleanupCommand, RestoreCom-
      mand, RecoveryEndCommand.
      Server log can now be written in the JSON format; it is much
      better suited for programmatic parsing than a plain-text
      output.
      The pg_log_backend_memory_contexts function now logs
      memory contexts of auxiliary processes in addition to back-
      end memory contexts.
      Postgres_fdw
      The postgres_fdw foreign data wrapper, which is the corner-
      stone of the future built-in sharding, can now pass CASE ex-
      pressions to the foreign server and set the application_name
      parameter to an arbitrary value.
      Transactions started on foreign servers can now be commit-
      ted in parallel, which may boost performance. Unfortunately,
      we are still far from having genuine distributed transac-
      tions.
      Vacuuming and Freezing
      Vacuuming and freezing have been significantly refactored.
      Now VACUUM can partially process the pages that cannot be
      locked instead of skipping them altogether. As a result, the
      relfrozenxid threshold can also be advanced by routine
      vacuuming, so aggressive vacuuming can be performed less
      frequently.
The VACUUM VERBOSE command now displays more useful             21
information, which was previously included into the server      ii
log only.
Optimizations
Sorting now utilizes memory more efficiently, which in-
creases the chances to avoid disk access; sorting of a single
column is now performed faster.
Encoding validation of UTF-8 text strings has been sped up by
processing several bytes at a time.
Data transfer from workers to the leader process in paral-
lelized queries has been accelerated. The parallel_tuple_cost
value has not been changed yet.
Parallel aggregation has been already available for a long
time, and now the SELECT DISTINCT command can also be
executed in parallel.
New support functions added for the planner improve cardi-
nality estimation for conditions with the starts_with func-
tion or the ˆ@ operator.
The default hash_mem_multiplier value has been increased
to 2.0 (it used to be 1.0). It means that hash tables can now
use twice as much memory as specified in the work_mem
parameter: it’s good for queries, but you have to keep an eye
on memory consumption.
You can manage the size of the working table of a recursive
query using the recursive_worktable_factor parameter.
22    Miscellaneous
 ii
      The ICU locale provider, which was added as early as version
      10, can now be used as the default one for databases
      and clusters; collation versions are recorded in the system
      catalog.
      Python 2 is not supported anymore, both plpython2u and
      plpythonu languages were removed.
      Documentation
      The documentation was extended with a new chapter on
      hash indexes: postgrespro.com/docs/postgresql/15/hash-
      index.
III Installation
    and Quick Start
What is required to get started with PostgreSQL? In this
chapter, we’ll learn how to install PostgreSQL and manage
the corresponding service, and in the next one we’ll continue
our ramp-up by creating a simple database and trying out
some basic SQL queries.
We are going to use a regular (often called “vanilla”) distribu-
tion of PostgreSQL 15. Depending on your operating system,
the procedure of installing and setting up PostgreSQL will
differ:
•   If you are using Windows, read on.
•   To set up PostgreSQL on Linux-based Debian or Ubuntu
    systems, go to p. 28.
For other operating systems, you can view installation in-
structions online: www.postgresql.org/download.
You can also use Postgres Pro Standard 15: it is fully com-
patible with vanilla PostgreSQL, includes some additional
features developed by Postgres Professional, and is free when
used for trial or educational purposes. Check out installation
instructions at postgrespro.com/products/download in this
case.
24    Windows
iii
      Installation
      Download the PostgreSQL installer, launch it, and select the
      installation language: postgrespro.com/windows.
      The installer provides a conventional wizard interface: you
      can simply keep clicking the “Next” button if you are fine with
      the default options. Let’s go over the main steps.
      Choose components (keep the current selection if you are
      uncertain what to choose):
      Then you have to specify PostgreSQL installation directory.
      By default, PostgreSQL server is installed into C:\Program
      Files\PostgreSQL\15.
You can also specify the location of the data directory.       25
                                                               iii
This directory will hold all the information stored in your
database system, so make sure you have enough disk space
if you plan to keep a lot of data.
If you are going to store your data in a language other than
English, make sure to choose the corresponding locale (or
leave the “Default” option if your Windows locale settings
are configured appropriately).
Enter and confirm the password for the postgres database
user. You should also select the “Set up environment vari-
ables” checkbox to connect to the PostgreSQL server on
behalf of the current OS user.
You can leave the default settings in all the other fields.
26
iii
      If you are planning to install PostgreSQL for training pur-
      poses only, you can select the “Use the default settings”
      option for the database system to take up less RAM.
      Managing the Service and the Main Files
      When PostgreSQL is installed, the “postgresql-15” service is
      registered in your system. This service is launched auto-
      matically at the system startup under the Network Service
      account. If required, you can change the service settings
      using the standard Windows options.
      To temporarily stop the service, run the “Stop Server” program
      from the Start menu subfolder that you have selected at
      installation time.
                                                                27
                                                                iii
To start the service, run the “Start Server” program from the
same folder.
If an error occurs at the service startup, you can view
the server log to find out its cause. The log file is lo-
cated in the log subdirectory of the database directory
chosen at installation time (you can typically find it at
C:\Program Files\PostgreSQL\15\data\log). Logging is
regularly switched to a new file. You can find the required
file either by the last modified date or by the filename that
includes the date and time of the switchover to this file.
There are several important configuration files that define
server settings. They are located in the database directory.
You do not have to modify them to get started with Post-
greSQL, but you’ll definitely need them in real work. Take a
look into these files, they are fully documented:
28    •   postgresql.conf is the main configuration file that con-
iii       tains server parameters.
      •   pg_hba.conf defines access rules. For security reasons,
          the default configuration only allows access from the
          local system, and it must be confirmed by a password.
      Now we are ready to connect to the database and try out
      some commands and SQL queries. Go to the “Trying SQL”
      chapter on p. 33.
      Debian and Ubuntu
      Installation
      If you are using Linux, you need to add PGDG (PostgreSQL
      Global Development Group) package repository. At the mo-
      ment, the supported Debian versions are 10 “Buster,” 11 “Bulls-
      eye,” and 12 “Bookworm.” The currently supported Ubuntu
      versions are 18.04 “Bionic,” 20.04 “Focal,” 22.04 “Jammy,” and
      22.10 “Kinetic”.
      Run the following commands in the console window:
      $ sudo apt-get install lsb-release
      $ sudo sh -c 'echo "deb \
      http://apt.postgresql.org/pub/repos/apt/ \
      $(lsb_release-cs)-pgdg main" \
      > /etc/apt/sources.list.d/pgdg.list'
      $ wget --quiet -O - \
      https://postgresql.org/media/keys/ACCC4CF8.asc \
      | sudo apt-key add -
      Once the repository is added, let’s update the list of pack-
      ages:
$ sudo apt-get update                                            29
                                                                 iii
Before starting the installation, check localization settings:
$ locale
If you plan to process non-English data, the LC_CTYPE and
LC_COLLATE variables may have to be configured. For exam-
ple, it makes sense to set these variables to “fr_FR.UTF8” for
the French language, even though the “en_US.UTF8” may do
too:
$ export LC_CTYPE=fr_FR.UTF8
$ export LC_COLLATE=fr_FR.UTF8
You should also make sure that the operating system has the
required locale installed:
$ locale -a | grep fr_FR
fr_FR.utf8
If it’s not the case, generate the locale, as follows:
$ sudo locale-gen fr_FR.utf8
Now we can start the installation:
$ sudo apt-get install postgresql-15
It was the final step; once the installation command com-
pletes, PostgreSQL will be installed and launched. To check
that the server is ready to use, run:
$ sudo -u postgres psql -c 'select now()'
If all went well, the current time is returned.
30    Managing the Service and the Main Files
iii
      When PostgreSQL is installed, a special postgres user is
      created on your system. All the server processes work on
      behalf of this user, and all the database files belong to this
      user as well. PostgreSQL will be started automatically at the
      operating system boot. It’s not a problem with the default
      settings: if you are not working with the database server, it
      consumes very little of system resources. If you still decide
      to turn off the autostart, run:
      $ sudo systemctl disable postgresql
      To temporarily stop the database server service, enter:
      $ sudo systemctl stop postgresql
      You can launch the server service as follows:
      $ sudo systemctl start postgresql
      You can also check the current state of the service:
      $ sudo systemctl status postgresql
      If the service cannot start, use the server log to troubleshoot
      this issue. Take a closer look at the latest log entries in
      /var/log/postgresql/postgresql-15-main.log.
      All information stored in the database is located in the
      /var/lib/postgresql/15/main/ directory. If you are going
      to keep a lot of data, make sure that you have enough disk
      space.
Server settings are defined by several configuration files.        31
There’s no need to edit all these files to get started, but it’s   iii
worth checking them out since you’ll definitely need them in
the future:
•   /etc/postgresql/15/main/postgresql.conf is the
    main configuration file that contains server parameters.
•   /etc/postgresql/15/main/pg_hba.conf file defines ac-
    cess settings. For security reasons, the default configura-
    tion only allows access from the local system on behalf of
    the database user that has the same name as the current
    OS user.
Now it’s time to connect to the database and try out SQL.
IV Trying SQL
Connecting via psql
To connect to the database server and start executing com-
mands, you need to have a client application. In the “Post-
greSQL for Applications” chapter, we will talk about sending
queries from applications written in different programming
languages. And here we’ll explain how to work with the psql
client from the command line in the interactive mode.
Unfortunately, many people are not very fond of the com-
mand line nowadays. Yet it is really worth mastering.
First of all, psql is a standard client application included into
all PostgreSQL packages, so it’s always available. No doubt,
it’s good to have a customized environment, but there is no
point in getting lost on an unknown system.
Secondly, psql is really convenient for everyday DBA tasks,
writing small queries, and automating processes. For exam-
ple, you can use it to periodically deploy application code
updates on your database server. The psql client provides
its own commands that can help you find your way around
database objects and display the data stored in tables in a
convenient format.
However, if you are used to working in graphical user inter-
faces, try pgAdmin (we’ll get back to it later) or other simi-
34    lar products: wiki.postgresql.org/wiki/Community_Guide_to_
 iv   PostgreSQL_GUI_Tools.
      To start psql on a Linux system, run this command:
      $ sudo -u postgres psql
                                        On Windows, open the
                                        Start menu and launch the
                                        “SQL Shell (psql)” program.
                                        When prompted, enter the
                                        password for the postgres
                                        user that you set when in-
                                        stalling PostgreSQL.
                                        Windows users may run
                                        into encoding issues when
                                        viewing non-Latin charac-
                                        ters in the terminal. If that
                                        is the case, make sure that
                                        a TrueType font is selected
                                        in the properties of the ter-
      minal window (typically, “Lucida Console” or “Consolas”).
      As a result, you should see the same prompt on both oper-
      ating systems: postgres=#. Here “postgres” is the name of
      the database to which you are connected right now. A single
      PostgreSQL server can serve several databases, but you can
      work only with one of them at a time.
      Now let’s try out some commands. Enter only the part printed
      in bold; the prompt and the system response are provided
      here solely for your convenience.
Databases                                                         35
                                                                  iv
Let’s create a new database called test:
postgres=# CREATE DATABASE test;
CREATE DATABASE
Don’t forget to finish each command with a semicolon:
PostgreSQL expects you to continue typing until you enter
this symbol (so you can split the command between multiple
lines).
Now let’s connect to the created database:
postgres=# \c test
You are now connected to database "test" as user
"postgres".
test=#
As you can see, the command prompt has changed to
test=#.
The command that we’ve just entered does not look like SQL,
as it starts with a backslash. This is a convention for special
commands that can only be run in psql (so if you are using
pgAdmin or some other GUI tool, skip all commands starting
with a backslash or try to find an equivalent).
There are quite a few psql commands, and we’ll use some of
them a bit later. To get the full list of psql commands right
now, you can run:
test=# \?
Since the reference information is quite bulky, it will be
displayed in a pager program of your operating system, which
is usually more or less.
36    Tables
 iv
      Relational database management systems present data as
      tables. The structure of the table is defined by its columns;
      the data itself is stored in table rows. The data is not ordered,
      so rows are not necessarily stored in the same order they
      were added into the table.
      For each column, a data type is defined. All the values in
      the corresponding row fields must belong to this type. You
      can use multiple built-in data types provided by PostgreSQL
      (postgrespro.com/doc/datatype) or add your own custom
      types, but here we’ll cover just a few main ones:
      •   integer
      •   text
      •   boolean, which is a logical data type taking true or false
          values
      Apart from regular values defined by the data type, a field
      can be NULL. It can be interpreted as “the value is unknown”
      or “the value is not set.”
      Let’s create a table of university courses:
      test=# CREATE TABLE courses(
      test(#   c_no text PRIMARY KEY,
      test(#   title text,
      test(#   hours integer
      test(# );
      CREATE TABLE
      Note that the psql command prompt has changed: it is
      a hint that the command continues on the new line. For
convenience, we will not repeat the prompt on each line in      37
the examples that follow.                                       iv
The above command creates the courses table with three
columns: c_no specifies the course number represented as a
text string, title provides the course title, and hours lists
an integer number of lecture hours.
Apart from columns and data types, we can define integrity
constraints that will be checked automatically: PostgreSQL
will not allow invalid data in the database. In this example,
we have added the PRIMARY KEY constraint for the c_no
column. It means that all the values in this column must
be unique, and NULLs are not allowed. Such a column
can be used to distinguish one table row from another.
The postgrespro.com/doc/ddl-constraints page lists all the
available constraints.
You can find the exact syntax of the CREATE TABLE command
in the documentation, or view command-line help right in
psql:
test=# \help CREATE TABLE
Such reference information is available for each SQL com-
mand. To get the full list of SQL commands, run \help
without arguments.
Filling Tables with Data
Let’s insert some rows into the created table:
test=# INSERT INTO courses(c_no, title, hours)
VALUES ('CS301', 'Databases', 30),
       ('CS305', 'Networks', 60);
INSERT 0 2
38    If you need to perform a bulk data upload from an external
 iv   source, the INSERT command is not the best choice. Instead,
      you can use the COPY command specifically designed for this
      purpose: postgrespro.com/doc/sql-copy.
      We’ll need two more tables for further examples: students
      and exams. For each student, we are going to store their
      name and the year of admission (start year). The student ID
      card number will serve as the student’s identifier.
      test=# CREATE TABLE students(
         s_id integer PRIMARY KEY,
         name text,
         start_year integer
      );
      CREATE TABLE
      test=# INSERT INTO students(s_id, name, start_year)
      VALUES (1451, 'Anna', 2014),
              (1432, 'Victor', 2014),
              (1556, 'Nina', 2015);
      INSERT 0 3
      The exams table contains all the scores received by students
      in the corresponding course. Thus, students and courses are
      connected by the many-to-many relationship: each student
      can take exams in multiple courses, and each exam can be
      taken by multiple students.
      Each table row is uniquely identified by the combination
      of a student ID and a course number. Such an integrity
      constraint pertaining to several columns at once is defined
      by the CONSTRAINT clause:
      test=# CREATE TABLE exams(
         s_id integer REFERENCES students(s_id),
         c_no text REFERENCES courses(c_no),
         score integer,
         CONSTRAINT pk PRIMARY KEY(s_id, c_no)
      );
CREATE TABLE                                                       39
                                                                   iv
Besides, the REFERENCES clause here defines two referential
integrity checks called foreign keys. Such keys show that the
values of one table reference rows of another table.
After any action performed on the database, PostgreSQL
checks that all the s_id identifiers in the exams table corre-
spond to real students (that is, entries in the students table),
while c_no course numbers correspond to real courses. Thus,
it is impossible to assign a score on a non-existing subject
or to a non-existent student, regardless of the user actions
or possible application errors.
Let’s assign several scores to our students:
test=# INSERT INTO exams(s_id, c_no, score)
VALUES (1451, 'CS301', 5),
       (1556, 'CS301', 5),
       (1451, 'CS305', 5),
       (1432, 'CS305', 4);
INSERT 0 4
Data Retrieval
Simple Queries
To read data from tables, use the SQL operator SELECT. For
example, let’s display two columns of the courses table. The
AS clause allows you to rename the column if required:
test=# SELECT title AS course_title, hours
FROM courses;
40     course_title | hours
 iv   --------------+-------
       Databases    |    30
       Networks     |    60
      (2 rows)
      The asterisk * displays all the columns:
      test=# SELECT * FROM courses;
       c_no |     title    | hours
      -------+-------------+-------
       CS301 | Databases   |    30
       CS305 | Networks    |    60
      (2 rows)
      In production applications, it is recommended to explicitly
      specify only those columns that are really needed: then the
      query is performed more efficiently, and the result does not
      depend on new columns that may appear. But in interactive
      queries you can simply use an asterisk.
      The result can contain several rows with the same data. Even
      if all rows in the original table are different, the data can
      appear duplicated if not all the columns are displayed:
      test=# SELECT start_year FROM students;
       start_year
      ------------
             2014
             2014
             2015
      (3 rows)
      To select all different start years, specify the DISTINCT
      keyword after SELECT:
      test=# SELECT DISTINCT start_year FROM students;
 start_year                                                         41
------------                                                        iv
       2014
       2015
(2 rows)
For details, see the documentation: postgrespro.com/doc/
sql-select#SQL-DISTINCT.
In general, you can use any expressions after the SELECT
operator. And if you omit the FROM clause, the query will
return a single row. For example:
test=# SELECT 2+2 AS result;
 result
--------
      4
(1 row)
When you select some data from a table, it is usually required
to return only those rows that satisfy a certain condition. This
filtering condition is specified in the WHERE clause:
test=# SELECT * FROM courses WHERE hours > 45;
 c_no | title     | hours
-------+----------+-------
 CS305 | Networks |    60
(1 row)
The condition must be of a logical type. For example, it can
contain operators =, <> (or !=), >, >=, <, <=, as well as combine
simple conditions using logical operations AND, OR, NOT, and
parenthesis (like in regular programming languages).
Handling NULLs is a bit more subtle. The result will contain
only those rows for which the filtering condition is true; if
the condition is false or undefined, the row is excluded.
42    Remember:
 iv
      •   The result of comparing something to NULL is undefined.
      •   The result of logical operations on NULLs is usually
          undefined (exceptions: true OR NULL = true, false AND
          NULL = false).
      •   To check whether the value is undefined, the following op-
          erators are used: IS NULL (IS NOT NULL) and IS DISTINCT
          FROM (IS NOT DISTINCT FROM).
      The coalesce expression is often used to replace NULL values
      with something else, such as an empty string for text types
      or zero for numeric types.
      For more details, see the documentation: postgrespro.com/
      doc/functions-comparison.
      Joins
      A well-designed database should not contain redundant data.
      For example, the exams table must not contain student
      names, as this information can be found in another table
      by the number of the student ID card.
      For this reason, to get all the required values in a query, it is
      often necessary to join the data of several tables, specifying
      their names in the FROM clause:
      test=# SELECT * FROM courses, exams;
       c_no |     title    | hours | s_id | c_no | score
      -------+-------------+-------+------+-------+-------
       CS301 | Databases   |    30 | 1451 | CS301 |     5
       CS305 | Networks    |    60 | 1451 | CS301 |     5
       CS301 | Databases   |    30 | 1556 | CS301 |     5
 CS305 |   Networks     |     60   |   1556   |   CS301   |   5    43
 CS301 |   Databases    |     30   |   1451   |   CS305   |   5    iv
 CS305 |   Networks     |     60   |   1451   |   CS305   |   5
 CS301 |   Databases    |     30   |   1432   |   CS305   |   4
 CS305 |   Networks     |     60   |   1432   |   CS305   |   4
(8 rows)
This result is called the direct or Cartesian product of tables:
each row of one table is appended to each row of the other
table.
As a rule, you can get a more useful and informative result
if you specify the join condition in the WHERE clause. Let’s
match the courses to the corresponding exams to get all the
scores for all the courses:
test=# SELECT courses.title, exams.s_id, exams.score
FROM courses, exams
WHERE courses.c_no = exams.c_no;
    title    | s_id | score
-------------+------+-------
 Databases   | 1451 |     5
 Databases   | 1556 |     5
 Networks    | 1451 |     5
 Networks    | 1432 |     4
(4 rows)
Another way to join tables is to explicitly use the JOIN
keyword. Let’s display all the students and their scores for
the “Networks” course:
test=# SELECT students.name, exams.score
FROM students
JOIN exams
  ON students.s_id = exams.s_id
  AND exams.c_no = 'CS305';
44      name | score
 iv   --------+-------
       Anna    |    5
       Victor |     4
      (2 rows)
      From the database point of view, these queries are com-
      pletely equivalent, so you can use any approach that seems
      more natural.
      In this example, the result does not include any rows of the
      table specified on the left side of the join clause if they have
      no pair in the right table: although the condition is applied
      to the subjects, the students that did not take the exam in
      this subject are also excluded. To include all the students
      into the result, we have to use the outer join:
      test=# SELECT students.name, exams.score
      FROM students
      LEFT JOIN exams
        ON students.s_id = exams.s_id
        AND exams.c_no = 'CS305';
        name | score
      --------+-------
       Anna    |    5
       Victor |     4
       Nina    |
      (3 rows)
      Note that the rows of the left table that don’t have a counter-
      part in the right table are added to the result (that’s why the
      operation is called LEFT JOIN). The corresponding values of
      the right table are NULL in this case.
      The WHERE conditions are applied to the result of the join
      operation. Thus, if you move the subject restriction from the
      join condition to the WHERE clause, Nina will be excluded
      from the result because the corresponding exams.c_no is
      NULL:
test=# SELECT students.name, exams.score                           45
FROM students                                                      iv
LEFT JOIN exams ON students.s_id = exams.s_id
WHERE exams.c_no = 'CS305';
  name | score
--------+-------
 Anna    |    5
 Victor |     4
(2 rows)
Don’t be afraid of joins. It is a common operation for database
management systems, and PostgreSQL has a whole range of
efficient mechanisms to perform it. Do not join data at the
application level, let the database server do the job it is sure
to do better.
For more details, see the documentation: postgrespro.com/
doc/sql-select#SQL-FROM.
Subqueries
The SELECT operation returns a table, which can be displayed
as the query result (as we have already seen) or used in
another SQL query. Such a nested SELECT command in
parentheses is called a subquery.
If a subquery returns exactly one row and one column, you
can use it as a regular scalar expression:
test=# SELECT name,
  (SELECT score
   FROM exams
   WHERE exams.s_id = students.s_id
   AND exams.c_no = 'CS305')
FROM students;
46      name | score
 iv   --------+-------
       Anna    |    5
       Victor |     4
       Nina    |
      (3 rows)
      If a scalar subquery used in the list of SELECT expressions
      does not contain any rows, NULL is returned (as in the last row
      of the result in the example above). Thus, you can expand
      scalar subqueries by replacing them with a join, but it must
      be an outer join.
      Scalar subqueries can also be used in filtering conditions.
      Let’s display all the exams taken by the students enrolled
      after 2014:
      test=# SELECT *
      FROM exams
      WHERE (SELECT start_year FROM students
             WHERE students.s_id = exams.s_id) > 2014;
       s_id | c_no | score
      ------+-------+-------
       1556 | CS301 |     5
      (1 row)
      You can also add filtering conditions to subqueries returning
      an arbitrary number of rows. SQL offers several predicates
      for this purpose. For example, IN checks whether the table
      returned by the subquery contains the specified value.
      Let’s display all the students who have any scores in the
      specified course:
      test=# SELECT name, start_year
      FROM students
      WHERE s_id IN (SELECT s_id FROM exams
                     WHERE c_no = 'CS305');
  name | start_year                                              47
--------+------------                                            iv
 Anna    |     2014
 Victor |      2014
(2 rows)
There is also the NOT IN form of this predicate, which returns
the opposite result. For example, the following query returns
the list of students who did not get any excellent scores:
test=# SELECT name, start_year
FROM students
WHERE s_id NOT IN (SELECT s_id
                   FROM exams
                   WHERE score = 5);
  name | start_year
--------+------------
 Victor |      2014
(1 rows)
Note that this query result can also include those students
who have not received any scores at all.
Another option is to use the EXISTS predicate, which checks
whether the subquery returns at least one row. With this
predicate, you can rewrite the previous query as follows:
test=# SELECT name, start_year
FROM students
WHERE NOT EXISTS (SELECT s_id
                  FROM exams
                  WHERE exams.s_id = students.s_id
                  AND score = 5);
  name | start_year
--------+------------
 Victor |      2014
(1 rows)
48    For more details, see the documentation: postgrespro.com/
 iv   doc/functions-subquery.
      In the examples above, we appended table names to column
      names to avoid ambiguity, but it is not always sufficient. For
      example, the same table can be used in the query twice, or
      the FROM clause can contain a nameless subquery instead
      of a table name. In such cases, you can specify an arbitrary
      name after the query, which is called an alias. Regular tables
      can also be assigned an alias.
      Let’s display student names and their scores for the
      “Databases” course:
      test=# SELECT s.name, ce.score
      FROM students s
      JOIN (SELECT exams.*
            FROM courses, exams
            WHERE courses.c_no = exams.c_no
            AND courses.title = 'Databases') ce
        ON s.s_id = ce.s_id;
       name | score
      ------+-------
       Anna |     5
       Nina |     5
      (2 rows)
      Here “s” is a table alias, while “ce” is a subquery alias. You
      should choose an alias that is short but comprehensive.
      The same query can also be written without subqueries:
      test=# SELECT s.name, e.score
      FROM students s, courses c, exams e
      WHERE c.c_no = e.c_no
      AND c.title = 'Databases'
      AND s.s_id = e.s_id;
Sorting                                                          49
                                                                 iv
As we already know, table data is not sorted. To return the
rows in a particular order, we can use the ORDER BY clause
with the list of sorting expressions. After each expression
(sorting key), you can specify the sort direction: ASC for
ascending (used by default), DESC for descending.
test=# SELECT * FROM exams
ORDER BY score, s_id, c_no DESC;
 s_id | c_no | score
------+-------+-------
 1432 | CS305 |     4
 1451 | CS305 |     5
 1451 | CS301 |     5
 1556 | CS301 |     5
(4 rows)
Here the rows are first sorted by the score, in ascending
order. For the same scores, the rows are further sorted by
the student ID card number, also in ascending order. If the
first two keys are the same, rows are additionally sorted by
the course number, in descending order.
It makes sense to do sorting at the end of the query, right
before returning the result; this operation is usually useless
in subqueries.
For more details, see the documentation: postgrespro.com/
doc/sql-select#SQL-ORDERBY.
Grouping
When grouping is used, the query returns a single row with
the value calculated based on several rows of data stored
50    in the original tables. Grouping typically involves aggregate
 iv   functions. For example, this is how we can display the total
      number of exams taken, the number of students who passed
      the exams, and the average score:
      test=# SELECT count(*), count(DISTINCT s_id),
      avg(score)
      FROM exams;
       count | count |        avg
      -------+-------+--------------------
           4 |     3 | 4.7500000000000000
      (1 row)
      You can get similar information by the course number if you
      provide the GROUP BY clause with grouping keys:
      test=# SELECT c_no, count(*),
      count(DISTINCT s_id), avg(score)
      FROM exams
      GROUP BY c_no;
       c_no | count | count |          avg
      -------+-------+-------+--------------------
       CS301 |     2 |     2 | 5.0000000000000000
       CS305 |     2 |     2 | 4.5000000000000000
      (2 rows)
      For the full list of aggregate functions, see postgrespro.com/
      doc/functions-aggregate.
      In queries that use grouping, you may need to filter the
      rows based on the aggregation results. You can define such
      conditions in the HAVING clause. While the WHERE conditions
      are applied before grouping (and can use the columns of
      the original tables), the HAVING conditions take effect after
      grouping (so they can also use the columns of the resulting
      table).
      Let’s select the names of the students who got more than
      one excellent score (5), in any course:
test=# SELECT students.name                                      51
FROM students, exams                                             iv
WHERE students.s_id = exams.s_id AND exams.score = 5
GROUP BY students.name
HAVING count(*) > 1;
 name
------
 Anna
(1 row)
For more details, see the documentation: postgrespro.ru/
doc/sql-select#SQL-GROUPBY.
Changing and Deleting Data
To modify data in a table, you should use the UPDATE operator,
which provides new field values for rows defined by the WHERE
clause (like for the SELECT operator).
For example, let’s double the number of lecture hours for the
“Databases” course:
test=# UPDATE courses
SET hours = hours * 2
WHERE c_no = 'CS301';
UPDATE 1
For more details, see the documentation: postgrespro.com/
doc/sql-update.
Similarly, the DELETE operator deletes the rows defined by
the WHERE clause:
test=# DELETE FROM exams WHERE score < 5;
DELETE 1
52    Transactions
 iv
      Let’s extend our database schema a little bit and distribute
      our students between groups. Each group must have a
      monitor (a student of the same group responsible for the
      students’ activities). For this purpose, let’s create the groups
      table:
      test=# CREATE TABLE groups(
         g_no text PRIMARY KEY,
         monitor integer NOT NULL REFERENCES students(s_id)
      );
      CREATE TABLE
      Here we have applied the NOT NULL constraint, which forbids
      using undefined values.
      Now we need another column in the students table: the
      group number. Luckily, we can add a new column into the
      already existing table:
      test=# ALTER TABLE students
      ADD g_no text REFERENCES groups(g_no);
      ALTER TABLE
      Using the psql command, you can always view which
      columns are defined in the table:
      test=# \d students
            Table "public.students"
         Column   | Type    | Modifiers
      ------------+---------+----------
       s_id       | integer | not null
       name       | text    |
       start_year | integer |
       g_no       | text    |
       ...
You can also get the list of all the tables available in the     53
database:                                                        iv
test=# \d
          List of relations
 Schema |   Name   | Type | Owner
--------+----------+-------+----------
 public | courses | table | postgres
 public | exams    | table | postgres
 public | groups   | table | postgres
 public | students | table | postgres
(4 rows)
Now let’s create a new group called “A-101,” move all the
students into this group, and make Anna its monitor.
Note the following subtle point. We cannot create a group
without a monitor, but neither can a student become the
monitor of the group unless they are already a member of
this group—it would make our data logically incorrect and
inconsistent. Taken separately, these two operations make
no sense: they must be performed simultaneously. A group
of operations constituting an indivisible logical unit of work
is called a transaction.
So let’s start our transaction:
test=# BEGIN;
BEGIN
Next, we need to add a new group, together with its monitor.
Naturally, we cannot remember all the students’ ID, so we’ll
use the following query right inside the command that adds
new rows:
54    test=*# INSERT INTO groups(g_no, monitor)
 iv   SELECT 'A-101', s_id
      FROM students
      WHERE name = 'Anna';
      INSERT 0 1
      The asterisk in the prompt reminds us that the transaction is
      not yet completed.
      Now let’s open a new terminal window and launch another
      psql process: this session will be running in parallel with the
      first one. To avoid confusion, we will indent the commands
      of the second session.
      Will the second session see the changes made in the first
      session?
           postgres=# \c test
           You are now connected to database "test" as user
           "postgres".
           test=# SELECT * FROM groups;
            g_no | monitor
           ------+---------
           (0 rows)
      No, since the transaction is not yet completed.
      To continue with our transaction, let’s move all students to
      the newly created group:
      test=*# UPDATE students SET g_no = 'A-101';
      UPDATE 3
      The second session still gets consistent data, which was
      already present in the database when the uncommitted
      transaction was started.
     test=# SELECT * FROM students;                            55
      s_id | name | start_year | g_no                          iv
     ------+--------+------------+------
      1451 | Anna   |       2014 |
      1432 | Victor |       2014 |
      1556 | Nina   |       2015 |
     (3 rows)
Let’s commit all our changes to complete the transaction:
test=*# COMMIT;
COMMIT
Finally, the second session receives all the changes made by
this transaction, as if they appeared all at once:
     test=# SELECT * FROM groups;
       g_no | monitor
     -------+---------
      A-101 |    1451
     (1 row)
     test=# SELECT * FROM students;
      s_id | name | start_year | g_no
     ------+--------+------------+-------
      1451 | Anna   |       2014 | A-101
      1432 | Victor |       2014 | A-101
      1556 | Nina   |       2015 | A-101
     (3 rows)
It is guaranteed that several important properties of the
database system are always observed.
First of all, any transaction is executed either completely
(like in the example above), or not at all. If at least one
56    of the commands results in an error, or we have aborted
 iv   the transaction with the ROLLBACK command, the database
      stays in the same state as before the BEGIN command. This
      property is called atomicity.
      Second, when a transaction is committed, all integrity con-
      straints must hold true, otherwise the transaction has to be
      aborted. The data is consistent when the transaction starts,
      and it remains consistent at the end of the transaction, which
      gives this property its name: consistency.
      Third, as the example has shown, other users will never
      see inconsistent data not yet committed by the transaction.
      This property is called isolation. Thanks to this property,
      the database system can serve multiple sessions in parallel,
      without sacrificing data consistency.
      PostgreSQL is known for a very efficient implementation of
      isolation: several sessions can perform reads and writes
      in parallel, without blocking each other. Blocking occurs
      only if two different processes try changing the same row
      simultaneously.
      And finally, durability is guaranteed: the committed data is
      never lost even in case of a failure (if the database is set up
      correctly and is regularly backed up, of course).
      These properties are extremely important; it is impossible to
      imagine a relational database management system without
      them.
      To learn more about transactions, see postgrespro.com/doc/
      tutorial-transactions (Even more details are available here:
      postgrespro.com/doc/mvcc).
Useful psql Commands                                           57
                                                               iv
\?           Command-line reference for psql.
\h           SQL Reference: the list of available commands
             or the syntax of a particular command.
\x           A switch that toggles between the regular table
             display (rows and columns) and an extended
             display (with each column printed on a separate
             line). This is useful for viewing several wide
             rows.
\l           List of databases.
\du          List of users.
\dt          List of tables.
\di          List of indexes.
\dv          List of views.
\df          List of functions.
\dn          List of schemas.
\dx          List of installed extensions.
\dp          List of privileges.
\d name      Detailed information about the specified ob-
             ject.
\d+ name     Extended detailed information about the speci-
             fied object.
\timing on   Displays operator execution time.
58    Conclusion
 iv
      We have managed to cover only a tiny bit of what you need to
      know about PostgreSQL, but we hope that you have seen it
      for yourself that it’s not at all hard to start using this database
      system. The SQL language enables you to construct queries
      of various complexity, while PostgreSQL provides an effective
      implementation and high-quality support of the standard.
      Try it yourself and experiment!
      And one more important psql command. To end the session,
      enter:
      test=# \q
V Demo Database
About the Demo Database
Overview
To get to grips with more complex queries, we need to create
a more substantial database (with not just three but eight
tables) and fill it up with some reasonable data.
We have selected airline flights as the subject area. Our
database contains statistics on all the flights of our not-
yet-existing airline company that were performed within a
particular timeframe. These scenarios must be familiar to
anyone who has ever traveled by plane, but we’ll explain
everything anyway.
The database schema is shown on p. 61. We tried to make the
database schema as simple as possible, without overloading
it with unnecessary details, but not too simple to allow
writing interesting and meaningful queries.
The main entity in our schema is a booking (mapped to
the bookings table). Each booking can include several
passengers, with a separate ticket issued for each passenger
(tickets). We do not have any reliable unique ID for a
passenger as a person (who might have flown with our airline
multiple times), so the passenger does not constitute a
60   separate entity. We will assume that all the passengers are
 v   unique.
     Each ticket always contains one or more flight segments
     (ticket_flights). Several flight segments can be included
     into a single ticket either if there are no direct flights
     between the points of departure and destination (so a multi-
     leg flight is required), or if it is a round-trip ticket.
     It is assumed that all tickets in a single booking have
     the same flight segments, even though there is no such
     constraint in the schema.
     Each flight (flights) goes from one airport (airports) to
     another. Flights with the same flight number have the same
     points of departure and destination but different departure
     dates.
     At flight check-in, each passenger is issued a boarding pass
     (boarding_passes), where the seat number is specified. Pas-
     sengers can check in for the flight only if they have a ticket
     for this flight. The flight/seat combination must be unique
     to avoid issuing several boarding passes for the same seat.
     The number of seats in the aircraft and their distribution
     between different travel classes depend on the specific
     model of the aircraft performing the flight. It is assumed that
     each aircraft model has only one cabin configuration. The
     database schema does not include any checks on whether
     the seat specified in the boarding pass is actually available
     in the particular aircraft.
     In the sections that follow, we’ll describe each of the tables,
     as well as additional views and functions. You can always
     use the \d+ command to get the exact definition of any table,
     including data types and column descriptions.
                                                    Airports
      Bookings
                                                #   airport_code
    # book_ref                                  ∗   airport_name
    ∗ book_date                                 ∗   city
    ∗ total_amount                              ∗   coordinates
                                                ∗   timezone
                                                     Flights
                                           #   flight_id
       Tickets                                 flight_no
                         Ticket_flights
                                           ∗
                                               scheduled_departure       Aircrafts
#   ticket_no
                                           ∗
                     #   ticket_no             scheduled_arrival
    book_ref                                                          # aircraft_code
                                           ∗
                     #   flight_id             departure_airport
∗
    passenger_id
                                           ∗
                         fare_conditions       arrival_airport
∗                                                                     ∗ model
    passenger_name
                     ∗                     ∗
                         amount                status
∗                                                                     ∗ range
    contact_data
                     ∗                     ∗
                                               aircraft_code
∗
                                           ∗
                                           ◦   actual_departure
                                           ◦   actual_arrival
                     Boarding_passes
                                                                          Seats
                         #   ticket_no
                                                                     # aircraft_code
                         #   flight_id
                                                                     # seat_no
                         ∗   boarding_no
                             seat_no
                                                                     ∗ fare_conditions
                         ∗
                                                                                     v
                                                                                     61
                                                                                         1
62
 v
Bookings                                                           63
                                                                   v
To fly with our airline, passengers book the required tickets in
advance (book_date, which must be within one month before
the flight). The booking is identified by its number (book_ref,
a six-position combination of letters and digits).
The total_amount field stores the total price of all tickets
included into the booking, for all passengers.
Tickets
A ticket has a unique number (ticket_no), which consists of
13 digits.
The ticket contains the number of the passenger’s identity
document (passenger_id), as well as the passenger’s first
and last names (passenger_name) and contact information
(contact_data).
Note that neither the ID nor the name of the passenger is
permanent (for example, one can change the last name or
passport), so it is impossible to uniquely identify all tickets
of a particular passenger. For simplicity, let’s assume that all
the passengers are unique.
Flight Segments
A flight segment connects a ticket with a flight and is
identified by their numbers.
Each flight segment has its price (amount) and travel class
(fare_conditions).
64   Flights
 v
     A unique ID can be either natural (if it is related to real-
     life objects) or surrogate (if it is generated by the system,
     typically as an increasing sequence of numbers).
     The natural composite key of the flights table consists of
     the flight number (flight_no) and the date of the departure
     (scheduled_departure). To make foreign keys that refer to
     this table a bit shorter, a surrogate key flight_id is used as
     the primary key.
     A flight always connects two points: departure_airport
     and arrival_airport.
     There is no such entity as a “connecting flight”: if there are no
     direct flights from one airport to another, the ticket simply
     includes all the required flight segments.
     Each flight has a scheduled date and time of departure and ar-
     rival (scheduled_departure and scheduled_arrival). The
     actual_departure and actual_arrival times may differ
     from the scheduled ones: the difference is usually insignifi-
     cant, but sometimes can be up to several hours if the flight
     is delayed.
     Flight status can take one of the following values:
     •   Scheduled
         The flight can be booked. This value is set one month
         before the planned departure date; at this point, the
         information about the flight is entered into the database.
     •   On Time
         The flight is open for check-in (twenty-four hours before
         the scheduled departure) and is not delayed.
•   Delayed                                                        65
    The flight is open for check-in (twenty-four hours before      v
    the scheduled departure), but is delayed.
•   Departed
    The aircraft has already departed and is airborne.
•   Arrived
    The aircraft has reached the point of destination.
•   Cancelled
    The flight is cancelled.
Airports
An airport is identified by a three-letter airport_code and
has an airport_name.
There is no separate entity for the city; a city name is
simply an airport attribute, which is required to identify
all the airports of the same city. The table also includes
coordinates (longitude and latitude) and the timezone.
Boarding Passes
At the time of check-in, which opens twenty-four hours before
the scheduled departure, the passenger is issued a boarding
pass. Just like the flight segment, the boarding pass is
identified by the combination of ticket and flight numbers.
Boarding pass numbers (boarding_no) are assigned sequen-
tially, in the order of check-ins for the flight (this number is
unique only within the context of a particular flight). The
boarding pass specifies the seat number (seat_no).
66   Aircraft
 v
     To identify an aircraft model, a three-digit aircraft_code
     is employed. The table also includes the name of the air-
     craft model and the maximum flying distance, in kilometers
     (range).
     Seats
     Seats define the cabin configuration of each aircraft model.
     Each seat has a number (seat_no) and an assigned travel
     class (fare_conditions): Economy, Comfort, or Business.
     Flights View
     There is a flights_v view built over the flights table.
     Views can be queried in the same way as tables, but they
     do not store any data: they simply perform a particular query.
     The following psql command displays the definition of the
     view and its query:
     postgres=# \d+ flights_v
     This view adds the following information:
     •   details about the airport of departure
         departure_airport, departure_airport_name,
         departure_city
     •   details about the airport of arrival
         arrival_airport, arrival_airport_name,
         arrival_city
     •   local departure time
         scheduled_departure_local, actual_departure_local
•   local arrival time                                         67
    scheduled_arrival_local, actual_arrival_local              v
•   flight duration
    scheduled_duration, actual_duration
Routes View
The flights table contains some redundancies, which you
can use to get route information that does not depend on
the exact flight dates (flight number, airports of departure
and destination, aircraft model).
This information constitutes the routes view. Besides, this
view shows the days_of_week array representing days of the
week on which flights are performed, and the planned flight
duration.
The “now” Function
The demo database contains a snapshot of data, similar to a
backup of a real system captured at some point in time. For
example, if a flight has the Departed status, it means that
the aircraft was airborne at the time the backup was taken.
The snapshot time is saved in the bookings.now function.
You can use this function in demo queries for cases that
would typically require calling the now function.
Besides, the return value of this function determines the
version of the demo database. The latest version available
at the time of this publication is of August 15, 2017.
68   Installation
 v
     Installation from the Website
     The demo database is available in three flavors, which differ
     only in the data size:
     •   edu.postgrespro.com/demo-small-en.zip
         A small database with flight data for one month (21 MB,
         DB size is 280 MB).
     •   edu.postgrespro.com/demo-medium-en.zip
         A medium database with flight data for three months
         (62 MB, DB size is 702 MB).
     •   edu.postgrespro.com/demo-big-en.zip
         A large database with flight data for one year
         (232 MB, DB size is 2638 MB).
     A small database is quite suitable for learning how to write
     queries, but if you would like to deal with query optimization
     specifics, choose the large database: then you’ll be able to
     see how queries work on large volumes of data.
     The files contain a logical backup of the demo database
     created with the pg_dump utility. Note that if you already
     have some database named demo, it will be dropped and
     restored from this backup. The user who runs the script
     becomes the owner of this database.
     To install the demo database on a Linux system, switch to the
     postgres user and download the corresponding file. For ex-
     ample, to download the small database, do the following:
     $ sudo su - postgres
     $ wget https://edu.postgrespro.com/demo-small-en.zip
Then run the following command:                                    69
                                                                   v
$ zcat demo-small-en.zip | psql
On Windows, download the edu.postgrespro.com/demo-
small-en.zip file, double-click it to open the archive, and copy
the demo-small-en-20170815.sql file into the C:\Program
Files\PostgreSQL\15 directory.
The pgAdmin application (described on p. 109) does not allow
us to restore the database from such a backup. So you should
start psql (by clicking the “SQL Shell (psql)” shortcut) and run
the following command:
postgres# \i demo-small-en-20170815.sql
If the file is not found, check the “Start in” property of the
shortcut; the file must be located in this directory.
Sample Queries
A Couple of Words about the Schema
The installation is complete. Now launch psql and connect
to the demo database:
postgres=# \c demo
You are now connected to database "demo" as user
"postgres".
The bookings schema stores all the entities that we need.
When you are connected to the database, this schema is used
automatically, so there is no need to specify it explicitly:
70   demo=# SELECT * FROM aircrafts;
 v
      aircraft_code |        model        | range
     ---------------+---------------------+-------
      773           | Boeing 777-300      | 11100
      763           | Boeing 767-300      | 7900
      SU9           | Sukhoi Superjet-100 | 3000
      320           | Airbus A320-200     | 5700
      321           | Airbus A321-200     | 5600
      319           | Airbus A319-100     | 6700
      733           | Boeing 737-300      | 4200
      CN1           | Cessna 208 Caravan | 1200
      CR2           | Bombardier CRJ-200 | 2700
     (9 rows)
     However, we must specify the schema for the bookings.now
     function, as we have to differentiate it from the standard now
     function:
     demo=# SELECT bookings.now();
               now
     ------------------------
     2017-08-15 18:00:00+03
     (1 row)
     The following query returns cities and airports:
     demo=# SELECT airport_code, city
     FROM airports LIMIT 4;
      airport_code |      city
     --------------+---------------
      YKS          | Yakutsk
      MJZ          | Mirnyj
      KHV          | Khabarovsk
      PKC          | Petropavlovsk
     (4 rows)
The contents of the database is provided in English and in    71
Russian. You can switch between these languages by setting    v
the bookings.lang parameter to en or ru, respectively. By
default, the English language is selected. You can change
this setting as follows:
demo=# ALTER DATABASE demo SET bookings.lang = ru;
ALTER DATABASE
The language has been changed at the database level; now
we have to reconnect to the database.
demo=# \c
You are now connected to database "demo" as user
"postgres".
demo=# SELECT airport_code, city
FROM airports LIMIT 4;
 airport_code |           city
--------------+--------------------------
 YKS          | Якутск
 MJZ          | Мирный
 KHV          | Хабаровск
 PKC          | Петропавловск-Камчатский
(4 rows)
To understand how it works, take a look at the aircrafts or
airports definition using the \d+ psql command.
If you want to learn more about managing schemas, see the
documentation: postgrespro.com/doc/ddl-schemas.
For more details on setting configuration parameters, see
postgrespro.com/doc/config-setting.
72   Simple Queries
 v
     Let’s use this schema to discuss several problems, starting
     from the simplest questions and getting to more complex
     ones. Most of the problems are followed by a solution, but it
     is better to first try to come up with your own query without
     looking at the provided key if you want to learn SQL.
     Problem. Who traveled from Moscow (SVO) to Novosibirsk
     (OVB) on seat 1A the day before yesterday, and when was the
     ticket booked?
     Solution. “The day before yesterday” is counted from the
     booking.now value, not from the current date.
     SELECT t.passenger_name,
            b.book_date
     FROM   bookings b
            JOIN tickets t
              ON t.book_ref = b.book_ref
            JOIN boarding_passes bp
              ON bp.ticket_no = t.ticket_no
            JOIN flights f
              ON f.flight_id = bp.flight_id
     WHERE f.departure_airport = 'SVO'
     AND    f.arrival_airport = 'OVB'
     AND    f.scheduled_departure::date =
            bookings.now()::date - INTERVAL '2 day'
     AND    bp.seat_no = '1A';
     Problem. How many seats remained free on flight PG0404
     yesterday?
     Solution. There are several approaches to solving this prob-
     lem. One of the options is to use the NOT EXISTS expression
     to find the seats without boarding passes:
SELECT count(*)                                                    73
FROM   flights f                                                   v
       JOIN seats s
         ON s.aircraft_code = f.aircraft_code
WHERE f.flight_no = 'PG0404'
AND    f.scheduled_departure::date =
       bookings.now()::date - INTERVAL '1 day'
AND    NOT EXISTS (
         SELECT NULL
         FROM    boarding_passes bp
         WHERE bp.flight_id = f.flight_id
         AND     bp.seat_no = s.seat_no
       );
Another approach relies on the operation of set subtraction.
Different solutions give the same result but may sometimes
differ in performance, so you have to take it into account if it
matters.
SELECT count(*)
FROM
(
  SELECT s.seat_no
  FROM    seats s
  WHERE s.aircraft_code = (
     SELECT aircraft_code
     FROM   flights
     WHERE flight_no = 'PG0404'
     AND    scheduled_departure::date =
            bookings.now()::date - INTERVAL '1 day'
  )
  EXCEPT
  SELECT bp.seat_no
  FROM    boarding_passes bp
  WHERE bp.flight_id = (
     SELECT flight_id
     FROM   flights
     WHERE flight_no = 'PG0404'
     AND    scheduled_departure::date =
            bookings.now()::date - INTERVAL '1 day'
  )
) t;
74   Problem. Which flights had the longest delays? Print the list
 v   of ten “leaders.”
     Solution. The query needs to include only those flights that
     have already departed.
     SELECT    f.flight_no,
               f.scheduled_departure,
               f.actual_departure,
               f.actual_departure - f.scheduled_departure
               AS delay
     FROM      flights f
     WHERE     f.actual_departure IS NOT NULL
     ORDER BY f.actual_departure - f.scheduled_departure
               DESC
     LIMIT 10;
     You can define the same condition using the status column
     by listing all the applicable statuses. Or you can skip the
     WHERE condition altogether by specifying the DESC NULLS
     LAST sorting order, so that undefined values are returned at
     the end of the selection.
     Aggregate Functions
     Problem. What is the shortest flight duration for each
     possible flight from Moscow to St. Petersburg, and how many
     times was the flight delayed for more than an hour?
     Solution. To solve this problem, it is convenient to use the
     available flights_v view instead of dealing with table joins.
     You need to take into account only those flights that have
     already arrived.
SELECT   f.flight_no,                                          75
         f.scheduled_duration,                                 v
         min(f.actual_duration),
         max(f.actual_duration),
         sum(CASE WHEN f.actual_departure >
                       f.scheduled_departure +
                       INTERVAL '1 hour'
                  THEN 1 ELSE 0
             END) delays
FROM     flights_v f
WHERE    f.departure_city = 'Moscow'
AND      f.arrival_city = 'St. Petersburg'
AND      f.status = 'Arrived'
GROUP BY f.flight_no,
         f.scheduled_duration;
Problem. Find the most disciplined passengers who checked
in first for all their flights. Take into account only those
passengers who took at least two flights.
Solution. Use the fact that boarding pass numbers are issued
in the check-in order.
SELECT   t.passenger_name,
         t.ticket_no
FROM     tickets t
         JOIN boarding_passes bp
           ON bp.ticket_no = t.ticket_no
GROUP BY t.passenger_name,
         t.ticket_no
HAVING   max(bp.boarding_no) = 1
AND      count(*) > 1;
Problem. How many passengers can be included into a single
booking?
Solution. Let’s count the number of passengers in each
booking and then find the number of bookings for each
number of passengers.
76   SELECT   tt.cnt,
 v            count(*)
     FROM     (
                SELECT   t.book_ref,
                         count(*) cnt
                FROM     tickets t
                GROUP BY t.book_ref
              ) tt
     GROUP BY tt.cnt
     ORDER BY tt.cnt;
     Window Functions
     Problem. For each ticket, display all the included flight
     segments, together with connection time. Limit the result
     to the tickets booked a week ago.
     Solution. Use window functions to avoid accessing the same
     data twice.
     SELECT tf.ticket_no,
            f.departure_airport,
            f.arrival_airport,
            f.scheduled_arrival,
            lead(f.scheduled_departure) OVER w
              AS next_departure,
            lead(f.scheduled_departure) OVER w -
              f.scheduled_arrival AS gap
     FROM   bookings b
            JOIN tickets t
              ON t.book_ref = b.book_ref
            JOIN ticket_flights tf
              ON tf.ticket_no = t.ticket_no
            JOIN flights f
              ON tf.flight_id = f.flight_id
     WHERE b.book_date =
            bookings.now()::date - INTERVAL '7 day'
     WINDOW w AS (PARTITION BY tf.ticket_no
                  ORDER BY f.scheduled_departure);
As you can see, the time cushion between flights can reach        77
up to several days: round-trip tickets and one-way tickets        v
are treated in the same way, and the time of the stay in the
point of destination is treated just like the time between con-
necting flights. Using the solution for one of the problems in
the “Arrays” section, you can take this fact into account when
building the query.
Problem. Which are the most frequent combinations of first
and last names? What is the ratio of the passengers with
such names to the total number of passengers?
Solution. The total number of passengers is calculated using
a window function.
SELECT passenger_name,
       round( 100.0 * cnt / sum(cnt) OVER (), 2)
       AS percent
FROM   (
         SELECT   passenger_name,
                  count(*) cnt
         FROM     tickets
         GROUP BY passenger_name
       ) t
ORDER BY percent DESC;
Problem. Solve the previous problem for first names and last
names separately.
Solution. Let’s take a look at how to count first names.
The query for counting last names will differ only by the p
subquery.
As this complex query shows, you should avoid using a single
text field for different values if you are going to use them
separately; in scientific terms, it is called the first normal
form.
78   WITH p AS (
 v     SELECT left(passenger_name,
                   position(' ' IN passenger_name))
              AS passenger_name
       FROM   tickets
     )
     SELECT passenger_name,
            round( 100.0 * cnt / sum(cnt) OVER (), 2)
            AS percent
     FROM   (
              SELECT   passenger_name,
                       count(*) cnt
              FROM     p
              GROUP BY passenger_name
            ) t
     ORDER BY percent DESC;
     Arrays
     Problem. There is no indication whether the ticket is one-way
     or round-trip. However, you can figure it out by comparing
     the first point of departure to the last point of destination.
     Display airports of departure and destination for each ticket,
     ignoring connections, and specify whether it’s a round-trip
     ticket or not.
     Solution. One of the easiest solutions is to work with an array
     of airports converted from the list of airports in the itinerary
     using the array_agg aggregate function.
     We select the middle element of the array as the airport of
     destination, assuming that the outbound and inbound ways
     have the same number of stops.
     In this example, the tickets table is scanned only once. The
     array of airports is displayed for clarity; for large volumes of
     data, it makes sense to remove it from the query since extra
     data can hamper performance.
WITH t AS (                                                    79
  SELECT ticket_no,                                            v
           a,
           a[1]                   departure,
           a[cardinality(a)]      last_arrival,
           a[cardinality(a)/2+1] middle
  FROM (
     SELECT     t.ticket_no,
                array_agg( f.departure_airport
                  ORDER BY f.scheduled_departure) ||
               (array_agg( f.arrival_airport
                  ORDER BY f.scheduled_departure DESC)
               )[1] AS a
     FROM       tickets t
                JOIN ticket_flights tf
                  ON tf.ticket_no = t.ticket_no
                JOIN flights f
                  ON f.flight_id = tf.flight_id
     GROUP BY t.ticket_no
  ) t
)
SELECT t.ticket_no,
        t.a,
        t.departure,
        CASE
           WHEN t.departure = t.last_arrival
              THEN t.middle
           ELSE t.last_arrival
        END arrival,
        (t.departure = t.last_arrival) return_ticket
FROM    t;
Problem. Find the round-trip tickets in which the outbound
route differs from the inbound one.
Problem. Find pairs of airports with inbound and outbound
flights departing on different days of the week.
Solution. The part of the problem that involves building an
array of days of the week is virtually solved in the routes
view. You only have to find the intersection of arrays using
the && operator and make sure it’s empty.
80   SELECT r1.departure_airport,
 v          r1.arrival_airport,
            r1.days_of_week dow,
            r2.days_of_week dow_back
     FROM   routes r1
            JOIN routes r2
              ON r1.arrival_airport = r2.departure_airport
             AND r1.departure_airport = r2.arrival_airport
     WHERE NOT (r1.days_of_week && r2.days_of_week);
     Recursive Queries
     Problem. How can you get from Ust-Kut (UKX) to Neryungri
     (CNN) with the minimal number of connections? What will
     the flight time be?
     Solution. Here you virtually have to find the shortest path
     in the graph. We will use a recursive query to complete this
     task.
     A detailed step-by-step explanation of this query is published
     at habr.com/en/company/postgrespro/blog/490228/, so we’ll
     only provide some brief comments here.
     Infinite looping is prevented by checking the hops array,
     which is built while the query is being executed.
     Note that the breadth-first search is performed, so the first
     path that is found will be the shortest one connection-wise.
     To avoid looping through other paths (that can be numerous
     and are definitely longer than the already found one), the
     found attribute is used. It is calculated using the bool_or
     window function.
WITH RECURSIVE p(                                             81
  last_arrival,                                               v
  destination,
  hops,
  flights,
  flight_time,
  found
) AS (
  SELECT a_from.airport_code,
         a_to.airport_code,
         array[a_from.airport_code],
         array[]::char(6)[],
         interval '0',
         a_from.airport_code = a_to.airport_code
  FROM   airports a_from,
         airports a_to
  WHERE a_from.airport_code = 'UKX'
  AND    a_to.airport_code = 'CNN'
  UNION ALL
  SELECT r.arrival_airport,
         p.destination,
         (p.hops || r.arrival_airport)::char(3)[],
         (p.flights || r.flight_no)::char(6)[],
         p.flight_time + r.duration,
         bool_or(r.arrival_airport = p.destination)
            OVER ()
  FROM   p
         JOIN routes r
            ON r.departure_airport = p.last_arrival
  WHERE NOT r.arrival_airport = ANY(p.hops)
  AND    NOT p.found
)
SELECT hops,
       flights,
       flight_time
FROM   p
WHERE p.last_arrival = p.destination;
It is useful to compare this query with its simpler variant
without the found trick.
To learn more about recursive queries, see the documenta-
tion: postgrespro.com/doc/queries-with.
82   Problem. What is the maximum number of connections that
 v   can be required to get from any airport to any other airport?
     Solution. We can take the previous query as the basis for the
     solution. However, the first iteration must now contain all
     the possible airport pairs, not just a single pair: each airport
     must be connected to all the other airports. For all these pairs
     of airports we first find the shortest path, and then select the
     longest of them.
     Clearly, it is only possible if the routes graph is connected,
     but our demo database satisfies this condition.
     WITH RECURSIVE p(
       departure,
       last_arrival,
       destination,
       hops,
       found
     ) AS (
       SELECT a_from.airport_code,
              a_from.airport_code,
              a_to.airport_code,
              array[a_from.airport_code],
              a_from.airport_code = a_to.airport_code
       FROM   airports a_from,
              airports a_to
       UNION ALL
       SELECT p.departure,
              r.arrival_airport,
              p.destination,
              (p.hops || r.arrival_airport)::char(3)[],
              bool_or(r.arrival_airport = p.destination)
                 OVER (PARTITION BY p.departure, p.destination)
       FROM   p JOIN routes r
                 ON r.departure_airport = p.last_arrival
       WHERE NOT r.arrival_airport = ANY(p.hops)
       AND    NOT p.found
     )
     SELECT max(cardinality(hops)-1)
     FROM   p
     WHERE p.last_arrival = p.destination;
This query also uses the found attribute, but here it should   83
be calculated separately for each pair of airports.            v
Problem. Find the shortest route from Ust-Kut (UKX) to
Negungri (CNN) from the flight time perspective (ignoring
connection time).
Solution. To avoid infinite looping, we use the CYCLE clause
introduced in PostgreSQL 14.
WITH RECURSIVE p(
  last_arrival,
  destination,
  flights,
  flight_time,
  min_time
) AS (
  SELECT a_from.airport_code,
         a_to.airport_code,
         array[]::char(6)[],
         interval '0',
         NULL::interval
  FROM   airports a_from,
         airports a_to
  WHERE a_from.airport_code = 'UKX'
  AND    a_to.airport_code = 'CNN'
  UNION ALL
  SELECT r.arrival_airport,
         p.destination,
         (p.flights || r.flight_no)::char(6)[],
         p.flight_time + r.duration,
         least(
            p.min_time, min(p.flight_time+r.duration)
            FILTER (
              WHERE r.arrival_airport = p.destination
            ) OVER ()
         )
  FROM   p
         JOIN routes r
            ON r.departure_airport = p.last_arrival
  WHERE p.flight_time + r.duration <
         coalesce(p.min_time, INTERVAL '1 year')
) CYCLE last_arrival SET is_cycle USING hops
84   SELECT hops,
 v          flights,
            flight_time
     FROM   (
              SELECT hops,
                     flights,
                     flight_time,
                     min(min_time) OVER () min_time
              FROM   p
              WHERE p.last_arrival = p.destination
            ) t
     WHERE flight_time = min_time;
     Note that the found route may be suboptimal with regards
     to the number of connections.
     Functions and Extensions
     Problem. Find the distance between Kaliningrad (KGD) and
     Petropavlovsk-Kamchatsky (PKC).
     Solution. The airports table contains airport coordinates.
     To precisely calculate the distance between remote points,
     you must take into account the non-trivial shape of the Earth.
     This task is best performed by the PostGIS extension, which
     can approximate the Earth surface as a geoid.
     But a simple spherical model will also do for our purpose.
     Let’s use the earthdistance and then convert the result
     from miles to kilometers.
     CREATE EXTENSION IF NOT EXISTS cube;
     CREATE EXTENSION IF NOT EXISTS earthdistance;
SELECT round(                                                  85
         (a_from.coordinates <@> a_to.coordinates) *           v
         1.609344
       )
FROM   airports a_from,
       airports a_to
WHERE a_from.airport_code = 'KGD'
AND    a_to.airport_code = 'PKC';
Problem. Draw the graph of flights between all the airports.
VI PostgreSQL
   for Applications
A Separate User
In the previous chapter, we showed how to connect to the
database server on behalf of the postgres user. This is
the only database user available right after the PostgreSQL
installation. But since the postgres user is a superuser,
it should not be used to connect to the database from an
application. It is better to create a new user and make it the
owner of a separate database; then its rights will be limited
to this database.
postgres=# CREATE USER app PASSWORD 'p@ssw0rd';
CREATE ROLE
postgres=# CREATE DATABASE appdb OWNER app;
CREATE DATABASE
To learn about users and privileges, see: postgrespro.com/
doc/user-manag and postgrespro.com/doc/ddl-priv.
To connect to a new database and start working with it on
behalf of the newly created user, run:
postgres=# \c appdb app localhost 5432
88    Password for user app: ***
 vi   You are now connected to database "appdb" as user
      "app" on host "127.0.0.1" at port "5432".
      appdb=>
      This command takes four parameters, in the following order:
      database name (appdb), username (app), node (localhost or
      127.0.0.1), and port number (5432).
      Note that the database name is not the only thing that has
      changed in the prompt: instead of the hash symbol (#), the
      greater than sign is displayed (>). The hash symbol indicates
      the superuser rights, similar to the root user in Unix.
      The app user can work with the database without any restric-
      tions. For example, this user can create a table:
      appdb=> CREATE TABLE greeting(s text);
      CREATE TABLE
      appdb=> INSERT INTO greeting VALUES ('Hello, world!');
      INSERT 0 1
      Remote Connections
      In our example, both the client and the database are located
      on the same system. Clearly, you can install PostgreSQL
      onto a separate server and connect to it from a different
      system (for example, from an application server). In this
      case, you must specify your database server address instead
      of localhost. But it is not enough: for security reasons,
      PostgreSQL only allows local connections by default.
To connect to the database from the outside, you must edit       89
two files.                                                       vi
First of all, modify the postgresql.conf file, which contains
the main configuration settings. It is usually located in the
data directory.
Find the line defining network interfaces for PostgreSQL to
listen on:
    #listen_addresses = 'localhost'
We have to replace it with:
    listen_addresses = '*'
Next, edit the pg_hba.conf file with authentication set-
tings.
When a client tries to connect to the server, PostgreSQL
searches this file for the first line that matches the connec-
tion by four parameters: connection type, database name,
username, and client IP address. This line also specifies how
the user must confirm their identity.
For example, on Debian and Ubuntu, this file includes the
following setting among others (the top line starting with
the hash symbol is a comment):
#    TYPE   DATABASE     USER    ADDRESS    METHOD
    local        all      all                 peer
It means that local connections (local) to any database (all)
on behalf of any user (all) must be validated by the peer
authentication method (clearly, an IP address is not required
for local connections).
90    The peer method means that PostgreSQL requests the cur-
 vi   rent username from the operating system and assumes that
      the OS has already performed the required authentication
      check (prompted for the password). This is why on Linux-
      like operating systems users usually don’t have to enter the
      password when connecting to a local server.
      But Windows does not support local connections, so this line
      looks as follows:
      #   TYPE    DATABASE    USER         ADDRESS    METHOD
          host         all     all    127.0.0.1/32       md5
      It means that network connections (host) to any database
      (all) on behalf of any user (all) from the local address
      (127.0.0.1) must be checked by the md5 method. This
      method requires the user to enter the password.
      To allow the app user to access the appdb database from any
      address upon providing a valid password, add the following
      line to the end of the pg_hba.conf file:
          host       appdb     app              all       md5
      After changing the configuration files, don’t forget to make
      the server re-read the settings:
      postgres=# SELECT pg_reload_conf();
      To learn more details about authentication settings, see
      postgrespro.com/doc/client-authentication.html
Pinging the Server                                               91
                                                                 vi
To access PostgreSQL from an application, you have to find an
appropriate library and install the corresponding database
driver. A driver is usually a wrapper for libpq (a standard
library that implements the client-server protocol for Post-
greSQL), but other implementations are also possible. The
library provides application developers with a convenient
way to access low-level features of the protocol.
Below we provide simple code snippets in several popular
languages. These examples can help you quickly check the
connection with the database system that you have installed
and set up.
The provided programs contain only the minimal viable code
to run a simple database query and display the result; there
is nothing else, not even error handling functionality. Don’t
take these code snippets as verbatim examples to follow.
If you are using a Windows system, to ensure the correct
display of extended character sets, you may need to switch
to a TrueType font (such as “Lucida Console” or “Consolas”) in
the Command Prompt window and change the code page.
For example, for the Russian language, run the following
commands:
C:\> chcp 1251
Active code page: 1251
C:\> set PGCLIENTENCODING=WIN1251
92    PHP
 vi
      PHP interacts with PostgreSQL via a special extension. On
      Linux, apart from the PHP itself, you also have to install the
      package with this extension:
      $ sudo apt-get install php-cli php-pgsql
      You can install PHP for Windows from the PHP website:
      windows.php.net/download. The extension for PostgreSQL
      is already included into the binary distribution, but you
      must find and uncomment (by removing the semicolon) the
      following line in the php.ini file:
          ;extension=php_pgsql.dll
      A sample program (test.php):
      <?php
         $conn = pg_connect('host=localhost port=5432 ' .
                            'dbname=appdb user=app ' .
                            'password=p@ssw0rd') or die;
         $query = pg_query('SELECT * FROM greeting') or die;
         while ($row = pg_fetch_array($query)) {
           echo $row[0].PHP_EOL;
         }
         pg_free_result($query);
         pg_close($conn);
      ?>
      Let’s execute this command:
      $ php test.php
      Hello, world!
      You can read about this PostgreSQL extension in PHP docu-
      mentation: php.net/manual/en/book.pgsql.php.
Perl                                                            93
                                                                vi
In the Perl language, database operations are implemented
via the DBI interface. On Debian and Ubuntu, Perl itself is
pre-installed, so you only need to install the driver:
$ sudo apt-get install libdbd-pg-perl
There are several Perl builds for Windows, which are listed
at perl.org/get.html. Popular builds of ActiveState Perl and
Strawberry Perl already include the driver required for Post-
greSQL.
A sample program (test.pl):
use DBI;
use open ':std', ':utf8';
my $conn = DBI->connect(
  'dbi:Pg:dbname=appdb;host=localhost;port=5432',
  'app','p@ssw0rd') or die;
my $query = $conn->prepare('SELECT * FROM greeting');
$query->execute() or die;
while (my @row = $query->fetchrow_array()) {
  print @row[0]."\n";
}
$query->finish();
$conn->disconnect();
Let’s execute this command:
$ perl test.pl
Hello, world!
The interface is described in the documentation:
metacpan.org/pod/DBD::Pg.
94    Python
 vi
      The Python language usually uses the psycopg library (pro-
      nounced as “psycho-pee-gee”) to work with PostgreSQL.
      On modern versions of Debian and Ubuntu, Python 3 is pre-
      installed, so you only need to add the corresponding driver:
      $ sudo apt-get install python3-psycopg2
      You can download Python for Windows from the python.org
      website. The psycopg library is available at initd.org/psycopg
      (choose the version that corresponds to the version of Python
      installed). You can also find all the required documentation
      there.
      A sample program (test.py):
      import psycopg2
      conn = psycopg2.connect(
        host='localhost', port='5432', database='appdb',
        user='app', password='p@ssw0rd')
      cur = conn.cursor()
      cur.execute('SELECT * FROM greeting')
      rows = cur.fetchall()
      for row in rows:
        print row[0]
      conn.close()
      Let’s execute this command:
      $ python3 test.py
      Hello, world!
Java                                                            95
                                                                vi
In Java, databases are accessed via the JDBC interface. We
are going to install Java SE 11; additionally, we will need a
package with the JDBC driver:
$ sudo apt-get install openjdk-11-jdk
$ sudo apt-get install libpostgresql-jdbc-java
You can download JDK for Windows from oracle.com/
technetwork/java/javase/downloads. The JDBC driver is avail-
able at jdbc.postgresql.org (choose the version that corre-
sponds to the JDK installed on your system). You can also
find all the required documentation there.
Let’s consider a sample program (Test.java):
import java.sql.*;
public class Test {
  public static void main(String[] args)
  throws SQLException {
    Connection conn = DriverManager.getConnection(
      "jdbc:postgresql://localhost:5432/appdb",
      "app", "p@ssw0rd");
    Statement st = conn.createStatement();
    ResultSet rs = st.executeQuery(
      "SELECT * FROM greeting");
    while (rs.next()) {
      System.out.println(rs.getString(1));
    }
    rs.close();
    st.close();
    conn.close();
  }
}
Compile and execute the program specifying the path to
the JDBC class driver (on Windows, paths are separated by
semicolons, not colons):
96    $ javac Test.java
 vi
      $ java -cp .:/usr/share/java/postgresql-jdbc4.jar \
      Test
      Hello, world!
      Backup
      Although our database contains only one table, it’s still worth
      taking care of data durability. While your application has
      little data, the easiest way to create a backup is to use the
      pg_dump utility:
      $ pg_dump appdb > appdb.dump
      If you open the resulting appdb.dump file in a text editor,
      you will see regular SQL commands that create all the appdb
      objects and fill them with data. You can pass this file to psql
      to restore the contents of the database. For example, you
      can create a new database and import all the data into it:
      $ createdb appdb2
      $ psql -d appdb2 -f appdb.dump
      This is the format in which the demo database described in
      the previous chapter is distributed.
      The pg_dump utility offers many features worth checking
      out: postgrespro.com/doc/app-pgdump. Some of them are
      available only if the data is dumped in a custom format. In
      this case, you have to use the pg_restore utility instead of
      psql to restore the data.
In any case, pg_dump can back up the contents of a single         97
database only. To make a backup of the whole cluster,             vi
including all the databases, users, and tablespaces, you
should use a different command: pg_dumpall.
Big serious projects require an elaborate and comprehensive
backup strategy. A better option here is a physical binary
copy of the cluster, which can be taken by the pg_basebackup
utility:
$ pg_basebackup -D backup
This command will create a backup of the whole database
cluster in the backup directory. To restore the cluster from
this copy, it is enough to move it to a data catalog and start
the server.
To learn more about the available backup and recovery tools,
see the documentation: postgrespro.com/doc/backup.
Built-in PostgreSQL features enable you to implement almost
everything you need, but you have to complete multi-step
workflows that lack automation. That’s why many companies
create their own backup tools. Postgres Professional also has
such a tool called pg_probackup, which is distributed for free.
This tool enables you to perform incremental backups at the
page level, ensure data integrity, use parallel execution and
compression when working with big volumes of information,
and implement various backup strategies.
Its full documentation is available at postgrespro.com/doc/
app-pgprobackup.
98    What’s next?
 vi
      Now you are ready to develop your application. With regards
      to the database, the application will always consist of two
      parts: server and client. The server part comprises everything
      that relates to the database system: tables, indexes, views,
      triggers, stored functions and procedures. The client part
      holds everything that works outside of the database and
      connects to it; from the database point of view, it doesn’t
      matter whether it’s a thick client or an application server.
      An important question that has no clear-cut answer: where
      should we place business logic?
      One of the popular approaches is to move the logic out of the
      database and implement it all on the client. It often happens
      when developers are unfamiliar with all the capabilities
      provided by a relational database system and prefer to rely
      on what they know well, that is, on the application code.
      In this case, the database becomes somewhat secondary
      to the application and only ensures data persistence, its
      reliable storage. Database systems can be often isolated by
      an additional abstraction level, such as an ORM tool that au-
      tomatically generates database queries from the constructs
      of the programming language familiar to developers. Such
      solutions are sometimes justified by the intent to develop an
      application that is portable to any database system.
      This approach has the right to exist: if such a system works
      and addresses all business objectives, why not?
      However, this solution also has some obvious drawbacks:
      •   Data consistency is ensured by the application.
          Instead of relying on the database system to ensure data
    consistency (and this is exactly what relational database        99
    systems are especially good at), all the required checks         vi
    are performed by the application. Rest assured that
    sooner or later your database will contain inconsistent
    data. You have to either fix these errors, or teach the appli-
    cation how to handle them. If the same database is used
    by several different applications, it’s simply impossible to
    do without the help of the database system.
•   Performance leaves much to be desired.
    ORM systems allow you to create an abstraction level over
    the database, but the quality of SQL queries they generate
    is rather questionable. As a rule, multiple small queries
    are executed, and each of them is quite fast on its own.
    Such a model can cope only with low load on small data
    volumes and is virtually impossible to optimize on the
    database side.
•   Application code gets more complicated.
    Using application-oriented programming languages, it’s
    impossible to write a really complex query that could be
    properly translated to SQL in an automated way. Thus,
    complex data processing (if it is needed, of course) has
    to be implemented at the application level, with all the
    required data retrieved from the database in advance,
    but it requires an extra data transfer over the network.
    Furthermore, such algorithms as scans, joins, sorting, and
    aggregation provided by database systems are guaran-
    teed to perform better than the application code, as they
    have been improved and optimized for years.
Obviously, to use all the database features, including in-
tegrity constraints and data handling logic in stored func-
tions, a careful analysis of its specifics and capabilities is
required. You have to master the SQL language to write
100    queries and learn one of the server programming languages
  vi   (typically, PL/pgSQL) to create functions and triggers. In
       return, you will get a reliable tool, one of the most important
       building blocks for any information system architecture.
       In any case, you have to decide for yourself where to imple-
       ment business logic: on the server side or on the client side.
       We’ll just note that there’s no need to go to extremes, as the
       truth often lies somewhere in the middle.
VII Configuring PostgreSQL
Basic Settings
The default settings allow us to start PostgreSQL on virtu-
ally any hardware, even on the weakest one. But for best
performance, the database configuration has to take into
account both physical characteristics of the server and a
typical application workload.
Here we’ll cover only some of the basic settings that must
be considered for a production-level database system. Fine-
tuning for a particular application requires additional knowl-
edge, which you can get, for example, in PostgreSQL database
administration courses (see p. 145).
Changing Configuration Parameters
To change a configuration parameter, you have to open the
postgresql.conf file and either find the required parameter
and modify its value, or add a new line at the end of the file:
it will have priority over the setting specified above in the
same file.
After changing the settings, you have to reload the server
configuration:
postgres=# SELECT pg_reload_conf();
102    Now check the current setting using the SHOW command. If
 vii   the parameter value has not changed, take a look into the
       server log: you might have made a mistake when editing the
       file.
       Instead of changing the file in a text editor, you can set the
       parameter value using an SQL command (it also requires the
       server configuration to be reloaded):
       postgres=# ALTER SYSTEM SET work_mem='128MB';
       Such settings get into the postgresql.auto.conf file and
       take priority over the values specified in the main file. The
       advantage of such method is that the new parameter values
       get validated at once.
       The Most Important Parameters
       It is highly important to pay attention to parameters that
       define how PostgreSQL uses RAM.
       The shared_buffers parameter defines the size of shared
       buffers, which are used to keep frequently used data in RAM
       to avoid extra disk access. A reasonable starting value is 25 %
       of all the RAM used by the server. Changing this parameter
       requires a server restart!
       The effective_cache_size value has no effect on memory
       allocation; it merely prompts the size of cache PostgreSQL
       can count on, including the operating system cache. The
       larger the value, the higher priority is given to indexes. You
       can start with 50–75 % of RAM.
       The work_mem parameter defines the amount of memory
       allocated for sorting, building hash tables when performing
joins, and other operations. The active use of temporary          103
files indicates that the allocated memory size is insufficient,   vii
which leads to performance degradation. In most cases, the
default value of 4 MB should be increased by at least several
times, but be cautious not to exceed the overall RAM size of
the server.
The maintenance_work_mem parameter defines the amount
of memory allocated for service processes. Higher values can
speed up indexing and vacuuming. This parameter is usually
set to a value that is several times higher than work_mem.
For example, for 32 GB of RAM, you can start with the
following settings:
shared_buffers = '8GB'
effective_cache_size = '24GB'
work_mem = '128MB'
maintenance_work_mem = '512MB'
The ratio of random_page_cost to seq_page_cost must match
the ratio of random disk access speed to sequential access
speed. By default, random access is assumed to be four times
slower than sequential one (which works well for regular
HDDs). For disk arrays and SSDs you should lower the value
of the random_page_cost parameter (but never change the
seq_page_cost value, which is set to 1).
For example, the following setting is appropriate for SSD
drives:
random_page_cost = 1.2
It’s very important to configure autovacuum. This process per-
forms “garbage collection” and several other critical system
104    tasks. This setting highly depends on a particular application
 vii   and its workload.
       In most cases, you can start with the following configura-
       tion:
       •   Reduce the autovacuum_vacuum_scale_factor value to
           0.01 to perform autovacuum more often and in smaller
           batches.
       •   Increase the autovacuum_vacuum_cost_limit value (or
           reduce autovacuum_vacuum_cost_delay) by 10 times to
           speed up autovacuum (for version 11 or lower).
       It’s equally important to properly configure the processes
       related to buffer cache and WAL maintenance, but the exact
       settings also depend on a particular application. For a start,
       you can set the checkpoint_completion_target parameter to
       0.9 (to spread out the load), increase checkpoint_timeout
       from 5 to 30 minutes (to reduce the overhead caused by
       checkpoints), and proportionally increase the max_wal_size
       value (for the same purpose).
       To learn tips and tricks for configuring various parameters,
       you can take the DBA2 course (p. 149).
       Connection Settings
       We have already covered this topic in the “PostgreSQL for
       Applications” chapter on p. 87, so here we can simply recall
       that you usually have to set the listen_addresses parameter
       to ’*’ and modify the pg_hba.conf configuration file to
       allow connections.
Bad Advice                                                       105
                                                                 vii
You can sometimes find advice about improving performance
that should never be followed:
•   Turning off autovacuum.
    Such “resource saving” will give some minor short-term
    performance benefits, but it will also lead to garbage
    accumulation in data and bloating of tables and indexes.
    Sooner or later your database system is sure to stop func-
    tioning normally. Autovacuum should never be turned off,
    it should be properly configured.
•   Turning off disk synchronization (fsync = off).
    Disabling fsync will indeed bring a tangible performance
    improvement, but any server crash (caused by either
    software or hardware failure) will lead to a complete loss
    of all databases. In this case, you can only restore the
    system from a backup (if you happen to have one).
PostgreSQL and 1C Solutions
PostgreSQL is officially supported by 1C, a popular Russian
ERP system. It’s a great opportunity to save a bunch of money
on expensive commercial database licenses.
As any other applications, 1C products will work much faster
if PostgreSQL is configured appropriately. Besides, there are
specific server parameters that are indispensable for working
with 1C.
Here we’ll provide some installation and setup instructions
that can help you get started.
106    Choosing PostgreSQL Version
 vii
       1C requires a custom patched version of PostgreSQL. You
       can download one from releases.1c.ru, or use Postgres Pro
       Standard or Postgres Pro Enterprise, which also include all
       the required patches.
       PostgreSQL can work on Windows as well, but if you have a
       choice, it’s better to opt for a Linux distribution.
       Before you start the installation, you have to decide whether
       a dedicated database server is required. A dedicated server
       offers higher performance because of better load balancing
       between the application server and the database server.
       Configuration Parameters
       Physical specifications of the server must match the expected
       load. You can use the following data as a baseline: a dedi-
       cated 8-core server with 8 GB of RAM and a disk subsystem
       with RAID1 SSD should be enough for a database of 100 GB,
       the total number of 50 users, and up to 2 000 documents
       per day. If the server is not dedicated, PostgreSQL must get
       the corresponding amount of resources from the common
       server.
       Based on the general recommendations listed above and 1C
       application specifics, we can suggest the following initial
       settings for such a server:
       # Mandatory settings for 1C
       standard_conforming_strings = off
       escape_string_warning = off
       shared_preload_libraries = 'online_analyze, plantuner'
       plantuner.fix_empty_table = on
online_analyze.enable = on                                       107
online_analyze.table_type = 'temporary'                          vii
online_analyze.local_tracking = on
online_analyze.verbose = off
# The following settings depend on the available RAM
shared_buffers = '2GB'         # 25% of RAM
effective_cache_size = '6GB'   # 75% of RAM
work_mem = '64MB'              # 64-128MB
maintenance_work_mem = '256MB' # 4*work_mem
# Active use of temporary tables
temp_buffers = '32MB'          # 32-128MB
# The default value of 64 is not enough
max_locks_per_transaction = 256
Connection Settings
Make sure that the listen_addresses parameter in the post-
gresql.conf file is set to ’*’.
Add the following line at the start of the pg_hba.conf config-
uration file, specifying the actual address and subnet mask
instead of the “IP-address-of-the-1C-server” placeholder:
host all all IP-address-of-1C-server md5
Once you restart PostgreSQL, all the changes in pg_hba.conf
and postgresql.conf files take effect, and the server is
ready to accept 1C connections.
1C establishes a connection as a superuser, usually postgres.
Set a password for this role:
postgres=# ALTER ROLE postgres PASSWORD 'p@ssw0rd';
ALTER ROLE
108    In configuration settings
 vii   of the 1C information
       database, specify the
       IP-address and port of
       the PostgreSQL server
       as your database server
       and choose “PostgreSQL”
       as the required DBMS
       type. Specify the name of
       the database that will be
       used for 1C and select the
       “Create database if none
       present” check box (do
       not create this database
       using PostgreSQL means).
       Provide the name and
       password of a superuser
       role that will be used to establish connections.
       These recommendations should help you to quickly get
       started, even though they cannot guarantee optimal perfor-
       mance.
       We thank Anton Doroshkevich from the Infosoft company for
       his assistance in preparing these recommendations.
VIII pgAdmin
pgAdmin is a popular GUI tool for PostgreSQL administra-
tion. This application facilitates the main DBA tasks, shows
database objects, and enables you to run SQL queries.
For a long time, pgAdmin 3 used to be a de-facto standard,
but EnterpriseDB developers ended its support and released
a new version in 2016, having fully rewritten the product
using Python and web development technologies instead of
C++. Because of its redesigned interface, pgAdmin 4 got a
cool reception at first, but its development continues, and
the product is constantly getting improved.
Installation
To launch pgAdmin 4 on Windows, use the installer available
at pgadmin.org/download. The installation procedure is
simple and straightforward, there is no need to change the
default options.
For Debian and Ubuntu, add the PostgreSQL repository (as
explained on p. 28) and run the following command:
$ sudo apt-get install pgadmin4
“pgAdmin4” appears in the list of available programs.
110    The user interface of this program is fully localized into a
viii   dozen of languages. To switch to another language, click
       Configure pgAdmin, select Miscellaneous → User language
       in the settings window, and then reload the page in your
       web browser.
       Connecting to a Server
       First of all, let’s set up a connection to the server. Click the
       Add New Server button and enter an arbitrary connection
       name in the General tab of the opened window.
       In the Connection tab, enter host name or address, port
       number, username, and password.
       If you don’t want to enter the password every time, select the
       Save password check box. Passwords are encrypted using a
       master password, which you are prompted to enter when you
       start pgAdmin for the first time.
       Note that this user must already have a password. For
       example, for the postgres user, you can do it with the
       following command:
       postgres=# ALTER ROLE postgres PASSWORD 'p@ssw0rd';
       When you click the Save button, the application checks that
       the server with the specified parameters is available, and
       registers a new connection.
                                                             111
                                                             viii
Browser
The left pane displays the Browser tree. As you expand its
objects, you can get to the server, which we have called
LOCAL. At the next level, you can see all its databases:
   • appdb has been created to check connection to Post-
     greSQL using different programming languages.
   • demo is our demo database.
   • postgres is always created when PostgreSQL gets
     installed.
   • test was used in the “Trying SQL” chapter.
112
viii
       If you expand the Schemas item for the appdb database, you
       can find the greetings table that we have created, view its
       columns, integrity constraints, indexes, triggers, etc.
       For each object type, the context (right-click) menu lists all
       the possible actions, for example: export to a file, load from
       a file, assign privileges, delete.
       The right pane includes several tabs that display reference
       information:
       •   Dashboard provides system activity charts.
       •   Properties displays the properties of the object selected
           in the Browser (data types for columns, etc.)
       •   SQL shows the SQL command used to create the selected
           object.
•   Statistics lists information used by the query optimizer to   113
    build query plans; it can be used by a database adminis-      viii
    trator for case analysis.
•   Dependencies, Dependents illustrates dependencies be-
    tween the selected object and other objects in the
    database.
Running Queries
To execute a query, open a new tab with the SQL window by
choosing Tools → Query tool from the menu.
Enter your query in the upper part of the window and press
F5. The Data Output tab in the lower part of the window will
display the result of the query.
114    To type in the next query, you do not have to delete the
viii   previous one: just select the required code fragment before
       pressing F5. Thus, the whole history of your actions will be
       always in front of you. It is usually more convenient than
       searching for the required query in the log on the Query
       History tab.
       Other Features
       pgAdmin provides a graphical user interface for standard
       PostgreSQL utilities, system catalog information, administra-
       tion functions, and SQL commands. The built-in PL/pgSQL
       debugger is worth a separate mention. You can learn about
       pgAdmin features either on the product website pgadmin.org
       or in the built-in pgAdmin help system.
IX Additional Features
Full-Text Search
Searching for documents written in natural languages and
sorting the results by relevance to the search query is called
full-text search. In the simplest and most typical case, the
query consists of one or more words, and the relevance is
defined by the frequency of these words in the document.
This is more or less what happens when we type a phrase in
Google or Yandex search engines. However, despite all the
strength of the SQL language, its capabilities are not always
enough for effective data handling. It has become especially
obvious recently, when avalanches of Big Data, usually poorly
structured and hard to parse, started filling data storages.
There is a large number of search engines, free and paid, that
enable you to index the whole collection of your documents
and set up search of quite decent quality. In this case, an
index, which is the most important search tool that can
accelerate it, is not a part of the database. It means that many
valuable database features become unavailable: database
synchronization, transaction isolation, accessing and using
metadata to limit the search range, setting up access policies,
and many more.
Shortcomings of document-oriented database management
systems usually have a similar nature: they have rich full-text
search functionality, but data security and synchronization
116    features are of low priority. Besides, such databases (for ex-
  ix   ample, MongoDB) are usually NoSQL ones, so by design they
       lack all the power of SQL accumulated over years.
       However, traditional SQL database systems do have built-in
       full-text search capabilities. The LIKE operator is included
       into the standard SQL syntax, but its flexibility is obviously
       insufficient. Therefore, developers had to implement their
       own extensions of the SQL standard. In PostgreSQL, these
       are comparison operators ILIKE, ~, ~*, but they don’t solve
       all the problems either, as they don’t take into account
       grammatical forms, are not suitable for ranking, and work
       rather slowly.
       When talking about the tools of the full-text search itself,
       it’s important to understand that they are far from being
       standardized: each database system uses its own approach
       and syntax. Russian users of PostgreSQL have some advan-
       tage here: its full-text search extensions were created by
       Russian developers, so there is a possibility to contact the
       experts directly or even attend their lectures to go into low-
       level details, if required. Here we’ll only provide some simple
       examples.
       To learn about the full-text search capabilities, we are going
       to create one more table in the demo database. Let it be a
       lecturer’s draft notes split into chapters by lecture topics:
       test=# CREATE TABLE course_chapters(
          c_no text REFERENCES courses(c_no),
          ch_no text,
          ch_title text,
          txt text,
          CONSTRAINT pkt_ch PRIMARY KEY(ch_no, c_no)
       );
       CREATE TABLE
Now let’s enter the text of the first lectures for our courses   117
CS301 and CS305:                                                 ix
test=# INSERT INTO course_chapters(
 c_no, ch_no, ch_title, txt)
VALUES
('CS301', 'I', 'Databases',
 'We start our acquaintance with ' ||
 'the fascinating world of databases'),
('CS301', 'II', 'First Steps',
 'Getting more fascinated with ' ||
 'the world of databases'),
('CS305', 'I', 'Local Networks',
 'Here we start our adventurous journey ' ||
 'through the intriguing world of networks');
INSERT 0 3
Check the result:
test=# SELECT ch_no AS no, ch_title, txt
FROM course_chapters \gx
-[ RECORD 1 ]-----------------------------------------
no       | I
ch_title | Databases
txt      | We start our acquaintance with the
           fascinating world of databases
-[ RECORD 2 ]-----------------------------------------
no       | II
ch_title | First Steps
txt      | Getting more fascinated with the world of
           databases
-[ RECORD 3 ]-----------------------------------------
no       | I
ch_title | Local Networks
txt      | Here we start our adventurous journey
           through the intriguing world of networks
Now let’s find some information in our database with the help
of traditional SQL means (using the LIKE operator):
118   test=# SELECT ch_no AS no, ch_title, txt
 ix   FROM course_chapters
      WHERE txt LIKE '%fascination%' \gx
      It’s easy to guess the result: 0 rows. The LIKE operator
      sees no connection between “fascination” and the words
      “fascinating” and “fascinated” present in the text.
      The query
      test=# SELECT ch_no AS no, ch_title, txt
      FROM course_chapters
      WHERE txt LIKE '%fascinated%' \gx
      will return the row from chapter II (but not from chapter I,
      where the adjective “fascinating” is used):
      -[ RECORD 1 ]-----------------------------------------
      no       | II
      ch_title | First Steps
      txt      | Getting more fascinated with the world of
                 databases
      PostgreSQL provides the ILIKE operator, which allows us
      not to worry about letter cases; otherwise, you would also
      have to take uppercase and lowercase letters into account.
      True, there are also regular expressions (search patterns), and
      setting them up is a truly engaging task, little short of art,
      but sometimes you just want a tool that can simply do the
      job. So let’s add one more column to the course_chapters
      table; it will have a special type called tsvector:
      test=# ALTER TABLE course_chapters
          ADD txtvector tsvector;
      test=# UPDATE course_chapters
          SET txtvector = to_tsvector('english',txt);
      test=# SELECT txtvector
      FROM course_chapters \gx
-[ RECORD 1 ]-----------------------------------------            119
txtvector | 'acquaint':4 'databas':10 'fascin':7                  ix
            'start':2 'world':9
-[ RECORD 2 ]-----------------------------------------
txtvector | 'databas':8 'fascin':3 'get':1 'world':6
-[ RECORD 3 ]-----------------------------------------
txtvector | 'adventur':5 'intrigu':9 'journey':6
            'network':12 'start':3 'world':10
As we can see, the rows have changed:
•   Words are reduced to their unchangeable parts (lexemes).
•   Numbers have appeared. They indicate the word position
    in the text.
•   There are no prepositions included (and neither there
    would be any conjunctions or other parts of the sentence
    that are unimportant for search; they are the so-called
    stop words).
The search will be even more efficient if it includes chapter
titles, which are also given more weight in respect to the rest
of the text (it can be done using the setweight function).
Let’s modify the table:
test=# UPDATE course_chapters
  SET txtvector =
      setweight(to_tsvector('english',ch_title),'B')
      || ' ' ||
      setweight(to_tsvector('english',txt),'D');
UPDATE 3
test=# SELECT txtvector FROM course_chapters \gx
-[ RECORD 1 ]-----------------------------------------
txtvector | 'acquaint':5 'databas':1B,11 'fascin':8
            'start':3 'world':10
-[ RECORD 2 ]-----------------------------------------
txtvector | 'databas':10 'fascin':5 'first':1B 'get':3
            'step':2B 'world':8
120    -[ RECORD 3 ]-----------------------------------------
  ix   txtvector | 'intrigu':10 'journey':7 'local':1B
                   'network':2B,13 'start':5 'world':11
       Lexemes have received relative weight markers: B and D
       (possible options are A, B, C, D). We’ll assign real weights when
       writing queries, which will give us more flexibility.
       Fully armed, let’s return to search. The to_tsquery function
       resembles the to_tsvector function we have seen above: it
       converts a string to the tsquery data type used in queries.
       test=# SELECT ch_title
       FROM course_chapters
       WHERE txtvector @@
             to_tsquery('english','fascination & database');
         ch_title
       -------------
        Databases
        First Steps
       ( rows)
       You can check that the query 'fascinated & database' and
       its other grammatical variants will return the same result.
       Here we have used the comparison operator @@, which plays
       the same role in full-text search as the LIKE operator does in
       regular search. The syntax of the @@ operator does not allow
       natural language expressions with spaces, so words in the
       query are connected by the “and” logical operator.
       The english argument indicates the configuration used by
       PostgreSQL. The configuration defines pluggable dictionaries
       and the parser, which splits the phrase into separate lex-
       emes.
       Despite their name, dictionaries enable all kinds of lexeme
       transformations. For example, a simple stemmer dictionary
like snowball, which is used by default, reduces the word             121
to its unchangeable part; it allows search to ignore word             ix
endings in queries. You can also plug in other dictionaries,
for example:
•   regular dictionaries like ispell, myspell, or hunspell,
    which can better handle word morphology
•   dictionaries of synonyms
•   thesaurus
•   unaccent, which can remove diacritics from letters
Thanks to assigned weights, the displayed search results are
ranked:
test=# SELECT ch_title,
     ts_rank_cd('{0.1, 0.0, 1.0, 0.0}', txtvector, q)
FROM course_chapters,
     to_tsquery('english','Databases') q
WHERE txtvector @@ q
ORDER BY ts_rank_cd DESC;
  ch_title   | ts_rank_cd
-------------+------------
 Databases   |        1.1
 First Steps |        0.1
( rows)
The {0.1, 0.0, 1.0, 0.0} array sets the weights. It is an optional
argument of the ts_rank_cd function. By default, array {0.1,
0.2, 0.4, 1.0} corresponds to D, C, B, A. The word’s weight affects
ranking of the returned row.
In the final experiment, let’s modify the display format.
Suppose we would like to highlight the found words in the
html page using the bold type. The ts_headline function
defines the symbols to frame the word, as well as the
minimum/maximum number of words to display in a single
line:
122    test=# SELECT ts_headline(
  ix     'english',
         txt,
         to_tsquery('english', 'world'),
         'StartSel=<b>, StopSel=</b>,
         MaxWords=50, MinWords=5'
       )
       FROM course_chapters
       WHERE to_tsvector('english', txt) @@
              to_tsquery('english', 'world');
       -[ RECORD 1 ]-----------------------------------------
       ts_headline | with the fascinating <b>world</b> of
                     databases
       -[ RECORD 2 ]-----------------------------------------
       ts_headline | with the <b>world</b> of databases
       -[ RECORD 3 ]-----------------------------------------
       ts_headline | through the intriguing <b>world</b> of
                     networks
       To speed up full-text search, special indexes are used: GiST,
       GIN, and RUM, which are different from regular database
       indexes. But like many other useful full-text search features,
       they are out of scope of this short guide.
       To learn more about full-text search, see the documentation:
       www.postgrespro.com/doc/textsearch.
       Using JSON and JSONB
       From the very beginning, SQL-based relational databases
       were created with a considerable safety margin: their top
       priority was data consistency and security, while volumes of
       information were incomparable to the modern ones. When
       NoSQL databases appeared, it raised a flag in the community:
       lack of strict consistency support and a much simpler data
       structure (at first, it was simply a storage of key–value
pairs) provided a remarkable search speedup. Actively using      123
parallel computations, they could process unprecedented          ix
volumes of information and were easy to scale.
Once the initial shock had passed, it became clear that for
most real tasks such a simple structure was insufficient.
Composite keys were introduced, and then groups of keys
appeared. Relational database systems didn’t want to fall
behind and started adding new features that were typical of
NoSQL.
Since changing the database schema in a relational database
incurs high costs, a new JSON data type came in handy.
Having a hierarchical structure similar to XML, it was first
targeted at JavaScript development (hence JS in the title),
including AJAX application development. JSON flexibility al-
lowed application developers to add diverse data with unpre-
dictable structure without having to redesign the database
schema.
Suppose we need to enter personal data into the students
demo database: we have run a survey and collected the
information from professors. Some questions in the ques-
tionnaire are optional, while other questions include the “add
more information at your discretion” and “other” fields. If we
followed the traditional approach, the information that does
not fit the current structure would require adding multiple
new tables and columns with lots of empty fields. As more
and more data is being added, the whole database may have
to be refactored.
We can solve this problem using json or jsonb types. The
jsonb type, which appeared after json, stores data in a com-
pact binary form and, unlike json, supports indexes, which
can sometimes speed up search by an order of magnitude.
124   Let’s create a table with JSON objects:
 ix
      test=# CREATE TABLE student_details(
         de_id int,
         s_id int REFERENCES students(s_id),
         details json,
         CONSTRAINT pk_d PRIMARY KEY(s_id, de_id)
      );
      test=# INSERT INTO student_details(de_id,s_id,details)
      VALUES
       (1, 1451,
      '{ "merits": "none",
          "flaws":
          "immoderate ice cream consumption"
       }'),
       (2, 1432,
      '{ "hobbies":
            { "guitarist":
                 { "band": "Postgressors",
                    "guitars":["Strat","Telec"]
                 }
            }
       }'),
      (3, 1556,
      '{ "hobbies": "cosplay",
            "merits":
              { "mother-of-five":
                  { "Basil": "m", "Simon": "m", "Lucie": "f",
                      "Mark": "m", "Alex": "unknown"
                  }
              }
       }'),
      (4, 1451,
      '{ "status": "expelled"
       }');
      Let’s check that all the data is in place. For convenience, we
      will join the student_details and students tables using
      the WHERE clause, as the new table does not contain students’
      names:
      test=# SELECT s.name, sd.details
      FROM student_details sd, students s
      WHERE s.s_id = sd.s_id \gx
-[ RECORD   1 ]--------------------------------------        125
name    |   Anna                                             ix
details |   { "merits": "none",                       +
        |     "flaws":                                +
        |     "immoderate ice cream consumption"      +
        |   }
-[ RECORD   2 ]--------------------------------------
name    |   Victor
details |   { "hobbies":                              +
        |        { "guitarist":                       +
        |            { "band": "Postgressors",        +
        |               "guitars":["Strat","Telec"]   +
        |            }                                +
        |        }                                    +
        |   }
-[ RECORD   3 ]--------------------------------------
name    |   Nina
details |   { "hobbies": "cosplay",                   +
        |     "merits":                               +
        |        { "mother-of-five":                  +
        |            { "Basil": "m",                  +
        |               "Simon": "m",                 +
        |               "Lucie": "f",                 +
        |               "Mark": "m",                  +
        |               "Alex": "unknown"             +
        |            }                                +
        |        }                                    +
        |   }
-[ RECORD   4 ]--------------------------------------
name    |   Anna
details |   { "status": "expelled"                    +
        |   }
Suppose we are interested in entries that hold information
about the students’ merits. Let’s access the values of the
“merits” key using a special operator ->>:
test=# SELECT s.name, sd.details
FROM student_details sd, students s
WHERE s.s_id = sd.s_id
AND   sd.details ->> 'merits' IS NOT NULL
\gx
126    -[ RECORD   1 ]--------------------------------------
  ix   name    |   Anna
       details |   { "merits": "none",                     +
               |     "flaws":                              +
               |     "immoderate ice cream consumption"    +
               |   }
       -[ RECORD   2 ]--------------------------------------
       name    |   Nina
       details |   { "hobbies": "cosplay",                 +
               |     "merits":                             +
               |        { "mother-of-five":                +
               |            { "Basil": "m",                +
               |              "Simon": "m",                +
               |              "Lucie": "f",                +
               |              "Mark": "m",                 +
               |              "Alex": "unknown"            +
               |            }                              +
               |        }                                  +
               |   }
       We have seen that the two entries are related to merits of
       Anna and Nina, but such a result is unlikely to satisfy us, as
       Anna’s merits are actually “none.” Let’s fix the query:
       test=# SELECT s.name, sd.details
       FROM student_details sd, students s
       WHERE s.s_id = sd.s_id
       AND   sd.details ->> 'merits' IS NOT NULL
       AND   sd.details ->> 'merits' != 'none';
       The new query only returns Nina, whose merits are real.
       However, this method does not always work. Let’s try to find
       out which guitars our musician Victor plays:
       test=# SELECT sd.de_id, s.name, sd.details
       FROM student_details sd, students s
       WHERE s.s_id = sd.s_id
       AND   sd.details ->> 'guitars' IS NOT NULL \gx
This query won’t return anything. It’s because the corre-        127
sponding key–value pair is located inside the JSON hierarchy,    ix
nested into the pairs of a higher level:
name    | Victor
details | { "hobbies":                                   +
        |      { "guitarist":                            +
        |          { "band": "Postgressors",             +
        |            "guitars":["Strat","Telec"]         +
        |          }                                     +
        |      }                                         +
        | }
To get to guitars, let’s use the #> operator and go down the
hierarchy, starting with “hobbies”:
test=# SELECT sd.de_id, s.name,
      sd.details #> '{hobbies,guitarist,guitars}'
FROM student_details sd, students s
WHERE s.s_id = sd.s_id
AND   sd.details #> '{hobbies,guitarist,guitars}'
      IS NOT NULL;
We can see that Victor is a fan of Fender:
 de_id | name |       ?column?
-------+--------+-------------------
       | Victor | ["Strat","Telec"]
The json type has a younger sibling: jsonb. The letter “b” im-
plies the binary (rather than text) format of data storage and
structure, which can often result in faster search. Nowadays,
jsonb is used much more frequently than json.
test=# ALTER TABLE student_details
ADD details_b jsonb;
test=# UPDATE student_details
SET details_b = to_jsonb(details);
128   test=# SELECT de_id, details_b
 ix   FROM student_details \gx
      -[ RECORD 1 ]--------------------------------------
      de_id     | 1
      details_b | {"flaws": "immoderate ice cream
                  consumption", "merits": "none"}
      -[ RECORD 2 ]--------------------------------------
      de_id     | 2
      details_b | {"hobbies": {"guitarist": {"guitars":
                  ["Strat", "Telec"], "band":
                  "Postgressors"}}}
      -[ RECORD 3 ]--------------------------------------
      de_id     | 3
      details_b | {"hobbies": "cosplay", "merits":
                  {"mother-of-five": {"Basil": "m",
                  "Lucie": "f", "Alex": "unknown",
                  "Mark": "m", "Simon": "m"}}}
      -[ RECORD 4 ]--------------------------------------
      de_id     | 4
      details_b | {"status": "expelled"}
      Apart from a different notation, we can see that the order of
      values in the pairs has changed: Alex, on whom there is no
      information, as we remember, is now displayed before Mark.
      It’s not a disadvantage of jsonb as compared to json, it’s
      simply its data storage specifics.
      The jsonb type is supported by a larger number of operators
      than json. A most useful one is the “contains” operator @>. It
      works similar to the #> operator for json.
      For example, let’s find the entry that mentions Lucie, a moth-
      er-of-five’s daughter:
      test=# SELECT s.name,
            jsonb_pretty(sd.details_b) json
      FROM student_details sd, students s
      WHERE s.s_id = sd.s_id
      AND   sd.details_b @>
            '{"merits":{"mother-of-five":{}}}' \gx
-[ RECORD 1 ]-------------------------------------            129
name | Nina                                                   ix
json | {                                           +
     |     "merits": {                            +
     |         "mother-of-five": {                +
     |             "Alex": "unknown",             +
     |             "Mark": "m",                   +
     |             "Basil": "m",                  +
     |             "Lucie": "f",                  +
     |             "Simon": "m"                   +
     |         }                                  +
     |     },                                     +
     |     "hobbies": "cosplay"                   +
     | }
We have used the jsonb_pretty() function, which formats
the output of the jsonb type.
Alternatively, you can use the jsonb_each() function, which
expands key–value pairs:
test=# SELECT s.name,
      jsonb_each(sd.details_b)
FROM student_details sd, students s
WHERE s.s_id = sd.s_id
AND   sd.details_b @>
      '{"merits":{"mother-of-five":{}}}' \gx
-[ RECORD 1 ]-------------------------------------
name       | Nina
jsonb_each | (hobbies,"""cosplay""")
-[ RECORD 2 ]-------------------------------------
name       | Nina
jsonb_each | (merits,"{""mother-of-five"":
             {""Alex"": ""unknown"", ""Mark"":
             ""m"", ""Basil"": ""m"", ""Lucie"":
             ""f"", ""Simon"": ""m""}}")
Note that the name of Nina’s child is replaced by an empty
space {} in the query. This syntax adds flexibility to the
process of application development.
130    But what’s more important, jsonb allows you to create
  ix   indexes that support the @> operator, its inverse <@, and many
       other ones (the GIN index is typically the most efficient).
       The json type does not support indexes, so for high-load
       applications it is usually recommended to use jsonb.
       To learn more about json and jsonb types and the functions
       to be used with them, see the PostgreSQL documentation
       at postgrespro.com/doc/datatype-json and postgrespro.com/
       doc/functions-json.
       However, the jsonb functionality was insufficient, so in
       2014, Teodor Sigaev, Alexander Korotkov, and Oleg Bartunov
       developed the jsquery extension for PostgreSQL 9.4. This
       extension defined a new query language for extracting data
       from jsonb and implemented indexes to speed up such
       queries. It required a new data type: jsquery.
       Using this query language, you can, for example, search for
       data by its path. The dot notation represents the hierarchy
       inside jsonb:
       test=# SELECT *
       FROM student_details
       WHERE details::jsonb @@
            'hobbies.guitarist.band=Postgressors'::jsquery;
       If we don’t know the exact path, we can replace its branches
       with an asterisk:
       test=# SELECT s_id, details
       FROM student_details
       WHERE details::jsonb @@
             'hobbies.*.band=Postgressors'::jsquery;
       But it’s still very hard to work with the required value without
       knowing the hierarchy.
When the SQL:2016 standard was published, which included          131
the SQL/JSON Path language, Postgres Professional devel-          ix
oped its implementation, providing the jsonpath type and
several functions for working with JSON using this language.
These features were committed to PostgreSQL 12.
The SQL/JSON Path notation differs from regular PostgreSQL
operators for JSON. It is closer to the jsquery notation: the
hierarchy is also represented by dots. But the SQL/JSON Path
grammar is more advanced.
•   $.a.b.c replaces the 'a'->'b'->'c' syntax that had to
    be used in PostgreSQL 11 or lower.
•   The $ symbol represents the current context element, that
    is, the JSON fragment that has to be parsed.
•   @ represents the current context element in filter expres-
    sions. All the paths available in the $ expression are
    searched.
•   * is a wildcard symbol. In expressions with $ and @ it
    denotes any value of the path taking the hierarchy into
    account.
•   ** is a wildcard that can denote any part of the path in
    expressions with $ or @, without taking the hierarchy into
    account. It comes in handy if you don’t know the exact
    nesting level of the elements.
•   The ? operator is used to create a filter similar to WHERE.
    For example: $.a.b.c ? (@.x > 10).
To find cosplay enthusiasts using the jsonb_path_query()
function, you can write the following query:
test=# SELECT s_id, jsonb_path_query(
  details::jsonb, '$.hobbies ? (@ == "cosplay")'
)
FROM student_details;
132     s_id | jsonb_path_query
  ix   ------+------------------
        1556 | "cosplay"
       (1 row)
       This query searches only through the JSON branch that begins
       with the “hobbies” key, checking whether the corresponding
       value equals “cosplay.” But if we replace “cosplay” with
       “guitarist,” the query won’t return anything because “guitarist”
       is used in our table as a key, not as a value of the nested
       element.
       Queries can use two hierarchies in search: one inside the
       path expression, which defines the search area, and the other
       inside the filter expression, which matches the results with
       the specified condition. It means there are different ways to
       reach the same goal.
       For example, the query
       test=# SELECT s_id, jsonb_path_query(
         details::jsonb,
         '$.hobbies.guitarist.band?(@=="Postgressors")'
       )
       FROM student_details;
       and the query
       test=# SELECT s_id, jsonb_path_query(
         details::jsonb,
         '$.hobbies.guitarist?(@.band=="Postgressors").band'
       )
       FROM student_details;
       return the same result:
        s_id | jsonb_path_query
       ------+------------------
        1432 | "Postgressors"
       (1 row)
In the first example, we defined a filter expression for each     133
entry within the “hobbies.guitarist.band” branch. If we take      ix
a look at the JSON itself, we can see that this branch has only
one value: “Postgressors”. So there was actually nothing to
filter out. In the second example, the filter is applied one
step higher, so we have to specify the path to the “group”
within the filter expression; otherwise, the filter won’t find
any values. If we use such syntax, we have to know the JSON
hierarchy in advance. But what if we don’t know the hier-
archy?
In this case, we can use the ** wildcard. It is an extremely
useful feature! Suppose we are not sure what a “Strat” is:
whether it’s a high-altitude balloon, a guitar, or a member of
the highest social stratum. But we have to find out whether
we have this word in our table at all. Previously, it would
have required a complex search through the JSON document
(unless we converted jsonb to text). Now you can simply run
the following query:
test=# SELECT s_id, jsonb_path_exists(
  details::jsonb, '$.** ? (@ == "Strat")'
)
FROM student_details;
 s_id | jsonb_path_exists
------+-------------------
 1451 | f
 1432 | t
 1556 | f
 1451 | f
(4 rows)
You can learn more about SQL/JSON Path capabilities
in the documentation (postgrespro.com/docs/postgresql/
12/datatype-json#DATATYPE-JSONPATH) and in the article
134   “JSONPath in PostgreSQL: committing patches and select-
 ix   ing apartments” (habr.com/en/company/postgrespro/blog/
      500440/).
      Integration with External Systems
      Real-life applications are not isolated, and they often have
      to send data to each other. Such interactions can be im-
      plemented at the application level, for example, using web
      services or file exchange, or you can rely on the database
      functionality for this purpose.
      PostgreSQL supports the ISO/IEC 9075-9 standard (SQL/MED,
      Management of External Data), which defines access to
      external data sources from SQL via a special mechanism of
      foreign data wrappers.
      The idea is to access external (foreign) data as if it were
      located in regular PostgreSQL tables. It requires creating
      foreign tables, which do not contain any data themselves
      and only redirect all queries to an external data source. This
      approach facilitates application development, as it allows to
      abstract from specifics of a particular external source.
      Creating a foreign table involves several sequential steps.
         1. The CREATE FOREIGN DATA WRAPPER command plugs in
            a library for working with a particular data source.
         2. The CREATE SERVER command defines a foreign server.
            You should usually specify such connection parameters
            as host name, port number, and database name.
    3. The CREATE USER MAPPING command provides user-                135
       name mapping since different PostgreSQL users can             ix
       connect to one and the same foreign source on behalf
       of different external users.
    4. The CREATE FOREIGN TABLE command creates foreign
       tables for the specified external tables and views, while
       IMPORT FOREIGN SCHEMA allows to import descriptions
       of some or all tables from the external schema.
Below we’ll discuss PostgreSQL integration with the most
popular databases: Oracle, MySQL, SQL Server, and Post-
greSQL itself. But first we need to install the libraries required
for working with these databases.
Installing Extensions
The PostgreSQL distribution includes two foreign data wrap-
pers: postgres_fdw and file_fdw. The first one is designed
for working with external PostgreSQL databases, while the
second one works with files on a server. Besides, the com-
munity develops and supports various libraries that provide
access to many popular databases. To get the full list, take a
look at pgxn.org/tag/fdw.
Foreign data wrappers for Oracle, MySQL, and SQL Server are
available as extensions:
•   Oracle — github.com/laurenz/oracle_fdw;
•   MySQL — github.com/EnterpriseDB/mysql_fdw;
•   SQL Server — github.com/tds-fdw/tds_fdw.
Follow the instructions on these web pages to build and in-
stall these extensions, and this process should run smoothly.
136    If all goes well, you will see the corresponding foreign data
  ix   wrappers in the list of available extensions. For example, for
       oracle_fdw:
       test=# SELECT name, default_version
       FROM pg_available_extensions
       WHERE name = 'oracle_fdw' \gx
       -[ RECORD 1 ]---+-----------
       name            | oracle_fdw
       default_version | 1.2
       Oracle
       First, let’s create an extension, which in its turn will add a
       foreign data wrapper:
       test=# CREATE EXTENSION oracle_fdw;
       CREATE EXTENSION
       Check that the corresponding wrapper has been added:
       test=# \dew
       List of foreign-data wrappers
       -[ RECORD 1 ]-------------------
       Name      | oracle_fdw
       Owner     | postgres
       Handler   | oracle_fdw_handler
       Validator | oracle_fdw_validator
       The next step is setting up a foreign server. In the OPTIONS
       clause, you have to specify the dbserver option, which
       defines connection parameters for the Oracle instance: server
       name, port number, and instance name.
test=# CREATE SERVER oracle_srv                                  137
  FOREIGN DATA WRAPPER oracle_fdw                                ix
  OPTIONS (dbserver '//localhost:1521/orcl');
CREATE SERVER
The PostgreSQL user postgres will be connecting to the
Oracle instance as scott.
test=# CREATE USER MAPPING FOR postgres
  SERVER oracle_srv
  OPTIONS (user 'scott', password 'tiger');
CREATE USER MAPPING
We’ll import foreign tables into a separate schema. Let’s
create it:
test=# CREATE SCHEMA oracle_hr;
CREATE SCHEMA
Now let’s import some foreign tables. We’ll do it for just two
popular tables, dept and emp:
test=# IMPORT FOREIGN SCHEMA "SCOTT"
  LIMIT TO (dept, emp)
  FROM SERVER oracle_srv
  INTO oracle_hr;
IMPORT FOREIGN SCHEMA
Note that Oracle data dictionary stores object names in
uppercase, while PostgreSQL system catalog saves them in
lowercase. When working with external data in PostgreSQL,
you have to double-quote uppercase Oracle schema names
to avoid their conversion to lowercase.
138   Let’s view the list of foreign tables:
 ix
      test=# \det oracle_hr.*
           List of foreign tables
        Schema   | Table |   Server
      -----------+-------+------------
      oracle_hr | dept | oracle_srv
      oracle_hr | emp    | oracle_srv
      (2 rows)
      Now run the following queries on the foreign tables to access
      the external data:
      test=# SELECT * FROM oracle_hr.emp LIMIT 1 \gx
      -[ RECORD 1 ]-------------------
      empno    | 7369
      ename    | SMITH
      job      | CLERK
      mgr      | 7902
      hiredate | 1980-12-17
      sal      | 800.00
      comm     |
      deptno   | 20
      Write operations on external data are also allowed:
      test=# INSERT INTO oracle_hr.dept(deptno, dname, loc)
        VALUES (50, 'EDUCATION', 'MOSCOW');
      INSERT 0 1
      test=# SELECT * FROM oracle_hr.dept;
       deptno |   dname    |   loc
      --------+------------+----------
           10 | ACCOUNTING | NEW YORK
           20 | RESEARCH   | DALLAS
           30 | SALES      | CHICAGO
           40 | OPERATIONS | BOSTON
           50 | EDUCATION | MOSCOW
      (5 rows)
MySQL                                                            139
                                                                 ix
Create an extension for the required foreign data wrapper:
test=# CREATE EXTENSION mysql_fdw;
CREATE EXTENSION
The foreign server for the external instance is defined by the
host and port parameters:
test=# CREATE SERVER mysql_srv
  FOREIGN DATA WRAPPER mysql_fdw
  OPTIONS (host 'localhost', port '3306');
CREATE SERVER
We are going to establish connections on behalf of a MySQL
superuser:
test=# CREATE USER MAPPING FOR postgres
  SERVER mysql_srv
  OPTIONS (username 'root', password 'p@ssw0rd');
CREATE USER MAPPING
The wrapper supports the IMPORT FOREIGN SCHEMA command,
but we can also create a foreign table manually:
test=# CREATE FOREIGN TABLE employees (
  emp_no       int,
  birth_date date,
  first_name varchar(14),
  last_name    varchar(16),
  gender       varchar(1),
  hire_date    date)
SERVER mysql_srv
  OPTIONS (dbname 'employees',
            table_name 'employees');
CREATE FOREIGN TABLE
140   Check the result:
 ix
      test=# SELECT * FROM employees LIMIT 1 \gx
      -[ RECORD 1 ]----------
      emp_no     | 10001
      birth_date | 1953-09-02
      first_name | Georgi
      last_name | Facello
      gender     | M
      hire_date | 1986-06-26
      Just like the Oracle wrapper, mysql_fdw allows both read and
      write operations.
      SQL Server
      Create an extension for the required foreign data wrapper:
      test=# CREATE EXTENSION tds_fdw;
      CREATE EXTENSION
      Create a foreign server:
      test=# CREATE SERVER sqlserver_srv
        FOREIGN DATA WRAPPER tds_fdw
        OPTIONS (servername 'localhost', port '1433',
                 database 'AdventureWorks');
      CREATE SERVER
      The required connection information is the same: you have
      to provide the host name, the port number, and the database
      name. But the OPTIONS clause takes different parameters as
      compared to oracle_fdw and mysql_fdw.
We are going to establish connections on behalf of an SQL    141
Server superuser:                                            ix
test=# CREATE USER MAPPING FOR postgres
  SERVER sqlserver_srv
  OPTIONS (username 'sa', password 'p@ssw0rd');
CREATE USER MAPPING
Let’s create a separate schema for foreign tables:
test=# CREATE SCHEMA sqlserver_hr;
CREATE SCHEMA
Import the whole HumanResources schema into the created
PostgreSQL schema:
test=# IMPORT FOREIGN SCHEMA HumanResources
  FROM SERVER sqlserver_srv
  INTO sqlserver_hr;
IMPORT FOREIGN SCHEMA
You can display the list of imported tables using the \det
command, or find them in the system catalog by running the
following query:
test=# SELECT ft.ftrelid::regclass AS "Table"
FROM pg_foreign_table ft;
                 Table
----------------------------------------
sqlserver_hr.Department
sqlserver_hr.Employee
sqlserver_hr.EmployeeDepartmentHistory
sqlserver_hr.EmployeePayHistory
sqlserver_hr.JobCandidate
sqlserver_hr.Shift
(6 rows)
142   Object names are case-sensitive, so they should be enclosed
 ix   in double quotes in PostgreSQL queries:
      test=# SELECT "DepartmentID", "Name", "GroupName"
      FROM sqlserver_hr."Department"
      LIMIT 4;
       DepartmentID |    Name     |        GroupName
      --------------+-------------+-------------------------
                  1 | Engineering | Research and Development
                  2 | Tool Design | Research and Development
                  3 | Sales       | Sales and Marketing
                  4 | Marketing   | Sales and Marketing
      (4 rows)
      Currently tds_fdw supports only reading; write operations
      are not allowed.
      PostgreSQL
      Create an extension and a wrapper:
      test=# CREATE EXTENSION postgres_fdw;
      CREATE EXTENSION
      We are going to connect to another database of the same
      server instance, so we only have to provide the dbname
      parameter when creating a foreign server. Other parameters
      (such as host, port, etc.) can be omitted.
      test=# CREATE SERVER postgres_srv
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (dbname 'demo');
      CREATE SERVER
There is no need to specify the password if you create a user   143
mapping within a single cluster:                                ix
test=# CREATE USER MAPPING FOR postgres
  SERVER postgres_srv
  OPTIONS (user 'postgres');
CREATE USER MAPPING
Import all tables and views of the bookings schema:
test=# IMPORT FOREIGN SCHEMA bookings
  FROM SERVER postgres_srv
  INTO public;
IMPORT FOREIGN SCHEMA
Check the result:
test=# SELECT * FROM bookings LIMIT 3;
 book_ref |       book_date        | total_amount
----------+------------------------+--------------
 000004   | 2015-10-12 14:40:00+03 | 55800.00
 00000F   | 2016-09-02 02:12:00+03 | 265700.00
 000010   | 2016-03-08 18:45:00+03 | 50900.00
 000012   | 2017-07-14 09:02:00+03 | 37900.00
 000026   | 2016-08-30 11:08:00+03 | 95600.00
(5 rows)
To learn more about postgres_fdw, see the documentation:
postgrespro.com/doc/postgres-fdw.
Foreign data wrappers are also worth mentioning as the
community considers them to be the basis for built-in shard-
ing in PostgreSQL. Sharding is similar to partitioning: they
both use a particular criterion to split a table into several
parts that are stored independently. The difference is that
partitions are stored on the same server, while shards are
144   located on different ones. Partitioning has been available
 ix   in PostgreSQL for quite a long time. Starting from version
      10, this mechanism is being actively developed, and many
      useful features have already been added: declarative syntax,
      dynamic partition pruning, support for parallel operations,
      and other miscellaneous enhancements. You can also use
      foreign tables as partitions, which virtually turns partitioning
      into sharding.
      But much is yet to be done before sharding becomes really
      usable:
      •   Consistency is not guaranteed: external data is accessed
          in separate local transactions rather than in a single
          distributed one.
      •   You can’t duplicate the same data on different servers to
          enhance fault tolerance.
      •   All actions required to create tables on shards and the
          corresponding foreign tables have to be done manually.
      Some of the discussed challenges are already addressed
      in pg_shardman, an experimental extension developed by
      Postgres Professional; you can download it at github.com/
      postgrespro/shardman.
      Another extension included into the distribution for work-
      ing with PostgreSQL databases is dblink. It allows you
      to explicitly manage connections (to connect and discon-
      nect), execute queries, and get the results asynchronously:
      postgrespro.com/doc/dblink.
X Education
  and Certification
Documentation
Reading the documentation is indispensable for professional
use of PostgreSQL. It describes all the database features
and provides an exhaustive reference that should always
be readily available. Here you can get full and precise
information first hand: it is written by developers themselves
and is carefully kept up-to-date at all times. The PostgreSQL
documentation is available at www.postgresql.org/docs or
postgrespro.com/docs.
We at Postgres Professional have translated into Russian the
whole PostgreSQL documentation set, including the latest
version. It is available on our website: postgrespro.ru/docs.
While working on this translation, we also compiled an
English-Russian glossary, published at postgrespro.com/
education/glossary. We recommend consulting this glossary
when translating English articles into Russian to use consis-
tent terminology familiar to a wide audience.
There are also French (docs.postgresql.fr), Japanese (www.
postgresql.jp/document), and Chinese (postgres.cn/docs)
translations provided by national communities.
146   Training Courses
  x
      We develop training courses for those who start using Post-
      greSQL or would like to improve their professional skills.
      Courses for database administrators:
                                  DBA1
                            Basic PostgreSQL
                             administration
             DBA2                DBA3                QPT
         Configuration         Replication       Query perfor-
        and monitoring        and backups        mance tuning
      Courses for application developers:
                                   DEV1
                             Basic server-side
                         application development
                      DEV2                       QPT
              Advanced server-side           Query perfor-
            application development          mance tuning
      The documentation contains all the conceivable details
      about PostgreSQL, but the information is often scattered
across different chapters, so you may have to carefully read     147
it several times before you gain full understanding.             x
Training courses are intended to complement the documen-
tation rather than replace it. They consist of separate mod-
ules that gradually explain a particular topic, focusing on
important practical information. Courses can broaden your
outlook, structure the previously gained bits of knowledge,
and help you find your way around the documentation,
should you need to quickly get some particular details.
Each course topic includes theory and practice. In most cases,
theory includes both slides and a live demo on a real system.
Students get all the slides with extensive comments, outputs
of demo scripts, keys to practical assignments, and additional
reference material on some topics.
Where and How to Take a Training
For non-commercial use and self-study, all course materials,
including videos, are available on our website for free. You
can find their Russian version at postgrespro.ru/education/
courses.
The courses currently translated into English are published
at postgrespro.com/community/courses.
You can also take these courses in a specialized training
center under the supervision of an experienced lecturer.
At the end of the course you will receive a certificate of
completion. Authorized training centers are listed here:
postgrespro.ru/education/where.
148   DBA1. Basic PostgreSQL administration
  x
      Duration: 3 days
      Background knowledge required:
            Basic knowledge of databases and SQL.
            Familiarity with Unix.
      Knowledge and skills gained:
            General understanding of PostgreSQL architecture.
            Installation, initial setup, server management.
            Logical structure and physical data layout.
            Basic administration tasks.
            User and access management.
            Backup, recovery, and replication.
      Topics:
            Basic toolkit
       1. Installation and server management
       2. Using psql
       3. Configuration
            Architecture
       4.   PostgreSQL overview
       5.   Isolation and multi-version concurrency control
       6.   Vacuum
       7.   Buffer cache and write-ahead log
            Data management
       8.   Databases and schemas
       9.   System catalog
      10.   Tablespaces
      11.   Low-level details
      Administration tasks                                       149
12. Monitoring                                                   x
      Access control
13.   Roles and attributes
14.   Privileges
15.   Row-level security
16.   Connection and authentication
      Backups
17. Overview
      Replication
18. Overview
Course materials in Russian are available for self-study at
postgrespro.ru/education/courses/DBA1. Their English ver-
sion for a shortened two-day introductory course is available
at postgrespro.com/community/courses/2dINTRO.
DBA2. Configuring and monitoring PostgreSQL
Duration: 4 days
Background knowledge required:
      SQL fundamentals.
      Good command of Unix OS.
      Familiarity with PostgreSQL within the scope of the DBA1
      course.
Knowledge and skills gained:
      Setting up various configuration parameters based on the
      understanding of server internals.
150         Monitoring server activity and using the collected data for
  x         iterative tuning of PostgreSQL configuration.
            Configuring localization settings.
            Managing extensions and getting started with server
            upgrades.
      Topics:
            Multi-version concurrency control
       1.   Transaction isolation
       2.   Pages and row versions
       3.   Data snapshots
       4.   HOT upgrades
       5.   Vacuum
       6.   Autovacuum
       7.   Freezing
            Logging
       8.   Buffer cache
       9.   Write-ahead log
      10.   Checkpoints
      11.   WAL configuration
            Locking
      12. Object locks
      13. Row-level locks
      14. Memory locks
            Administration tasks
      15. Managing extensions
      16. Localization
      17. Server upgrades
      Course materials in Russian are available for self-study at
      postgrespro.ru/education/courses/DBA2.
DBA3. Replication and backups                                   151
                                                                x
Duration: 2 days
Background knowledge required:
     SQL fundamentals.
     Good command of Unix OS.
     Familiarity with PostgreSQL within the scope of the DBA1
     course.
Knowledge and skills gained:
     Taking backups.
     Setting up physical and logical replication.
     Recognizing replication use cases.
     Understanding cluster technologies.
Topics:
     Backups
1. Logical backup
2. Base backup
3. WAL archive
     Replication
4.   Physical replication
5.   Switchover to a replica
6.   Logical replication
7.   Usage scenarios
     Cluster Technologies
8. Overview
Course materials in Russian are available for self-study at
postgrespro.ru/education/courses/DBA3.
152   DEV1. Basic server-side application development
  x
      Duration: 4 days
      Background knowledge required:
         SQL fundamentals.
         Experience with any procedural programming language.
         Basic knowledge of Unix OS.
      Knowledge and skills gained:
         General information about PostgreSQL architecture.
         Using the main database objects.
         Programming in SQL and PL/pgSQL on the server side.
         Using the main data types, including records and arrays.
         Setting up client-server communication channels.
      Topics:
         Basic toolkit
      1. Installation and server management, psql
         Architecture
      2. A general overview of PostgreSQL
      3. Isolation and MVCC
      4. Buffer cache and WAL
         Data organization
      5. Logical structure
      6. Physical structure
         Bookstore application
      7. Application schema and interface
         SQL
      8. Functions
 9. Procedures                                                153
10. Composite types                                           x
      PL/pgSQL
11.   Overview and programming structures
12.   Executing queries
13.   Cursors
14.   Dynamic commands
15.   Arrays
16.   Error handling
17.   Triggers
18.   Debugging
      Access control
19. Access control overview
      Backup
20. Logical backup
Course materials in English are available for self-study at
postgrespro.com/community/courses/DEV1.
DEV2. Advanced server-side application
development
Duration: 4 days
Background knowledge required:
      General understanding of PostgreSQL architecture.
      Strong SQL and PL/pgSQL skills.
      Basic knowledge of Unix OS.
Knowledge and skills gained:
      Understanding server internals.
154         Using all PostgreSQL capabilities in application logic
  x         implementations.
            Extending database functionality to address specific
            tasks.
      Topics:
            Architecture
       1.   Isolation
       2.   Server internals
       3.   Vacuum
       4.   Write-ahead logging
       5.   Locks
            Bookstore
       6. Bookstore application 2.0
            Extensibility
       7.   Connection pooling
       8.   Data types for large values
       9.   User-defined data types
      10.   Operator classes
      11.   Semi-structured data
      12.   Background processes
      13.   Asynchronous processing
      14.   Creating extensions
      15.   Programming languages
      16.   Aggregate and window functions
      17.   Full-text search
      18.   Physical replication
      19.   Logical replication
      20.   Foreign data
      Course materials in Russian are available for self-study at
      postgrespro.ru/education/courses/DEV2.
QPT. Query Performance Tuning                                 155
                                                              x
Duration: 2 days
Background knowledge required:
   Familiarity with Unix OS.
   Good command of SQL.
   Some knowledge of PL/pgSQL will be useful, but is not
   mandatory.
   Familiarity with PostgreSQL within the scope of the DBA1
   course (for DBAs) or DEV1 (for developers).
Knowledge and skills gained:
   In-depth understanding of query planning and execution.
   Performance tuning of the server instance.
   Troubleshooting query issues and optimizing queries.
Topics:
 1. Airlines database
 2. Query execution
 3. Sequential scans
 4. Index scans
 5. Bitmap scans
 6. Nested loop joins
 7. Hash joins
 8. Merge joins
 9. Statistics
10. Query profiling
11. Optimization methods
Course materials in Russian are available for self-study at
postgrespro.ru/education/courses/QPT.
156   Professional Certification
  x
      The certification program, which was launched in 2019, is
      useful for both database professionals and their employers.
      If you have a certificate, you are likely to get additional points
      when hunting for a job or negotiating your salary. Besides,
      it’s a good opportunity to get an impartial evaluation of your
      knowledge.
      For employers, certification program facilitates recruiting,
      enables verification of PostgreSQL expertise of the current
      employees, and provides a means to control the quality of
      knowledge received in external employee trainings or check
      the competence of third-party vendors and partners.
      PostgreSQL certification is currently available only for
      database administrators, but in the future we plan to launch
      certification programs for PostgreSQL application developers
      too.
      We offer three levels of certification, all of which require you
      to pass several tests.
      Professional level confirms the knowledge in the following
      fields:
      •   General understanding of PostgreSQL architecture.
      •   Server installation, working in psql, tuning configuration
          settings.
      •   Logical and physical data structure.
      •   User and access management.
      •   General understanding of backup and replication con-
          cepts.
      To get a certificate, it is required to successfully pass a test
      on the DBA1 course.
Expert level additionally confirms the knowledge in the             157
following fields:                                                   x
•   PostgreSQL internals.
•   Server setup and monitoring, database maintenance
    tasks.
•   Performance optimization tasks, query tuning.
•   Taking backups.
•   Physical and logical replication setup for various usage
    scenarios.
To get a certificate, it is required to have a certificate of the
Professional level and successfully pass tests on DBA2, DBA3,
and QPT courses.
Master level additionally confirms practical skills required for
PostgreSQL database administration.
To get a certificate, it is required to have a certificate of
the Expert level and successfully pass a hands-on test. This
certification is currently under development.
Create an account under postgrespro.ru/user and sign up for
a certification test in your profile.
To pass a test, you should have:
•   a good command of the corresponding courses and the
    documentation sections they refer to
•   hands-on experience in working with PostgreSQL via psql
While taking the test, you can refer to our course materials
and the PostgreSQL documentation, but usage of any other
sources of information is prohibited.
Achieving a particular level is acknowledged by a certificate.
Certificates have no expiration date, but since they apply to
158   a particular server version, they will get deprecated together
  x   with this version. So in several years you may want to take a
      test for a more recent PostgreSQL version.
      To learn more about certification, visit postgrespro.ru/
      education/cert.
      Academic Courses
      One of the main focus areas of our company is training
      database specialists. This work has to be started while future
      professionals are still getting their degree, so it requires
      close collaboration with universities.
      We offer several academic courses produced in cooperation
      with professors from leading universities. These courses
      are targeted at bachelor students who already have some
      basic programming skills. All the courses can be used in
      educational institutions for free. Lecturers can use text-
      books, slides, lecture videos, and other educational mate-
      rials published on our website: postgrespro.ru/education/
      university.
      Postgres Professional has contributed to several courses read
      in such universities as Lomonosov Moscow State Univer-
      sity, Higher School of Economics, Moscow Aviation Institute,
      Reshetnev Siberian State University of Science and Technol-
      ogy, and Siberian Federal University. Contact us if you are
      a university representative and would like to add database
      courses to the curriculum.
      We also seek partnership with teachers and instructors who
      are ready to develop new original PostgreSQL courses. On our
      part, we provide all the required support and advice, edit the
manuscripts and drive them to publication, as well as make     159
arrangements for open lectures of the course authors in top    x
Russian universities.
SQL Basics
Course participants will learn about PostgreSQL and will be
able to start working with it right away; no prior training
is required. Starting with simple SQL queries, students
will gradually get to more complex constructs, learn about
transactions and query optimization.
This course is based on the following textbook (published in
Russian):
Morgunov, E. PostgreSQL. SQL Basics. St. Petersburg : BHV-
Petersburg, 2018.
ISBN 978-5-9775-4022-3
                                  Contents:
                                  Introduction
                                  Configuring the environ-
                                  ment
                                  Basic operations
                                  Data types
                                  DDL fundamentals
                                  Queries
                                  Data manipulation
                                  Indexes
                                  Transactions
                                  Performance tuning
160   A soft copy of this book in Russian is available on our website:
  x   postgrespro.ru/education/books/sqlprimer.
      This course consists of 36 hours of lectures and hands-on
      training. The course author has been delivering it in top
      universities of Moscow and Krasnoyarsk for several years
      now. You can download the course materials in Russian at
      postgrespro.ru/education/university/sqlprimer.
      Evgeny Morgunov, Ph.D in Technical
      Sciences, associate professor at the
      Informatics and Computer Science
      Department of Reshetnev Siberian
      State University of Science and Tech-
      nology.
      Evgeny lives in Krasnoyarsk. Before
      joining the University in 2000, he
      had been working as a programmer
      for more than 10 years; among other
      things, he had been developing a banking application system.
      He got to learn PostgreSQL in 1998. Being an advocate of
      using free open-source software in academic activities, he
      has initiated the use of PostgreSQL and FreeBSD operating
      system as part of the “Programming Technology” course.
      Evgeny is a member of the International Society for Engi-
      neering Pedagogy (IGIP). He has been using PostgreSQL in
      teaching for more than 20 years.
      Database Technology Fundamentals
      A modern academic course that combines in-depth theory
      with relevant practical skills of database design and deploy-
      ment.
Novikov, B., Gorshkova, E., and Grafeeva N. Database Technol-    161
ogy Fundamentals. 2nd ed. Moscow : DMK Press, 2020.              x
ISBN 978-5-97060-841-8
                                  The first part contains the
                                  key information about
                                  database management
                                  systems: relational data
                                  model, the SQL language,
                                  transaction processing.
                                  The second part dives
                                  into the underlying
                                  database technology and
                                  its development trends.
                                  Some topics covered in
                                  the first part are discussed
                                  again at a deeper level.
Contents:
Part I. From Theory to Practice
Introduction
Some database theory
Getting started with databases
Introduction to SQL
Database access management
Transactions and data consistency
Database application development
Relational model extensions
Various types of database systems
162   Part II. From Practice to Proficiency
  x
      Database system architecture
      Storage structures and the main algorithms
      Query execution and optimization
      Transaction management
      Database reliability
      Advanced SQL features
      Database functions and procedures
      PostgreSQL extensibility
      Full-text search
      Data security
      Database administration
      Replication
      Parallel and distributed database systems
      A soft copy of this book in Russian is available on our website:
      postgrespro.ru/education/books/dbtech.
      This course offers 24 hours of lectures and 8 hours of hands-
      on training. It was delivered by Boris Novikov at the faculty of
      Computational Mathematics and Cybernetics of Lomonosov
      Moscow State University. You can download the course
      materials in Russian at postgrespro.ru/education/university/
      dbtech.
      Boris Novikov, Dr. Sci. in Physics
      and Mathematics, professor at
      the Informatics Department of
      Higher School of Economics in St.
      Petersburg.
      His academic interests mainly
      concern various aspects of de-
      signing, developing, and deploy-
ing database systems and applications, as well as scalable       163
distributed systems for Big Data processing and analytics.       x
Ekaterina Gorshkova, Ph.D. in Physics and Mathematics.
An expert in designing high-load data-intensive applications.
Her academic interests include machine learning, data-flow
analysis, and data retrieval.
Natalia Grafeeva, Ph.D. in Physics and Mathematics, associate
professor at the Informatics and Data Analysis Department
of St. Petersburg State University.
Her academic interests include databases, data retrieval, Big
Data, and smart data analysis. She is an expert in information
system design, development, and maintenance, as well as in
course design and teaching.
Books
PostgreSQL Internals
This book is for those who will not settle for a black-box
approach when working with a database. Targeted at readers
who have some experience with PostgreSQL, this book will
also be useful for those who are familiar with another
database system but switch over to PostgreSQL and would
like to understand how they differ.
Rogov E. PostgreSQL 14 Internals. Moscow : DMK Press, 2022
ISBN 978-5-6045970-4-0 (in English)
ISBN 978-5-93700-122-1 (in Russian)
164                                    You will not find any ready-
  x                                    made recipes in this book.
                                       But the provided explana-
                                       tions of the inner mechan-
                                       ics will enable you to crit-
                                       ically evaluate other peo-
                                       ple’s experience and come
                                       to your own conclusions.
                                       The author goes into de-
                                       tails of PostgreSQL inter-
                                       nals and shows how to run
                                       experiments and verify in-
                                       formation that inevitably
                                       gets deprecated.
      Egor Rogov has been working in the education department
      in Postgres Professional since 2015; he develops and teaches
      training courses, publishes blog posts, writes and edits
      books.
      Contents:
      Introduction
      Part I. Isolation and MVCC
      Isolation • Pages and Tuples • Snapshots • Page
      pruning and HOT updates • Vacuum and autovacuum •
      Freezing • Rebuilding tables and indexes
      Part II. Buffer cache and WAL
      Buffer cache   •   Write-ahead log   •   WAL modes
Part III. Locks                                                  165
                                                                 x
Relation-level locks • Row-level locks       •   Miscellaneous
locks • Locks on memory structures
Part IV. Query execution
Query execution stages • Statistics • Table access
methods • Index access methods • Index scans •
Nested loop • Hashing • Sorting and Merging
Part V. Index types
Hash   •   B-tree •   GiST   •   SP-GiST •   GIN   •   BRIN
A soft copy of this book is available on our website:
postgrespro.com/community/books/internals.
XI The Hacker’s
   Guide to the Galaxy
News and Discussions
Anyone can follow PostgreSQL news, learn about the features
planned for the next release, and stay up-to-date with the
current events.
Plenty of interesting and useful content is published in
various related blogs. For example, the planet.postgresql.org
website aggregates all the English-language articles in one
place. Many articles in Russian can be found at habr.com/
hub/postgresql, including those published by Postgres Pro-
fessional. For some of our articles, an English translation is
available at habr.com/en/company/postgrespro/blog/. There
are also dedicated YouTube channels, such as youtube.com/
RuPostgres and youtube.com/PostgresTV.
There is also a Wiki project (wiki.postgresql.org), where you
can find FAQ, training materials, articles about system setup
and optimization, migration specifics from different database
systems, and much more.
More than 9 000 Russian-speaking PostgreSQL users are sub-
scribed to the “pgsql—PostgreSQL” channel in Telegram (t.me/
168   pgsql); more than 4 000 people are members of the Face-
 xi   book group “PostgreSQL in Russia” (facebook.com/groups/
      postgresql).
      You can also ask your questions on stackoverflow.com. Do
      not forget to add the “postgresql” tag.
      As for Postgres Professional news, they are published in its
      corporate blog at postgrespro.com/blog.
      Mailing Lists
      To get all the news firsthand, without waiting for someone
      to write a blog post, you can subscribe to mailing lists.
      Respecting the tradition, PostgreSQL developers discuss all
      questions exclusively by email.
      You can find all the mailing lists at postgresql.org/list. Some
      of them are:
      •   pgsql-hackers (typically called simply “hackers”), the main
          list for everything related to development
      •   pgsql-general used to discuss general questions
      •   pgsql-bugs for bug reports
      •   pgsql-docs for documentation
      •   pgsql-translators for translation-related discussions
      •   pgsql-announce to get new release announcements
      and many more.
      Having signed up for any of these lists, you will start receiving
      regular emails and will be able to participate in discussions
      if you like. Another option is to browse through the email
archive at postgresql.org/list or on our company’s website       169
(postgrespro.com/list).                                          xi
Commitfest
Another way to keep up with the news without spending too
much time is to check the commitfest.postgresql.org page.
Here the community opens the so-called commitfests for
developers to submit their patches. For example, commit-
fest 01.03.2022–31.03.2022 was open for version 15, while
commitfest 01.07.2022–31.07.2022 was related to the next
version already. It allows the community to stop accepting
new features at least about half a year before the release and
have the time to stabilize the code.
Patches undergo several stages: first they are reviewed and
fixed, and then they are either accepted, or moved to the next
commitfest, or rejected (if you are completely out of luck).
This way, you can stay informed about new features already
included into PostgreSQL or planned for the next release.
Conferences
Russia hosts two annual international conferences, which are
attended by hundreds of PostgreSQL users and developers.
   PGConf in Moscow (pgconf.ru)
   PGDay in Saint-Petersburg (pgday.ru)
170   Regional conferences are also held from time to time; for
 xi   example, PGConf.Siberia in Novosibirsk and Krasnoyarsk.
      Besides, several Russian cities host conferences on broader
      topics, including databases in general and PostgreSQL in
      particular. We will name only a few:
         CodeFest in Novosibirsk (codefest.ru)
         HighLoad++ in Moscow and other cities (highload.ru)
      Naturally, PostgreSQL conferences are held all over the world.
      The major ones are:
         PGCon in Ottawa, Canada (pgcon.org)
         PGConf Europe (pgconf.eu)
      The list of upcoming events can be found at postgresql.org/
      about/events.
      In addition to conferences, there are less official regular
      meetups, including online ones.
XII Postgres Professional
The Postgres Professional company was founded in 2015; it
unites key Russian developers whose contribution to Post-
greSQL is recognized in the global community. Building
database development expertise in Russia, the company
currently employs about 150 developers, architects, and
engineers.
The Postgres Professional company delivers several versions
of Postgres Pro database system based on PostgreSQL, as well
as develops new core features and extensions and provides
support for application system design, maintenance, and
migration to PostgreSQL.
The company pays much attention to education. It hosts
PgConf.Russia, the largest international annual PostgreSQL
conference in Moscow, and participates in other conferences
all over the world.
Contact information:
   7A Dmitry Ulyanov str., Moscow, Russia, 117036
   +7 495 150-06-91
   info@postgrespro.ru
172    Postgres Pro Database System
 xii
       Postgres Pro is a Russian commercial database system de-
       veloped by the Postgres Professional company. Based on the
       open-source PostgreSQL database system, Postgres Pro offers
       many additional features to satisfy the needs of enterprise
       customers. It is included into the unified register of Russian
       software.
       Postgres Pro Standard contains all the PostgreSQL features
       and additional extensions and core patches, including those
       that are not yet accepted by the community. As a result,
       its users can get access to useful functionality and improve
       performance without having to wait for the next PostgreSQL
       version to be released.
       Postgres Pro Enterprise is a considerably reworked version of
       the database system; offering better stability and increased
       performance, it can address challenging production-level
       tasks.
       Both Postgres Pro versions have been extended with the
       required information security functionality and are certified
       by FSTEC (Federal Service for Technical and Export Control).
       To use any Postgres Pro version, you have to buy a license.
       A trial version is available for free; you can also get Postgres
       Pro at no cost for educational purposes or application devel-
       opment.
       To learn more about the features specific to different Postgres
       Pro versions, go to postgrespro.com/products.
Services                                                         173
                                                                 xii
Fault-Tolerant Solutions for Postgres
Designing and implementing high-load, high-performance,
and fault-tolerant production systems; providing consulting
services. Deploying Postgres and optimizing system configu-
ration.
Vendor Technical Support
24 x7 support for Postgres Pro and PostgreSQL: system moni-
toring, disaster recovery, incident analysis, performance man-
agement, debugging both core features and extensions.
Migration of Application Systems
Estimating complexity of migration to Postgres from other
database systems. Defining the architecture and the required
changes for new solutions. Migrating application systems to
Postgres and providing support during migration.
Postgres Training
Courses for database administrators, system architects, and
application developers covering Postgres specifics and effi-
cient use of its advantages.
Database System Audit
Database system evaluation by Postgres Professional experts.
Information security audit for Postgres-based systems.
A complete list of services is available at postgrespro.com/
services.
               Pavel Luzanov
                 Egor Rogov
                Igor Levshin
    Postgres. The First Experience
      Translated by Liudmila Mantrova
          Edited by Peter Lagutkin
     Cover design by Alexander Gruzdev
      9th edition, revised and updated
postgrespro.com/community/books/introbook
    © Postgres Professional, 2016–2023
          ISBN 978-5-6045970-3-3