#review
History
 1. The POSTGRES project, led by Professor Michael Stonebraker, was
   sponsored by the Defense Advanced Research Projects Agency
   (DARPA), the Army Research Office (ARO), the National Science
   Foundation (NSF), and ESL, Inc.
 2. The implementation of POSTGRES began in 1986. The initial concepts
   for the system were presented in The design of POSTGRES , and the
   definition of the initial data model appeared in The POSTGRES data
   model . The design of the rule system at that time was described in The
   design of the POSTGRES rules system. The rationale and architecture of
   the storage manager were detailed in The design of the POSTGRES
   storage system.
   https://web.mit.edu/cygwin/cygwin_v1.3.2/usr/doc/postgresql-
   7.1.2/html/rules.html
   https://dsf.berkeley.edu/papers/ERL-M87-06.pdf
 3. In 1994, Andrew Yu and Jolly Chen added an SQL language interpreter
    to POSTGRES. Under a new name, Postgres95 was subsequently
   released to the web to find its own way in the world as an open-source
   descendant of the original POSTGRES Berkeley code.
 4. By 1996, it became clear that the name "Postgres95" would not stand the
    test of time. We chose a new name, PostgreSQL, to reflect the
   relationship between the original POSTGRES and the more recent
   versions with SQL capability. At the same time, we set the version
     numbering to start at 6.0, putting the numbers back into the sequence
     originally begun by the Berkeley POSTGRES project.
PostgreSQL
     PostgreSQL is cross platform and runs on many operating systems such
     as Linux, FreeBSD, OS X, Solaris and Windows
     PostgreSQL features transactions with ACID properties
     PostgreSQL manages concurrency through multiversion concurrency
     control (MVCC)
MVCC
MVCC is a database optimization method that allows multiple transactions to
access the same data concurrently without locking, by keeping multiple
versions of data objects.
This approach improves performance and scalability by enabling reads to
occur without waiting for writes to complete, and vice versa, thereby reducing
conflicts and delays in a multi-user environment.
PostgreSQL - tools
     Installation resources - https://www.postgresql.org/download/
     Documentation - https://www.postgresql.org/docs/
We will be working on Windows. Remember however that PostgreSQL for
production purposes will be usually hosted in Linux environment.
PostgreSQL on Windows
Performance: PostgreSQL is known to perform better on UNIX-like systems.
This is due to differences in how Windows and UNIX-like operating systems
handle file system operations and process management. PostgreSQL's
architecture and optimization are more naturally suited to the UNIX-like
environments.
File System: The file systems commonly used on UNIX-like systems (e.g.,
EXT4, XFS on Linux) are often more efficient at handling the types of
read/write operations databases perform, compared to the NTFS file system
on Windows.
Process Forking: PostgreSQL relies on process forking for its operation, a
task that is more efficiently performed in UNIX-like systems. Windows does
not natively support fork(), and though PostgreSQL works around this, it incurs
additional overhead.
Support and Community: The PostgreSQL community and the majority of its
development effort are focused on UNIX-like systems. This means better
support, more frequent updates, and a wider array of tools and extensions for
these systems.
Operational Tools and Scripts: Many of the operational tools, scripts, and
best practices are designed with UNIX-like systems in mind. This includes
backup tools, monitoring solutions, and performance tuning utilities.
Deployment Standards: UNIX-like systems are a common choice for server
environments, making them a standard choice for deploying production
databases. This standardization helps in leveraging community knowledge,
documentation, and experienced personnel.
Security: UNIX-like systems have a long history of being used in multi-user,
networked environments, leading to a mature and robust security model that's
well-suited for protecting a database system.
While it's technically possible to run PostgreSQL on Windows for development
or testing, for production environments, a UNIX-like system is generally
preferred for its performance, reliability, and ecosystem support.
General architecture overview
  1. Postmaster Process: When PostgreSQL starts, it launches the
     postmaster process. This process listens on the specified port for
    incoming connection requests from client applications.
  2. Forking New Processes: Upon receiving a connection request, the
    postmaster forks a new backend process for each client connection. This
    means that for each client connected to the database, there is a separate
    server process handling that connection.
  3. Dedicated Backend Process: Each backend process is dedicated to
    serving its client: executing queries, accessing the database files, and
    performing transactions. This process-based approach isolates the
    activities of different clients, enhancing security and stability.
  4. No Threading: Unlike some other database management systems that
    use threading, PostgreSQL's use of separate processes means that it
    does not rely on threading to handle concurrent connections. This design
    choice is partly why PostgreSQL is traditionally favored on UNIX-like
    systems, which efficiently manage processes.
  5. Shared Memory and IPC: PostgreSQL backend processes
    communicate with each other and manage concurrency control (locks,
    shared buffers, etc.) using shared memory and inter-process
    communication (IPC). This allows for efficient data sharing and
    coordination among the processes.
  6. Connection Pooling (External): Because each connection is handled
     by a separate process, the overhead can become significant with a high
    number of concurrent connections. To mitigate this, external connection
    poolers (like PgBouncer) are often used to manage a pool of connections
    that client applications can share, reducing the overhead on the
    PostgreSQL server.
    https://www.pgbouncer.org/
Processes vs threads
Connection pooling and the use of threads versus processes are two
important architectural choices that database management systems (DBMSs)
make to handle concurrent connections and operations efficiently.
These choices significantly affect the DBMS's performance, scalability, and
how it manages resources.
PostgreSQL: PostgreSQL uses a process-based model for handling client
connections. When a client connects to PostgreSQL, the main server process
( postmaster ) forks a new backend process dedicated to handling that
connection. Each connection has its own process, and these processes are
isolated from each other, which improves stability and security but can
increase overhead, especially with a large number of connections.
PostgreSQL does not include built-in connection pooling.
MySQL: MySQL traditionally uses a thread-based model for handling client
connections, especially in its default storage engine, InnoDB. Each client
connection is handled by a separate thread in the server process.
This model can be more memory-efficient than a process-based model
because threads share the same address space and resources, reducing the
per-connection overhead.
MySQL provides built-in connection pooling in its client libraries, like
Connector/J for Java.
Oracle: Oracle can use both processes and threads, depending on the
configuration and the operating system. On UNIX and Linux, it uses a
process-based model similar to PostgreSQL but also supports threading for
certain operations and components, combining the advantages of both
approaches.
Oracle includes built-in connection pooling through features like Oracle
Connection Pooling and Database Resident Connection Pooling (DRCP).
Microsoft SQL Server: SQL Server uses a thread-based model to handle
client connections. It employs a thread pool to manage the execution of these
connections, optimizing resource use and allowing for a high degree of
concurrency with lower overhead compared to a process-based model.
SQL Server supports connection pooling through its client libraries, such as
ADO.NET for .NET applications, which manage pools of connections to
reduce the overhead of opening and closing connections frequently.
PostgreSQL - choice
PostgreSQL opts for a process-based approach for handling connections and
relies on external tools for connection pooling. This choice emphasizes
stability, security, and isolation at the expense of higher resource usage with
many connections.
In contrast, databases like MySQL and SQL Server use thread-based models
and provide built-in support for connection pooling, aiming for efficiency and
scalability, particularly in environments with a high number of concurrent
connections.
Oracle offers a more flexible approach, capable of utilizing both processes
and threads, and provides comprehensive connection pooling mechanisms.
PostgreSQL - high level architecture
Shared memory
In PostgreSQL, shared memory is a critical component that allows different
server processes to access common data structures and buffers, enhancing
performance and efficiency.
Shared buffers
These are a part of the shared memory where PostgreSQL caches table data
and indexes. By keeping frequently accessed data in memory, PostgreSQL
can significantly reduce disk I/O, speeding up data retrieval and query
execution. The size of the shared buffers is configurable, and tuning it
correctly is vital for database performance.
WAL buffers
WAL stands for Write-Ahead Logging. WAL buffers are used to temporarily
hold changes to the database before they are written to the disk in the WAL
files. This mechanism ensures data integrity and allows the database to
recover from crashes by replaying these logs. The WAL buffers are an
essential part of PostgreSQL's durability and data integrity guarantees.
CLOG buffers
CLOG stands for Commit Log. These buffers are used to track the commit
status of transactions. PostgreSQL uses this information to determine whether
a transaction has been committed or rolled back, which is crucial for
maintaining transactional integrity and supporting features like point-in-time
recovery.
Temp buffers
Temp buffers are used for temporary tables. When queries involve operations
that need intermediary storage, like complex sorts or hash operations, and the
data does not fit into the work memory, PostgreSQL uses temporary tables.
Temp buffers are allocated separately from the main shared buffers to hold
data for these temporary tables, preventing the main cache from being
polluted by temporary data that is not needed after the transaction ends.
File types
In PostgreSQL, several key file types play vital roles in database storage,
transaction logging, error logging, and data recovery. Here's a brief overview
of each:
Data files
These files store the actual data of the database, including tables, indexes,
and other database objects. Data files are organized into a directory structure
under the PostgreSQL data directory ( PGDATA ). Each table and index is
stored in one or more files on the disk. The organization and naming of these
files are managed by PostgreSQL and are transparent to users.
Page files
  1. Page is the smallest unit of data storage
  2. Every table and index is stored as an array of pages of fixed size
  3. By default, in PostgreSQL the page size is 8kb
  4. We can configure different page size during compiling the server
  5. All pages are logically equivalent, and any row can be stored on any
     page
Page Header (Header Info):
     *pd_lsn: Log sequence number indicating the WAL location for the latest
     change to the page.
     pd_checksum: Optional checksum value for this page.
     pd_flags: Flags to provide additional information about the page, such
     as whether it's been pruned.
     pd_lower: The offset to the location in the page where the list of item
     pointers (line pointers) ends.
     pd_upper: The offset to the free space on the page (the start of the
     hole).
     pd_special: The offset to the location of special data at the end of the
     page, such as index-specific data.
     pd_pagesize_version: Combined field for the size of the page and
     version code.
     pd_prune_xid: Transaction ID of the last prune operation on the page.
Line pointers:
These are item pointers that act as an array index; each one points to a tuple
(row) within the page. They show the actual position of the tuple on the page
and allow tuples to be moved around within a page to keep them contiguous
without changing the line pointer array.
Heap Tuples (Record Data):
These are the actual data records (tuples) stored on the page. A tuple consists
of one or more fields, corresponding to table columns.
Free Space (Hole):
The area between pd_upper and pd_special is considered free space, which
can be used to store new tuples or to accommodate the expansion of existing
tuples.
This structure allows PostgreSQL to efficiently manage the tuples within a
page, providing rapid access to records, as well as space and transaction
management. The page header contains meta-information about the page,
which the database system uses to manage the page's contents and maintain
consistency and integrity.
WAL Files (Write-Ahead Logging Files)
WAL files are crucial for data integrity and recovery. PostgreSQL uses write-
ahead logging to record every change made to the database's data in these
files before the changes are written to the actual data files. This approach
ensures that, in the event of a crash, the database can be recovered to a
consistent state by replaying these logs. WAL files are stored in the pg_wal
directory (formerly known as pg_xlog in versions before PostgreSQL 10).
Using WAL results in a significantly reduced number of disk writes, because
only the WAL file needs to be flushed to disk to guarantee that a transaction is
committed, rather than every data file changed by the transaction. The WAL
file is written sequentially, and so the cost of syncing the WAL is much
less than the cost of flushing the data pages. This is especially true for
servers handling many small transactions touching different parts of the data
store. Furthermore, when the server is processing many small concurrent
transactions, one fsync of the WAL file may suffice to commit many
transactions.
Log files
PostgreSQL generates log files to record various operational events, including
errors, warnings, and informational messages about database operations. The
configuration of logging is highly customizable, allowing administrators to
specify what types of events are logged and where the logs are stored. Log
files are essential for diagnosing problems, monitoring database activity, and
auditing.
Archive files
Archive files are essentially copies of WAL files that are stored outside the
main PostgreSQL data directory for backup and recovery purposes. Archiving
WAL files is part of a disaster recovery strategy, enabling point-in-time
recovery (PITR) or replication. The process of archiving is controlled by the
archive_command setting in PostgreSQL's configuration, which specifies how
to copy each WAL segment to a designated archive location.
Utility processes
In PostgreSQL, several background utility processes work together to ensure
the database's smooth operation, data integrity, and performance optimization.
Writer (Background writer)
The Background Writer process is responsible for writing modified pages from
shared buffers (in-memory cache of database blocks) to the disk. This process
helps in reducing disk I/O and managing memory efficiently by writing out
"dirty" pages in the background, allowing the database to handle more
transactions.
WAL Writer
The Write-Ahead Logging (WAL) Writer process is tasked with writing WAL
records from the in-memory WAL buffers to the disk. This is crucial for
ensuring that all changes to the database are recorded promptly in the WAL
files, which are essential for database recovery and durability.
Checkpointer
The Checkpointer process periodically writes a consistent state of the
database to disk, creating checkpoints. Checkpoints are important for
recovery processes because they mark a point in time from which the
database can be restored using the WAL files, reducing recovery time.
Archiver process
The Archiver process copies completed WAL files to a designated archive
location for safekeeping. This archiving is part of PostgreSQL's continuous
archiving feature, enabling point-in-time recovery (PITR) and supporting
replication setups.
Logging collector
This process is responsible for capturing log messages and redirecting them
to the appropriate log files or other destinations as configured. The Logging
Collector consolidates logging from various server processes, helping in
centralized log management and analysis.
Stats collector
The Statistics Collector gathers information about database activity, such as
the number of operations performed on tables and indexes, and system usage
statistics. This data is essential for monitoring, performance tuning, and query
optimization, as it provides insights into how the database is being used.
Autovacuum launcher
The Autovacuum Launcher process oversees the autovacuum workers, which
are responsible for automatic vacuuming (cleanup of dead tuples) and
analyzing (updating statistics for the optimizer) of the database tables. This
process helps in reclaiming storage space, preventing table bloat, and
keeping the statistics used by the query planner up-to-date, which is crucial for
maintaining good database performance.
Processing the SQL command - high level
Parsing
   Check Query Syntax: The parser verifies that the SQL query is written
   correctly according to the SQL grammar rules. This includes checking for
   proper use of keywords, commas, parentheses, and other syntax
   elements.
   Check Query Type: The parser determines the type of the SQL query,
   such as SELECT, INSERT, UPDATE, DELETE, etc. This helps the
   database understand what action is being requested.
   Split the Query into Separate Elements: The parser breaks down the
   SQL query into its constituent elements, such as the SELECT clause,
   WHERE clause, JOIN conditions, etc. This dissection helps in creating a
   parse tree or an abstract syntax tree that represents the query's logical
   structure.
Optimization
   Generate Query Execution Plan: The optimizer evaluates various ways
   to execute the query and generates one or more possible execution
   plans. These plans outline different strategies for how the database can
   retrieve or modify the data.
   Choose Statistics: The optimizer selects relevant statistical information
   about the database tables and indexes involved in the query. This
   information can include the size of tables, cardinality of fields, and
   distribution of values, which are crucial for making informed decisions
   about the most efficient way to execute the query.
   Calculate Cost: For each potential execution plan, the optimizer
   calculates an estimated cost, which represents the resources needed to
     execute the plan, such as CPU, memory, and I/O. This estimation is
     based on the query's complexity, the size of the data, and the statistics
     chosen.
     Choose Execution Plan: Based on the cost estimations, the optimizer
     selects the most efficient execution plan for the query. This is the plan
     that the database expects to consume the least amount of resources and
     complete the operation in the shortest time.
Execution
Query Execution: This is the phase where the database engine carries out
the steps outlined in the chosen execution plan. It involves accessing the
physical tables and indexes, performing joins, applying filters, aggregating
data, and finally, presenting the results to the user or application. The
execution phase interacts with the storage engine to read or modify the actual
data stored on disk.
Step-by-step - select operation
  1. Query Reception: The client application sends a SELECT query to the
     PostgreSQL server, which is picked up by one of the backend processes
     dedicated to that client connection.
2. Parsing and planning: explained the the text above.
3. Shared buffers: Before actual execution, the database checks its shared
  buffers, which is the in-memory cache where PostgreSQL stores pages
  read from the disk. If the data needed for the query is already in the
  shared buffers, it can be accessed directly, saving time by avoiding disk
  I/O.
     1. If the required data is not in the shared buffers, PostgreSQL loads it
       from the data files on disk into the shared buffers.
     2. PostgreSQL implements a Least Recently Used (LRU) strategy to
       manage the shared buffers. If the buffers are full, the least recently
       used data is evicted to make room for new data.
4. Execution: With the execution plan chosen and data in shared buffers,
   PostgreSQL proceeds with the execution. This involves:
     1. Accessing the required rows in the shared buffers.
     2. Performing any necessary joins, sorting, filtering, and aggregation
        as dictated by the execution plan.
     3. Generating the result set.
5. WAL (Write-Ahead Logging): For a SELECT statement, the WAL is
  generally not involved because there is no data modification. However,
  WAL comes into play if the query triggers any data changes, such as
  updates caused by a function or a trigger.
6. Background Writer and WAL Writer: Although they are not directly
  involved in reading operations, these processes ensure that the shared
  buffers are efficiently written back to disk and that the WAL records are
  flushed to the storage. This ensures that the shared buffers have enough
  space for new data and that the system can recover in case of a crash.
7. Autovacuum: While not directly involved in the execution of a specific
   SELECT query, the autovacuum process is indirectly important. It
  maintains table and index health by cleaning up dead tuples and
  updating statistics, which helps the planner generate efficient execution
  plans.
8. Result Return: Finally, the backend process returns the result set of the
  SELECT query to the client application.
Step-by-step - update
 1. Query Reception: The UPDATE query arrives from the client and is
    received by the PostgreSQL server process dedicated to the client's
   connection.
 2. Parsing and Planning: The query is parsed to validate its syntax and
    then a parse tree is created. The planner/optimizer takes this parse tree
   and generates a query plan, using statistics to determine the most
   efficient way to locate and update the targeted rows.
 3. Shared Buffers: PostgreSQL checks if the data pages to be updated are
    in the shared buffers.
      1. If the data is in the shared buffers, it can be directly accessed.
      2. If not, the relevant pages are loaded from the disk into the shared
         buffers.
 4. WAL Write: Before the data pages are modified in the shared buffers, the
    changes are recorded in the WAL. This ensures that, in case of a crash,
   the database can recover by replaying the WAL entries.
 5. Execute Update:
      1. The database engine locks the rows to be updated to maintain
        isolation.
      2. The actual update is performed on the in-memory data pages within
        the shared buffers.
      3. If row versions are moved because of the update (due to row size
        change), index entries pointing to the row are updated, and the new
        row version is marked as the latest.
      4. Visibility information is updated to reflect that these rows have a
        new version.
 6. Background Writer and WAL Writer
      1. The background writer may write dirty pages from shared buffers to
         the disk, but it usually does so based on its regular scheduling.
      2. The WAL writer flushes WAL records to the disk at intervals to
         ensure that the log is up-to-date on disk.
 7. CLOG Update: The commit log (CLOG) is updated to reflect that the
    transactions have been committed, ensuring the changes are permanent
   and visible to other transactions according to the isolation rules.
 8. Autovacuum and Freezing
      1. The autovacuum process eventually cleans up the old row versions
         that are no longer visible to any transactions to reclaim space and
        maintain performance.
      2. During this, it might also "freeze" old rows to avoid transaction ID
        wraparound issues by marking them as permanently visible to all
        transactions.
 9. Checkpointer
      1. The checkpointer process periodically forces both dirty shared
         buffers and WAL records to the disk, creating consistent
        checkpoints.
      2. This process ensures data is not lost and aids in faster recovery if
        the system crashes.
10. Archiver
      1. If archiving is enabled, completed WAL segments are copied to the
         archive location after they are full or a switch has been requested.
        This is essential for point-in-time recovery and replication.
11. Result Return: After the transaction is committed, PostgreSQL returns a
   command tag to the client application, indicating the number of rows
   updated.
Step-by-step - INSERT
 1. Query Reception: When a client sends an INSERT query to
   PostgreSQL, it is received by the PostgreSQL server process (backend)
   that is handling the client connection.
 2. Parsing and Planning: The query is parsed to check the syntax, and the
    planner/optimizer generates a plan for the query. The plan includes
   where and how to insert the rows.
 3. Shared Buffers: PostgreSQL checks its shared buffers to see if the pages
   into which the new rows will be inserted are already in memory.
      1. If the pages are in the shared buffers, PostgreSQL can directly
         insert the new rows into these pages.
      2. If the pages are not in memory, PostgreSQL reads them from the
         disk into the shared buffers.
 4. WAL Logging: Before making any changes, PostgreSQL writes the insert
   action to the WAL. This ensures that the database can recover the insert
   if there is a crash before the changes are written to disk.
 5. Execute Insert: PostgreSQL then:
      1. Inserts the new row(s) into the appropriate page in the shared
         buffers.
      2. If indexes are affected by the insert, it updates the indexes
        accordingly.
      3. Updates visibility information to indicate the presence of the new
        row(s).
 6. Background Writer and WAL Writer: While these processes are not
    directly involved in the INSERT command execution, they are essential
   for durability.
      1. The Background Writer may write dirty pages from the shared
         buffers to the disk.
      2. The WAL Writer ensures the WAL is written to disk at regular
        intervals.
 7. Commit: The transaction is committed, making the insert permanent. The
    commit log (CLOG) is updated to reflect this.
 8. Autovacuum: This background process will later clean up dead tuples left
    by superseded versions of rows, which is not usually an immediate part
   of an INSERT operation but is critical for long-term maintenance.
 9. Return Result: PostgreSQL returns a result to the client, typically the
    number of rows inserted or a confirmation that the insertion was
   successful.
OS cache
PostgreSQL configuration
https://pgtune.leopard.in.ua/
The "out-of-the-box" configuration of PostgreSQL is deliberately conservative
to ensure it will start up on a wide variety of hardware and software
environments.
This conservative setup is often not optimized for specific workloads or
hardware configurations, leading to the perception that additional configuration
is necessary for optimal performance.
Editing configuration
  1. Edit in postgresql.conf
  2. Restart server or run SELECT pg_reload_conf();
  3. SHOW work_mem;
shared_buffers
It is always faster to read or write data in memory than on any other media. A
database server also needs memory for quick access to data, whether it is
READ or WRITE access. In PostgreSQL, this is referred to as "shared buffers"
and is controlled by the parameter shared_buffers.
As a starting point, a common recommendation is to set shared_buffers to
about 25% to 40% of the available system RAM on a dedicated database
server.
For systems with a lot of RAM, setting shared_buffers to a value larger than
32GB might not yield better performance due to the way operating systems
manage memory. It's often more effective to rely on the operating system's
cache beyond a certain point.
work_mem
work_mem is another crucial memory-related configuration parameter in
PostgreSQL, determining the amount of memory to be used by internal sort
operations and hash tables before writing to temporary disk files.
This setting plays a significant role in the performance of queries that involve
sorting, aggregations, or complex joins.
     Usage per Query: Unlike shared_buffers , which is shared across all
     connections, work_mem is allocated for each sort, hash, or
     materialization operation within a query.
     Risk of Oversizing: Setting work_mem too high can lead to excessive
     memory usage because each session or query operation that needs
     memory can allocate up to the specified amount. On systems with many
     concurrent queries, this can quickly add up and potentially exhaust
     available memory.
     Tuning for Workload: There's no one-size-fits-all value for work_mem; it
     depends on your workload, the query patterns, and the system's RAM. A
     common approach is to start with a conservative value and increase it if
     you notice queries are frequently spilling to disk (observable through the
     EXPLAIN ANALYZE command or monitoring tools).
A typical starting point might be a few megabytes, such as 4MB or 8MB. You
might adjust this based on the available system memory and the nature of
your queries.
maintenance_work_mem
maintenance_work_mem is a configuration parameter in PostgreSQL that
determines the maximum amount of memory used for maintenance
operations, such as VACUUM , CREATE INDEX , and ALTER TABLE ADD FOREIGN
KEY . It's specifically designed to improve the performance of these operations
by allowing them to use more memory, which can significantly reduce their
execution time, especially on large databases.
A common recommendation is to start with a value between 128MB to 1GB for
systems with ample memory, adjusting based on your specific needs and
observations. Another recommendation is to use 5% of the available RAM.
temp_buffers
temp_buffers is a PostgreSQL configuration parameter that determines the
amount of memory allocated for temporary tables within each database
session.
Temporary tables are those explicitly created with the TEMPORARY keyword
in SQL commands. This setting is important for workloads that heavily rely on
temporary tables, as it can significantly affect performance by reducing disk
I/O for operations involving these tables.
The default value is typically relatively low (e.g., 8MB). If your application
makes extensive use of temporary tables and you have sufficient memory,
increasing this value can improve performance. However, since this memory
is not shared and each session gets its own allocation, setting this value too
high on systems with many concurrent connections could lead to excessive
overall memory consumption.
effective_cache_size
effective_cache_size is a configuration parameter in PostgreSQL that provides
the database optimizer with information about the expected size of the
system's cache (including both PostgreSQL's own cache and the operating
system's disk cache).
This parameter helps the query planner make more informed decisions,
especially when choosing between different query plans that involve disk I/O
operations, such as sequential scans versus index scans.
Usually we set it for around 50%-75% of the available RAM remory.
max_wal_size
max_wal_size is a PostgreSQL configuration parameter that specifies the
maximum size of WAL (Write-Ahead Logging) files that can accumulate on
your disk before a checkpoint is forced. This setting helps manage the amount
of WAL data that must be written to disk and retained, which can affect your
database's performance, especially for write-heavy workloads, and influence
recovery time after a crash.
Purpose: Increasing max_wal_size allows PostgreSQL to delay checkpoints,
potentially improving performance by reducing disk I/O. However, a larger
WAL can also mean more data to process during recovery, so it's about
finding a balance.
Default Value: PostgreSQL has a default setting (e.g., 1GB in some versions),
but the optimal size depends on your workload and disk capacity Companion
Setting: min_wal_size is another parameter that works alongside
max_wal_size to define a range within which the WAL size can fluctuate.
PostgreSQL will try to recycle WAL files up to min_wal_size without deleting
them, providing a buffer for bursts of write activity.
wal_buffers
Setting wal_buffers in PostgreSQL configures the amount of memory allocated
for storing WAL (Write-Ahead Logging) records before they are written to disk.
This parameter is crucial for the performance of write operations and overall
database throughput, especially in systems with high transaction rates.
Default Setting: PostgreSQL automatically sets wal_buffers to a default value
based on the setting of shared_buffers. The default is typically set to 1/32 of
the shared_buffers value, but not more than 64MB.
Cluster
In PostgreSQL, the term "cluster" refers to a collection of databases managed
by a single PostgreSQL server instance. This concept is slightly different from
what some might expect, especially if they're familiar with clustering in the
context of high availability or distributed databases. Here's a breakdown of
what a cluster means in PostgreSQL:
     Server Instance: A single running process of PostgreSQL, which can
     manage multiple databases.
     Databases: Within a cluster, PostgreSQL can host multiple isolated
     databases. These databases are separate from each other, meaning
     that, by default, data and user roles are not shared across databases.
     Data Directory: Every cluster has its own data directory, where all the
     data files, including tables, indexes, and configuration files, are stored.
     This directory is specified when the PostgreSQL server is initialized
     (using initdb ) and is unique to that cluster.
Preparing the new cluster
  1. First before creating a cluster its a good practice to define the PG_DATA
  2. Add the path to binaries to PATH env variable
  3. Create the directories for PG_DATA
       1. One for PG_DATA
       2. One for wal files
  4. Add the permissions for user postgres to created directories
  5. Use the initdb to create new cluster
  6. Create the service in systemd
Check the initdb -help
Initdb –D /postgres/data –X /postgres/wal –E=UTF8 -W
systemd - service
Reloading configuration
  Systemctl deamon-reload
  Systemctl start postgresql
  Systemctl status
Configuration of remote access
The pg_hba.conf file is a crucial configuration file in PostgreSQL, responsible
for controlling client authentication. "HBA" stands for Host-Based
Authentication.
This file specifies which users can connect to which databases, from which
hosts, and using which authentication methods. By editing pg_hba.conf,
database administrators can enforce security policies and manage access
controls in a fine-grained manner.
psql
psql is the interactive terminal client for PostgreSQL, providing a command-
line interface for interacting with the PostgreSQL database server. It allows
users to execute SQL queries, run scripts, and manage both the data and the
schema of a database directly from the terminal. psql is included with
PostgreSQL installation packages and is widely used by database
administrators and developers for its flexibility and full-featured support for
PostgreSQL's capabilities.
Key Features of psql
     Interactive Commands: psql supports a wide range of interactive
     commands that allow users to navigate databases, view table data, and
     manage database objects.
     Script Execution: Users can execute SQL script files from within psql
     using commands like \i filename.sql , facilitating automation and
     batch processing.
     Metadata Queries: It provides commands for listing database objects
     (databases, tables, indices, etc.) and viewing object information, which
     are helpful for database exploration and administration.
     Transaction Control: psql allows users to manually manage
     transactions with SQL commands ( BEGIN , COMMIT , ROLLBACK ), giving
     precise control over database operations.
     Customizable Environment: Users can customize the behavior of psql
     through various settings, such as output formatting options, and can use
      .psqlrc file to apply their customizations automatically upon startup.
     Extensive Help System: psql includes an integrated help system,
     accessible via the \h (for SQL command help) and \? (for psql
     command help) commands, making it easier to use even for newcomers.
Connecting to cluster
 psql -h <hostname> -p <port> -U <user> -d <database>
We can also use environment variables.
PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD
 psql -U postgres
 create database test1;
 create database test2;
 create user kuba with password 'kuba';
 \du
 psql -U kuba -d test1;
Help
Commands in psql can be divided into SQL commands and meta-commands.
 \? - display help
Display list of databases
 \l
 or
 \l+
Display list of tables
 \dt
Running SQL command
 psql -U postgres -c "select current_time"
 psql -U postgres -c "select current_time;" -f
 C:\Users\Piotr\Desktop\skrypt.sql -c "select current_time;„
 psql -U postgres -c "select current_time;" -f
 C:\Users\Piotr\Desktop\skrypt.sql -c "select current_time;„ -L
 load_table.log
 psql -U postgres -c "select current_time;" -f
 C:\Users\Piotr\Desktop\skrypt.sql -c "select current_time;„ -o
 load_table.log
Switching databases
 \c nazwaBazy
 \conninfo
 \q
Watch
 select * from pg_stat_activity;
 \watch
Extended display
 select * from pg_stat_activity;
 \x
 select * from pg_stat_activity;
Display help about command
 \h UPDATE
Run command from the file
 \i path_to_file
Execution time
Displays execution time for a command
  \timing
Service file
Service file pozwala zautomatyzować na poziomie OS lub usera proces
łączenia się do psql
Na poziomie OS”:
/etc/pg_service.conf
Na poziomie konta użytkownika OS (można zmienić z pomocą
PGSERVICEFILE)
~/.pg_service.conf:
Wpis na poziomie użytkownika jest ważniejszy!
Logical structure of the database
In PostgreSQL, a template database serves as a blueprint for creating new
databases.
PostgreSQL comes with two default template databases: template0 and
template1. When you create a new database, PostgreSQL clones the
template database you specify (or template1 by default) to create the new
database. This mechanism allows for the new database to inherit not just the
schema (tables, data types, functions, etc.) but also the data from the
template database if it contains any.
template1
     Default Template: template1 is the default template database.
     Whenever you create a new database without specifying a template,
     PostgreSQL uses template1 .
     Customizable: Users can modify template1 , such as by adding
     common extensions or user-defined data types, so that every new
     database created from it will inherit those modifications. However,
     because it is the default template, it's important to be cautious about the
     changes made to it.
template0
     Fallback Template: template0 is a clean, unmodified template. It exists
     as a fallback solution so you can create a new database even if
     template1 has been changed in such a way that it's no longer usable
     for creating new databases.
     Immutable: It is recommended to keep template0 unchanged to
     ensure that there is always a stable template from which to create new
     databases.
Working with template database - superuser
Only superuser can use other (non-template) database as template.
  create database test1;
  \c test1 postgres;
  create table student(id int);
  \c postgres postgres
  create database clone_test1 template test1;
Working with template database - user
 drop database clone_test1;
 alter user jacek createdb;
 \c postgres jacek
 create database clone_test1 template test1;
 create database clone_test1 template template0;
 \c postgres postgres
 alter user Jacek nocreatedb;
Showing list of schemas
 \dn
Creating a new schema
Creating users/roles
 Display users
 \du
 \h create user
 \h create role
 Create user michal with password ‚michal’ inherit;
 Create role devteam;
 \du
   Alter role devteam createdb;
   Grant devteam to michal;
   Create database baza_michala;
   Set role devteam;
   Create database baza_michala;
   \c postgres postgres
We will talk about permissions in details.
Bibliography
 - Official documentation (https://www.postgresql.org/docs/)
- PostgreSQL 11 Administration Cookbook, Simon Riggs, Gianni Ciolli
- PostgreSQL Administration Essentials
- Additional materials - Łukasz Bartnicki, Lucian Oprea
- The internals of PostgreSQL - http://www.interdb.jp/pg/pgsql01.html
- https://www.postgresql.fastware.com/blog/back-to-basics-with-postgresql-
memory-components
https://archive.org/details/pgza18-Wait_Events_in_PostgreSQL