KEMBAR78
Postgresql DBA Architecture | PDF | Postgre Sql | Database Transaction
100% found this document useful (2 votes)
2K views60 pages

Postgresql DBA Architecture

Complete postgresqlDBA architecture, it will provides decent overview about postgresql memory components, background processes and other control and access control files.

Uploaded by

Rakesh DBA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views60 pages

Postgresql DBA Architecture

Complete postgresqlDBA architecture, it will provides decent overview about postgresql memory components, background processes and other control and access control files.

Uploaded by

Rakesh DBA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Postgresql Database Architecture

Shared Memory Process Memory


Shared_Buffers Maintenance_Work_Memory

Wal_Buffers Temp_Buffers
Clog_Buffers
Work_Mem
Memory_Locks
Vacuum_Buffers

Process
Postgres DB writer Stats collector Check pointer
Wal Writer Archive Logger Auto Vacuum
Physical Data Files

Data files WAL Files Temp Files Clog files Log files

WAL archives Other files


Shared Buffers

8k
LRU
i=20 Dirty buffers

------
t3
t1 t2 t4
Pinned buffers MRU
t5 t6 t7 t8

By default size of the shared_buffers is 128 MB, We can assign 25% of the
physical memory to the Shared_Buffers.
Shared buffers allocating during the startup process using Shared_buffers parameter values.

Dirty Buffers: These buffers contains modified data, those modified buffers called as dirty shared
buffers.

Pinned Buffers: The buffers which are access by the transactions are called as pinned buffers

Algorithms: Entire Buffer management is working internally using LRU & MRU algorithms

If you want to increase the Shared_buffers size, we need to restart the Database to effect the new
values.

Dirty Read: A transaction reads data written by a concurrent uncommitted transaction.

Non-repeatable read: A transaction re reads the data previously read and finds the data changed by
another transaction which has committed since the initial read.

Phantom read: A transaction re executes a query returning a set of rows satisfying a search condition
and finds that the set of rows satisfying the condition has changed because another recently committed
transaction.
WAL Buffers
To ensure that no data has been lost by server failures, PostgreSQL supports the WAL
mechanism. WAL data (also referred to as XLOG records) are transaction log in
PostgreSQL; and WAL buffer is a buffering area of the WAL data before writing to a
persistent storage.

Default value of the WAL_Buffers is 4 MB

WAL_BUFFERS allocate during the startup of the cluster used by the values of
WAL_Buffers in configuration file.

Changes of this parameter needs restart the cluster to effect the new values.
CLOG Buffers

• CLOG stands for "commit log", and the CLOG buffers is an area
in operating system RAM dedicated to hold commit log pages.
The commit log pages contain log of transaction metadata and
differ from the WAL data. The commit logs have commit status
of all transactions and indicate whether or not a transaction has
been completed (committed).

• There is no specific parameter to control this area of memory.


This is automatically managed by the database engine in tiny
amounts. This is a shared memory component, which is
accessible to all the background server and user processes of a
PostgreSQL database.
Memory_Locks
• This memory component is to store all heavyweight
locks used by the PostgreSQL instance. These locks
are shared across all the background server and
user processes connecting to the database.
• A non-default larger setting of two database
parameters
namely max_locks_per_transaction and max_pred
_locks_per_transaction in a way influences the size
of this memory component.
• max_locks_per_transaction (integer) The shared lock
table tracks locks on max_locks_per_transaction * (
max_connections + max_prepared_transactions) objects
(e.g., tables); hence, no more than this many distinct
objects can be locked at any one time. This parameter
controls the average number of object locks allocated for
each transaction; individual transactions can lock more
objects as long as the locks of all transactions fit in the
lock table. This is not the number of rows that can be
locked; that value is unlimited. The default, 64, has
historically proven sufficient, but you might need to raise
this value if you have clients that touch many different
tables in a single transaction. This parameter can only be
set at server start.
• PREPARE TRANSACTION prepares the current transaction for
two-phase commit. After this command, the transaction is no
longer associated with the current session; instead, its state is
fully stored on disk, and there is a very high probability that it
can be committed successfully, even if a database crash occurs
before the commit is requested.
• Once prepared, a transaction can later be committed or rolled
back with COMMIT PREPARED or 
ROLLBACK PREPARED, respectively. Those commands can
be issued from any session, not only the one that executed the
original transaction.
• From the point of view of the issuing session, PREPARE
TRANSACTION is not unlike a ROLLBACK command: after
executing it, there is no active current transaction, and the effects
of the prepared transaction are no longer visible. 
Maintenance_work_mem
• This memory to be used by maintenance operations, such
as VACUUM, CREATE INDEX, and ALTER TABLE for ADDING FOREIGN KEYS.
• This memory segment is allocated for user and is used for the maintenance
operations
• Parameter to maintain this memory is maintenance_work_mem, default
value is 64MB

• Ex:
postgres=# SET maintenance_work_mem TO '384MB ';
SET
postgres=# CREATE INDEX ON personen_vorname_nachname_idx persons
( "last_name”);
CREATE INDEX
WORK_MEM
Work_mem is mainly used for expensive operations like sort
or hash operations like Joining, and filtering of certain
Data used
• ORDER BY, DISTINCT, and merge joins require memory for
sort operations.
hash joins, hash and hash-based processing aggregations
of IN operations require memory for hash tables.
Bitmap index scans require memory for the internal
bitmap.
• Parameter is work_mem, defaul value is 4MB
postgres=# SET work_mem TO '8 GB ';
SET
postgres=# SET trace_sort TO on;
SET
postgres=#SET client_min_messages TO DEBUG;
SET
postgres=# SELECT relname, relkind, relpages FROM pg_class c WHERE
relkind = 'r'
postgres=# relpages ORDER BY DESC, ASC relname
postgres=#LIMIT 1;
LOG: begin tuple sort: NKEYS = 2, Workmen = 8192, random access = f
LOG: starting performsort: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: performsort done: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: internal sort ended, 31 KB used: CPU 0.00s/0.00u sec elapsed 0.00 sec
VACUUM_BUFFER
• Starts the daemon which cleans up tables and indexes, preventing bloat
and poor response times.
• This is the maximum amount of memory used by each of the autovacuum
worker processes, and it is controlled by
the autovacuum_work_mem database parameter. The memory is
allocated from the operating system RAM and is also influenced by
the autovacuum_max_workers database parameter. The setting
of autovacuum_work_mem should be configured carefully
as autovacuum_max_workers times this memory will be allocated from
the RAM. All these parameter settings only come into play when the auto
vacuum daemon is enabled, otherwise, these settings have no effect on
the behavior of VACUUM when run in other contexts. This memory
component is not shared by any other background server or user process.
temp_buffers 
• A database may have one or more temporary tables, and
the data blocks (pages) of such temporary tables need a
separate allocation of memory to be processed in. The
temp buffers serve this purpose by utilising a portion of
RAM, defined by the temp_buffers parameter. The temp
buffers are only used for access to temporary tables in a
user session. There is no relation between temp buffers
in memory and the temporary files that are created
under the pgsql_tmp directory during
large sort and hash table operations.
Bg Writer Process: Default
• The background writer continues to flush dirty
pages/buffers to datafiles. This background writer
scans the shared buffer for dirty pages to write down
to the disk level.
• The default run times of BG writer process is 200
mill. Seconds and for every 100 LRU (least recently
used ) pages/buffers reached in shared buffers.
• Parameters to handle the BG Writer:
-bgwriter_delay= 200 (milli seconds)
-bgwriter_lru_maxpages=100 (buffers)
Check Pointer Process: Default
This process is takes care of cluster checkpoint. When a check
point starts all the dirty pages in memory are written to the
datafiles.
A checkpoint is a known safe starting point for recovery, since at
that time we write all currently outstanding database changes to
disk.
Checkpoint process has two aspects: the preparation of database
recovery, and the cleaning of dirty pages on the shared buffer
pool.
Checkpoint also updates PG_CONTROL file. which holds the
metadata of the current checkpoint.
Parameters:
checkpoint_timeout = 5 min
checkpoint_completion_target = 0.5
pg_control File:

• As the pg_control file contains the fundamental information of the checkpoint, it is certainly
essential for database recovery. If it is broken or unreadable, the recovery process cannot start
up in order to not obtained a starting point.

• Even though pg_control file stores over 40 items,

• State – The state of database server at the time of the latest check pointing starts.
• There are seven states in total:
 'start up' is the state that system is starting up;
 'shut down' is the state that system is going down normally by the shutdown command;
 'in production' is the state that system is running;
……………. and so on.

• Latest checkpoint location – LSN Location of the latest checkpoint record.

• Prior checkpoint location – LSN Location of the prior checkpoint record.

• --------------
• ---------------
WAL Writer Process: default
• WAL writer is a background process to check
the WAL buffer periodically and write all
unwritten XLOG records into the WAL
segments.

• Parameter:
• Wal_writer_delay = 200 ms
• Auto vacuum Launcher Process: non- default

• PostgreSQL does not immediately remove the deleted tuples from the data
files. These are marked as deleted. Similarly, when a record is updated, it's
roughly equivalent to one delete and one insert. The previous version of the
record continues to be in the data file. Each update of a database row
generates a new version of the row. The reason is simple: there can be
active transactions, which want to see the data as it was before. As a result
of this activity, there will be a lot of unusable space in the data files. After
some time, these dead records become irrelevant as there are no
transactions still around to see the old data.

• To recover or reuse disk space occupied by updated or deleted rows.

• To update data statistics used by the PostgreSQL query planner.

• To protect against loss of very old data due to transaction ID wraparound.


statistics collector: Default
• statistics collector is a subsystem that supports collection and reporting of information
about server activity. Presently, the collector can count accesses to tables and indexes in
both disk-block and individual-row terms. It also tracks the total number of rows in each
table, and information about vacuum and analyze actions for each table

• The statistics collector transmits the collected information to other PostgreSQL processes
through temporary files. These files are stored in the directory named by the
stats_temp_directory parameter, pg_stat_tmp by default.

• Echo $PG_STAT_TMP -- linux


• Echo %pg_stat_tmp% -- windows

• The parameter track_activities enables monitoring of the current command being


executed by any server process.
• The parameter track_counts controls whether statistics are collected about table and
index accesses.
• The parameter track_functions enables tracking of usage of user-defined functions.
• The parameter track_io_timing enables monitoring of block read and write times.
postgres server process
• A postgres server process is a parent of all
processes related to a database cluster
management.
• This process is the database backend process
for each established connection.
• It handles the values of connected database,
username, host and transaction information
• One to one :
PG_HBA.CONF
• TYPE :
Host- Type matches either an SSL or plain connection.

Hostssl- only SSL connection


Hostnossl- only plain text

• Database & User :


Database and user columns are used to match specific
databases or users from the incoming connection.
Accept everything using * .
• Address : Is used only if the type uses the tcp/ip and can
be an ip address with network mask or a hostname. Both
ipv4 and ipv6 are supported.

• Method : Authentication, Postgresql supports many


methods from the password challenge to sophisticated
radius or Kerberos's.

– Trust: Allow the connection without any request. Is quite useful


if the password lost.

– Peer: Allow the connections if the OS user matches the


database user. Useful to authenticate to the database on the
local boxes. It is default by installation.
• Password: Allow the connection matching the user and
password with shadow system table. PG_SHADOW

• Md5: Same as the password, this method offers a md5


encryption for the passwords. As the md5 is deterministic a
pseudo random subroutine is used during the password
challenge to avoid the same string to be sent over the network.

• Crypt: The crypt method is similar to the password method.


When using crypt, the password is not sent in clear text, but
through a simple form of encryption. The use of this method is
not very secure, but is better than using the clear text password
method.
• krb4, krb5 : The krb4 and krb5 methods are used to specify
Version 4 or 5 of the Kerberos authentication system. The
installation and configuration of Kerberos is beyond the
scope of this book, but if you wish to authenticate via
Kerberos, these methods are available.

• Ident: The ident method specifies that an ident map should


be used when a host is requesting connections from a valid
IP address listed in the pg_hba.conf file. This method
requires one option.

• - When the connection request matches the pg_hba.conf


and the authentication method is cleared.
• Reject: Reject the connections with the
matching parameters. You should typically use
this to restrict access from specific hosts or
insecure connections.

• Vacuum_freeze_table_age
• Autovacuum_freeze_max_age
BASE

Postgres.auto.conf

DATA

Pg_tblspc
BASE
• base directory contains the database files. Each database have a
dedicated sub-directories-named after the internal database's object id.
A freshly initialized data directory shows only three subdirectories in
the base folder.

• Those corresponds to the two template databases, template0 and


template1, plus the postgres database.
• The numerical directories contains various files, also with the
numerical name which are actually the database's relations, tables and
indices.
• The relation's name is set initially from the relation's object id. Any file
altering operation like VACUUM FULL or REINDEX, will generate a
new file with a deferent name. To find out the real relation's file name
the relfilenode inside the pg_class sub-system table must be queried.
Global:

The global directory contains all the cluster


wide relations. In addition there's the very
critical control file
Pg_xlog
• This is the most important and critical directory, for the
performances and for the reliability.
• The directory contains the transaction's logs, named wal file.
Each le is usually 16 Mb and contains all the data blocks
changed during the database activity. The blocks are written
first on this not volatile area to ensure the cluster's recovery in
case of crash. The data blocks are then written later to the
corresponding data files.
• If the cluster's shutdown is not clean then the wal files are
replayed during the startup process from the last known
consistent location read from control file.
• In order to ensure good performance this location should stay
on a dedicated device.
pg clog
• This directory contains the committed transactions in
small 8k les, except for the serializable transactions.
• The the files are managed by the cluster and the
amount is related with the two GUC parameters
autovacuum_freeze_max_age and
vacuum_freeze_table_age.
• Increasing the values for the two parameters the
pg_clog must store the commit status to the “event
horizon" of the oldest frozen transaction id.
Pg_serial:
Same as pg clog this directory stores the
information's about the committed transactions in
serializable transaction isolation level.

Pg_multixact:
Stores the information's about the multi transaction
status, used generally for the row share locks.

Pg_subtrans:
Stores the sub transactions status data.
Pg_notify:
Stores information's about the LISTEN/NOTIFY
operations.

Pg_stat:
This directory contains the permanent les for the statistic
subsystem.

Pg_twophase:
Stores the two phase commit data. The two phase commit
allows the transaction opening independently from the
session. This way even a different session can commit or
rollback the transaction later.
Pg _tblspc:
• The directory contains the symbolic links to
the tablespace locations. A tablespace is a
logical name pointing a physical location.
• As from PostgreSQL 9.2 the location is read
directly from the symbolic link.
• This make possible to change the tablespace's
position simply stopping the cluster, moving
the data file in the new location, creating the
new symbolic link and starting the cluster.
Pg_snapshots:
• This directory is used to store the exported
snapshots. From the version 9.2 PostgreSQL
offers the transaction's snapshot export where
one session can open a transaction and export a
consistent snapshot.
• This way different session can access the
snapshot and read all together the same
consistent data snapshot.
• This feature is used, for example, by pg dump
for the parallel export.
Pg_ident.conf
• The content of the pg_ident.conf associates identifying usernames with PostgreSQL usernames
via definitions called ident maps . This is useful for users whose system usernames do not
match their PostgreSQL usernames. Some rules you should keep in mind when defining and
using an ident map are:

• Each ident map member is defined on a single line, which associates a map name with an
identifying username, and a translated PostgreSQL username.

• The pg_ident.conf file can contain multiple map names. Each group of single lines with the
same associative map name are considered a single map.

• A single line record to define an ident map consist of 3 tokens: the name of the map, the
identifying username, and the translated PostgreSQL username. This syntax is entered as
follows, where each token is separated by spaces, or tabs:

• mapname, identname, postgresqlname


• mapname: The map name used in the pg_hba.conf file to refer to the ident map.
• identname: The identifying username, which is generally the name of the system user
attempting to establish a connection to the database. This is the name provided by the identd
daemon, which must be running on the system attempting to connect.
• postgresqlname:
PG_CTL
• pg_ctl is a utility for initializing a PostgreSQL
database cluster, starting, stopping, or
restarting the PostgreSQL database server
(postgres), or displaying the status of a
running server.
PG_CTL
• pg_ctl can be used to stop a database cluster
• pg_ctl supports three modes of shutdown
− smart quit after all clients have disconnected
−Fast (default) quit directly, with proper shutdown
(default)
− immediate quit without complete shutdown; will lead to
recovery
Syntax:
• − pg_ctl stop [-W] [-t SECS] [-D DATADIR] [-s] [-m
SHUTDOWN-MODE]
• Restart: mode effectively executes a stop followed by a start

• Reload: mode simply sends the postgres server process


a SIGHUP signal, causing it to reread its configuration files
(postgresql.conf, pg_hba.conf, etc.). This allows changing
configuration-file options that do not require a full server
restart to take effect.

• Status: mode checks whether a server is running in the


specified data directory. If it is, the server's PID and the
command line options that were used to invoke it are
displayed. If the server is not running, pg_ctl returns an exit
status of 3. If an accessible data directory is not
specified, pg_ctl returns an exit status of 4.
postgresql.conf
• There are many configuration parameters that effect the
behavior of the database system.
• All parameter names are case-insensitive.
• Every parameter takes a value of one of five types:
− boolean
− integer
− floating point
− string
− enum
• One way to set these parameters is to edit the file
postgresql.conf, which is normally kept in the data directory.
• Holds parameters used by a cluster
• Parameters are case-insensitive
• Normally stored in data directory
• initdb installs default copy
• Some parameters only take effect on server restart
(pg_ctl restart)
• # used for comments
• One parameter per line
• Use include directive to read and process another
file
• Can also be set using command-line option
• Some parameters can be changed per session
using the SET command.
• Some parameters can be changed at the user
level using ALTER USER.
• Some parameters can be changed at the
database level using ALTER DATABASE.
• The SHOW command can be used to see
settings.
• The pg_settings and pg_file_settings catalog
table lists settings information.
• ALTER SYSTEM command:
− Available starting with PostgreSQL 9.4
− Edits cluster settings without editing postgresql.conf
− Writes the setting to a file called postgresql.auto.conf
• Example:
− ALTER SYSTEM SET work_mem=20480;
• postgresql.auto.conf is always read last during
server reload/restarts
• Reset a parameter change -
− ALTER SYSTEM SET work_mem = DEFAULT;
Postgres.conf : Connection settings
listen _addresses (default localhost) - Specifies the addresses on
which the server is to listen for connections. Use * for all.
• port (default 5432) - The port the server listens on.
• max_connections (default 100) - Maximum number of concurrent
connections the server can support.
• superuser_reserved_connections (default 3) - Number of
connection slots reserved for superusers.
• unix_socket_directory (default /tmp) - Directory to be used for
UNIX socket connections to the server.
• unix_socket_permissions (default 0777) - access permissions of
the Unix-domain socket.
Security and Authentication Settings

authentication_timeout (default: 1 minute): Maximum time to


complete client authentication, in seconds.
• ssl (default: off) - Enables SSL connections.
• ssl_ca_file - Specifies the name of the file containing the SSL
server certificate authority (CA).
• ssl_cert_file - Specifies the name of the file containing the SSL
server certificate.
• ssl_key_file - Specifies the name of the file containing the SSL
server private key.
• ssl_ciphers - List of SSL ciphers that may be used for secure
connections.
Memory Settings
• shared_buffers (default: 128MB) - Size of PostgreSQL shared buffer pool
for a cluster.
• temp_buffers (default: 8MB) - Amount of memory used by each backend
for caching temporary table data.
• work_mem (default: 4MB) - Amount of memory used for each sort or hash
operation before switching to temporary disk files.
• maintenance_work_mem (default: 64MB) - Amount of memory used for
each index build or VACUUM.
• temp_file_limit (default: 1) - amount of disk space that a session can use
for temporary files. A transaction attempting to exceed this limit will be
cancelled. Default is unlimited.
Query Planner Settings
random_page_cost (default 4.0) - Estimated cost of a random page fetch, in
abstract cost units. May need to be reduced to account for caching effects.

• seq_page_cost (default 1.0) - Estimated cost of a sequential page fetch, in


abstract cost units. May need to be reduced to account for caching effects.
Must always set random_page_cost >= seq_page_cost.

• effective_cache_size (default 4GB) - Used to estimate the cost of an index


scan. Rule of thumb is 75% of system memory.

There are plenty of enable_* parameters which influence the planner in


choosing an optimal plan.
Write Ahead Log Settings
• wal_level (default: minimal) - Determines how much information is written to the
WAL. Change this to enable replication. Other values are archive, logical and
hot_standby.
• fsync (default on) - Turn this off to make your database much faster – and silently
cause arbitrary corruption in case of a system crash.
• wal_buffers (default: -1, autotune) - The amount of memory used in shared
memory for WAL data. The default setting of -1 selects a size equal to 1/32nd
(about 3%) of shared_buffers.
• min_wal_size (default 80 MB) – The WAL size to start recycling the WAL files.
• max_wal_size (default 1GB) – The WAL size to start checkpoint. Controls the
number of WAL Segments(16MB each) after which checkpoint is forced.
• checkpoint_timeout (default 5 minutes) - Maximum time between checkpoints.
• wal_compression (default off) – The WAL of Full Page write will be compressed
and written.
Where To Log
• log_destination - Valid values are combinations of stderr, csvlog, syslog,
and eventlog, depending on platform.
• logging_collector - Enables advanced logging features. csvlog requires
logging_collector.
‒ log_directory - Directory where log files are written. Requires logging_collector.
‒ log_filename - Format of log file name (e.g. postgresql-%Y-%M-%d.log). Allows
regular log rotation. Requires logging_collector.
‒ log_file_mode (default 0600) - On Unix systems this parameter sets the permissions
for log files when logging_collector is enabled.
‒ log_rotation_age - Automatically rotate logs after this much time. Requires
logging_collector.
‒ log_rotation_size - Automatically rotate logs when they get this big. Requires
logging_collector.
When To Log
• client_min_messages (default NOTICE) - Messages of
this severity level or above are sent to the client.
• log_min_messages (default WARNING) - Messages of
this severity level or above are sent to the server.
• log_min_error_statement (default ERROR) - When a
message of this severity or higher is written to the server
log, the statement that caused it is logged along with it.
• log_min_duration_statement (default -1, disabled) -
When a statement runs for at least this long, it is written
to the server log, with its duration.
What To Log
log_connections (default off) - Log successful connections to the server log.
• log_disconnections (default off) - Log some information each time a session
disconnects, including the duration of the session.
• log_error_verbosity (default “default”) - Can also select “terse” or
“verbose”.
• log_duration (default off) - Log duration of each statement.
• log_line_prefix - Additional details to log with each line.
• log_statement (default none) - Legal values are none, ddl, mod (DDL and all
other data-modifying statements), or all.
• log_temp_files (default -1) - Log temporary files of this size or larger, in
kilobytes.
• log_checkpoints (default off) - Causes checkpoints and restart points to be
logged in the server log.
Background Writer Settings
bgwriter_delay (default 200 ms) - Specifies time
between activity rounds for the background writer.
• bgwriter_lru_maxpages (default 100) - Maximum
number of pages that the background writer may clean
per activity round.
• bgwriter_lru_multiplier (default 2.0) - Multiplier on
buffers scanned per round. By default, if system thinks
10 pages will be needed,
it cleans 10 * bgwriter_lru_multiplier of 2.0 = 20.
• Primary tuning technique is to lower bgwriter_delay.
• search_path - This parameter specifies the order
in which schemas are searched. The default
value for this parameter is "$user", public.
• default_tablespace - Name of the tablespace in
which to objects are created by default.
• temp_tablespaces - Tablespaces name(s) in
which temporary objects are created.
• statement_timeout - Postgres will abort any
statement that takes over the specified number
of milliseconds A value of zero (the default)
turns this off.
Autovacuum
autovacuum (default on) - Controls whether the autovacuum
launcher runs, and starts worker processes to vacuum and analyze
tables.
• log_autovacuum_min_duration (default -1) - Autovacuum tasks
running longer than this duration are logged. Can now be
specified per table.
• autovacuum_max_workers (default 3) - Maximum number of
autovacuum worker processes which may be running at one time.
• autovacuum_work_mem (default -1, to use
maintenance_work_mem) -
Maximum amount of memory used by each autovacuum worker
postmaster.pid

• PostgreSQL puts a file named postmaster.pid in the data directory to


store the process id of the PostgreSQL server process.
If PostgreSQL crashes, this file can contain an old pid that
confuses PostgreSQL. ... make sure that there are no processes named
'postgres' or 'postmaster’.

• FATAL: lock file "postmaster.pid" already exists.

postmaster.opts

• If this file exists in the data directory, pg_ctl (in restart mode) will pass
the contents of the file as options to postgres, unless overridden by the
-o option. The contents of this file are also displayed in status mode.
Vacuum:
• Tuples that are deleted or obsolete by an update are not physically removed
from their table; they remain present until a VACUUM is done.

• PostgreSQL databases require periodic maintenance known as vacuuming.

• PostgreSQL's VACUUM command has to process each table on a regular


basis for several reasons:

 To recover or reuse disk space occupied by updated or deleted rows.


 To update data statistics used by the PostgreSQL query planner.
 To update the visibility map, which speeds up index-only scans.
 To protect against loss of very old data due to transaction ID
wraparound or multixact ID wraparound.
Standard VACUUM and VACUUM FULL

VACUUM FULL: can reclaim more disk space, reclaims


storage occupied by dead tuples but runs much more
slowly.
Full vacuum applies exclusive lock on tables and it
doesn’t allow new connections on table until
completed. This method also requires extra disk space,
since it writes a new copy of the table and doesn't
release the old copy until the operation is complete.
Therefore it's necessary to do VACUUM periodically,
especially on frequently-updated tables.
• vacuum_cost_delay (integer)The length of time, in milliseconds, that the process
will sleep when the cost limit has been exceeded. The default value is zero, which
disables the cost-based vacuum delay feature. Positive values enable cost-based
vacuuming. Note that on many systems, the effective resolution of sleep delays is
10 milliseconds; setting vacuum_cost_delay to a value that is not a multiple of 10
might have the same results as setting it to the next higher multiple of 10.
• When using cost-based vacuuming, appropriate values for vacuum_cost_delay are
usually quite small, perhaps 10 or 20 milliseconds. Adjusting vacuum's resource
consumption is best done by changing the other vacuum cost parameters.
• vacuum_cost_page_hit (integer)The estimated cost for vacuuming a buffer found
in the shared buffer cache. It represents the cost to lock the buffer pool, lookup the
shared hash table and scan the content of the page. The default value is one.
• vacuum_cost_page_miss (integer)The estimated cost for vacuuming a buffer that
has to be read from disk. This represents the effort to lock the buffer pool, lookup
the shared hash table, read the desired block in from the disk and scan its content.
The default value is 10.
• vacuum_cost_page_dirty (integer)The estimated cost charged when vacuum
modifies a block that was previously clean. It represents the extra I/O required to
flush the dirty block out to disk again. The default value is 20.
• vacuum_cost_limit (integer)The accumulated cost that will cause the vacuuming
process to sleep. The default value is 200.
• Free Space Map (FSM)
To keep track of available space in the relation.
It stores all free space related information in an
alongside primary relation and that relation
starts with the file node number plus the suffix
_fsm.
The VACUUM process also updates the Free
Space Map 
• Visibility Map (VM)
To keep track of which pages contain only tuples
that are known to be visible to all active
transactions.
All tuples on the page are known to be visible to
all transactions, this means that the page does
not contain any tuples that need to be
vacuumed

You might also like