Complete PostgreSQL Documentation
Table of Contents
1. Introduction
2. Installation and Setup
3. Basic Concepts
4. Database Operations
5. Data Types
6. Table Operations
7. Data Manipulation
8. Querying Data
9. Advanced Queries
10. Indexes
11. Views
12. Stored Procedures and Functions
13. Triggers
14. Transactions
15. User Management and Security
16. Performance Optimization
17. Backup and Recovery
18. Replication
19. Monitoring and Maintenance
20. Best Practices
Introduction
PostgreSQL is a powerful, open-source object-relational database system that has earned a strong
reputation for reliability, feature robustness, and performance. It supports both SQL (relational) and
JSON (non-relational) querying and is known for its extensibility and standards compliance.
Key Features
ACID Compliance: Ensures data integrity through Atomicity, Consistency, Isolation, and Durability
Extensibility: Support for custom data types, operators, and functions
Concurrency: Multi-version concurrency control (MVCC) without read locks
Full-text Search: Built-in text search capabilities
JSON Support: Native JSON and JSONB data types
Inheritance: Table inheritance support
Custom Functions: Support for multiple programming languages
Standards Compliance: Extensive SQL standard compliance
Installation and Setup
Ubuntu/Debian Installation
bash
# Update package list
sudo apt update
# Install PostgreSQL
sudo apt install postgresql postgresql-contrib
# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql
CentOS/RHEL Installation
bash
# Install PostgreSQL
sudo yum install postgresql-server postgresql-contrib
# Initialize database
sudo postgresql-setup initdb
# Start and enable service
sudo systemctl start postgresql
sudo systemctl enable postgresql
macOS Installation (using Homebrew)
bash
# Install PostgreSQL
brew install postgresql
# Start service
brew services start postgresql
Windows Installation
Download the installer from the official PostgreSQL website and follow the installation wizard.
Initial Configuration
bash
# Switch to postgres user
sudo -u postgres psql
# Create a new user
CREATE USER myuser WITH PASSWORD 'mypassword';
# Create a database
CREATE DATABASE mydatabase OWNER myuser;
# Grant privileges
GRANT ALL PRIVILEGES ON DATABASE mydatabase TO myuser;
Basic Concepts
Database Architecture
PostgreSQL uses a client-server model consisting of:
PostgreSQL Server Process: Manages database files and handles client connections
Client Applications: Programs that connect to the server to perform database operations
Shared Memory: Used for caching and inter-process communication
Key Components
Cluster: A collection of databases managed by a single PostgreSQL server instance
Database: A named collection of SQL objects (tables, views, functions, etc.)
Schema: A namespace within a database that contains named objects
Table: A collection of related data held in a structured format
Row/Tuple: A single record in a table
Column/Attribute: A single field in a table
Database Operations
Connecting to PostgreSQL
sql
-- Command line connection
psql -h hostname -p port -U username -d database_name
-- Connection with specific options
psql "host=localhost port=5432 dbname=mydb user=myuser password=mypass"
Creating Databases
sql
-- Basic database creation
CREATE DATABASE company_db;
-- Database with specific owner and encoding
CREATE DATABASE company_db
OWNER = john_doe
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TEMPLATE = template0;
Managing Databases
sql
-- List all databases
\l
-- Connect to a database
\c database_name
-- Drop a database
DROP DATABASE company_db;
-- Rename a database
ALTER DATABASE old_name RENAME TO new_name;
-- Change database owner
ALTER DATABASE company_db OWNER TO new_owner;
Schema Operations
sql
-- Create schema
CREATE SCHEMA sales;
-- Create schema with authorization
CREATE SCHEMA hr AUTHORIZATION hr_manager;
-- Set search path
SET search_path TO sales, public;
-- Show current search path
SHOW search_path;
-- Drop schema
DROP SCHEMA sales CASCADE;
Data Types
Numeric Types
sql
-- Integer types
SMALLINT -- 2 bytes, -32768 to 32767
INTEGER -- 4 bytes, -2147483648 to 2147483647
BIGINT -- 8 bytes, large range
-- Decimal types
DECIMAL(precision, scale)
NUMERIC(precision, scale)
REAL -- 4 bytes, single precision
DOUBLE PRECISION -- 8 bytes, double precision
-- Serial types (auto-incrementing)
SMALLSERIAL -- 2 bytes
SERIAL -- 4 bytes
BIGSERIAL -- 8 bytes
Character Types
sql
-- Fixed-length character string
CHAR(n)
-- Variable-length character string
VARCHAR(n)
-- Variable-length character string (unlimited)
TEXT
Date and Time Types
sql
-- Date only
DATE
-- Time only
TIME
TIME WITH TIME ZONE
-- Date and time
TIMESTAMP
TIMESTAMP WITH TIME ZONE
-- Time interval
INTERVAL
Boolean Type
sql
-- Boolean values
BOOLEAN -- TRUE, FALSE, NULL
JSON Types
sql
-- JSON data type
JSON -- Text-based JSON
JSONB -- Binary JSON (recommended for most uses)
Array Types
sql
-- Array of integers
INTEGER[ ]
-- Multi-dimensional array
INTEGER[ ][ ]
-- Array with specific size
INTEGER[3]
Other Types
sql
-- UUID type
UUID
-- Network address types
INET -- IP address
CIDR -- Network address
MACADDR -- MAC address
-- Geometric types
POINT -- Point in 2D space
LINE -- Infinite line
CIRCLE -- Circle
POLYGON -- Polygon
Table Operations
Creating Tables
sql
-- Basic table creation
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
hire_date DATE DEFAULT CURRENT_DATE,
salary DECIMAL(10, 2) CHECK (salary > 0),
department_id INTEGER
);
-- Table with constraints
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
budget DECIMAL(12, 2) DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Add foreign key constraint
ALTER TABLE employees
ADD CONSTRAINT fk_department
FOREIGN KEY (department_id) REFERENCES departments(id);
Modifying Tables
sql
-- Add column
ALTER TABLE employees ADD COLUMN phone VARCHAR(20);
-- Drop column
ALTER TABLE employees DROP COLUMN phone;
-- Modify column
ALTER TABLE employees ALTER COLUMN salary TYPE DECIMAL(12, 2);
-- Rename column
ALTER TABLE employees RENAME COLUMN first_name TO fname;
-- Add constraint
ALTER TABLE employees ADD CONSTRAINT check_salary CHECK (salary > 0);
-- Drop constraint
ALTER TABLE employees DROP CONSTRAINT check_salary;
-- Rename table
ALTER TABLE employees RENAME TO staff;
Table Information
sql
-- Describe table structure
\d table_name
-- List all tables
\dt
-- Show table constraints
\d+ table_name
-- Get table information from system catalogs
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'employees';
Dropping Tables
sql
-- Drop table
DROP TABLE employees;
-- Drop table if exists
DROP TABLE IF EXISTS employees;
-- Drop table with cascade (removes dependent objects)
DROP TABLE employees CASCADE;
Data Manipulation
Inserting Data
sql
-- Single row insert
INSERT INTO employees (first_name, last_name, email, salary)
VALUES ('John', 'Doe', 'john.doe@company.com', 50000.00);
-- Multiple rows insert
INSERT INTO employees (first_name, last_name, email, salary)
VALUES
('Jane', 'Smith', 'jane.smith@company.com', 55000.00),
('Bob', 'Johnson', 'bob.johnson@company.com', 48000.00),
('Alice', 'Brown', 'alice.brown@company.com', 52000.00);
-- Insert with SELECT
INSERT INTO archived_employees
SELECT * FROM employees WHERE hire_date < '2020-01-01';
-- Insert and return values
INSERT INTO employees (first_name, last_name, email, salary)
VALUES ('Mike', 'Wilson', 'mike.wilson@company.com', 51000.00)
RETURNING id, first_name, last_name;
Updating Data
sql
-- Basic update
UPDATE employees
SET salary = 55000.00
WHERE id = 1;
-- Update multiple columns
UPDATE employees
SET salary = salary * 1.1,
last_modified = CURRENT_TIMESTAMP
WHERE department_id = 2;
-- Update with JOIN
UPDATE employees e
SET salary = e.salary * 1.05
FROM departments d
WHERE e.department_id = d.id
AND d.name = 'Engineering';
-- Update and return values
UPDATE employees
SET salary = 60000.00
WHERE id = 1
RETURNING id, first_name, salary;
Deleting Data
sql
-- Basic delete
DELETE FROM employees WHERE id = 1;
-- Delete with condition
DELETE FROM employees
WHERE hire_date < '2019-01-01';
-- Delete with JOIN
DELETE FROM employees e
USING departments d
WHERE e.department_id = d.id
AND d.budget < 100000;
-- Delete and return values
DELETE FROM employees
WHERE salary < 30000
RETURNING id, first_name, last_name;
UPSERT (INSERT ... ON CONFLICT)
sql
-- Insert or update if conflict
INSERT INTO employees (id, first_name, last_name, email, salary)
VALUES (1, 'John', 'Doe', 'john.doe@company.com', 50000.00)
ON CONFLICT (id)
DO UPDATE SET
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
email = EXCLUDED.email,
salary = EXCLUDED.salary;
-- Insert or do nothing on conflict
INSERT INTO employees (email, first_name, last_name)
VALUES ('existing@company.com', 'Test', 'User')
ON CONFLICT (email) DO NOTHING;
Querying Data
Basic SELECT Statements
sql
-- Select all columns
SELECT * FROM employees;
-- Select specific columns
SELECT first_name, last_name, salary FROM employees;
-- Select with alias
SELECT
first_name AS fname,
last_name AS lname,
salary * 12 AS annual_salary
FROM employees;
-- Select distinct values
SELECT DISTINCT department_id FROM employees;
Filtering Data
sql
-- WHERE clause
SELECT * FROM employees WHERE salary > 50000;
-- Multiple conditions
SELECT * FROM employees
WHERE salary > 45000 AND department_id = 2;
-- IN operator
SELECT * FROM employees
WHERE department_id IN (1, 2, 3);
-- BETWEEN operator
SELECT * FROM employees
WHERE salary BETWEEN 40000 AND 60000;
-- LIKE operator for pattern matching
SELECT * FROM employees
WHERE first_name LIKE 'J%';
-- ILIKE for case-insensitive matching
SELECT * FROM employees
WHERE email ILIKE '%gmail.com';
-- IS NULL / IS NOT NULL
SELECT * FROM employees
WHERE phone IS NOT NULL;
Sorting Data
sql
-- ORDER BY ascending
SELECT * FROM employees ORDER BY salary;
-- ORDER BY descending
SELECT * FROM employees ORDER BY salary DESC;
-- Multiple sort columns
SELECT * FROM employees
ORDER BY department_id, salary DESC;
-- Sort by expression
SELECT first_name, last_name, salary
FROM employees
ORDER BY salary * 12 DESC;
Limiting Results
sql
-- LIMIT clause
SELECT * FROM employees LIMIT 10;
-- OFFSET with LIMIT (pagination)
SELECT * FROM employees
ORDER BY id
LIMIT 10 OFFSET 20;
-- Alternative pagination syntax
SELECT * FROM employees
ORDER BY id
OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
Aggregation Functions
sql
-- COUNT
SELECT COUNT(*) FROM employees;
SELECT COUNT(DISTINCT department_id) FROM employees;
-- SUM
SELECT SUM(salary) FROM employees;
-- AVG
SELECT AVG(salary) FROM employees;
-- MIN/MAX
SELECT MIN(salary), MAX(salary) FROM employees;
-- GROUP BY
SELECT department_id, COUNT(*), AVG(salary)
FROM employees
GROUP BY department_id;
-- HAVING clause
SELECT department_id, COUNT(*) as emp_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
Advanced Queries
JOINs
sql
-- INNER JOIN
SELECT e.first_name, e.last_name, d.name as department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
-- LEFT JOIN
SELECT e.first_name, e.last_name, d.name as department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- RIGHT JOIN
SELECT e.first_name, e.last_name, d.name as department
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;
-- FULL OUTER JOIN
SELECT e.first_name, e.last_name, d.name as department
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;
-- CROSS JOIN
SELECT e.first_name, d.name
FROM employees e
CROSS JOIN departments d;
-- Self JOIN
SELECT e1.first_name as employee, e2.first_name as manager
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.id;
Subqueries
sql
-- Subquery in WHERE clause
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Subquery in SELECT clause
SELECT first_name, last_name,
(SELECT name FROM departments d WHERE d.id = e.department_id) as dept_name
FROM employees e;
-- EXISTS subquery
SELECT * FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id
);
-- NOT EXISTS subquery
SELECT * FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id
);
-- IN subquery
SELECT * FROM employees
WHERE department_id IN (
SELECT id FROM departments
WHERE budget > 500000
);
Window Functions
sql
-- ROW_NUMBER()
SELECT first_name, last_name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank
FROM employees;
-- RANK() and DENSE_RANK()
SELECT first_name, last_name, salary,
RANK() OVER (ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;
-- Partition by department
SELECT first_name, last_name, department_id, salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as dept_rank
FROM employees;
-- Running totals
SELECT first_name, last_name, salary,
SUM(salary) OVER (ORDER BY id) as running_total
FROM employees;
-- LAG and LEAD
SELECT first_name, last_name, salary,
LAG(salary) OVER (ORDER BY salary) as prev_salary,
LEAD(salary) OVER (ORDER BY salary) as next_salary
FROM employees;
Common Table Expressions (CTEs)
sql
-- Basic CTE
WITH high_earners AS (
SELECT * FROM employees WHERE salary > 60000
)
SELECT * FROM high_earners ORDER BY salary DESC;
-- Multiple CTEs
WITH
high_earners AS (
SELECT * FROM employees WHERE salary > 60000
),
dept_summary AS (
SELECT department_id, COUNT(*) as emp_count, AVG(salary) as avg_salary
FROM high_earners
GROUP BY department_id
)
SELECT * FROM dept_summary;
-- Recursive CTE (organizational hierarchy)
WITH RECURSIVE employee_hierarchy AS (
-- Base case: top-level employees
SELECT id, first_name, last_name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case
SELECT e.id, e.first_name, e.last_name, e.manager_id, eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy ORDER BY level, first_name;
CASE Statements
sql
-- Simple CASE
SELECT first_name, last_name, salary,
CASE
WHEN salary < 40000 THEN 'Low'
WHEN salary < 60000 THEN 'Medium'
ELSE 'High'
END as salary_category
FROM employees;
-- CASE in WHERE clause
SELECT * FROM employees
WHERE
CASE
WHEN department_id = 1 THEN salary > 50000
WHEN department_id = 2 THEN salary > 45000
ELSE salary > 40000
END;
Indexes
Creating Indexes
sql
-- Basic index
CREATE INDEX idx_employees_last_name ON employees(last_name);
-- Unique index
CREATE UNIQUE INDEX idx_employees_email ON employees(email);
-- Composite index
CREATE INDEX idx_employees_dept_salary ON employees(department_id, salary);
-- Partial index
CREATE INDEX idx_active_employees ON employees(last_name)
WHERE active = true;
-- Expression index
CREATE INDEX idx_employees_lower_email ON employees(LOWER(email));
-- B-tree index (default)
CREATE INDEX idx_employees_hire_date ON employees USING BTREE(hire_date);
-- Hash index
CREATE INDEX idx_employees_id_hash ON employees USING HASH(id);
-- GIN index (for arrays and full-text search)
CREATE INDEX idx_employees_skills ON employees USING GIN(skills);
-- GiST index (for geometric data)
CREATE INDEX idx_locations_point ON locations USING GIST(coordinates);
Managing Indexes
sql
-- List indexes
\di
-- Get index information
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'employees';
-- Drop index
DROP INDEX idx_employees_last_name;
-- Rebuild index
REINDEX INDEX idx_employees_last_name;
-- Rebuild all indexes on a table
REINDEX TABLE employees;
Index Types and Use Cases
B-tree: Default index type, good for equality and range queries
Hash: Good for equality comparisons only
GIN: Good for composite values (arrays, full-text search)
GiST: Good for geometric data and full-text search
SP-GiST: Space-partitioned GiST, good for non-balanced data structures
BRIN: Block Range Index, good for very large tables with natural ordering
Views
Creating Views
sql
-- Basic view
CREATE VIEW employee_summary AS
SELECT first_name, last_name, email, salary
FROM employees
WHERE active = true;
-- View with joins
CREATE VIEW employee_department AS
SELECT
e.id,
e.first_name,
e.last_name,
e.salary,
d.name as department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- View with aggregation
CREATE VIEW department_stats AS
SELECT
d.name as department_name,
COUNT(e.id) as employee_count,
AVG(e.salary) as avg_salary,
MAX(e.salary) as max_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;
Materialized Views
sql
-- Create materialized view
CREATE MATERIALIZED VIEW mv_department_stats AS
SELECT
d.name as department_name,
COUNT(e.id) as employee_count,
AVG(e.salary) as avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;
-- Refresh materialized view
REFRESH MATERIALIZED VIEW mv_department_stats;
-- Refresh concurrently (requires unique index)
CREATE UNIQUE INDEX ON mv_department_stats(department_name);
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_department_stats;
Managing Views
sql
-- List views
\dv
-- Drop view
DROP VIEW employee_summary;
-- Replace view
CREATE OR REPLACE VIEW employee_summary AS
SELECT first_name, last_name, email, salary, department_id
FROM employees
WHERE active = true;
Stored Procedures and Functions
Functions
sql
-- Simple function
CREATE OR REPLACE FUNCTION get_employee_count()
RETURNS INTEGER AS $$
BEGIN
RETURN (SELECT COUNT(*) FROM employees);
END;
$$ LANGUAGE plpgsql;
-- Function with parameters
CREATE OR REPLACE FUNCTION get_employees_by_dept(dept_id INTEGER)
RETURNS TABLE(id INTEGER, first_name VARCHAR, last_name VARCHAR) AS $$
BEGIN
RETURN QUERY
SELECT e.id, e.first_name, e.last_name
FROM employees e
WHERE e.department_id = dept_id;
END;
$$ LANGUAGE plpgsql;
-- Function with INOUT parameters
CREATE OR REPLACE FUNCTION calculate_bonus(
IN base_salary DECIMAL,
IN performance_rating INTEGER,
OUT bonus_amount DECIMAL
) AS $$
BEGIN
bonus_amount := base_salary * (performance_rating / 100.0);
END;
$$ LANGUAGE plpgsql;
-- SQL function
CREATE OR REPLACE FUNCTION get_full_name(first_name TEXT, last_name TEXT)
RETURNS TEXT AS $$
SELECT first_name || ' ' || last_name;
$$ LANGUAGE sql;
Stored Procedures
sql
-- Basic stored procedure
CREATE OR REPLACE PROCEDURE update_employee_salary(
emp_id INTEGER,
new_salary DECIMAL
) AS $$
BEGIN
UPDATE employees SET salary = new_salary WHERE id = emp_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'Employee with ID % not found', emp_id;
END IF;
COMMIT;
END;
$$ LANGUAGE plpgsql;
-- Call stored procedure
CALL update_employee_salary(1, 55000);
Exception Handling
sql
CREATE OR REPLACE FUNCTION safe_divide(a DECIMAL, b DECIMAL)
RETURNS DECIMAL AS $$
BEGIN
IF b = 0 THEN
RAISE EXCEPTION 'Division by zero is not allowed';
END IF;
RETURN a / b;
EXCEPTION
WHEN division_by_zero THEN
RAISE NOTICE 'Division by zero attempted';
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Managing Functions and Procedures
sql
-- List functions
\df
-- Drop function
DROP FUNCTION get_employee_count();
-- Drop function with specific signature
DROP FUNCTION get_employees_by_dept(INTEGER);
Triggers
Creating Triggers
sql
-- Create trigger function
CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.modified_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Create trigger
CREATE TRIGGER trigger_update_modified
BEFORE UPDATE ON employees
FOR EACH ROW
EXECUTE FUNCTION update_modified_timestamp();
-- Audit trigger function
CREATE OR REPLACE FUNCTION audit_employee_changes()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO employee_audit (employee_id, action, changed_at)
VALUES (NEW.id, 'INSERT', CURRENT_TIMESTAMP);
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO employee_audit (employee_id, action, old_values, new_values, changed_at)
VALUES (NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), CURRENT_TIMESTAMP);
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO employee_audit (employee_id, action, old_values, changed_at)
VALUES (OLD.id, 'DELETE', row_to_json(OLD), CURRENT_TIMESTAMP);
RETURN OLD;
END IF;
END;
$$ LANGUAGE plpgsql;
-- Create audit trigger
CREATE TRIGGER trigger_audit_employees
AFTER INSERT OR UPDATE OR DELETE ON employees
FOR EACH ROW
EXECUTE FUNCTION audit_employee_changes();
Trigger Types
BEFORE: Executes before the triggering event
AFTER: Executes after the triggering event
INSTEAD OF: Used with views to replace the triggering event
Managing Triggers
sql
-- List triggers
SELECT trigger_name, event_manipulation, event_object_table
FROM information_schema.triggers;
-- Disable trigger
ALTER TABLE employees DISABLE TRIGGER trigger_update_modified;
-- Enable trigger
ALTER TABLE employees ENABLE TRIGGER trigger_update_modified;
-- Drop trigger
DROP TRIGGER trigger_update_modified ON employees;
Transactions
Basic Transaction Control
sql
-- Begin transaction
BEGIN;
-- Perform operations
UPDATE employees SET salary = salary * 1.1 WHERE department_id = 1;
INSERT INTO salary_adjustments (employee_id, adjustment_date, amount)
SELECT id, CURRENT_DATE, salary * 0.1 FROM employees WHERE department_id = 1;
-- Commit transaction
COMMIT;
-- Or rollback if something goes wrong
ROLLBACK;
Savepoints
sql
BEGIN;
-- Some operations
INSERT INTO departments (name) VALUES ('New Department');
-- Create savepoint
SAVEPOINT sp1;
-- More operations
UPDATE employees SET department_id = 5 WHERE id = 1;
-- Rollback to savepoint if needed
ROLLBACK TO SAVEPOINT sp1;
-- Release savepoint
RELEASE SAVEPOINT sp1;
COMMIT;
Transaction Isolation Levels
sql
-- Set transaction isolation level
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Available isolation levels:
-- READ UNCOMMITTED
-- READ COMMITTED (default)
-- REPEATABLE READ
-- SERIALIZABLE
-- Set for current session
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Set default for all sessions
ALTER DATABASE mydb SET default_transaction_isolation = 'serializable';
Locking
sql
-- Explicit row locking
SELECT * FROM employees WHERE id = 1 FOR UPDATE;
-- Share lock
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Advisory locks
SELECT pg_advisory_lock(12345);
SELECT pg_advisory_unlock(12345);
User Management and Security
Creating Users and Roles
sql
-- Create user
CREATE USER john_doe WITH PASSWORD 'secure_password';
-- Create role
CREATE ROLE developer;
-- Create user with specific attributes
CREATE USER jane_smith WITH
PASSWORD 'password123'
CREATEDB
VALID UNTIL '2025-12-31';
-- Grant role to user
GRANT developer TO john_doe;
Managing Privileges
sql
-- Grant table privileges
GRANT SELECT, INSERT, UPDATE ON employees TO john_doe;
GRANT ALL PRIVILEGES ON employees TO jane_smith;
-- Grant schema privileges
GRANT USAGE ON SCHEMA sales TO developer;
GRANT CREATE ON SCHEMA sales TO developer;
-- Grant database privileges
GRANT CONNECT ON DATABASE company_db TO john_doe;
-- Grant privileges on all tables in schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO developer;
-- Grant privileges on future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO developer;
-- Revoke privileges
REVOKE INSERT ON employees FROM john_doe;
Row Level Security (RLS)
sql
-- Enable RLS on table
ALTER TABLE employees ENABLE ROW LEVEL SECURITY;
-- Create policy
CREATE POLICY employee_policy ON employees
FOR ALL TO employee_role
USING (user_id = current_user_id());
-- Policy for specific operations
CREATE POLICY manager_update_policy ON employees
FOR UPDATE TO manager_role
USING (department_id IN (
SELECT department_id FROM managers
WHERE user_name = current_user
));
Authentication and Connection Security
sql
-- View current user
SELECT current_user;
-- View session information
SELECT
usename,
application_name,
client_addr,
backend_start
FROM pg_stat_activity;
-- Password policies (in postgresql.conf)
-- password_encryption = 'scram-sha-256'
-- ssl = on
Performance Optimization
Query Performance Analysis
sql
-- EXPLAIN query execution plan
EXPLAIN SELECT * FROM employees WHERE salary > 50000;
-- EXPLAIN ANALYZE for actual execution statistics
EXPLAIN ANALYZE SELECT * FROM employees
JOIN departments ON employees.department_id = departments.id;
-- EXPLAIN with all options
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT * FROM employees WHERE salary > 50000;
Configuration Tuning
Key parameters in postgresql.conf:
# Memory settings
shared_buffers = 256MB # 25% of RAM
effective_cache_size = 1GB # 50-75% of RAM
work_mem = 4MB # Per sort/hash operation
maintenance_work_mem = 64MB # For maintenance operations
# Checkpoint settings
checkpoint_completion_target = 0.9
wal_buffers = 16MB
# Connection settings
max_connections = 100
# Query planner settings
random_page_cost = 1.1 # For SSDs
effective_io_concurrency = 200 # For SSDs
Index Optimization
sql
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_tup_read = 0;
-- Find missing indexes
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
seq_tup_read / seq_scan as avg_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;
Query Optimization Techniques
sql
-- Use LIMIT with ORDER BY for large result sets
SELECT * FROM employees ORDER BY salary DESC LIMIT 10;
-- Use EXISTS instead of IN for subqueries
SELECT * FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.id);
-- Use appropriate JOIN types
-- Use indexes for WHERE, ORDER BY, and JOIN conditions
-- Avoid SELECT * in production queries
-- Use prepared statements for repeated queries
Statistics and Maintenance
sql
-- Update table statistics
ANALYZE employees;
-- Update statistics for entire database
ANALYZE;
-- Vacuum to reclaim space
VACUUM employees;
-- Full vacuum (locks table)
VACUUM FULL employees;
-- Auto-vacuum settings
ALTER TABLE employees SET (
autovacuum_vacuum_threshold = 100,
autovacuum_analyze_threshold = 50
);
Backup and Recovery
Logical Backups with pg_dump
bash
# Backup single database
pg_dump -U username -h hostname database_name > backup.sql
# Backup with compression
pg_dump -U username -h hostname -Fc database_name > backup.dump
# Backup specific tables
pg_dump -U username -h hostname -t employees -t departments database_name > tables_backup.sql
# Backup schema only
pg_dump -U username -h hostname --schema-only database_name > schema.sql
# Backup data only
pg_dump -U username -h hostname --data-only database_name > data.sql
# Backup all databases
pg_dumpall -U username -h hostname > all_databases.sql
# Backup with custom format (recommended)
pg_dump -U username -h hostname -Fc -f backup.dump database_name
Restoring from Logical Backups
bash
# Restore from SQL file
psql -U username -h hostname -d database_name < backup.sql
# Restore from custom format
pg_restore -U username -h hostname -d database_name backup.dump
# Restore specific tables
pg_restore -U username -h hostname -d database_name -t employees backup.dump
# Restore with clean (drop existing objects)
pg_restore -U username -h hostname -d database_name --clean backup.dump
# Restore all databases
psql -U username -h hostname < all_databases.sql
Physical Backups (Point-in-Time Recovery)
bash
# Configure continuous archiving in postgresql.conf
# wal_level = replica
# archive_mode = on
# archive_command = 'cp %p /path/to/archive/%f'
# Take base backup
pg_basebackup -U username -h hostname -D /backup/base -Ft -z -P
# Recovery configuration (recovery.conf or postgresql.conf in v12+)
# restore_command = 'cp /path/to/archive/%f %p'
# recovery_target_time = '2025-01-15 14:30:00'
Backup Strategies
sql
-- Create backup schedule using cron
-- Daily full backup
0 2 * * * pg_dump -U postgres mydatabase > /backups/daily_$(date +\%Y\%m\%d).sql
-- Weekly archive cleanup
0 3 * * 0 find /backups -name "daily_*.sql" -mtime +7 -delete
-- Monitor backup sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Recovery Testing
sql
-- Test restore process regularly
-- Create test database
CREATE DATABASE test_restore;
-- Restore backup to test database
-- pg_restore -d test_restore backup.dump
-- Validate data integrity
SELECT COUNT(*) FROM employees;
SELECT MAX(created_at) FROM orders;
-- Drop test database
DROP DATABASE test_restore;
Replication
Streaming Replication Setup
bash
# Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
synchronous_commit = on
# Primary server authentication (pg_hba.conf)
host replication replicator 192.168.1.0/24 md5
# Create replication user on primary
CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'password';
bash
# Standby server setup
# Take base backup from primary
pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -W
# Standby server configuration (postgresql.conf)
hot_standby = on
primary_conninfo = 'host=primary_host port=5432 user=replicator password=password'
primary_slot_name = 'standby_slot'
# Create replication slot on primary (optional but recommended)
SELECT pg_create_physical_replication_slot('standby_slot');
Logical Replication
sql
-- Publisher setup (source database)
-- Enable logical replication in postgresql.conf
-- wal_level = logical
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE employees, departments;
-- Or publish all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;
-- Subscriber setup (target database)
-- Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_host dbname=source_db user=replicator password=password'
PUBLICATION my_publication;
-- Monitor replication
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_stat_subscription;
Monitoring Replication
sql
-- Check replication status on primary
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_state
FROM pg_stat_replication;
-- Check replication lag
SELECT
client_addr,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as lag
FROM pg_stat_replication;
-- Check standby status
SELECT pg_is_in_recovery();
SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();
Failover and Switchover
sql
-- Promote standby to primary
SELECT pg_promote();
-- Or using pg_ctl
-- pg_ctl promote -D /var/lib/postgresql/data
Monitoring and Maintenance
System Monitoring Views
sql
-- Active connections
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
query
FROM pg_stat_activity
WHERE state != 'idle';
-- Database statistics
SELECT
datname,
numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted
FROM pg_stat_database;
-- Table statistics
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins,
n_tup_upd,
n_tup_del
FROM pg_stat_user_tables;
-- Index usage statistics
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes;
Performance Monitoring
sql
-- Slow queries
SELECT
query,
calls,
total_time,
mean_time,
rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
-- Lock monitoring
SELECT
l.pid,
l.mode,
l.locktype,
l.relation::regclass,
l.granted,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;
-- Blocking queries
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;
Maintenance Tasks
sql
-- Vacuum and analyze schedule
-- Daily vacuum analyze for active tables
VACUUM ANALYZE employees;
-- Weekly full vacuum for heavily updated tables
VACUUM FULL sales_transactions;
-- Reindex to rebuild fragmented indexes
REINDEX INDEX idx_employees_email;
REINDEX TABLE employees;
-- Update statistics
ANALYZE employees;
-- Check for bloat
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(tablename::regclass)) as size,
pg_size_pretty(pg_relation_size(tablename::regclass)) as table_size,
pg_size_pretty(pg_total_relation_size(tablename::regclass) - pg_relation_size(tablename::regclass)) as index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC;
Log Analysis
sql
-- Configure logging in postgresql.conf
-- log_destination = 'stderr'
-- logging_collector = on
-- log_directory = 'pg_log'
-- log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
-- log_statement = 'all' -- or 'ddl', 'mod', 'none'
-- log_min_duration_statement = 1000 -- Log queries taking > 1 second
-- Log analysis queries
-- Parse logs using external tools like pgBadger, or custom scripts
Health Checks
sql
-- Database size monitoring
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) as size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- Connection limits
SELECT
setting::int as max_connections,
(SELECT count(*) FROM pg_stat_activity) as current_connections,
setting::int - (SELECT count(*) FROM pg_stat_activity) as available_connections
FROM pg_settings
WHERE name = 'max_connections';
-- Disk space monitoring
SELECT
name,
setting,
unit
FROM pg_settings
WHERE name IN ('data_directory', 'log_directory');
-- Check for long-running transactions
SELECT
pid,
now() - xact_start as duration,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND xact_start IS NOT NULL
ORDER BY duration DESC;
Best Practices
Database Design
1. Normalize appropriately: Use proper normalization but avoid over-normalization
2. Choose appropriate data types: Use the most specific data type possible
3. Use constraints: Implement PRIMARY KEY, FOREIGN KEY, CHECK, and UNIQUE constraints
4. Index strategically: Create indexes on frequently queried columns
5. Use meaningful names: Choose clear, consistent naming conventions
Query Best Practices
sql
-- Use specific columns instead of SELECT *
SELECT first_name, last_name, email FROM employees;
-- Use LIMIT for large result sets
SELECT * FROM employees ORDER BY id LIMIT 100;
-- Use EXISTS instead of IN for subqueries when appropriate
SELECT * FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.id);
-- Use prepared statements for repeated queries
PREPARE employee_by_dept (int) AS
SELECT * FROM employees WHERE department_id = $1;
EXECUTE employee_by_dept(1);
-- Use transactions for multiple related operations
BEGIN;
INSERT INTO orders (customer_id, total) VALUES (1, 100.00);
INSERT INTO order_items (order_id, product_id, quantity) VALUES (currval('orders_id_seq'), 1, 2);
COMMIT;
Security Best Practices
1. Use least privilege principle: Grant minimal necessary permissions
2. Regular security updates: Keep PostgreSQL updated
3. Secure connections: Use SSL/TLS for connections
4. Strong passwords: Enforce strong password policies
5. Regular audits: Monitor and audit database access
6. Backup encryption: Encrypt backup files
7. Network security: Restrict network access using pg_hba.conf
Performance Best Practices
1. Regular maintenance: Schedule VACUUM and ANALYZE operations
2. Monitor query performance: Use EXPLAIN ANALYZE regularly
3. Optimize queries: Avoid unnecessary JOINs and subqueries
4. Use connection pooling: Implement connection pooling for applications
5. Configure appropriately: Tune PostgreSQL parameters for your workload
6. Monitor resources: Track CPU, memory, and I/O usage
Backup and Recovery Best Practices
bash
# Automated backup script example
#!/bin/bash
BACKUP_DIR="/backups"
DB_NAME="mydatabase"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup
pg_dump -U postgres -Fc $DB_NAME > $BACKUP_DIR/backup_${DB_NAME}_${DATE}.dump
# Verify backup
if [ $? -eq 0 ]; then
echo "Backup successful: backup_${DB_NAME}_${DATE}.dump"
# Remove backups older than 7 days
find $BACKUP_DIR -name "backup_${DB_NAME}_*.dump" -mtime +7 -delete
else
echo "Backup failed!" >&2
exit 1
fi
# Test restore (optional)
# createdb test_restore_db
# pg_restore -d test_restore_db $BACKUP_DIR/backup_${DB_NAME}_${DATE}.dump
# dropdb test_restore_db
Development Best Practices
1. Use version control: Track schema changes with migration scripts
2. Test thoroughly: Test all database changes in development environment
3. Document changes: Maintain documentation for schema and procedures
4. Use staging environment: Test changes in production-like environment
5. Plan migrations: Carefully plan and test schema migrations
Operational Best Practices
sql
-- Regular maintenance checklist
-- 1. Monitor disk space
SELECT pg_size_pretty(pg_database_size(current_database()));
-- 2. Check for long-running queries
SELECT pid, now() - query_start as duration, query
FROM pg_stat_activity
WHERE state != 'idle' AND query_start < now() - interval '5 minutes';
-- 3. Monitor replication lag (if applicable)
SELECT client_addr, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as lag
FROM pg_stat_replication;
-- 4. Check for unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
-- 5. Monitor connection usage
SELECT count(*) as connections, max_conn, max_conn - count(*) as available
FROM pg_stat_activity, (SELECT setting::int as max_conn FROM pg_settings WHERE name = 'max_connections') mc;
Conclusion
PostgreSQL is a feature-rich, enterprise-grade database system that offers excellent performance,
reliability, and extensibility. This documentation covers the fundamental concepts and advanced
features needed to effectively work with PostgreSQL.
Key takeaways:
Always prioritize data integrity through proper constraints and transactions
Design your schema thoughtfully with appropriate normalization and indexing
Monitor performance regularly and optimize queries as needed
Implement robust backup and recovery procedures
Follow security best practices to protect your data
Keep your PostgreSQL installation updated and properly maintained
For the most current information and advanced topics, refer to the official PostgreSQL documentation
at https://www.postgresql.org/docs/
Regular practice and hands-on experience with these concepts will help you become proficient in
PostgreSQL administration and development.