KEMBAR78
PostgreSQL talk, Database 2011 conference | KEY
PostgreSQL: So
 powerful, you won’t
believe it’s open source
   Reuven M. Lerner • reuven@lerner.co.il
               Database 2011
             January 13th, 2011
Who am I?
• Web developer, software architect,
  consultant
• Linux Journal columnist since 1996
• Mostly Ruby on Rails + PostgreSQL, but
  also Python, PHP, jQuery, and lots more...
• PostgreSQL user since (at least) 1997
Lots of options!

• Oracle
• Microsoft SQL Server
• IBM DB2
• MySQL
• PostgreSQL
How do you choose?
•   Integrity (ACID compliance)

•   Data types

•   Functionality

•   Tools

•   Extensibility

•   Documentation

•   Community
PostgreSQL
• Very fast, very scalable. (Just ask Skype.)
• Amazingly flexible, easily extensible.
• Rock-solid — no crashes, corruption,
  security issues for years
• Ridiculously easy administration
• It also happens to be free (MIT/BSD)
Brief history
• Ingres (Stonebreaker, Berkeley)
• Postgres (Stonebreaker, Berkeley)
• PostgreSQL project = Postgres + SQL
• About one major release per year
• Version 8.x — Windows port, recovery
• Version 9.0 — hot replication, upgrades
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
Data types
• Boolean
• Numeric (integer, float, decimal)
• (var)char, text (infinitely large), binary
• sequences (guaranteed unique)
• Date/time and time intervals
• IP addresses, XML, enums, arrays
Or create your own!
Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);
Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);
Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);


CREATE TABLE Members (group_id
INTEGER, member Person);
PostGIS
• Some people took this all the way
• Want to include geographical information?
• No problem — we’ve got PostGIS!
• Complete GIS solution, with data types and
  functions
• Keeps pace with main PostgreSQL revisions
Flexible internal types

• For example, text/binary storage
   • Inline or in a separate table (TOAST)
   • Compressed or not
  ALTER TABLE Foo ALTER COLUMN
  mytext SET STORAGE EXTERNAL;
Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"

DETAIL: Key (store_id)=(500) is not present in table
"stores".
Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"

DETAIL: Key (store_id)=(500) is not present in table
"stores".

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"

DETAIL: Key (store_id)=(500) is not present in table
"stores".

ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"

DETAIL: Key (store_id)=(500) is not present in table
"stores".
Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
ERROR: new row for relation "dvds"
violates check constraint
"dvds_title_check"
Timestamp vs. Interval
testdb=# select now();
              now
-------------------------------
 2010-10-31 08:58:23.365792+02
(1 row)
                                  Point in time

testdb=# select now() - interval '3 days';
           ?column?
-------------------------------
 2010-10-28 08:58:28.870011+02
                                  Difference between
(1 row)                              points in time
Built-in functions
• Math
• Text processing (including regexps)
• Date/time calculations
• Conditionals (CASE, COALESCE, NULLIF)
  for use in queries
• Extensive library of geometrical functions
Or write your own!
• PL/pgSQL
• PL/Perl
• PL/Python
• PL/Ruby
• PL/R
• PL/Tcl
Triggers
• Write a function, and use it as a trigger
• The function can run before or after
  inserts, deletes, or updates
• What do they do?
 • Stop certain actions from happening
 • Change the data before it’s really used
Windowing functions

• Calculate aggregate amounts on each row,
  taking other rows into consideration
• For example: Rank each person’s salary
  within their department
Smart indexing
• Functional indexes
  CREATE UNIQUE INDEX on
  People(lower(email_address));
• Conditional indexes
  CREATE INDEX on People WHERE
  is_admin = true;
• Background (concurrent) indexing
Transactions
• In PostgreSQL from the beginning
• Use transactions for just about anything:
  BEGIN
  DROP TABLE DVDs;
  ROLLBACK;
  SELECT * FROM DVDs; -- Works!
Savepoints
(or, sub-transactions)
BEGIN;
INSERT INTO table1 VALUES (1);
SAVEPOINT my_savepoint;
INSERT INTO table1 VALUES (2);
ROLLBACK TO SAVEPOINT my_savepoint;
INSERT INTO table1 VALUES (3);
COMMIT;
MVCC
• Readers and writers don’t block each other
• “Multi-version concurrency control”
• xmin, xmax on each tuple; rows are those
  tuples with txid_current between them
• Old versions stick around until vacuumed
 • Autovacuum removes even this issue
Full-text indexing
• Built-in, Unicode-aware full-text indexing
  has been in PostgreSQL for a few years
• It’s simply a different type of index, as well
  as a different search operator (@@)
• Custom dictionaries for different
  languages, indicating stop words and
  stemming rules
CTEs
• Adds a “WITH” statement, which defines a
  sorta-kinda temp table
• You can then query that same temp table
• Makes many queries easier to read, write,
  without a real temp table
• Better yet: CTEs can be recursive, for
  everything from Fibonacci to org charts
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH regional_sales AS (
     SELECT region, SUM(amount) AS total_sales
     FROM orders GROUP BY region
     ),
     top_regions AS (
          SELECT region FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10
FROM regional_sales)
     )
SELECT region,
         product,
         SUM(quantity) AS product_units,
         SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
PITR

• Store WALs (write-ahead logs) to disk
• Normally, the server will apply all WALs
  when it starts up
• By choosing which WALs to apply, you can
  bring the database up to date until a certain
  point in time
Streaming replication
• In 8.4, you could use “log shipping” to send
  the WALs from one server to another
• When the primary server went down, you
  could bring up this secondary
• In 9.0, this can happen over the network
 • Faster, more reliable, and easier to
    configure than previously
Hot standby

• In 8.4, the secondary server could not
  answer queries when it was receiving WALs
  from the primary
• 9.0 introduced “hot standby”: Secondary
  server can answer read-only queries
• Can help with balancing certain queries
It’s all in the tables
• Databases are good at working with tables
• So PostgreSQL stores info in tables
 • Configuration parameters
 • Objects (tables, indexes, functions, etc.)
 • Statistics for the optimizer
 • Look through pg_catalog ... and learn!
Client libraries
• libpq (in C)    • Java (JDBC)
• Others by 3rd   • .NET (npgsql)
  parties:        • ODBC
• Python          • JavaScript (!)
• Ruby            • Just about any
                    language you can
• Perl              imagine
Want to learn more?
• Mailing lists, wikis, and blogs
 • All at http://postgresql.org/
 • http://planetpostgresql.org
• PostgreSQL training, consulting,
  development, hand-holding, and general
  encouragement
Thanks!
(Any questions?)



     reuven@lerner.co.il
   http://www.lerner.co.il/
        054-496-8405
“reuvenlerner” on Skype/AIM

PostgreSQL talk, Database 2011 conference

  • 1.
    PostgreSQL: So powerful,you won’t believe it’s open source Reuven M. Lerner • reuven@lerner.co.il Database 2011 January 13th, 2011
  • 2.
    Who am I? •Web developer, software architect, consultant • Linux Journal columnist since 1996 • Mostly Ruby on Rails + PostgreSQL, but also Python, PHP, jQuery, and lots more... • PostgreSQL user since (at least) 1997
  • 3.
    Lots of options! •Oracle • Microsoft SQL Server • IBM DB2 • MySQL • PostgreSQL
  • 4.
    How do youchoose? • Integrity (ACID compliance) • Data types • Functionality • Tools • Extensibility • Documentation • Community
  • 5.
    PostgreSQL • Very fast,very scalable. (Just ask Skype.) • Amazingly flexible, easily extensible. • Rock-solid — no crashes, corruption, security issues for years • Ridiculously easy administration • It also happens to be free (MIT/BSD)
  • 6.
    Brief history • Ingres(Stonebreaker, Berkeley) • Postgres (Stonebreaker, Berkeley) • PostgreSQL project = Postgres + SQL • About one major release per year • Version 8.x — Windows port, recovery • Version 9.0 — hot replication, upgrades
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Data types • Boolean •Numeric (integer, float, decimal) • (var)char, text (infinitely large), binary • sequences (guaranteed unique) • Date/time and time intervals • IP addresses, XML, enums, arrays
  • 14.
  • 15.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
  • 16.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
  • 17.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT); CREATE TABLE Members (group_id INTEGER, member Person);
  • 18.
    PostGIS • Some peopletook this all the way • Want to include geographical information? • No problem — we’ve got PostGIS! • Complete GIS solution, with data types and functions • Keeps pace with main PostgreSQL revisions
  • 19.
    Flexible internal types •For example, text/binary storage • Inline or in a separate table (TOAST) • Compressed or not ALTER TABLE Foo ALTER COLUMN mytext SET STORAGE EXTERNAL;
  • 20.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500);
  • 21.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
  • 22.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
  • 23.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
  • 24.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
  • 25.
    Custom validity checks CREATETABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500);
  • 26.
    Custom validity checks CREATETABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500); ERROR: new row for relation "dvds" violates check constraint "dvds_title_check"
  • 27.
    Timestamp vs. Interval testdb=#select now(); now ------------------------------- 2010-10-31 08:58:23.365792+02 (1 row) Point in time testdb=# select now() - interval '3 days'; ?column? ------------------------------- 2010-10-28 08:58:28.870011+02 Difference between (1 row) points in time
  • 28.
    Built-in functions • Math •Text processing (including regexps) • Date/time calculations • Conditionals (CASE, COALESCE, NULLIF) for use in queries • Extensive library of geometrical functions
  • 29.
    Or write yourown! • PL/pgSQL • PL/Perl • PL/Python • PL/Ruby • PL/R • PL/Tcl
  • 30.
    Triggers • Write afunction, and use it as a trigger • The function can run before or after inserts, deletes, or updates • What do they do? • Stop certain actions from happening • Change the data before it’s really used
  • 31.
    Windowing functions • Calculateaggregate amounts on each row, taking other rows into consideration • For example: Rank each person’s salary within their department
  • 32.
    Smart indexing • Functionalindexes CREATE UNIQUE INDEX on People(lower(email_address)); • Conditional indexes CREATE INDEX on People WHERE is_admin = true; • Background (concurrent) indexing
  • 33.
    Transactions • In PostgreSQLfrom the beginning • Use transactions for just about anything: BEGIN DROP TABLE DVDs; ROLLBACK; SELECT * FROM DVDs; -- Works!
  • 34.
    Savepoints (or, sub-transactions) BEGIN; INSERT INTOtable1 VALUES (1); SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (2); ROLLBACK TO SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (3); COMMIT;
  • 35.
    MVCC • Readers andwriters don’t block each other • “Multi-version concurrency control” • xmin, xmax on each tuple; rows are those tuples with txid_current between them • Old versions stick around until vacuumed • Autovacuum removes even this issue
  • 36.
    Full-text indexing • Built-in,Unicode-aware full-text indexing has been in PostgreSQL for a few years • It’s simply a different type of index, as well as a different search operator (@@) • Custom dictionaries for different languages, indicating stop words and stemming rules
  • 37.
    CTEs • Adds a“WITH” statement, which defines a sorta-kinda temp table • You can then query that same temp table • Makes many queries easier to read, write, without a real temp table • Better yet: CTEs can be recursive, for everything from Fibonacci to org charts
  • 38.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 39.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 40.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 41.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 42.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 43.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 44.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 45.
    PITR • Store WALs(write-ahead logs) to disk • Normally, the server will apply all WALs when it starts up • By choosing which WALs to apply, you can bring the database up to date until a certain point in time
  • 46.
    Streaming replication • In8.4, you could use “log shipping” to send the WALs from one server to another • When the primary server went down, you could bring up this secondary • In 9.0, this can happen over the network • Faster, more reliable, and easier to configure than previously
  • 47.
    Hot standby • In8.4, the secondary server could not answer queries when it was receiving WALs from the primary • 9.0 introduced “hot standby”: Secondary server can answer read-only queries • Can help with balancing certain queries
  • 48.
    It’s all inthe tables • Databases are good at working with tables • So PostgreSQL stores info in tables • Configuration parameters • Objects (tables, indexes, functions, etc.) • Statistics for the optimizer • Look through pg_catalog ... and learn!
  • 49.
    Client libraries • libpq(in C) • Java (JDBC) • Others by 3rd • .NET (npgsql) parties: • ODBC • Python • JavaScript (!) • Ruby • Just about any language you can • Perl imagine
  • 50.
    Want to learnmore? • Mailing lists, wikis, and blogs • All at http://postgresql.org/ • http://planetpostgresql.org • PostgreSQL training, consulting, development, hand-holding, and general encouragement
  • 51.
    Thanks! (Any questions?) reuven@lerner.co.il http://www.lerner.co.il/ 054-496-8405 “reuvenlerner” on Skype/AIM