KEMBAR78
Going beyond Django ORM limitations with Postgres | PDF
G!"#$ b%&!#' ()%
      D*+#$! ORM
,"-"(+("!#. w"() P/($r%.
 @0r+"$1%r.("%#.
PSA: Macs

  Postgres.app
Why Postgres
 “Its the emacs of databases”
 http://www.craigkerstiens.com/2012/04/30/why-postgres/
https://speakerdeck.com/craigkerstiens/postgres-demystified
TLDR
         Datatypes                Fast Column Addition
    Conditional Indexes               Listen/Notify
     Transactional DDL              Table Inheritance
  Foreign Data Wrappers     Per Transaction sync replication
Concurrent Index Creation          Window functions
        Extensions                  NoSQL inside SQL
Common Table Expressions               Momentum
Limitations?
Django attempts to support as many features
as possible on all database backends. However,
not all database backends are alike, and we’ve
had to make design decisions on which
features to support and which assumptions
we can make safely.
D+(+(2%.
__________
< Postgres >
 ----------
         ^__^
         (oo)_______
          (__)        )/
               ||----w |
               ||     ||
DatatypesUUID
                      date                   boolean
timestamptzarray             intervalinteger
                    bigint
   smallint line            XML enum
                                           char
                         money
         serial bytea                  point     float
 inet               polygon numeric            circle
        cidr  varchar      tsquery timetz
path           time     text
       macaddr                timestamp box
                   tsvector
DatatypesUUID
                      date                   boolean
timestamptzarray             intervalinteger
                    bigint
   smallint line            XML enum
                                           char
                         money
         serial bytea                  point     float
 inet               polygon numeric            circle
        cidr  varchar      tsquery timetz
path           time     text
       macaddr                timestamp box
                   tsvector
DatatypesUUID
                      date                   boolean
timestamptzarray             intervalinteger
                    bigint
   smallint line            XML enum
                                           char
                         money
         serial bytea                  point     float
 inet               polygon numeric            circle
        cidr  varchar      tsquery timetz
path           time     text
       macaddr                timestamp box
                   tsvector
DatatypesUUID
                      date                   boolean
timestamptzarray             intervalinteger
                    bigint
   smallint line            XML enum
                                           char
                         money
         serial bytea                  point     float
 inet                polygon numeric           circle
        cidr  varchar      tsquery timetz
path           time     text
       macaddr                timestamp box
                   tsvector
_______
< SQLite >
 ---------
          ^__^
         (oo)_______
           (__)        )/
                ||----w |
                ||     ||
Datatypes
              null
                     integer
text
                     real

       blob
DatatypesUUID
                      date                   boolean
timestamptzarray             intervalinteger
                    bigint
   smallint line            XML enum
                                           char
                         money
         serial bytea                  point     float
 inet               polygon numeric            circle
        cidr  varchar      tsquery timetz
path           time     text
       macaddr                timestamp box
                   tsvector
I#'3%.
________
< Postgres >
 ----------
          ^__^
         (oo)_______
           (__)        )/
                ||----w |
                ||     ||
Indexes
Multiple Types
   B-Tree, Gin, Gist, KNN, SP-Gist



CREATE INDEX CONCURRENTLY
_____
< MySQL >
 -------
         ^__^
         (oo)_______
          (__)        )/
               ||----w |
               ||     ||
Indexes
They exist
Digging In
Arrays
CREATE TABLE item (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
    created_at timestamp
);
Arrays
CREATE TABLE item (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
    created_at timestamp
);
Arrays
INSERT INTO item
VALUES (1, 'Django Pony',
'{“Programming”,”Animal”}', now());

INSERT INTO item
VALUES (2, 'Ruby Gem',
'{“Programming”,”Jewelry”}', now());
Arrays
INSERT INTO item
VALUES (1, 'Django Pony',
'{“Programming”,”Animal”}', now());

INSERT INTO item
VALUES (2, 'Ruby Gem',
'{“Programming”,”Jewelry”}', now());
Django
pip install djorm-ext-pgarray
pip install djorm-ext-expressions

models.py:


from djorm_pgarray.fields import ArrayField
from djorm_expressions.models import ExpressionManager

class Item(models.Model):
    name = models.CharField(max_length=255)
    tags = ArrayField(dbtype="varchar(255)")
    created_at = models.DateTimeField(auto_now=True)
Django
pip install djorm-ext-pgarray
pip install djorm-ext-expressions

models.py:


from djorm_pgarray.fields import ArrayField
from djorm_expressions.models import ExpressionManager

class Item(models.Model):
    name = models.CharField(max_length=255)
    tags = ArrayField(dbtype="varchar(255)")
    created_at = models.DateTimeField(auto_now=True)
Django
i = Item(
    name='Django Pony',
    tags=['Programming','Animal'])
i.save()

qs = Item.objects.where(
    SqlExpression("tags", "@>", ['programming'])
)
Django
i = Item(
    name='Django Pony',
    tags=['Programming','Animal'])
i.save()

qs = Item.objects.where(
    SqlExpression("tags", "@>", ['programming'])
)
Extensions
 dblink          hstore                  trigram
                           uuid-ossp               pgstattuple
        citext
                          pgcrypto
  isn             ltree           fuzzystrmatch
          cube                            dict_int
                    earthdistance
unaccent                                             dict_xsyn
                            btree_gist
            tablefunc                      pgrowlocks
NoSQL in your SQL
NoSQL in your SQL
CREATE EXTENSION hstore;
CREATE TABLE item (
    id integer NOT NULL,
    name varchar(255),
    data hstore,
);
NoSQL in your SQL
CREATE EXTENSION hstore;
CREATE TABLE item (
    id integer NOT NULL,
    name varchar(255),
    data hstore,
);
NoSQL in your SQL
INSERT INTO items
VALUES (
 1,
 'Pony',
 'rating => "4.0", color => “Pink”',
);
NoSQL in your SQL
INSERT INTO items
VALUES (
 1,
 'Pony',
 'rating => "4.0", color => “Pink”',
);
Django
pip install django-hstore

from django.db import models
from django_hstore import hstore

class Item(models.Model):
    name = models.CharField(max_length=250)
    data = hstore.DictionaryField(db_index=True)
    objects = hstore.Manager()

    def __unicode__(self):
        return self.name
Django
pip install django-hstore

from django.db import models
from django_hstore import hstore

class Item(models.Model):
    name = models.CharField(max_length=250)
    data = hstore.DictionaryField(db_index=True)
    objects = hstore.Manager()

    def __unicode__(self):
        return self.name
Django
Item.objects.create(
    name='Django Pony',
    data={'rating': '5'})

Item.objects.create(
    name='Pony',
    data={'color': 'pink', 'rating': '4'})
Django
Item.objects.create(
    name='Django Pony',
    data={'rating': '5'})

Item.objects.create(
    name='Pony',
    data={'color': 'pink', 'rating': '4'})
Django
colored_ponies =
  Product.objects.filter(data__contains='color')
print colored_ponies[0].data['color']

favorite_ponies =
  Product.objects.filter(data__contains={'rating': '5'})
print colored_ponies[0]
Django
colored_ponies =
  Product.objects.filter(data__contains='color')
print colored_ponies[0].data['color']

favorite_ponies =
  Product.objects.filter(data__contains={'rating': '5'})
print colored_ponies[0]
JSON
JSON

w/ PLV8
Q4%4%"#$
Pub/Sub?
Postgres a
great Queue
Postgres a
  great Queue

Not With Polling
Trunk
pip install celery trunk

psql < sql/*.sql
celery worker -A tasks --loglevel=info

ipython -i tasks.py
>>> add.delay(2, 2)
Trunk
pip install celery trunk

psql < sql/*.sql
celery worker -A tasks --loglevel=info

ipython -i tasks.py
>>> add.delay(2, 2)
Trunk
from celery import Celery

celery = Celery('tasks')
celery.config_from_object({
    'BROKER_URL': 'trunk.transport.Transport://
localhost/trunk',
})

@celery.task
def add(x, y):
    return x + y
Trunk
from celery import Celery

celery = Celery('tasks')
celery.config_from_object({
    'BROKER_URL': 'trunk.transport.Transport://
localhost/trunk',
})

@celery.task
def add(x, y):
    return x + y
T3( S%+r0)
Searching Text
Searching Text
 Lucene
          Elastic Search

Sphinx
                    Solr
Searching Text
 Lucene
             Elastic Search

Sphinx
          Postgres
                       Solr
Full Text Search
CREATE TABLE posts (
   id serial,
   title varchar(255),
   content text,
   tags varchar(255)[],
   post_text tsvector
);

CREATE INDEX posttext_gin ON
posts USING GIN(post_text);

CREATE TRIGGER update_posttext
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
  ‘PostText’,‘english’,title, content, tags);
Full Text Search
CREATE TABLE posts (
   id serial,
   title varchar(255),
   content text,
   tags varchar(255)[],
   post_text tsvector
);

CREATE INDEX posttext_gin ON
posts USING GIN(post_text);

CREATE TRIGGER update_posttext
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
  ‘PostText’,‘english’,title, content, tags);
Django
from djorm_pgfulltext.models import SearchManager
from djorm_pgfulltext.fields import VectorField
from django.db import models

class Posts(models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()

    search_index = VectorField()

    objects = SearchManager(
        fields = ('title', 'content'),
        config = 'pg_catalog.english',
        search_field = 'search_index',
        auto_update_search_field = True
    )
Django
from djorm_pgfulltext.models import SearchManager
from djorm_pgfulltext.fields import VectorField
from django.db import models

class Posts(models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()

    search_index = VectorField()

    objects = SearchManager(
        fields = ('title', 'content'),
        config = 'pg_catalog.english',
        search_field = 'search_index',
        auto_update_search_field = True
    )
Django
Post.objects.search("documentation & about")


Post.objects.search("about | documentation")
Django
Post.objects.search("documentation & about")


Post.objects.search("about | documentation")
I#'3%.
Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)
Indexes
Which do I use?
Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)
Generalized Inverted Index (GIN)
 Use with multiple values in 1 column
 Array/hStore
Generalized Search Tree (GIST)
 Full text search
 Shapes
 PostGIS
Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)
G%!Sp+("+,
GeoDjango
                http://geodjango.org/

https://speakerdeck.com/pyconslides/location-location-
                      location
O#% M!r%
Connections
Connections
django-postgrespool
djorm-ext-pool
django-db-pool
django-postgrespool
import dj_database_url
import django_postgrespool

DATABASE = { 'default': dj_database_url.config() }
DATABASES['default']['ENGINE'] = 'django_postgrespool'

SOUTH_DATABASE_ADAPTERS = {
    'default': 'south.db.postgresql_psycopg2'
}
django-postgrespool
import dj_database_url
import django_postgrespool

DATABASE = { 'default': dj_database_url.config() }
DATABASES['default']['ENGINE'] = 'django_postgrespool'

SOUTH_DATABASE_ADAPTERS = {
    'default': 'south.db.postgresql_psycopg2'
}
Limitations?
Django attempts to support as many features
as possible on all database backends. However,
not all database backends are alike, and we’ve
had to make design decisions on which
features to support and which assumptions
we can make safely.
P/($r%.

Its great
D*+#$! ORM

Its not so bad
5+#1.!

Going beyond Django ORM limitations with Postgres

  • 1.
    G!"#$ b%&!#' ()% D*+#$! ORM ,"-"(+("!#. w"() P/($r%. @0r+"$1%r.("%#.
  • 2.
    PSA: Macs Postgres.app
  • 3.
    Why Postgres “Itsthe emacs of databases” http://www.craigkerstiens.com/2012/04/30/why-postgres/ https://speakerdeck.com/craigkerstiens/postgres-demystified
  • 4.
    TLDR Datatypes Fast Column Addition Conditional Indexes Listen/Notify Transactional DDL Table Inheritance Foreign Data Wrappers Per Transaction sync replication Concurrent Index Creation Window functions Extensions NoSQL inside SQL Common Table Expressions Momentum
  • 5.
    Limitations? Django attempts tosupport as many features as possible on all database backends. However, not all database backends are alike, and we’ve had to make design decisions on which features to support and which assumptions we can make safely.
  • 6.
  • 7.
    __________ < Postgres > ---------- ^__^ (oo)_______ (__) )/ ||----w | || ||
  • 8.
    DatatypesUUID date boolean timestamptzarray intervalinteger bigint smallint line XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery timetz path time text macaddr timestamp box tsvector
  • 9.
    DatatypesUUID date boolean timestamptzarray intervalinteger bigint smallint line XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery timetz path time text macaddr timestamp box tsvector
  • 10.
    DatatypesUUID date boolean timestamptzarray intervalinteger bigint smallint line XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery timetz path time text macaddr timestamp box tsvector
  • 11.
    DatatypesUUID date boolean timestamptzarray intervalinteger bigint smallint line XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery timetz path time text macaddr timestamp box tsvector
  • 12.
    _______ < SQLite > --------- ^__^ (oo)_______ (__) )/ ||----w | || ||
  • 13.
    Datatypes null integer text real blob
  • 14.
    DatatypesUUID date boolean timestamptzarray intervalinteger bigint smallint line XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery timetz path time text macaddr timestamp box tsvector
  • 15.
  • 16.
    ________ < Postgres > ---------- ^__^ (oo)_______ (__) )/ ||----w | || ||
  • 17.
    Indexes Multiple Types B-Tree, Gin, Gist, KNN, SP-Gist CREATE INDEX CONCURRENTLY
  • 18.
    _____ < MySQL > ------- ^__^ (oo)_______ (__) )/ ||----w | || ||
  • 19.
  • 20.
  • 21.
    Arrays CREATE TABLE item( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 22.
    Arrays CREATE TABLE item( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 23.
    Arrays INSERT INTO item VALUES(1, 'Django Pony', '{“Programming”,”Animal”}', now()); INSERT INTO item VALUES (2, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());
  • 24.
    Arrays INSERT INTO item VALUES(1, 'Django Pony', '{“Programming”,”Animal”}', now()); INSERT INTO item VALUES (2, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());
  • 25.
    Django pip install djorm-ext-pgarray pipinstall djorm-ext-expressions models.py: from djorm_pgarray.fields import ArrayField from djorm_expressions.models import ExpressionManager class Item(models.Model): name = models.CharField(max_length=255) tags = ArrayField(dbtype="varchar(255)") created_at = models.DateTimeField(auto_now=True)
  • 26.
    Django pip install djorm-ext-pgarray pipinstall djorm-ext-expressions models.py: from djorm_pgarray.fields import ArrayField from djorm_expressions.models import ExpressionManager class Item(models.Model): name = models.CharField(max_length=255) tags = ArrayField(dbtype="varchar(255)") created_at = models.DateTimeField(auto_now=True)
  • 27.
    Django i = Item( name='Django Pony', tags=['Programming','Animal']) i.save() qs = Item.objects.where( SqlExpression("tags", "@>", ['programming']) )
  • 28.
    Django i = Item( name='Django Pony', tags=['Programming','Animal']) i.save() qs = Item.objects.where( SqlExpression("tags", "@>", ['programming']) )
  • 29.
    Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgcrypto isn ltree fuzzystrmatch cube dict_int earthdistance unaccent dict_xsyn btree_gist tablefunc pgrowlocks
  • 30.
  • 31.
    NoSQL in yourSQL CREATE EXTENSION hstore; CREATE TABLE item ( id integer NOT NULL, name varchar(255), data hstore, );
  • 32.
    NoSQL in yourSQL CREATE EXTENSION hstore; CREATE TABLE item ( id integer NOT NULL, name varchar(255), data hstore, );
  • 33.
    NoSQL in yourSQL INSERT INTO items VALUES ( 1, 'Pony', 'rating => "4.0", color => “Pink”', );
  • 34.
    NoSQL in yourSQL INSERT INTO items VALUES ( 1, 'Pony', 'rating => "4.0", color => “Pink”', );
  • 35.
    Django pip install django-hstore fromdjango.db import models from django_hstore import hstore class Item(models.Model): name = models.CharField(max_length=250) data = hstore.DictionaryField(db_index=True) objects = hstore.Manager() def __unicode__(self): return self.name
  • 36.
    Django pip install django-hstore fromdjango.db import models from django_hstore import hstore class Item(models.Model): name = models.CharField(max_length=250) data = hstore.DictionaryField(db_index=True) objects = hstore.Manager() def __unicode__(self): return self.name
  • 37.
    Django Item.objects.create( name='Django Pony', data={'rating': '5'}) Item.objects.create( name='Pony', data={'color': 'pink', 'rating': '4'})
  • 38.
    Django Item.objects.create( name='Django Pony', data={'rating': '5'}) Item.objects.create( name='Pony', data={'color': 'pink', 'rating': '4'})
  • 39.
    Django colored_ponies = Product.objects.filter(data__contains='color') print colored_ponies[0].data['color'] favorite_ponies = Product.objects.filter(data__contains={'rating': '5'}) print colored_ponies[0]
  • 40.
    Django colored_ponies = Product.objects.filter(data__contains='color') print colored_ponies[0].data['color'] favorite_ponies = Product.objects.filter(data__contains={'rating': '5'}) print colored_ponies[0]
  • 42.
  • 43.
  • 44.
  • 46.
  • 47.
  • 48.
    Postgres a great Queue Not With Polling
  • 49.
    Trunk pip install celerytrunk psql < sql/*.sql celery worker -A tasks --loglevel=info ipython -i tasks.py >>> add.delay(2, 2)
  • 50.
    Trunk pip install celerytrunk psql < sql/*.sql celery worker -A tasks --loglevel=info ipython -i tasks.py >>> add.delay(2, 2)
  • 51.
    Trunk from celery importCelery celery = Celery('tasks') celery.config_from_object({ 'BROKER_URL': 'trunk.transport.Transport:// localhost/trunk', }) @celery.task def add(x, y): return x + y
  • 52.
    Trunk from celery importCelery celery = Celery('tasks') celery.config_from_object({ 'BROKER_URL': 'trunk.transport.Transport:// localhost/trunk', }) @celery.task def add(x, y): return x + y
  • 53.
  • 54.
  • 55.
    Searching Text Lucene Elastic Search Sphinx Solr
  • 56.
    Searching Text Lucene Elastic Search Sphinx Postgres Solr
  • 57.
    Full Text Search CREATETABLE posts ( id serial, title varchar(255), content text, tags varchar(255)[], post_text tsvector ); CREATE INDEX posttext_gin ON posts USING GIN(post_text); CREATE TRIGGER update_posttext BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger( ‘PostText’,‘english’,title, content, tags);
  • 58.
    Full Text Search CREATETABLE posts ( id serial, title varchar(255), content text, tags varchar(255)[], post_text tsvector ); CREATE INDEX posttext_gin ON posts USING GIN(post_text); CREATE TRIGGER update_posttext BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger( ‘PostText’,‘english’,title, content, tags);
  • 59.
    Django from djorm_pgfulltext.models importSearchManager from djorm_pgfulltext.fields import VectorField from django.db import models class Posts(models.Model): title = models.CharField(max_length=200) content = models.TextField() search_index = VectorField() objects = SearchManager( fields = ('title', 'content'), config = 'pg_catalog.english', search_field = 'search_index', auto_update_search_field = True )
  • 60.
    Django from djorm_pgfulltext.models importSearchManager from djorm_pgfulltext.fields import VectorField from django.db import models class Posts(models.Model): title = models.CharField(max_length=200) content = models.TextField() search_index = VectorField() objects = SearchManager( fields = ('title', 'content'), config = 'pg_catalog.english', search_field = 'search_index', auto_update_search_field = True )
  • 61.
  • 62.
  • 63.
  • 64.
    Indexes B-Tree Generalized Inverted Index(GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)
  • 65.
  • 66.
    Indexes B-Tree Generalized Inverted Index(GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)
  • 67.
    Generalized Inverted Index(GIN) Use with multiple values in 1 column Array/hStore
  • 68.
    Generalized Search Tree(GIST) Full text search Shapes PostGIS
  • 69.
    Indexes B-Tree Generalized Inverted Index(GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)
  • 70.
  • 71.
    GeoDjango http://geodjango.org/ https://speakerdeck.com/pyconslides/location-location- location
  • 72.
  • 73.
  • 74.
  • 75.
    django-postgrespool import dj_database_url import django_postgrespool DATABASE= { 'default': dj_database_url.config() } DATABASES['default']['ENGINE'] = 'django_postgrespool' SOUTH_DATABASE_ADAPTERS = { 'default': 'south.db.postgresql_psycopg2' }
  • 76.
    django-postgrespool import dj_database_url import django_postgrespool DATABASE= { 'default': dj_database_url.config() } DATABASES['default']['ENGINE'] = 'django_postgrespool' SOUTH_DATABASE_ADAPTERS = { 'default': 'south.db.postgresql_psycopg2' }
  • 77.
    Limitations? Django attempts tosupport as many features as possible on all database backends. However, not all database backends are alike, and we’ve had to make design decisions on which features to support and which assumptions we can make safely.
  • 78.
  • 79.
  • 80.