KEMBAR78
Distributed Database Architecture for GDPR | PPTX
1© 2018 All rights reserved.
Distributed Database
Architecture for GDPR
Karthik Ranganathan
PostgresConf Silicon Valley
Oct 15, 2018
2© 2018 All rights reserved.
About Us
Kannan Muthukkaruppan, CEO
Nutanix ♦ Facebook ♦ Oracle
IIT-Madras, University of California-Berkeley
Karthik Ranganathan, CTO
Nutanix ♦ Facebook ♦ Microsoft
IIT-Madras, University of Texas-Austin
Mikhail Bautin, Software Architect
ClearStory Data ♦ Facebook ♦ D.E.Shaw
Nizhny Novgorod State University, Stony Brook
 Founded Feb 2016
 Apache HBase committers and early engineers on Apache Cassandra
 Built Facebook’s NoSQL platform powered by Apache HBase
 Scaled the platform to serve many mission-critical use cases
• Facebook Messages (Messenger)
• Operational Data Store (Time series Data)
 Reassembled the same Facebook team at YugaByte along with
engineers from Oracle, Google, Nutanix and LinkedIn
Founders
3© 2018 All rights reserved.
WHAT IS
YUGABYTE DB?
4© 2018 All rights reserved.
A transactional, planet-scale database
for building high-performance cloud services.
5© 2018 All rights reserved.
NoSQL + SQL Cloud Native
6© 2018 All rights reserved.
TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE
Single Shard & Distributed ACID Txns
Document-Based, Strongly
Consistent Storage
Low Latency, Tunable Reads
High Throughput
OPEN SOURCE
Apache 2.0
Popular APIs Extended
Apache Cassandra, Redis and PostgreSQL (BETA)
Auto Sharding & Rebalancing
Global Data Distribution
Design Principles
CLOUD NATIVE
Built For The Container Era
Self-Healing, Fault-Tolerant
7© 2018 All rights reserved.
WHAT IS GDPR?
8© 2018 All rights reserved.
GDPR : General Data Protection Regulation
9© 2018 All rights reserved.
Citizens of EU can control sharing and protection
of their personal data by businesses.
10© 2018 All rights reserved.
Personal Data, also called
PII (Personally Identifiable Information)
• User name
• Email address
• Date of birth
• Bank details
• Location details
• Computer IP address
11© 2018 All rights reserved.
Control over personal data
• Consent & data location
• Data privacy and safety
• Right to be forgotten
• Data access on demand
• Notify on data breach
• Data portability
• Ability to fix errors in data
• Restrict processing
Database concerns Application concerns
12© 2018 All rights reserved.
#1 USER CONSENT
AND DATA LOCATION
13© 2018 All rights reserved.
Data must be stored in EU by default. Businesses
need explicit user consent to move it outside.
14© 2018 All rights reserved.
Why is this hard?
• EU user data lives in that region
• Other countries have compliance regulation – more geo’s
• Public clouds may not have coverage – hybrid deployments
• Architecture depends on data – multiple per service
Think Global Deployments first!
15© 2018 All rights reserved.
Example – online ecommerce site
• Products table needs globally replication – not PII data
16© 2018 All rights reserved.
Read Replicas
Global Replication
Non-PII Data
Global Replication
with YugaByte DB
17© 2018 All rights reserved.
Example – online ecommerce site
• Users, orders and shipments needs locality – PII data
• Product locations table needs scale – may be PII
18© 2018 All rights reserved.
Primary Data in EU
PII Data
Non-EU Data
Non-EU Data
Geo-Partitioning
with YugaByte DB
19© 2018 All rights reserved.
Replicate data on demand to other geo’s
• User may be ok with replicating data
• Read replicas on demand (for remote, low-latency reads)
• Change data capture (for analytics)
20© 2018 All rights reserved.
Read Replicas
Primary Data in EU
PII Data with YugaByte DB
Read Replicas with
YugaByte DB
21© 2018 All rights reserved.
#2 DATA PRIVACY
AND SAFETY
22© 2018 All rights reserved.
Data must be secured by using best practices by
default. Users need to be notified on breach.
23© 2018 All rights reserved.
Implement end-to-end encryption on day #1
24© 2018 All rights reserved.
• Use TLS Encryption
• Between client and server for app interaction
• Between database servers for replication
Encrypt All Network Communication
25© 2018 All rights reserved.
TLS Encryption
Database Cluster
User
Server to server
communication
26© 2018 All rights reserved.
• Encryption at rest
• Integrate with external Key Management Systems
• Ability to rotate keys on demand
Encryption All Storage
Have a key-value table with id to cipher key. Encrypt PII data with
the cipher key for fine-grained control. More in the next section.
27© 2018 All rights reserved.
Encryption at Rest
Database Cluster
User
Encryption on disk
Key Management
Service
28© 2018 All rights reserved.
#3 RIGHT TO BE
FORGOTTEN
29© 2018 All rights reserved.
Data must be erased if on explicit request or when
data is no longer relevant to original intent.
30© 2018 All rights reserved.
• Have a key-value table with id to cipher key
• Encrypt PII data with the cipher key on write
• Decrypt PII data on access
• Delete cipher key to forget PII data
Use Encryption of Data Attributes
31© 2018 All rights reserved.
SET email=foo@bar.com FOR USER ID=XXX
Example - Storing User Profile Data
SET email=ENCRYPTED FOR USER ID=XXX
Get encryption
key for user
Encryption PII Data
Store encrypted data
• Reads require decryption
• Data not accessible without key
32© 2018 All rights reserved.
• Many cases where value not needed
• Anonymize PII data with one way hash functions
• Use hashed ids for in data warehouse
• There is no PII data if hashed ids are used!
Use Anonymization of Data Attributes
33© 2018 All rights reserved.
USER=foo@bar.com CHECKED OUT PRODUCT=X, CATEGORY=Gadget
Example – Website Analytics
USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget
One-way hash
user id
Analytics
34© 2018 All rights reserved.
Example – Website Analytics
• User no longer identifiable
• Hashed data still useful!
35© 2018 All rights reserved.
#4 DATA ACCESS
ON DEMAND
36© 2018 All rights reserved.
Ability to inform a user about what data is being used,
for what purpose and where it is stored.
37© 2018 All rights reserved.
• Store in a separate information architecture table
• Make tagging a part of the process
• Easy to find what PII data is stored on demand
Tag Tables and Columns with PII
38© 2018 All rights reserved.
• Ensure PII are encrypted
• Ensure non-PII columns do not have sensitive data
• Use Spark/Presto to perform scan periodically
• Run scan on a read replica to not impact production
Run Continuous Compliance Checks
39© 2018 All rights reserved.
Ensure PII columns are encrypted
Ensure no PII data in other columns
Tag PII Columns
40© 2018 All rights reserved.
PUTTING IT ALL TOGETHER
41© 2018 All rights reserved.
GDPR Reference Architecture
Primary Cluster
(in EU)
Read Replica Clusters
(Anywhere in the World)
Encrypted Encrypted
App clients
Encrypted Async
Replication
Reads & Writes, Encrypted
Analytics clients
Read only, Encrypted
At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes
PII Columns Encrypted w/
Cipher Key
Tag PII Columns
Ensure PII columns are
encrypted
Ensure no PII data in other
columns
42© 2018 All rights reserved.
43© 2018 All rights reserved.
Questions?
Try it at
docs.yugabyte.com/latest/quick-start

Distributed Database Architecture for GDPR

  • 1.
    1© 2018 Allrights reserved. Distributed Database Architecture for GDPR Karthik Ranganathan PostgresConf Silicon Valley Oct 15, 2018
  • 2.
    2© 2018 Allrights reserved. About Us Kannan Muthukkaruppan, CEO Nutanix ♦ Facebook ♦ Oracle IIT-Madras, University of California-Berkeley Karthik Ranganathan, CTO Nutanix ♦ Facebook ♦ Microsoft IIT-Madras, University of Texas-Austin Mikhail Bautin, Software Architect ClearStory Data ♦ Facebook ♦ D.E.Shaw Nizhny Novgorod State University, Stony Brook  Founded Feb 2016  Apache HBase committers and early engineers on Apache Cassandra  Built Facebook’s NoSQL platform powered by Apache HBase  Scaled the platform to serve many mission-critical use cases • Facebook Messages (Messenger) • Operational Data Store (Time series Data)  Reassembled the same Facebook team at YugaByte along with engineers from Oracle, Google, Nutanix and LinkedIn Founders
  • 3.
    3© 2018 Allrights reserved. WHAT IS YUGABYTE DB?
  • 4.
    4© 2018 Allrights reserved. A transactional, planet-scale database for building high-performance cloud services.
  • 5.
    5© 2018 Allrights reserved. NoSQL + SQL Cloud Native
  • 6.
    6© 2018 Allrights reserved. TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE Single Shard & Distributed ACID Txns Document-Based, Strongly Consistent Storage Low Latency, Tunable Reads High Throughput OPEN SOURCE Apache 2.0 Popular APIs Extended Apache Cassandra, Redis and PostgreSQL (BETA) Auto Sharding & Rebalancing Global Data Distribution Design Principles CLOUD NATIVE Built For The Container Era Self-Healing, Fault-Tolerant
  • 7.
    7© 2018 Allrights reserved. WHAT IS GDPR?
  • 8.
    8© 2018 Allrights reserved. GDPR : General Data Protection Regulation
  • 9.
    9© 2018 Allrights reserved. Citizens of EU can control sharing and protection of their personal data by businesses.
  • 10.
    10© 2018 Allrights reserved. Personal Data, also called PII (Personally Identifiable Information) • User name • Email address • Date of birth • Bank details • Location details • Computer IP address
  • 11.
    11© 2018 Allrights reserved. Control over personal data • Consent & data location • Data privacy and safety • Right to be forgotten • Data access on demand • Notify on data breach • Data portability • Ability to fix errors in data • Restrict processing Database concerns Application concerns
  • 12.
    12© 2018 Allrights reserved. #1 USER CONSENT AND DATA LOCATION
  • 13.
    13© 2018 Allrights reserved. Data must be stored in EU by default. Businesses need explicit user consent to move it outside.
  • 14.
    14© 2018 Allrights reserved. Why is this hard? • EU user data lives in that region • Other countries have compliance regulation – more geo’s • Public clouds may not have coverage – hybrid deployments • Architecture depends on data – multiple per service Think Global Deployments first!
  • 15.
    15© 2018 Allrights reserved. Example – online ecommerce site • Products table needs globally replication – not PII data
  • 16.
    16© 2018 Allrights reserved. Read Replicas Global Replication Non-PII Data Global Replication with YugaByte DB
  • 17.
    17© 2018 Allrights reserved. Example – online ecommerce site • Users, orders and shipments needs locality – PII data • Product locations table needs scale – may be PII
  • 18.
    18© 2018 Allrights reserved. Primary Data in EU PII Data Non-EU Data Non-EU Data Geo-Partitioning with YugaByte DB
  • 19.
    19© 2018 Allrights reserved. Replicate data on demand to other geo’s • User may be ok with replicating data • Read replicas on demand (for remote, low-latency reads) • Change data capture (for analytics)
  • 20.
    20© 2018 Allrights reserved. Read Replicas Primary Data in EU PII Data with YugaByte DB Read Replicas with YugaByte DB
  • 21.
    21© 2018 Allrights reserved. #2 DATA PRIVACY AND SAFETY
  • 22.
    22© 2018 Allrights reserved. Data must be secured by using best practices by default. Users need to be notified on breach.
  • 23.
    23© 2018 Allrights reserved. Implement end-to-end encryption on day #1
  • 24.
    24© 2018 Allrights reserved. • Use TLS Encryption • Between client and server for app interaction • Between database servers for replication Encrypt All Network Communication
  • 25.
    25© 2018 Allrights reserved. TLS Encryption Database Cluster User Server to server communication
  • 26.
    26© 2018 Allrights reserved. • Encryption at rest • Integrate with external Key Management Systems • Ability to rotate keys on demand Encryption All Storage Have a key-value table with id to cipher key. Encrypt PII data with the cipher key for fine-grained control. More in the next section.
  • 27.
    27© 2018 Allrights reserved. Encryption at Rest Database Cluster User Encryption on disk Key Management Service
  • 28.
    28© 2018 Allrights reserved. #3 RIGHT TO BE FORGOTTEN
  • 29.
    29© 2018 Allrights reserved. Data must be erased if on explicit request or when data is no longer relevant to original intent.
  • 30.
    30© 2018 Allrights reserved. • Have a key-value table with id to cipher key • Encrypt PII data with the cipher key on write • Decrypt PII data on access • Delete cipher key to forget PII data Use Encryption of Data Attributes
  • 31.
    31© 2018 Allrights reserved. SET email=foo@bar.com FOR USER ID=XXX Example - Storing User Profile Data SET email=ENCRYPTED FOR USER ID=XXX Get encryption key for user Encryption PII Data Store encrypted data • Reads require decryption • Data not accessible without key
  • 32.
    32© 2018 Allrights reserved. • Many cases where value not needed • Anonymize PII data with one way hash functions • Use hashed ids for in data warehouse • There is no PII data if hashed ids are used! Use Anonymization of Data Attributes
  • 33.
    33© 2018 Allrights reserved. USER=foo@bar.com CHECKED OUT PRODUCT=X, CATEGORY=Gadget Example – Website Analytics USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget One-way hash user id Analytics
  • 34.
    34© 2018 Allrights reserved. Example – Website Analytics • User no longer identifiable • Hashed data still useful!
  • 35.
    35© 2018 Allrights reserved. #4 DATA ACCESS ON DEMAND
  • 36.
    36© 2018 Allrights reserved. Ability to inform a user about what data is being used, for what purpose and where it is stored.
  • 37.
    37© 2018 Allrights reserved. • Store in a separate information architecture table • Make tagging a part of the process • Easy to find what PII data is stored on demand Tag Tables and Columns with PII
  • 38.
    38© 2018 Allrights reserved. • Ensure PII are encrypted • Ensure non-PII columns do not have sensitive data • Use Spark/Presto to perform scan periodically • Run scan on a read replica to not impact production Run Continuous Compliance Checks
  • 39.
    39© 2018 Allrights reserved. Ensure PII columns are encrypted Ensure no PII data in other columns Tag PII Columns
  • 40.
    40© 2018 Allrights reserved. PUTTING IT ALL TOGETHER
  • 41.
    41© 2018 Allrights reserved. GDPR Reference Architecture Primary Cluster (in EU) Read Replica Clusters (Anywhere in the World) Encrypted Encrypted App clients Encrypted Async Replication Reads & Writes, Encrypted Analytics clients Read only, Encrypted At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes PII Columns Encrypted w/ Cipher Key Tag PII Columns Ensure PII columns are encrypted Ensure no PII data in other columns
  • 42.
    42© 2018 Allrights reserved.
  • 43.
    43© 2018 Allrights reserved. Questions? Try it at docs.yugabyte.com/latest/quick-start