KEMBAR78
Empowering developers to deploy their own data stores | PDF
Empowering developers to deploy their
own data stores.
A story of Terraform, Puppet and rage
Tomas Doran
@bobtfish
• Iterate on the things you do often

• Hide complexity

• Empower others
2
Devops = Workflow
• A thing of the past (mostly)
• Need to be able to scale up and down in hours
• If not minutes

• Need to allow people to experiment
• Cloud is expensive, unless you use it!
3
Artisanal hand-crafted servers
• ‘Infra’ layer
• DNS / puppet / apt - basic services
• A(WS)?nycast - failover / HA

• ‘App’ layer
• Smartstack - Service discovery + routing
• Paasta (Mesos + Marathon) - Scheduling + Orchestration
• search24-reviews-uswest1aprod - ugh!
4
2 Layer architecture
• Remembering the . on PTR records

• For some people!
• Why make them do this?
5
The hardest thing
• Datastore PAAS
• Elasticsearch clusters are the ‘easy’ case

• No ‘master’ - all machines are equal
• Automatic sharding/replication

• ASG + ELB
• Zookeeper for discovery
6
Next logical step
• curl http://10.29.0.3:8142 (A(WS)nycast puppetmaster)

{
“habitat”: “uswest1aprod”
}

• “habitat”, “region”, “superregion”, “ecosystem”
7
Environment server
• curl http://10.29.0.3:8142 (A(WS)nycast puppetmaster)

{
“habitat”: “uswest1aprod”
}

• “habitat”, “region”, “superregion”, “ecosystem”
8
Environment server
• Hostname: search1-reviews-uswest1aprod
• Parse out cluster name

elasticsearch_cluster { ‘reviews’: }

puppet/modules/elasticsearch_cluster/data/cluster/
reviews.yaml

• Can locate the ‘data’ directory somewhere else!
• Reuse the same YAML for service discovery + provisioning
• Commit hook validation
9
puppet data in modules
• External Node Classifer
• Puppetmaster calls a script, returns node definition
• Create node definition from EC2 tags

puppet::role::elasticsearch_cluster => cluster_name=reviews

• Stop needing individual hostnames!
• Pre-allocate names using GENERATE
10
puppet ENC
• Bad abstraction for contextual information
• Which db server is the master? Does it have ‘master’ in it’s FQDN?
• If it does, what happens when you promote another machine?

• Need key => value for cattle not pets

• Customize your monitoring system to actually tell you what’s wrong!
• ‘The master db has crashed’ vs ‘A db has crashed’
• ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead`
11
Hostnames
• Got most of the pieces
• Machines auto-configure themselves after launch.
• Remaining step is actually launching machines

• Terraform is awesome…
• IF you treat it as a low level abstraction
• IF you keep things in composeable units
• IF you add enough workflow to not run with scissors
12
Terraform
13
14
15
• Terraform the most generic abstraction possible
• Map JSON (HCL) DSL => CRUD APIs
• Cannot do implicit mapping
• But puppet / ansible / whatever can???
• ‘Name’ tag => namevar
• Only works in some cases - not everything has tags!
• Implicit mapping is evil
• Duplicates will screw up your day
16
Low level
17
Implicit mapping example - puppet AWS
18
Implicit mapping example - puppet AWS
19
Implicit mapping example - puppet AWS
20
Implicit mapping example - puppet AWS
• BUG - prefetch method eats exceptions (fixed now)
21
Implicit mapping example - puppet AWS
• BUG - prefetch method eats exceptions (fixed now)
22
Implicit mapping example - puppet AWS
• Reusable abstraction (in theory)

• Don’t try to use like puppet!
• Flat hierarchy (do not nest modules)
• Use version tags
• Use other git repos

• Or just generate resources as JSON

• KISS
23
Terraform modules
• Why even is state?
• How to cope with state
• Atlas
• Workflow (locking!) is your problem
• Remote state
• Shard terraform for (team) concurrency
• S3 store
• Many read, few write
• Wrap it yourself (make, Jenkins, don’t install terraform in $PATH)
24
State
• Provides the workflow

• ‘awsadmin’ machine + IAM Role as slave

• Makefile based workflow

• Jenkins job builder to template things
25
Jenkins
• Refresh state (upload refreshed state)
• Plan + save as artifact
• Filter plan!
• Approve plan
• Apply plan, save state
26
Split up the steps
• Commit some files to git.
• Push to a branch
• Jenkins runs
• Gated approval/application process

• Abstract away the scary parts
• Enforce workflow
27
Cluster provisioning workflow
• Self service cluster provisioning
• Developers define their own clusters
• 1 click from OPs to approve

• Owning team gets accounted
• AWS metadata added as needed.
• All metadata validated.

• Clusters built around best practices
• Can abstract further in future
28
Nirvana
P.S. We’re hiring!
@bobtfish
engineeringblog.yelp.com
github.com/Yelp
github.com/bobtfish

Empowering developers to deploy their own data stores

  • 1.
    Empowering developers todeploy their own data stores. A story of Terraform, Puppet and rage Tomas Doran @bobtfish
  • 2.
    • Iterate onthe things you do often
 • Hide complexity
 • Empower others 2 Devops = Workflow
  • 3.
    • A thingof the past (mostly) • Need to be able to scale up and down in hours • If not minutes
 • Need to allow people to experiment • Cloud is expensive, unless you use it! 3 Artisanal hand-crafted servers
  • 4.
    • ‘Infra’ layer •DNS / puppet / apt - basic services • A(WS)?nycast - failover / HA
 • ‘App’ layer • Smartstack - Service discovery + routing • Paasta (Mesos + Marathon) - Scheduling + Orchestration • search24-reviews-uswest1aprod - ugh! 4 2 Layer architecture
  • 5.
    • Remembering the. on PTR records
 • For some people! • Why make them do this? 5 The hardest thing
  • 6.
    • Datastore PAAS •Elasticsearch clusters are the ‘easy’ case
 • No ‘master’ - all machines are equal • Automatic sharding/replication
 • ASG + ELB • Zookeeper for discovery 6 Next logical step
  • 7.
    • curl http://10.29.0.3:8142(A(WS)nycast puppetmaster)
 { “habitat”: “uswest1aprod” }
 • “habitat”, “region”, “superregion”, “ecosystem” 7 Environment server
  • 8.
    • curl http://10.29.0.3:8142(A(WS)nycast puppetmaster)
 { “habitat”: “uswest1aprod” }
 • “habitat”, “region”, “superregion”, “ecosystem” 8 Environment server
  • 9.
    • Hostname: search1-reviews-uswest1aprod •Parse out cluster name
 elasticsearch_cluster { ‘reviews’: }
 puppet/modules/elasticsearch_cluster/data/cluster/ reviews.yaml
 • Can locate the ‘data’ directory somewhere else! • Reuse the same YAML for service discovery + provisioning • Commit hook validation 9 puppet data in modules
  • 10.
    • External NodeClassifer • Puppetmaster calls a script, returns node definition • Create node definition from EC2 tags
 puppet::role::elasticsearch_cluster => cluster_name=reviews
 • Stop needing individual hostnames! • Pre-allocate names using GENERATE 10 puppet ENC
  • 11.
    • Bad abstractionfor contextual information • Which db server is the master? Does it have ‘master’ in it’s FQDN? • If it does, what happens when you promote another machine?
 • Need key => value for cattle not pets
 • Customize your monitoring system to actually tell you what’s wrong! • ‘The master db has crashed’ vs ‘A db has crashed’ • ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead` 11 Hostnames
  • 12.
    • Got mostof the pieces • Machines auto-configure themselves after launch. • Remaining step is actually launching machines
 • Terraform is awesome… • IF you treat it as a low level abstraction • IF you keep things in composeable units • IF you add enough workflow to not run with scissors 12 Terraform
  • 13.
  • 14.
  • 15.
  • 16.
    • Terraform themost generic abstraction possible • Map JSON (HCL) DSL => CRUD APIs • Cannot do implicit mapping • But puppet / ansible / whatever can??? • ‘Name’ tag => namevar • Only works in some cases - not everything has tags! • Implicit mapping is evil • Duplicates will screw up your day 16 Low level
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    • BUG -prefetch method eats exceptions (fixed now) 21 Implicit mapping example - puppet AWS
  • 22.
    • BUG -prefetch method eats exceptions (fixed now) 22 Implicit mapping example - puppet AWS
  • 23.
    • Reusable abstraction(in theory)
 • Don’t try to use like puppet! • Flat hierarchy (do not nest modules) • Use version tags • Use other git repos
 • Or just generate resources as JSON
 • KISS 23 Terraform modules
  • 24.
    • Why evenis state? • How to cope with state • Atlas • Workflow (locking!) is your problem • Remote state • Shard terraform for (team) concurrency • S3 store • Many read, few write • Wrap it yourself (make, Jenkins, don’t install terraform in $PATH) 24 State
  • 25.
    • Provides theworkflow
 • ‘awsadmin’ machine + IAM Role as slave
 • Makefile based workflow
 • Jenkins job builder to template things 25 Jenkins
  • 26.
    • Refresh state(upload refreshed state) • Plan + save as artifact • Filter plan! • Approve plan • Apply plan, save state 26 Split up the steps
  • 27.
    • Commit somefiles to git. • Push to a branch • Jenkins runs • Gated approval/application process
 • Abstract away the scary parts • Enforce workflow 27 Cluster provisioning workflow
  • 28.
    • Self servicecluster provisioning • Developers define their own clusters • 1 click from OPs to approve
 • Owning team gets accounted • AWS metadata added as needed. • All metadata validated.
 • Clusters built around best practices • Can abstract further in future 28 Nirvana
  • 29.