KEMBAR78
The Patterns of Distributed Logging and Containers | PDF
THE PATTERNS OF DISTRIBUTED LOGGING
AND CONTAINERS
CloudNativeCon Europe 2017
March 30, 2017
Satoshi Tagomori (@tagomoris)
Treasure Data, Inc.
SATOSHI TAGOMORI
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
1. Microservices, Containers and Logging
2. Scaling Logging Platform
3. Patterns: Source/Destination -side Aggregation
4. Patterns: Scaling Up/Out Destination
5. Practices
MICROSERVICES,
CONTAINERS AND LOGGING
Logging in Industries
• Service Logs
• Web access logs
• Ad logs
• Commercial transaction logs for analytics (EC, Game, ...)
• System Logs
• Syslog and other OS logs
• Audit logs
• Performance metrics
Logs for
Business Growth
Logs for
Service Stability
Microservices and Logging
Users
LAMP/Rails/MEAN/... Apps
Logs
Users
Search
Logs
Recommendation Shopping cart Reviews Ads ...
Monolithic service Microservices
Microservices and Containers
• Microservices
• Isolated dependencies
• Agile deployment
• Containers
• Isolated environments & resources
• Simple pull&restart deployment
• Less overhead, high density
Logging Challenges with Microservices/Containers
• Containerization changes everything:
• No permanent storages
• No fixed physical/network addresses
• No fixed mapping between servers and roles
Logging Challenges with Microservices/Containers
• Containerization changes everything:
• No permanent storages
• No fixed physical/network addresses
• No fixed mapping between servers and roles
Transfer Logs to Anywhere ASAP
Logging Challenges with Microservices/Containers
• Containerization changes everything:
• No permanent storages
• No fixed physical/network addresses
• No fixed mapping between servers and roles
Push Logs From Containers
Logging Challenges with Microservices/Containers
• Containerization changes everything:
• No permanent storages
• No fixed physical/network addresses
• No fixed mapping between servers and roles
Label Logs With Service Names/Tags
Logging Challenges with Microservices/Containers
• Containerization changes everything:
• No permanent storages
• No fixed physical/network addresses
• No fixed mapping between servers and roles
Label Logs With Service Names/Tags
Parse Logs & Label Values At Source
Structured Logs
Structured Logs: tag, time, key-value pairs
Original log:
the customer put an item to cart: item_id=101, items=10, client=web
Structured log:
ec_service.shopping_cart
2017-03-30 16:35:37 +0100
{
"container_id": "bfdd5b9....",
"container_name": "/infallible_mayer",
"source": "stdout",
"event": "put an item to cart",
"item_id": 101,
"items": 10,
"client": "web"
}
tag
timestamp
record
How to Ship Logs from Docker Containers
nginx, mysql, ....
log files
agents
read files,
parse plain texts
apps, middleware
json log files
agents
read files,
parse json lines
applications
agents
just receive
transferred logs
apps, middleware
agents
just receive
transferred logs
Using
mounted volume
Using
container json logs
Sending logs
to agents directly
Using
logging drivers
+ disk I/O penalty
+ mount points
+ disk I/O penalty
+ logger code
+ agent config 😃
SCALING LOGGING PLATFORM
Core Architecture: Distributed Logging
Source (Container + Agent)
Transferring/Aggregation layer
Destination (Storage, Database, Service)
Distributed Logging Workflow
• Retrieve raw logs: file system / network
• Parse log content
Collector
Aggregator
Destination
• Get data from multiple sources
• Split/merge incoming data into streams
• Retrieve structured logs from Aggregator
• Store formatted logs
Scaling Logging
• Network Traffic
• Split heavy log traffic into traffics to nodes
• CPU Load
• Distribute processing to nodes about parsing/formatting logs
• High Availability
• Switch traffic from a node to another for failures
• Agility
• Reconfigure whole logging layer to modify destinations
PATTERNS:
SOURCE/DESTINATION -SIDE
AGGREGATION
Source Side Aggregation
Destination
Side
Aggregation
NO
YES
YESNO
Now I'm Talking About:
Source
Transferring
Aggregation
Destination
Source Side
Destination Side
Source-side Aggregation Patterns
Without
Source-side Aggregation
With
Source-side Aggregation
Collector
Destination-side
Aggregate
Container
Aggregation Pattern without Source-side Aggregation
• Pros:
• Simple configuration
• Cons:
• Fixed aggregator (destination endpoint) address

configured in containers
• Many network connections
• High load in aggregator / destination
Aggregation Pattern with Source-side Aggregation
• Pros:
• Less connections
• Lower load in aggregator / destination
• Less configurations in containers
• More agility

(aggregate containers can be reconfigured)
• Cons:
• Need more resources (+1 container per host)
Aggregate
Container
Destination-side Aggregation Patterns
Without
Destination-side Aggregation
With
Destination-side Aggregation
Aggregator
Node
Source-side
Destination
Aggregation Pattern without Destination-side Aggregation
• Pros:
• Less nodes
• Simpler configuration
• Cons:
• Destination changes affects all source nodes
• Worse performance:

many small write requests on destination(storage)
• Pros:
• Destination changes does NOT affect source nodes
• Better performance:

destination aggregator can merge write operations
• Cons:
• More nodes
• More complex configuration
Aggregation Pattern with Destination-side Aggregation
Aggregator
Node
PATTERNS:
SCALING UP/OUT DESTINATION
HOW TO SCALE HERE
Source
Transferring
Aggregation
Destination
Now I'm Talking About:
Scaling Destination Patterns
Scaling Up
Aggregator/Destination Endpoints
Scaling Out
Aggregator/Destination Endpoints
Destination-side
Aggregator
or
Destination
Load balancer
or
Huge queue
Backend
nodes
Collector
nodes
Using HTTP Load Balancer
or Huge Queues
Using Round Robin Clients
• Pros:
• Simple configuration:

specifying load balancer only

in collector nodes
• Cons:
• Upper limits about scaling up

on Load balancer (or queue)
Scaling Up Destination
Backend
nodes
Load balancer
or
Huge queue
Scaling Out Destination
• Pros:
• Unlimited scaling by adding nodes
• Cons:
• Complex configuration in collector nodes
• Client feature required for round-robin
• Unavailable for traffic over Internet
Destination-side Aggregation and Destination Scaling
Destination Side Aggregation
Scaling Up
Destination
Endpoints
YESNO
Scaling Out
Destination
Endpoints
Early Stage Systems
Collect Logs over
Internet
or
Using Queues
Collect Logs
in Data Center
All Collector Nodes Must Know
All Destination Nodes
↓
Uncontrollable
PRACTICES
Practices: Docker + Fluentd
• Docker Fluentd Logging Driver
• Docker containers can send these logs to Fluentd directly,

with less overhead
• Fluentd's Pluggable Architecture
• Various destination systems (storage/database/service) are available

by changing configuration
• Small Memory Footprint
• Source aggregation requires +1 container per hosts:

less additional resource usage is fine!
Practice 1: Source-side Aggregation + Scaling Up
• Kubernetes: Fluentd + Elasticsearch
• a.k.a EFK stack (inspired by ELK stack)
• Elasticsearch - Fluentd - Kibana
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/
apps, middleware
json log files
Practice 2: Source-side Aggregation + Scaling Up
• Containerized Applications
• w/ Google Stackdriver for Monitoring
• w/ Treasure Data for Analytics
Google Stackdriver Logging
apps, middleware
Practice 3: Source/Destination-side Aggregation + Scaling Out
• Containerized Application
• w/ Log processing on Hadoop
• writing files on HDFS via WebHDFS
• Hadoop HDFS prefers large files on HDFS:
• Destination-side aggregation works well
apps, middleware
Practice 4: Source/Destination-side Aggregation + Scaling Out
• Containerized Application
• w/ Log processing on Google BigQuery
• putting logs via HTTPS
• BigQuery has quota about write requests:
• Destination-side aggregation works well
apps, middleware
Best practices?
• Source aggregation: do it
• it makes app containers free from logging problems (buffering, HA, ...)

• Destination aggregation: it depends
• no need for cloud logging services/storages
• may need for self-hosted distributed filesystems/databases
• may need for cloud services which charges per requests

• Destination scaling: it depends on destinations
Make Logging Scalable,
Service Stable & Business Growing.
Happy Logging!
@tagomoris

The Patterns of Distributed Logging and Containers

  • 1.
    THE PATTERNS OFDISTRIBUTED LOGGING AND CONTAINERS CloudNativeCon Europe 2017 March 30, 2017 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  • 2.
  • 5.
    1. Microservices, Containersand Logging 2. Scaling Logging Platform 3. Patterns: Source/Destination -side Aggregation 4. Patterns: Scaling Up/Out Destination 5. Practices
  • 6.
  • 7.
    Logging in Industries •Service Logs • Web access logs • Ad logs • Commercial transaction logs for analytics (EC, Game, ...) • System Logs • Syslog and other OS logs • Audit logs • Performance metrics Logs for Business Growth Logs for Service Stability
  • 8.
    Microservices and Logging Users LAMP/Rails/MEAN/...Apps Logs Users Search Logs Recommendation Shopping cart Reviews Ads ... Monolithic service Microservices
  • 9.
    Microservices and Containers •Microservices • Isolated dependencies • Agile deployment • Containers • Isolated environments & resources • Simple pull&restart deployment • Less overhead, high density
  • 10.
    Logging Challenges withMicroservices/Containers • Containerization changes everything: • No permanent storages • No fixed physical/network addresses • No fixed mapping between servers and roles
  • 11.
    Logging Challenges withMicroservices/Containers • Containerization changes everything: • No permanent storages • No fixed physical/network addresses • No fixed mapping between servers and roles Transfer Logs to Anywhere ASAP
  • 12.
    Logging Challenges withMicroservices/Containers • Containerization changes everything: • No permanent storages • No fixed physical/network addresses • No fixed mapping between servers and roles Push Logs From Containers
  • 13.
    Logging Challenges withMicroservices/Containers • Containerization changes everything: • No permanent storages • No fixed physical/network addresses • No fixed mapping between servers and roles Label Logs With Service Names/Tags
  • 14.
    Logging Challenges withMicroservices/Containers • Containerization changes everything: • No permanent storages • No fixed physical/network addresses • No fixed mapping between servers and roles Label Logs With Service Names/Tags Parse Logs & Label Values At Source Structured Logs
  • 15.
    Structured Logs: tag,time, key-value pairs Original log: the customer put an item to cart: item_id=101, items=10, client=web Structured log: ec_service.shopping_cart 2017-03-30 16:35:37 +0100 { "container_id": "bfdd5b9....", "container_name": "/infallible_mayer", "source": "stdout", "event": "put an item to cart", "item_id": 101, "items": 10, "client": "web" } tag timestamp record
  • 16.
    How to ShipLogs from Docker Containers nginx, mysql, .... log files agents read files, parse plain texts apps, middleware json log files agents read files, parse json lines applications agents just receive transferred logs apps, middleware agents just receive transferred logs Using mounted volume Using container json logs Sending logs to agents directly Using logging drivers + disk I/O penalty + mount points + disk I/O penalty + logger code + agent config 😃
  • 17.
  • 18.
    Core Architecture: DistributedLogging Source (Container + Agent) Transferring/Aggregation layer Destination (Storage, Database, Service)
  • 19.
    Distributed Logging Workflow •Retrieve raw logs: file system / network • Parse log content Collector Aggregator Destination • Get data from multiple sources • Split/merge incoming data into streams • Retrieve structured logs from Aggregator • Store formatted logs
  • 20.
    Scaling Logging • NetworkTraffic • Split heavy log traffic into traffics to nodes • CPU Load • Distribute processing to nodes about parsing/formatting logs • High Availability • Switch traffic from a node to another for failures • Agility • Reconfigure whole logging layer to modify destinations
  • 21.
  • 22.
  • 23.
    Now I'm TalkingAbout: Source Transferring Aggregation Destination Source Side Destination Side
  • 24.
    Source-side Aggregation Patterns Without Source-sideAggregation With Source-side Aggregation Collector Destination-side Aggregate Container
  • 25.
    Aggregation Pattern withoutSource-side Aggregation • Pros: • Simple configuration • Cons: • Fixed aggregator (destination endpoint) address
 configured in containers • Many network connections • High load in aggregator / destination
  • 26.
    Aggregation Pattern withSource-side Aggregation • Pros: • Less connections • Lower load in aggregator / destination • Less configurations in containers • More agility
 (aggregate containers can be reconfigured) • Cons: • Need more resources (+1 container per host) Aggregate Container
  • 27.
    Destination-side Aggregation Patterns Without Destination-sideAggregation With Destination-side Aggregation Aggregator Node Source-side Destination
  • 28.
    Aggregation Pattern withoutDestination-side Aggregation • Pros: • Less nodes • Simpler configuration • Cons: • Destination changes affects all source nodes • Worse performance:
 many small write requests on destination(storage)
  • 29.
    • Pros: • Destinationchanges does NOT affect source nodes • Better performance:
 destination aggregator can merge write operations • Cons: • More nodes • More complex configuration Aggregation Pattern with Destination-side Aggregation Aggregator Node
  • 30.
  • 31.
    HOW TO SCALEHERE Source Transferring Aggregation Destination Now I'm Talking About:
  • 32.
    Scaling Destination Patterns ScalingUp Aggregator/Destination Endpoints Scaling Out Aggregator/Destination Endpoints Destination-side Aggregator or Destination Load balancer or Huge queue Backend nodes Collector nodes Using HTTP Load Balancer or Huge Queues Using Round Robin Clients
  • 33.
    • Pros: • Simpleconfiguration:
 specifying load balancer only
 in collector nodes • Cons: • Upper limits about scaling up
 on Load balancer (or queue) Scaling Up Destination Backend nodes Load balancer or Huge queue
  • 34.
    Scaling Out Destination •Pros: • Unlimited scaling by adding nodes • Cons: • Complex configuration in collector nodes • Client feature required for round-robin • Unavailable for traffic over Internet
  • 35.
    Destination-side Aggregation andDestination Scaling Destination Side Aggregation Scaling Up Destination Endpoints YESNO Scaling Out Destination Endpoints Early Stage Systems Collect Logs over Internet or Using Queues Collect Logs in Data Center All Collector Nodes Must Know All Destination Nodes ↓ Uncontrollable
  • 36.
  • 37.
    Practices: Docker +Fluentd • Docker Fluentd Logging Driver • Docker containers can send these logs to Fluentd directly,
 with less overhead • Fluentd's Pluggable Architecture • Various destination systems (storage/database/service) are available
 by changing configuration • Small Memory Footprint • Source aggregation requires +1 container per hosts:
 less additional resource usage is fine!
  • 38.
    Practice 1: Source-sideAggregation + Scaling Up • Kubernetes: Fluentd + Elasticsearch • a.k.a EFK stack (inspired by ELK stack) • Elasticsearch - Fluentd - Kibana https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/ apps, middleware json log files
  • 39.
    Practice 2: Source-sideAggregation + Scaling Up • Containerized Applications • w/ Google Stackdriver for Monitoring • w/ Treasure Data for Analytics Google Stackdriver Logging apps, middleware
  • 40.
    Practice 3: Source/Destination-sideAggregation + Scaling Out • Containerized Application • w/ Log processing on Hadoop • writing files on HDFS via WebHDFS • Hadoop HDFS prefers large files on HDFS: • Destination-side aggregation works well apps, middleware
  • 41.
    Practice 4: Source/Destination-sideAggregation + Scaling Out • Containerized Application • w/ Log processing on Google BigQuery • putting logs via HTTPS • BigQuery has quota about write requests: • Destination-side aggregation works well apps, middleware
  • 42.
    Best practices? • Sourceaggregation: do it • it makes app containers free from logging problems (buffering, HA, ...)
 • Destination aggregation: it depends • no need for cloud logging services/storages • may need for self-hosted distributed filesystems/databases • may need for cloud services which charges per requests
 • Destination scaling: it depends on destinations
  • 43.
    Make Logging Scalable, ServiceStable & Business Growing. Happy Logging! @tagomoris