KEMBAR78
Application Metrics (with Prometheus examples) | PDF
Application Metrics
with Prometheus examples
Rafael Dohms @rdohms Backend Architect @
Application Metrics
with Prometheus examples
Rafael Dohms @rdohms Backend Architect @
How do you do
metrics?
“The Prometheus 

Scientist Method”
I hope not.
jobs.usabilla.com
Rafael Dohms
Staff Engineer
rdohmsdoh.ms
FeedbackFeedback
jobs.usabilla.com
Rafael Dohms
Staff Engineer
rdohmsdoh.ms
We are hiring!

jobs.usabilla.com
Let’s talk about metrics. 



But let’s do it with a
concrete example.
Kafka / DDD / Autonomous Microservices / Monitoring
Kafka / DDD / Autonomous Microservices / Monitoring
Kafka / DDD / Autonomous Microservices / Monitoring
Metrics are insights into
the current state of your
application.
Metrics tell you if your
service is healthy.
Metrics tell you what
is wrong.
Metrics tell you what
is right.
Metrics tell you what
will soon be wrong.
Metrics tell you where
to start looking.
Site Reliability Engineering
SLIs SLOs
◎
SLAs
SLIs
Service Level Indicators
“A quantitative measure of some
aspect of your application”
The response time of a request was 150ms
Source: Site Reliability Engineering - O’Reilly
SLOs
◎
Service Level Objectives
“A target value or a range of values
for something measured by an SLI”
Request response times should be below 200ms
Source: Site Reliability Engineering - O’Reilly
Help you drive architectural
decisions, like optimisation
SLOs
◎
Response time SLO: 150 ms

95th Percentile of Processing time (PHP time): 5ms



As a result we decided to invest more time in exploring the problem
domain and not optimising our stack.
SLAs
Service Level Agreements
“An explicit or implicit contract with
your customer,that includes
consequences of missing their SLOs”
The 99th percentile of requests response times should meet our SLO,or we
will refund users
Source: Site Reliability Engineering - O’Reilly
Measuring
–Etsy Engineering
“If it moves, we track it.”
https://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Metrics
Statistics
What is happening right
now?
How often does this happen?
Telemetry
Telemetry
“the process of recording and transmitting the readings of an instrument”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
I really miss Ayrton Senna
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
Statistics
Incoming feedback items
with origin information
Telemetry
response time of public
endpoints
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak frequency
CPU
Memory
Disk Space
Bandwith
node
PHP
NginX
Database
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak frequency
CPU
Memory
Disk Space
Bandwith
node
PHP
NginX
Database
Measure Monitoring
Measure measurements
Metrics,Everywhere.
SLIs
Picking good SLIs
SLIs may change
according to who is
looking at the data.
Understanding the
nature of your system
User-Facing 

serving system?
availability,throughput,latency
Storage System?
availability,durability,latency
Big Data Systems?
throughput,end-to-end latency
User-Facing and Big Data Systems
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
User-Facing and Big Data Systems
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
User-Facing and Big Data Systems
More relevant to
development team
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
๏Other Metrics
- node,nginx,php-fpm,java metrics
- server metrics: cpu,memory,disk space
- Size of cluster
- Kafka health
User-Facing and Big Data Systems
More relevant to
development team
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
๏Other Metrics
- node,nginx,php-fpm,java metrics
- server metrics: cpu,memory,disk space
- Size of cluster
- Kafka health
User-Facing and Big Data Systems
More relevant to
development team
More relevant to
Infrastructure team
Picking Targets
Target value
SLI value >= target
Target Range
lower bound <= SLI value <= upper bound
Don’t pick a target based
on current performance
What is the business need?
What are users trying to achieve?
How much impact does it have on the user experience?
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
“Immediate"
“We sell as
real time”
“500ms,too
much HTML“
“I don’t know”
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
“Immediate"
“We sell as
real time”
“500ms,too
much HTML“
“I don’t know”
What is human perception of
immediate? 100ms
Collection API should respond within 150ms
Some, but not too many.
can you settle an argument or priority based on it?
Don’t over achieve.
The Chubby example.
Adapt. Evolve.
re-define SLO’s as your product evolves.
Meeting Expectations.
Attach consequences
to your Objectives.
The night is dark and
full of loopholes.
take a friend from legal with you.
Safety Margins.
like setting the alarm 5 minutes before the meeting.
Metrics in Practice.
prometheus.io
Push Model
scale this!
Pull Model
scale this!
Prometheus
Telemetry Statistics
Prometheus
StatsD,InfluxDB,etc…
+
Long Term Storage
GaugeHistogramCounter Summary
Cumulative
metric the
represents a
single number
that only
increases
Samples and
count of
observations
over time
A counter,that
can go up or
down
Same as a
histogram but
with stream of
quantiles over a
sliding window.
jimdo/prometheus_client_php
reads from /metrics
reads from local storage
writes to local storage
your code
/metrics
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
APC / APCu
Redis
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
namespace
metric name
help
label names
buckets
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
measurement
label values
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
namespace
metric name
help
labels
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;

require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
# HELP my_app_count_total How many...
# TYPE my_app_count_total counter
my_app_count_total{status="200",url="/url"} 6
# HELP my_app_response_time_ms This measures ....
# TYPE my_app_response_time_ms histogram
my_app_response_time_ms_bucket{status="200",url="/url",le="0"} 0
my_app_response_time_ms_bucket{status="200",url="/url",le="10"} 0
my_app_response_time_ms_bucket{status="200",url="/url",le="50"} 1
my_app_response_time_ms_bucket{status="200",url="/url",le="100"} 1
my_app_response_time_ms_bucket{status="200",url="/url",le="+Inf"} 1
my_app_response_time_ms_count{status="200",url="/url"} 1
my_app_response_time_ms_sum{status="200",url="/url"} 16
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
–Also Rafael (today)
“I’ll just try this live demo
again.”
http://localhost:9090/graph http://localhost:8180/metrics
–Rafael (yesterday)
“Demos always fail.”
http://localhost:8180/index
https://github.com/rdohms/talk-app-metrics
You can’t act on what
you can’t see.
Metrics without
actionability are just
numbers on a screen.
Act as soon as an 

SLO is threatened .
Thank you.
Drop me some 

feedback at Usabilla 

and make this talk 

better.
@rdohms

http://slides.doh.ms
https://joind.in/talk/bd0c9
we feedback
Thank you.
Drop me some 

feedback at Usabilla 

and make this talk 

better.
@rdohms

http://slides.doh.ms
https://joind.in/talk/bd0c9
we feedback

Application Metrics (with Prometheus examples)