Microservices Observability in Java— A Simple Guide
Do you know that these days, software often uses a microservices architecture?
So, for us, a developer understanding microservices observability with ELK Stack and
Splunk is very important for finding and fixing issues fast. Hence, I will try to help you with a
guide on how we can use these popular tools to make the microservices easier to monitor
and manage.
What is Observability?
Let me first tell you briefly what we exactly mean by observability. So, Observability is
nothing but the extent to which a person can understand a system’s internal operations just
by looking at the data it produces. It depends on three types of data:
1. Logs – Text records of events (e.g., “User login failed”).
2. Metrics – Numbers showing performance (e.g., CPU usage).
3. Traces – Step-by-step journey of a request as it moves through different services.
In simple words:
• Logs tell you what happened.
• Metrics show how the system is performing.
• Traces show how a request moved through the system.
See the diagram below showing logs, metrics, and traces flowing from microservices to
observability tools:
Why Observability Matters in Microservices?
Imagine your app has 10 small services. One service is slow, and users are complaining.
With good observability, you can quickly answer:
• Which service is slow?
• Why is it slow?
• Is it affecting all users or just some?
This fast insight lets you fix issues before users notice.
Popular Observability Tools
1. ELK Stack
This is an open-source solution used for collecting and viewing logs.
• Elasticsearch: Stores and searches logs.
• Logstash: Collects and processes logs.
• Kibana: Lets you see the logs in dashboards.
You can use ELK to see error logs from different microservices in one place and figure out
what went wrong.
Here’s a sample Filebeat config to send container logs to Elasticsearch:
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
output.elasticsearch:
hosts: ["http://localhost:6969"]
Pros:
• Free to use and open to all.
• Flexible and powerful.
Cons:
• Hard to manage if you have lots of data.
• Needs time and effort to set up.
2. Splunk
Splunk is a tool used by many big companies. The main use of Splunk is in handling logs,
metrics, and traces — all in one place.
Example: Send logs through an HTTP Event Collector (HEC):
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class SplunkLogger {
public static void main(String[] args) {
String urlString = "https://splunk.example.com:9000/services/collector";
String token = "MY-TOKEN";
String jsonPayload = """
"event": {
"service": "billing-service",
"message": "Payment failed for order 12345",
"level": "error"
""";
try {
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Splunk " + token);
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
try (OutputStream os = conn.getOutputStream()) {
byte[] input = jsonPayload.getBytes("utf-8");
os.write(input, 0, input.length);
int responseCode = conn.getResponseCode();
System.out.println("Response Code: " + responseCode);
conn.disconnect();
} catch (Exception e) {
e.printStackTrace();
Pros:
• Easy to search data.
• Great dashboards and alerts.
• Works well with modern tools like Kubernetes.
Cons:
• It’s expensive.
• It may be too complex for small teams.
3. Prometheus with Grafana
As you all know, Prometheus is used for collecting metrics and Grafana is used for
showing beautiful dashboards.
Great for monitoring performance like memory, CPU, or request time.
Example: Using Micrometer for Prometheus metrics in Spring Boot :
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Configure in application.properties:
management.endpoints.web.exposure.include=health,info,prometheus
management.endpoint.prometheus.enabled=true
Let’s build a simple controller that increments a metric:
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class HelloController {
private final MeterRegistry registry;
public HelloController(MeterRegistry registry) {
this.registry = registry;
}
@GetMapping("/")
public String hello() {
registry.counter("http_requests_total", "endpoint", "/").increment();
return "Hello!";
4. Jaeger (Distributed Tracing)
• Focuses on traces.
• Shows the full path a request takes through services.
• Helps find where delays happen.
Example: Using OpenTelemetry:
Add Maven dependencies:
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>1.31.0</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
<version>1.31.0</version>
</dependency>
<dependency>
<groupId>io.opentelemetry.exporter</groupId>
<artifactId>opentelemetry-exporter-jaeger</artifactId>
<version>1.31.0</version>
</dependency>
Simple tracing code:
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.exporter.jaeger.JaegerGrpcSpanExporter;
public class JaegerTracingExample {
public static void main(String[] args) {
JaegerGrpcSpanExporter jaegerExporter = JaegerGrpcSpanExporter.builder()
.setEndpoint("http://localhost:14250")
.build();
SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(jaegerExporter).build())
.build();
Tracer tracer = tracerProvider.get("checkout-tracer");
Span span = tracer.spanBuilder("checkout-process").startSpan();
span.setAttribute("order.id", "12345");
try {
System.out.println("Processing checkout...");
} finally {
span.end();
5. Paid Tools (Datadog, New Relic, Dynatrace)
• These are paid tools that give full observability: logs, metrics, and traces.
• Easy to use and great for large teams.
Quick Comparison
Tool Logs Metrics Traces Easy to U
ELK Stack Yes Yes (with add-ons) Yes (with APM) Medium
Splunk Yes Yes Yes Easy
Prometheus + Grafana No Yes Yes (with Jaeger) Medium
Datadog / New Relic Yes Yes Yes Very Eas
Real-Life Example: Troubleshooting a Slow Checkout
1. Logs show errors in the billing service.
2. Metrics show high memory usage.
3. Traces show processing delays in payment.
This direct insight helps you fix the exact service causing trouble.
Best Practices
• Add logs, metrics, and traces to each service.
• Include trace IDs to link data across services.
• Use central tools to collect everything in one place.
• Set alerts for issues like high errors or slow latency.
• Start small and expand your observability over time.
Observability Architecture Diagram
This sample architecture shows how services send logs, metrics, and traces to
observability tools:
Conclusion
So, by following and implementing the discussed methodologies, we can see inside the
microservices system, find issues quickly, and fix them before users are impacted. For
doing this, we can use the tools which we discussed as per our requirement, keeping in
mind that the key is to start small, keep it simple, and build upon your success.
References for Microservices Observability & Tools
ELK Stack
• Official ELK Stack docs:
https://www.elastic.co/elk-stack
Splunk
• Splunk official site:
https://www.splunk.com/
Prometheus and Grafana
• Prometheus official site:
https://prometheus.io/
• Grafana official site:
https://grafana.com/
Jaeger (Tracing)
• Jaeger official site:
https://www.jaegertracing.io/
General Observability Concepts
• What is observability? (New Relic):
https://newrelic.com/what-is-observability
• Introduction to Microservices Observability (Honeycomb):
https://www.honeycomb.io/resources/what-is-observability/
• Distributed Tracing Explained (Lightstep):
https://lightstep.com/learn/distributed-tracing/