Observability User Guide
Observability User Guide
May 2022
Observability User Guide
Oracle Financial Services Software Limited
Oracle Park
Off Western Express Highway
Goregaon (East)
Mumbai, Maharashtra 400 063
India
Worldwide Inquiries:
Phone: +91 22 6718 3000
Fax: +91 22 6718 3001
https://www.oracle.com/industries/financial-services/index.html
Copyright © 2021, 2022, Oracle and/or its affiliates. All rights reserved.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks
of their respective owners.
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software,
any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users
are “commercial computer software” pursuant to the applicable Federal Acquisition Regulation and agency-
specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the
programs, including any operating system, integrated software, any programs installed on the hardware,
and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications.
It is not developed or intended for use in any inherently dangerous applications, including applications that
may create a risk of personal injury. If you use this software or hardware in dangerous applications, then
you shall be responsible to take all appropriate failsafe, backup, redundancy, and other measures to ensure
its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of
this software or hardware in dangerous applications.
This software and related documentation are provided under a license agreement containing restrictions
on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in
your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify,
license, transmit, distribute, exhibit, perform, publish or display any part, in any form, or by any means.
Reverse engineering, disassembly, or decompilation of this software, unless required by law for
interoperability, is prohibited. The information contained herein is subject to change without notice and is
not warranted to be error-free. If you find any errors, please report them to us in writing.
This software or hardware and documentation may provide access to or information on content, products
and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly
disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle
Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your
access to or use of third-party content, products, or services.
Contents
1 Preface ..................................................................................................................... 1
1.1 Purpose ......................................................................................................................................... 1
1.2 Audience ....................................................................................................................................... 1
1.3 Acronyms, Abbreviations, and Definitions .................................................................................... 1
1.4 Document Accessibility ................................................................................................................. 1
1.5 List of Topics ................................................................................................................................. 2
1.6 Prerequisites ................................................................................................................................. 2
1.7 General Prevention ....................................................................................................................... 2
1.8 Best Practices ............................................................................................................................... 3
1.9 Related Documentation ................................................................................................................. 3
2 Observability Improvements using Zipkin Traces ............................................... 4
2.1 Setting Zipkin Server ..................................................................................................................... 4
2.2 Login to Zipkin ............................................................................................................................... 4
2.3 Zipkin Issues ................................................................................................................................. 8
2.3.1 Application Service is not Registered.................................................................................... 8
3 Observability Improvements Logs using ELK Stack ......................................... 11
3.1 Setting up ELK ............................................................................................................................ 11
3.1.1 Steps to run ELK ................................................................................................................. 11
3.1.2 Accessing Kibana ................................................................................................................ 13
3.1.3 Steps to setup dynamic log levels in Oracle Banking Microservices Architecture services
without restart ...................................................................................................................................... 13
3.1.4 Searching for Logs in Kibana .............................................................................................. 14
3.1.5 How to Export Logs for Tickets ........................................................................................... 14
4 Health Checks ....................................................................................................... 15
4.1 Discovery Health Check .............................................................................................................. 15
4.2 Actuator Health Indicator Endpoint: ............................................................................................ 15
4.2.1 Generic service ................................................................................................................... 15
4.2.2 Kafka Consumers and Producers ....................................................................................... 17
5 Troubleshooting Kafka Issues............................................................................. 18
5.1 Kafka Health ................................................................................................................................ 18
5.1.1 Verifying Kafka Health ......................................................................................................... 18
5.1.2 Verify Zookeeper Health ..................................................................................................... 18
5.2 Prometheus and Grafana ............................................................................................................ 18
5.2.1 Prometheus Setup............................................................................................................... 18
5.2.2 JMX-Exporter Setup ............................................................................................................ 18
5.2.3 Grafana Setup ..................................................................................................................... 19
5.2.4 Prometheus Metrics ............................................................................................................ 19
6 Troubleshooting Flyway issues .......................................................................... 20
6.1 Failed Migrations ......................................................................................................................... 20
6.1.1 Success Column Verification .............................................................................................. 20
6.1.2 Migration Checksum Mismatch for a Version ..................................................................... 20
6.1.3 Placeholder errors ............................................................................................................... 20
Observability User Guide
1 Preface
1.1 Purpose
This guide helps to use the tools that enable users to observe the Oracle Banking Microservices
Architecture suite of products better. The sections provide tools that can enable a user to:
1. Observe the spans associated with various API calls and the response of each API.
The guide also describes recommended tools to enhance monitoring and observability aspects of the
Oracle Banking Microservices Architecture products.
1.2 Audience
This guide is intended for the implementation teams.
Acronyms Definition
UI User Interface
Topics Description
Observability Improvements This topic explains the possible ways and benefits of using tools like
using Zipkin Traces Zipkin to enhance troubleshooting possibilities.
Observability Improvements This topic explains the possible log aggregation and search features that
Logs using ELK Stack can be availed using ELK stack.
Health Checks This topic explains the possible approaches to monitor health of Oracle
Banking Microservices Architecture services.
Troubleshooting Kafka This topic explains the steps to troubleshoot basic issues in Kafka.
Issues
Troubleshooting Flyway This topic explains the steps to troubleshoot Flyway issues during
Issues deployment.
Acronyms, Abbreviations, This topic provides Acronyms, abbreviations, and their definitions.
and Definitions
1.6 Prerequisites
The prerequisites are as follows:
• It is ideal to have ELK stack installed on a separate VM outside the product VMs to ensure flow of
logs in case of app crash.
• Log levels can be adjusted to INFO and above to enable relevant logs to flow in.
• Verify all Kafka settings as per User Troubleshooting Guide before the health check.
• Troubleshooting Guide
https://docs.oracle.com/cd/FXXXXX_01/PDF/Installation_Guide/ANNEXURE-2.pdf
2. Use the search option to find the traces of required API calls and services.
NOTE: The search options given in the user interface are self-explanatory, and there is another UI
option (Try Lens UI). It is given a different user interface with the same functionality. The list
of the traces can be seen as shown in Figure 2. Error API calls are made to showcase how
to track errors. The blue listings show the successful API hits, and the red listings indicate
errors. Each block indicates a single trace in the listings.
Figure 2: List of Traces
NOTE: Figure 3 shows an individual trace when it is opened. It also describes the time taken for
each block. As the two custom spans are created inside two service calls, you can find a
total of four blocks. The time taken for an individual block can be seen in Figure 3.
NOTE: The details of the specific span block are shown in Figure 4 and the logging events can
also be seen in the Zipkin UI as small circular blocks. An example of an error log is shown
in Figure 5.
5. Click on the error portion to get a clear detail about the error, and where the error has arisen. An
example is shown in Figure 6.
NOTE: If the Lens UI is used in Zipkin, the above Figures are not applicable but are relatable to the
Lens UI as well. Traces of the application can be found using TraceId. The TraceId can be
found in the debug logs of the deployment when spring-cloud-sleuth is included in the
dependencies (included in spring-cloud-starter-zipkin dependency).
6. Click Dependencies tab to get the dependency graph info between micro-services. An example
dependency graph is shown in Figure 7.
1. Check the applications, which are sending the trace report to Zipkin server from Service Name
drop-down list.
2. If the required application is not listed in Zipkins, check the application.yml file for Zipkin base
URL configuration.
NOTE: The shipped application.yml should have the Zipkin entry. Every service should have
spring-cloud-sleuth-zipkin dependency added in build gradle file for the service to
generate and send trace Id and span Id.
name: 'spring-cloud-sleuth-zipkin'
version: '2.1.2.RELEASE'
NOTE: Default port for Elastic search is 9200, and the default port for Kibana: 5601.
2. Configure Kibana to point the running instance of elastic search in kibana.yml file.
a) Input- This configuration is required to provide the log file location for the Logstash to read
from.
b) Filter- Filters in Logstash is basically used to control or format the read operation (Line by
line or Bulk read)
c) Output- In this section, provide the running elastic search instance to send the data for
persisting.
• PLATO_DEBUG_USERS: This table contains the information about whether the dynamic
logging will be enabled to a user for a service. The table will have records, where
DEBUG_ENABLED values for a user and a service have values Y or N, and depending on
that plato-logger will enable dynamic logging.
• LOG_PATH: This will specify a dynamic logging path for the logging files to be stored.
Changing this in runtime will change the location of the log files at runtime. If this value is not
passed then by default, the LOG_PATH value will be taken from the -D parameter of
plato.service.logging.path.
• LOG_LEVEL: The level of the logging can be specified on runtime as INFO or ERROR etc.
The default value of this can be set in the logback.xml.
• LOG_MSG_WITH_TIME: Making this Y will append the current date into the log file name.
Setting the value of this as N will not append the current date into the filename.
4 Health Checks
This section describes the possible approaches to monitor health of Oracle Banking Microservices
Architecture services.
userId: XYZ
appId: PLATOREFAPP
entityId: DEFAULTENTITY
branchCode: 000
Sample Response:
"status": "UP"
Sample Response:
{
"status": "UP",
"components": {
"binders": {
"status": "UP",
"components": {
"kafka": {
"status": "UP"
}
}
},
"clientConfigServer": {
"status": "UP",
"details": {
"propertySources": [
"refapp-jdbc"
]
}
},
"db": {
"status": "UP",
"components": {
"PLATO_LOGGER_DS": {
"status": "UP",
"details": {
"database": "Oracle",
"validationQuery": "isValid()"
}
},
"dataSource": {
"status": "UP",
"details": {
"database": "Oracle",
"validationQuery": "isValid()"
}
}
}
},
"discoveryComposite": {
"status": "UP",
"components": {
"discoveryClient": {
"status": "UP",
"details": {
"services": [
"plato-feed-services",
"plato-api-gateway",
"plato-rule-service",
"refapp"
]
}
},
"eureka": {
"description": "Remote status from Eureka server",
"status": "UP",
"details": {
"applications": {
"PLATO-API-GATEWAY": 1,
"PLATO-RULE-SERVICE": 1,
"REFAPP": 1,
"PLATO-FEED-SERVICES": 4,
}
}
}
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 248031522816,
"free": 81710915584,
"threshold": 10485760,
"exists": true
}
},
"hystrix": {
"status": "UP"
},
"ping": {
"status": "UP"
},
"refreshScope": {
"status": "UP"
}
}
}
http://HOST:PORT/context_path/actuator/health
To stop discovery service from routing requests to kafka consumers or producers when connection
to kafka is not successful, following flag needs to be set:
eureka.client.healthcheck.enabled=true
To debug, check if the permissions of Kafka log folder are correct. The log folder path can be found
by looking at the value of the property “log.dirs” in the server.properties file of Kafka installation.
A JMX-Exporter application is used to integrate with the Kafka broker as a Java agent to expose the
values of JMX MBeans as an API. The JMX-Exporter is used by the Prometheus to fetch the values
of the JMX metrics. Perform the following steps:
1. Download the latest jmx_prometheus_javaagent jar file from the maven repository in the Kafka
directory along with the bin, config directories.
2. Set the KAFKA_OPTS variable to the desired value to execute the jar as a java agent.
2. Go to the bin folder in the extracted contents, and start the Grafana server.
NOTE: Grafana should start on the default port 3000 (HOST: 3000). The default user and
password for Grafana are admin/admin.
Perform the following steps to integrate Grafana with the Prometheus instance installed:
5. Click Add to test the connection and to save the new data source.
• process_cpu_seconds_total
• http_request_duration_seconds
• node_memory_usage_bytes
• http_requests_total
• process_cpu_seconds_total
1. Check the flyway_schema_history table to identify the migration record with success column as
‘0’.
3. Restart deployment.
1. Make sure that the flyway script is not manually updated before deployment.