TN4121 Notebookv2
TN4121 Notebookv2
cover
IBM Training
March 2023 edition
Notices
© Copyright International Business Machines Corporation 2023.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM
representative for information on the products and services currently available in your area. Any reference to an IBM product, program,
or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent
product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this
document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied
warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein;
these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an
endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those
websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any
other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible,
the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to
actual people or business enterprises is entirely coincidental.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
V12.0
Contents
TOC
Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training
document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX® IBM Cloud® IBM Cloud Pak®
Insight® Jazz® Netcool®
Resource® Tivoli® WebSphere®
Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle
and/or its affiliates.
Ansible®, OpenShift® and Red Hat® are trademarks or registered trademarks of Red Hat, Inc. or its
subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware is a registered trademark or trademark of VMware, Inc. or its subsidiaries in the United
States and/or other jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
pref
Course description
Configuring IBM Cloud Pak for AIOps Event Manager
Duration: 2 days
Purpose
The Event Manager component of IBM Cloud Pak for AIOps is a carrier-class service assurance
system. It collects and consolidates events and alarms from a wide variety of IT environments in
real time. These include servers, mainframes, Windows systems, applications, circuit switches,
voice switches, IP routers, SNMP devices, network management applications, existing
management systems and frameworks, among many others.
IBM Cloud Pak for AIOps Event Manager also adds intelligence to your events, allowing you to cast
a wide net to ingest relevant data from any source, process it in an intelligent, automated way,
analyze the data, see which applications or parts of the infrastructure are impacted, share it and
even suggest guided steps to mitigate or resolve issues automatically.
One key benefit of Event Manager's machine learning features is a reduction in the number of
events. By detecting, correlating, grouping, and suppressing the "noise" that IT systems generate,
your operators can focus their attention on key events that represent actual problems.
This 2-day course teaches you how to configure IBM Cloud Pak for AIOps Event Manager for
productive use. Through hands-on lab activities, you learn how to configure a new installation of
Event Manager to:
• Connect to event sources and enrich incoming events
• Apply machine learning to find relationships among events
• Add topology data to identify groups of connected resources and calculate root cause
• Create automated fixes for known problems and match them to incoming problem events
You also get hands-on practice with other common configuration tasks, such as user management
and database customization. This course focuses on IBM Cloud Pak for AIOps Event Manager
running on the Red Hat OpenShift Container Platform.
Audience
This course is intended for administrators of IBM Cloud Pak for AIOps Event Manager.
Prerequisites
• Experience with Linux
• Basic SQL knowledge
• Working knowledge of Kubernetes
Objectives
• Describe the event management capabilities of the IBM Cloud Pak for AIOps
• Connect Event Manager to incoming data sources
• Work with Temporal and seasonal event analytics
• Configure the Event Manager topology service
• Create scope-based groups
• Create runbooks and map them to incoming events
• Work with triggers
• Manage users
Contents
• Overview
• Incoming Integrations
• Temporal and seasonal event analytics
• Topology
• Scope-based groups
• Runbooks
• Triggers
• User management
pref
Agenda
Note
The following unit and exercise durations are estimates, and might not reflect every class
experience.
Day 1
(00:15) Course introduction
(00:45) Unit 1. Overview
(00:45) Exercise 1. Overview
(00:45) Unit 2. Incoming integrations
(01:00) Exercise 2. Incoming integrations
(00:45) Unit 3. Temporal and seasonal event analytics
(01:00) Exercise 3. Temporal and seasonal event analytics
(00:45) Unit 4. Topology
(01:00) Exercise 4. Topology
Day 2
(00:45) Unit 5. Scope-based groups
(01:00) Exercise 5. Scope-based groups
(00:45) Unit 6. Runbooks
(01:00) Exercise 6. Runbooks
(01:00) Unit 7. Triggers
(01:00) Exercise 7. Triggers
(00:45) Unit 8. User management
(00:45) Exercise 8. User management
(00:10) Unit 9. Summary
Uempty
Unit 1. Overview
Estimated time
00:45
Overview
This unit introduces you to key concepts and terminology about IBM Cloud Pak for AIOps Event
Manager. You also learn about your lab environment.
Uempty
Uempty
Uempty
1.1. Event Manager overview
Event Manager
overview
Uempty
What is an event?
Many, if not most IT resources, are instrumented to emit or expose events related to their
operation. An event is a set of data that represents a status, activity, or condition on an IT
resource. The structure and content of an event varies depending on the device, system, or
application that generated the event.
There are four examples of an event on this slide.
• At the top left, a network device sent an SNMP trap indicating that a line card failed. This trap
is an event sent by the operating system of the network device.
• At the middle left, you see a log message from the same network device written to a SYSLOG
server. This log message indicates that a network interface is down. This log message is an
event.
• At the bottom left, an ICMP ping test failed. The output of the ping utility is an event, indicating
that there is no connectivity to that IP address.
• At the right, you see the response to an HTTP related to an Amazon Web Service EC2 instance.
The response, which indicates that an instance refresh has started on an auto-scaling group, is
also an event.
In case of a problem, how do IT operators and troubleshooters find events like these? The
answer is through toil. In these four examples alone, an operator must:
• Use an SNMP management tool to view and read the SNMP trap from the network device
• Connect to the SYSLOG server and search through the log file
• Use a ping utility to test connectivity
• Use an HTTP client to request and read the HTTP response from the Amazon Web Service
resource
Uempty
Finding these events is a challenge. After they have been found, they are in dissimilar formats and
require expertise to interpret. Add to this complexity that IT operators are responsible for
thousands of IT resources, all of which are constantly emitting or exposing events. These events
can become too numerous for any person or team to possibly manage.
Uempty
Event management software collects events and presents them in a normalized way. Event
management software is often tightly integrated with other IT service management software such
as incident management, performance management, and change management solutions.
The example on this slide shows a sketch of what the four events in the previous slide would look
like in a normalized, structured format. Any event management software should have this
capability.
In practice, however, very few event management solutions are complete. They might be
vendor-specific, or they might only collect one type of data, such as only log messages or only
SNMP traps. Other event management systems do not have a collection system and are simple
event aggregators.
Many IT operations teams are successfully monitoring and gathering relevant data to better
understand how well their operations are performing, but establishing and maintaining a single
source of truth has often proven to be elusive.
Uempty
IT operations teams work with many events in their day-to-day routine. Some of these events are
relevant, some of these are not, and some events represent serious problems. It is difficult for
operators to know which events to act on first, and which events are just “noise” and do not need
attention. It is also difficult to discern if any of the thousands of events received are related to
each other, and were perhaps caused by the same problem.
In the case of an outage, these teams are often overwhelmed by a flood of events. Researching
the root cause of these problems is time consuming. Finding and implementing the fix adds more
time to the overall resolution.
Some event management solutions have flood control and event reduction mechanisms, but
these are difficult to configure, and the results are not well explained.
Often teams must switch between tools to understand their events. A complete event
management solution should gather as much information as possible about incoming events and
their business impact and technical blast radius, so IT operators don’t waste time gathering data
manually or from separate tools.
Another challenge is expertise within IT operations teams themselves. Much of the experience
and knowledge needed to react to events and troubleshoot problems is limited to a few team
members.
Uempty
Using Cloud Pak for AIOps you can consolidate and automatically track multiple sets of data, from
different tools and resources, to empower your teams to react faster with more knowledge and
deeper understanding.
The Event Manager component within the Cloud Pak is a carrier-class event management system.
Many, if not most IT resources, are instrumented to emit or expose events related to their
operation. In addition to collecting, enriching, deduplicating, and displaying events, Event
Manager can use machine learning to analyze these oceans of events and present only the most
relevant to your staff. You may have heard Event Manager referred to as Netcool Operations
Insight.
IBM Cloud Pak for AIOps Event Manager also adds intelligence to your events, allowing you to cast
a wide net to ingest relevant data from any source, process it in an intelligent, automated way,
analyze the data, see which applications or parts of the infrastructure are impacted, share it and
even suggest guided steps to mitigate or resolve issues automatically.
One key benefit of Event Manager's machine learning features is a reduction in the number of
events. By detecting, correlating, grouping, and suppressing the "noise" that IT systems generate,
your operators can focus their attention on key events that represent actual problems.
This course focuses on IBM Cloud Pak for AIOps Event Manager running on the Red Hat OpenShift
Container Platform.
Uempty
European financial services group: European transportation company: North American communications provider:
• Overall incident reduction of 70% • Reduced the number of critical service • Reduced the number of alarms by 20%
events by 90% • Responds 40% faster and boosts
customer satisfaction
European transportation company: European energy and utility provider: North American communications provider:
• Event Manager grouped over 27,000 • 30% reduction in number of open • 38% Event Reduction
events into about 500 groups tickets • 12% Ticket Reduction
• 15% faster repair times • 5% Seasonal Tickets
• 10% reduction in effort needed to
make repairs
Figure 1-8. IBM Cloud Pak for AIOps Event Manager: actual outcomes
This slide shows actual results from clients using IBM Cloud Pak for AIOps Event Manager.
Uempty
AIOps event Grouping: industry-leading advanced event
correlation
Event Manager uses intelligent event correlation to find relationships among events. Events that
have been correlated are grouped together, so that your operators see fewer events overall. The
user interface makes it easy to see why events were grouped together so your operators
understand the relationship between events.
Groups that share a common event or events are combined, further reducing the total number of
actionable events.
Uempty
Events that always seem to arrive together are grouped together to reduce
the total number of actionable events
AWS EC2 Auto Scaling Instance Refresh Started event AWS EC2 instance terminated event
{ {
"version": "0", "id":"2xToc4f-DLl16R",
"id": "1299-1206-1514", "detail-type":"EC2 Instance State-change
"detail-type": "EC2 Auto Scaling Instance Refresh Notification",
Started", "source":"aws.ec2",
"source": "aws.autoscaling", "account":"1012",
"account": "422", "time":"2023-01-11T21:29:54Z",
"time": "2023-01-11T21:28:50Z", "region":"us-east-1",
"region": "us-east-1", "resources":[
"resources": [ "arn:aws:ec2:us-east-1:1012:instance/i-
"trn-scaling-group" juyn1212"
], ],
"detail": { "detail":{
"InstanceRefreshId": "c2299-1677-2253", "instance-id":"i-juyn1212",
"AutoScalingGroupName": "trn-scaling-group" "state":"terminated"
} }
} }
This slide shows an example of two raw HTTP response events from resources that are running in
Amazon Web Services. Imagine that these two events always arrive together.
The event on the left indicates that an Amazon Web Service EC2 instance refresh has started on
an auto-scaling group. The event on the right indicates that an Amazon Web Service EC2 instance
has terminated.
Looking at the two events, there are no common properties or attributes between them: the
source, account, resources, and ID properties all have different values.
Event though these events always arrive together, and logically they seem to be related to an
instance scaling irregularity, it would be difficult for operators to see that relationship. Because IT
operators work with high volumes of events, finding these two among the thousands of other
events is challenging. Further, the instance refresh event is not as severe as the instance
termination event, so it might get lost in the noise of other events.
IBM Cloud Pak for AIOps Event Manager can detect events like these and group them together
based on their temporal relationship. Events that always seem to arrive together are
automatically grouped together. This is called temporal grouping.
Without this grouping, your IT operators might waste time troubleshooting each of these events
individually, and eventually discover by themselves that they are related.
With Event Manager’s built-in grouping analytics, your IT operators can save time by considering
this group of events as a single issue immediately. Your operators also see less “noise” in the
event list, because these two events are reduced to a single parent event.
Uempty
router1
Syslog server
Ethernet0/1
Server
10.191.10.31
Failed ICMP ping test:
PING 10.191.10.31 (10.191.10.31) 56(84) bytes of data.
1 packets transmitted, 0 received, 100% packet loss, time 0ms
IBM Cloud Pak for AIOps Event Manager includes a topology service that can discover the
resources in your IT systems and the dependencies among them. Event Manager uses this
topology data to correlate events that have a topological relationship together.
In this example, there are three raw events happening in an IT network:
• At the top left, a network device sent an SNMP trap indicating that a line card failed.
• At the top right, you see a log message from the same network device written to a SYSLOG
server. This log message indicates that a network interface is down.
• At the bottom left, an ICMP ping test failed.
If you consider how these resources are connected, you get better visibility into the chain of
dependencies that caused these three events:
• The server 10.191.10.31 is unreachable because the interface connected to (Ethernet0/1) is
down. This explains why the ping test failed.
• The log message event occurred when interface Ethernet0/1 went down.
• Line card in slot 1 of router1 failed, indicated by the SNMP trap. Line card 1 of router1 contains
the interface Ethernet0/1.
Event Manager recognizes how the resources in these events are connected to each other and
groups them together to reduce the overall number of events. Event Manager also suggests the
root cause of the problem is the line card failure, because its failure led to the interface failure,
which in turn led to the server becoming unreachable.
As a result of topology-based event analytics, your operators see:
• These three events grouped into a single event
Uempty
• A topology map showing how the card, interface, and server are connected
• The event that is the probable root cause of the overall problem
Uempty
These events all have the same value in their Service attribute. Any
attribute of an event can be used.
You can group events based on known relationships. Based on the information you know about
your IT environment, you can automatically group events relating to an incident if they have the
same scope and occur during the same period of time.
The scope can be any attribute of an event. For example, all events that have the same value in
their Service attribute can be automatically grouped together if they occur within five minutes of
each other. Another example could be location: events from the same geographical location that
occur within a 90-second window could also be reduced to a single event.
These scope-based groups are easy to configure and manage. Scope-based groups are an easy
way to reduce the number of overall events.
Uempty
The purpose of event seasonality is to automatically identify events that occur in a non-random
pattern and show those events to your IT personnel. For example, consider a low disk space event
that occurs every Monday at about 2:00 PM because of a scheduled backup. Instead of wasting
time troubleshooting the low disk space event every Monday afternoon, your team can see that
the event is seasonal, and can choose to suppress it.
IBM Cloud Pak for AIOps Event Manager automatically learns from your event history and
identifies events that occur in a seasonal pattern over time. By finding seasonal events, it is
possible to reduce the number of events that occur at non-random times, which can be done by
adjusting the IT process to compensate for known peaks, by filtering the events, or by
suppressing the events completely. Event Manager’s seasonality analysis can also help determine
where and when anomalies occur that were not previously known.
Your operators see easy to understand infographics that show when the event usually occurs and
when its has occurred in the past.
Uempty
IBM Cloud Pak for AIOps Event Manager includes a runbook automation service. This easy-to-use
service helps IT operations management teams to simplify and automate complex
troubleshooting and remedial actions.
Runbook automation provides the following capabilities for creating, managing, and executing
guided tasks and automated activity:
• Event-triggered automated guidance and actions
• Flexible interoperability with management and collaboration tools, both cloud-based and
on-premises
• Runbook execution tracking statistics
Runbooks are guided steps that your IT operations team uses to troubleshoot and resolve
problems. Some organizations might call these standard operating procedures, or playbooks.
When an incident occurs, IBM Cloud Pak for AIOps matches an appropriate runbook to the
problem.
The runbook can be set to run automatically when it is matched to an incident, or it can run with
user approval and participation:
• Manual runbooks: A step-by-step description of the exact procedure an operator should
follow.
• Semi automated runbooks: Each step describes exactly what an operator should to do, and
the operator simply pushes a button to execute an automated task on a target system.
• Fully automated runbooks: The runbook is selected by the system as response to a trigger and
executed without operator attention. The results of the runbook are stored for technical
review.
Uempty
Uempty
Important
The ObjectServer is case-sensitive. All of the default databases are defined with names that
contain all lowercase letters. Any use of the actual database name, in an SQL statement, for
example, must reference the name in lowercase.
Because the ObjectServer runs entirely in memory, it is extremely fast. The ObjectServer creates
check-points on disk for recovery of a system failure.
Uempty
Your IT resources
(including servers, mainframes, Windows systems,
applications, circuit switches, voice switches, IP
routers, SNMP devices, hypervisors, SDN, network
management applications and frameworks, among
many others.)
A probe is lightweight, small-footprint software that obtains event data, converts it to a common
event format, and passes it to the ObjectServer. Each probe collects data from a specific source:
database table, log file, application, and others. The role of a probe is to extract data from the
source and normalize the data into the format of the alerts.status table within the ObjectServer.
Probes are separate from the ObjectServer. As requirements change, probes can be added or
removed without changing the ObjectServer or interrupting service.
Some probes are generic, for example, SNMP, and Syslog. Other probes are specific to
applications or devices.
Probes are light-weight, fast, and resilient. If a probe loses contact with the ObjectServer, it can
store events until the ObjectServer becomes available. If a pair of ObjectServers exists, a probe
can be configured to fail over, sending events to the alternative ObjectServer. Because the data is
passed to the Event Manager central store (ObjectServer) through a TCP/IP connection, delivery is
ensured.
Uempty
Types of probes
Alcatel-Lucent 1300 XMC Apache Pulsar FIFO IBM Event Streams for IBM Cloud
Alcatel-Lucent 5529 OAD V6 Avaya Definity G3 Genband IEMS IBM SevOne Network Performance
Management (NPM)
Alcatel-Lucent 5620 Logfile BMC Patrol Generic 3GPP
IBM Tivoli Common Event Infrastructure
Alcatel-Lucent 5620 SAM 3GGP v8 BMC Patrol V9 Generic Log File Java (CEI)
Alcatel-Lucent 5620 SAM v13 JMS CA Spectrum V9 CORBA Generic Multi-Technology Operations IBM Tivoli EIF
Systems Interface (MTOSI)
Alcatel-Lucent 5ESS CA Spectrum V9.4 (CORBA) IBM Turbonomic
Generic TMF814
Alcatel-Lucent 9353 WNMS Ciena Blue Planet MCP IBM WebSphere MQ
Glenayre VMS
Alcatel-Lucent DSC DEX Cisco APIC iDirect Pulse
Heartbeat
Alcatel-Lucent ECP Cisco Evolved Programmable Network IEC CIM Advanced Metering Infrastructure
Manager HP Network Node Manager-i
Alcatel-Lucent ITM-NM/OMS Itron OpenWay Collection Engine (OWCE)
Cisco Transport Manager 9.0 (CORBA) HP Operations Manager EMS
Alcatel-Lucent ITM-SC
Comverse TRILOGUE INfinity HPE Operations Manager i JDBC
Alcatel-Lucent OMC-R
Dantel PointMaster HTTP Server Error Log Juniper Contrail
Alcatel-Lucent OS-OS
ECI Network Manager Huawei M2000 MML Juniper Contrail Alerts
Alcatel-Lucent Wavestar SNMS (CORBA)
Email Huawei U2000 3GPP (CORBA) Kafka
Amazon Web Services
Exec Huawei U2000
And many more!
Overview © Copyright IBM Corporation 2022, 2023
This slide is a partial list of IBM Cloud Pak for AIOps probes. There are over 100 probes for
different technologies and data types.
The library of probes changes continuously. New or updated probes are typically released on a
quarterly basis.
Uempty
{
"resource": { JSON formatted HTTP
"type": "service", request (POST)
"name": "TPA-cassandra",
"cluster": "TPA-datalayer-d",
"displayName": "Cassandra DB",
"location": "wdc",
"application": "cassandra",
"hostname": "TPA-datalayer-d.ibmserviceengage.com"
},
"summary": "Cassandra response time is above 2000ms", Webhook Object
"severity": "Minor",
"sender": { API Server
"type": "synthetics",
"name": "db-synthetic-mon"
},
"type": {
"statusOrThreshold": "> 2000ms",
"eventType": "Response time > 2000ms"
},
"resolution": false
}
You can use a webhook to insert event information into IBM Cloud Pak for AIOps Event Manager
from any event source that can send the information in JSON format.
Inbound webhooks provide an API endpoint so that other systems can send event data to Event
Manager with an HTTP request. Most types of webhooks use a predefined mapping to translate
the content of the source system’s HTTP request into the normalized event fields in the Event
Manager ObjectServer.
Uempty
Gateways provide a mechanism to extract event data from an ObjectServer and send it to a
different destination.
There are two types of ObjectServer gateways: uni-directional and bidirectional.
• Uni-directional gateways are used in hierarchical architectures, and to forward specific
events to other ObjectServers.
• Bidirectional gateways are used primarily to synchronize data in two ObjectServers operating
as a resilient pair (failover).
Other gateways include:
• Message bus gateways send event records from the ObjectServer to a message bus, such as
Kafka, HTTP/HTTPS, Java Message Service (JMS), and more.
• Database gateways allow ObjectServer events to be written to a database. When events are
deleted from the ObjectServer, they are no longer available to be viewed by Event Manager
users. The database gateway provides a mechanism for historical tracking of events.
• Trouble Ticketing gateways interface with ticketing systems (TSRM, Remedy, Clarify, and
others) and raise tickets based on specified event criteria. During the life of the event and
trouble ticket, the gateway can continue to update severities, journals, and other data.
• SNMP gateways allow ObjectServer event data to be forwarded as SNMP traps.
Flat File gateways write event data to a file in a user-specified format.
Uempty
Event Manager provides several user interfaces for displaying events, topology, runbooks, and
other data. One of these user interfaces in called WebGUI.
The Event Manager WebGUI is accessed through a browser. Users access the WebGUI to see
events, topology maps, runbooks, dashboards, and more. Administrators also use the WebGUI to
configure and customize some features of Event Manager.
The WebGUI interface is highly-customizable, and has been included with Event Manager for
several years.
Uempty
Another way for users to works with events is the AIOps user interface. This modern interface
helps your users focus on the results of event analytics. Like the WebGUI, the AIOps user
interface is web-based and can also be used to configure and customize Event Manager.
Uempty
Events Impact
Original event
10:05 E-Com-1 Inc Gold Call customer 9-5 30 minutes
X 144.124.108.101 Error 773 10:05
Impact is a component of the IBM Cloud Pak for AIOps. Impact transforms operational IT data
into usable information. Using that information, operators and troubleshooters can manage and
extract value from complex IT environments.
Impact is used to process high-volume event streams and to do the following tasks:
• Gather additional information about an event or series of events (aggregation)
• Alert staff about high-priority conditions (escalation)
• Decide which events should be ignored (suppression)
• Set markers in diverse data sources (correlation)
• Take action on IT resources (autocorrection)
A key feature of Impact is its ability to connect to various aspects of customer data. Impact can
query database tables and retrieve selected data for event enrichment. This feature allows
Impact to associate business intelligence to events. Based upon this additional information IT
operators are better able to prioritize their response to service outages based upon the business
impact.
This slide shows an example of how Impact enriches event fields from the ObjectServer to add
extra business information to the event. Subsequently, the resulting event provides the IT
operations staff with live information such as customer name, location, contact details, and
re-prioritizes the event according to the SLA of the particular customer. The following steps
explain the role of Impact:
1. A probe inserts an event into the Event Manager ObjectServer. The event contains only sparse
technical data. In this example, the event comes from a probe, but it could also come from any
other incoming integration, such as a webhook.
Uempty
2. The Impact event reader service reads the new event from the ObjectServer and Impact
begins processing the event according to a policy.
3. Impact uses the IP address in the event to look up more information in a DB2 provisioning
database. Impact finds business data in the database, such as customer name, location, and
site contact information.
4. Impact uses the customer name from the preceding step to look up SLA information in an
Oracle database. Impact finds the SLA agreement (Gold), the hours of operation, and the time
frame in which the customer must be contacted in case of a problem.
5. Impact adds the extra business information from the two different databases to the event and
sends the event back to the ObjectServer. Impact can also alter existing attributes of the
event, such as changing the Severity to critical before it sends the event back to the
ObjectServer.
6. From the event list, IT operators see the enriched event. This means they have all the
information they need to react to this event and fulfill the SLA contract in seconds without the
need to switch to any other tool. Without Impact, operators and troubleshooters would
spend valuable time finding this additional information manually, and could risk violating the
terms of the SLA.
Impact is versatile and highly-configurable. In this example Impact looked up data in two
databases, but Impact can connect to many different types of data sources to find data, such as
HTTP endpoints or JMS topics. Impact can also send event records to destinations other than an
ObjectServer, such as email, HTTP API, SNMP trap, Kafka topic, or another database.
Impact includes a web-based administrator interface, which is used to control and configure how
Impact processes data.
You do not use Impact in this course.
Uempty
Event Manager terminology: WebSphere administrative
console
Uempty
1.2. About your lab environment
Uempty
Infrastructure host
The IBM Cloud Pak for
DNS server AIOps in your lab is
running in a Red Hat
HAProxy Load OpenShift cluster
Bastion host Balancer OpenShift Cluster
This course focuses on IBM Cloud Pak for AIOps Event Manager running on the Red Hat OpenShift
Container Platform.
Your lab environment includes the following systems and servers, which are running in virtual
machines.
• A Red Hat OpenShift v4.8 cluster:
▪ 3 control plane nodes, for cluster control and management: control0, control1, and
control2.
▪ 3 compute nodes, to run the application workloads that make up IBM Cloud Pak for AIOps
Event Manager: compute0, compute1, and compute2.
• An infrastructure virtual machine, which provides DNS services for your cluster. The
infrastructure host also runs an HAProxy load balancer.
• An NFS server, to provide persistent storage for your lab activities.
• A bastion virtual machine, to use for all of your lab work. The bastion virtual machine has client
software installed that you use to access and manage your OpenShift cluster.
Important
Do all of your lab work from the bastion virtual machine. There is no need to connect directly to
any of the other virtual machines in your lab environment.
Uempty
Uempty
Review questions
1. True or False: AIOps reduces time to react to events and diagnose incidents.
2. What kind of software collects data from your IT resources and sends it to the Event Manager
ObjectServer?
a. Probe
b. Container engine
c. WebGUI
d. Load balancer
3. Which lab server do you access to perform all of your lab exercises?
a. bastion
b. control0
c. The infrastructure host
d. compute0
Uempty
Review answers
1. True or False: AIOps reduces time to react to events and diagnose incidents.
The answer is True.
2. What kind of software collects data from your IT resources and sends it to the Event Manager
ObjectServer?
a. Probe
b. Container engine
c. WebGUI
d. Load balancer
The answer is A.
3. Which lab server do you access to perform all of your lab exercises?
a. bastion
b. control0
c. The infrastructure host
d. Compute0
The answer is A.
Uempty
Exercise: Overview
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• Passwords are provided in the lab guide. Some passwords are stored in Kubernetes secrets, so
you must retrieve them.
• Many steps require you to be logged in to your Red Hat OpenShift cluster. If you see
unexpected errors when you run a command, make sure you are logged in.
• Some of the commands and code examples you use in this course are lengthy. At times, these
long text examples do not copy and paste accurately from the PDF lab guide to your lab
environment. As a convenience, you can find plain-text versions of these long examples in the
following file on your bastion host:
/home/netcool/ClassFiles/longCodeExamples.txt
Uempty
Overview
This unit describes two ways to connect Event Manager to data sources: probes and webhooks.
References
About probes:
https://www.ibm.com/docs/SSSHTQ_8.1.0/pdf/omn_pdf_prgw_master.pdf
https://www.ibm.com/docs/netcoolomnibus/8.1?topic=gateways-setting-up-probes-acquire-ev
ent-data
https://www.ibm.com/docs/noi/1.6.7?topic=sources-connecting-event-cloud-deployment
About webhooks:
https://www.ibm.com/docs/noi/1.6.7?topic=systems-configuring-incoming-integrations
Uempty
Uempty
Topics • Probes
• Webhooks
Uempty
2.1. Probes
Probes
Uempty
About probes
Probes:
• Are software data collectors
SNMP Probe
• Are designed to collect data from specific sources (MTTrapd) Object
Server
Log files
Database tables SNMP traps Events
SNMP traps
Applications
Syslog Probe
Many others
Probes are light-weight software designed to collect data from a specific source and produce
ObjectServer events.
Typical questions regarding probes:
• What probes are required?
It depends upon event source. For example, devices that generate SNMP traps use the
MTTrapd probe. Devices that generate log messages use the Syslog probe.
• How many probes are required?
It depends upon the event source. For example, one MTTrapd Probe can receive traps from
lots of devices. You require one Syslog Probe for every UNIX log file, because the probe
physically reads one file.
• Where do the probes run?
It depends upon the event source. For example, the MTTrapd probe can run any place where it
can receive SNMP traps. The Syslog probe must run on the server with the UNIX log file.
• What operating system is required by the probe?
It varies by the probe. For example, the MTTrapd probe is supported on AIX, Solaris, HP-UX,
Linux, and Windows. The Syslog probe is not supported on windows because it reads a UNIX
log file.
In this example, there are three probes collecting data and sending events to the ObjectServer:
• The MTTrapd probe listens for incoming SNMP traps sent by SNMP-enabled devices. The
probe converts the OID in the SNMP traps into events, then sends the events to the
ObjectServer.
Uempty
• The Syslog probe watches a log file. When an interesting message is written to the log, the
probe converts the fields in the message into an event, then sends the event to the
ObjectServer.
• The Ping probe reads a list of IP addresses or host names, then performs a ping sweep of the
resources in the list. The probe converts failed ICMP ping responses into events, then sends
the events to the ObjectServer.
Uempty
Types of probes
Alcatel-Lucent 1300 XMC Apache Pulsar FIFO IBM Event Streams for IBM Cloud
Alcatel-Lucent 5529 OAD V6 Avaya Definity G3 Genband IEMS IBM SevOne Network Performance
Management (NPM)
Alcatel-Lucent 5620 Logfile BMC Patrol Generic 3GPP
IBM Tivoli Common Event Infrastructure
Alcatel-Lucent 5620 SAM 3GGP v8 BMC Patrol V9 Generic Log File Java (CEI)
Alcatel-Lucent 5620 SAM v13 JMS CA Spectrum V9 CORBA Generic Multi-Technology Operations IBM Tivoli EIF
Systems Interface (MTOSI)
Alcatel-Lucent 5ESS CA Spectrum V9.4 (CORBA) IBM Turbonomic
Generic TMF814
Alcatel-Lucent 9353 WNMS Ciena Blue Planet MCP IBM WebSphere MQ
Glenayre VMS
Alcatel-Lucent DSC DEX Cisco APIC iDirect Pulse
Heartbeat
Alcatel-Lucent ECP Cisco Evolved Programmable Network IEC CIM Advanced Metering Infrastructure
Manager HP Network Node Manager-i
Alcatel-Lucent ITM-NM/OMS Itron OpenWay Collection Engine (OWCE)
Cisco Transport Manager 9.0 (CORBA) HP Operations Manager EMS
Alcatel-Lucent ITM-SC
Comverse TRILOGUE INfinity HPE Operations Manager i JDBC
Alcatel-Lucent OMC-R
Dantel PointMaster HTTP Server Error Log Juniper Contrail
Alcatel-Lucent OS-OS
ECI Network Manager Huawei M2000 MML Juniper Contrail Alerts
Alcatel-Lucent Wavestar SNMS (CORBA)
Email Huawei U2000 3GPP (CORBA) Kafka
Amazon Web Services
Exec Huawei U2000
And many more!
Incoming integrations © Copyright IBM Corporation 2022, 2023
This slide shows a partial list of IBM Cloud Pak for AIOps probes. There are over 100 different
types of probes to support the diverse technologies found in modern IT environments. New or
updated probes are released on a regular basis.
Uempty
Probe operation
BUILD INSERT COMMAND @fields are collected into an SQL Insert command
2-7
Incoming integrations © Copyright IBM Corporation 2022, 2023
Uempty
Table 1. Five stages of probe operation
Stage Description
Process The tokenized event stream is parsed through the rules file and
ObjectServer alerts.status fields (@ fields) are set.
Forward The composed event is forwarded to the ObjectServer. If problems occur in
forwarding, the probe might fail over or store and forward.
After forwarding, the probe clears variables, retrieves a new event, and repeats the cycle as
necessary.
Uempty
Probe Basics
A probe consists of a binary, a .rules and a .props file:
• Binaries retrieve and tokenize event streams($OMNIHOME/probes/nco_p_<probename>)
• Properties run time settings of probe ($OMNIHOME/probes/<arch>/<probename>.props)
• Rules instructions for processing event ($OMNIHOME/probes/<arch>/<probename>.rules)
All probes have at least three files: a binary executable (nco_p_probename), an interpreted rules
file (probename.rules) and a properties (probename.props) file.
The properties file sets run time parameters and determines the behavior of a probe. This
includes, among other parameters, where to record a log file, and how verbose the log file
messages should be.
The binary collects the event stream and splits it into individual tokens. The binary interprets and
applies the rules file and forwards the processed event to the Object Server.
The primary purpose of the rules file is to assign tokens to Object Server fields. The rules file can
also manipulate data, perform calculations, and derive additional data and add it to the event. The
rules file can derive additional data using lookup tables and other methods.
When installing probes on a remote machine, you must install the probes and common files on
that machine. These common files are installed when you install the probe.
Probes require access to the interfaces file to determine how to communicate with the
ObjectServer. Use the nco_xigen or nco_igen utility to define the Object Server in the interfaces
file.
Modify the properties and rules file as necessary and start the probe using the following
command:
$OMNIHOME/probes/nco_p_probename
Uempty
The following example shows the $OMNIHOME/probes/<arch>/ directory of a machine that is
running the Simnet and Syslog probes.
$ pwd
/opt/IBM/tivoli/netcool/omnibus/probes/linux2x86
$ ls -l
total 232
netcool ncoadmin 144 default
netcool ncoadmin 27100 nco_p_simnet
netcool ncoadmin 18772 nco_p_syntax
netcool ncoadmin 137970 nco_p_syslog
netcool ncoadmin 1220 simnet.def
netcool ncoadmin 4007 simnet.props
netcool ncoadmin 1496 simnet.rules
netcool ncoadmin 2234 syntax.props
netcool ncoadmin 528 syntax.rules
netcool ncoadmin 2451 syslog.props
netcool ncoadmin 23337 syslog.rules
Uempty
Rules file
• Contains program steps, which are run for each event to
manipulate incoming data and assign it to alerts.status fields
• Found in $OMNIHOME/probes/<arch>/<probename>.rules
• Inbound tokens start with dollar sign ($); ObjectServer fields start with @
• Major function of rules file is to define Identifier field
• Field values of alerts.status might be set by the rules file
• Additional information can be added through the rules file
• Probe can have multiple associated rules files (include files)
Not all data contained in an event is relevant to the processing of that event. The rules file defines
how the probe rationalizes or adds to the contents of an incoming event to create a meaningful
alert.
An important function of the rules file is to define the Identifier field used for de-duplication. If the
Identifier is made too specific, for example by incorporating the time of the event, little
de-duplication takes place. However, if the identifier is not specific enough, for example by
omitting a card, port or slot number, wrong events are de-duplicated.
All field values of the alerts.status table are normally propagated by the rules file using the
information from the incoming event. It is also possible to add extra information to the event in
the rules file, such as customer, department, and application data.
Rules files can selectively update fields in alerts.status, overriding ObjectServer de-duplication
settings.
The probe rules file is where the incoming token stream is parsed and assigned to ObjectServer
fields.
It is a best practice to implement as much functionality as possible at the probe rules file level,
before events are sent to the ObjectServer. This means the ObjectServer has less work to do when
the event arrives. For example, dropping events that can be discarded is best done at the probe
rules file level.
Uempty
@Node=$Node
@Summary=$Summary
switch($Severity) {
case Critical:
@Severity=5
case Major:
@Severity=4
default:
@Severity=2
}
if (regmatch(@Summary,interface.*down)) {
@AlertKey=extract(@Summary,interface (.*) is)
}
@Identifier = @Node + @AlertKey + @Summary
Note
Uempty
When a probe receives an event stream from the source, it splits the stream into tokens. The set
of tokens is not fixed and might change depending on event data received. The tokens are
identified in the rules file by $, for example $Node is a token holding the node name.
In the probe rules file, an ObjectServer field value is denoted with a @. For example, @Node
references the value of the Node field. It is the @fieldnames in the alerts.status table of the
ObjectServer which make up an event and are shown in event lists. To populate the @fieldnames,
tokens need to be assigned to them, for example:
• Direct Assignment: @Node=$Node
• Concatenation: @Summary=$Summary + $Group
• Adding text: @Summary=$Node + “ has problem “ + $Summary
Uempty
Uempty
Lookup tables
• Technique for performing event enrichment
Adding data to the event record that does not appear in the probe source
• Data is formatted in a text file
Tab delimited
• File contains multiple columns
Key
Data item 1
Data item 2
Data item N
• Suitable for static data
Building names
Addresses
Lookup tables provide a method of making extra information available to a probe, and inclusion in
an event.
Lookup files are useful at the probe level for basic event enrichment so long as the data set is
fairly static. If the data in the lookup table is dynamic, it is better to store it in a database table (for
example, the ObjectServer or other external database) and then perform event enrichment with
an ObjectServer trigger or with Impact.
A lookup table can be defined two ways. One way is a reference to an external file containing the
table. The second way is by placing the table in the rules file itself.
For an external file table, you specify a pointer to the file table as follows:
table TabNam="/opt/netcool/omnibus/probes/<arch>/file"
This reference must be at the top of the rules file, as the first uncommented line above the
ProbeWatch section.
The file must have the following format:
key[TAB]value
key[TAB]value
For a lookup table embedded in the rules file, the definition takes the format:
table TabNam={{"key","value"},{"key","value"}…}
In the rules file, the lookup() function looks the same for both table types. It uses two
arguments: a value to look up and the name of the table to read from:
@result=lookup(@Key,TabNam)
Uempty
labExample.lookup
interfaceEth0_1 LONDON Bureau of Machine Analytics
interfaceEth0_2 BRUSSELS Department of Computer Algorithms
interfaceFe2_1 SYDNEY Department of Computer Algorithms
interfaceEth1_3 ROME Oaca Industries, JP
In this example, imagine you work for a communications provider. Part of your business is to
provide managed network services to your customers. Each of your customers has their own
managed network interfaces connected to your equipment, which you are monitoring with Event
Manager. The goal in this example is to enrich incoming events:
• To add the name of the customer each interface is assigned to
• To add the location
This slide shows an excerpt from a rules file and a lookup table.
Uempty
other two columns list the location and customer for each interface. There is nothing special
about the lookup table file, except that it:
• Must be delimited with tabs.
• Must store the table key in the first column.
• Must be saved in a location and with permissions so that the probe can read it.
Uempty
he Netcool Knowledge Library (NcKL) is a collection of rules files. It is distributed with Event
Manager as a separate download.
This library of rules files:
• Uses the include technique to incorporate multiple individual files
▪ One “leader” rules file contains multiple include statements
• Predefined rules files for multiple vendors and technologies
▪ Packaged as individual files (over 2000 in current version)
▪ Easy to configure to add or remove
• Rules provided for MTTrapd and Syslog probes
To install the Netcool Knowledge Library:
• Unpack rules files into target directory - $NC_RULES_HOME
• Run supplied SQL file to add ObjectServer customizations
• Configure probe to use the NcKL “leader” rules file
$NC_RULES_HOME/syslog.rules – Syslog Probe
$NC_RULES_HOME/snmptrap.rules – MTTrapd Probe
Uempty
2.2. Webhooks
Webhooks
Uempty
Object
Events Server
Probes are not the only way to send events to the IBM Cloud Pak for AIOps. You can use a
webhook to send events to an API, which then normalizes the incoming data into an event record.
Data sent to a webhook must be in a JSON-formatted HTTP request.
Uempty
Incoming integrations
Several applications and platforms can send event data to a webhook API. The IBM Cloud Pak for
AIOps has predefined webhook integrations that already know how to map incoming HTTP
request fields into ObjectServer fields. There are many predefined webhook integrations to ingest
data from diverse sources, such as Amazon’s Simple Notification Service (SNS), Microsoft Azure,
Jenkins, and many more.
This slide shows the Incoming integrations page where you can configure connections to other
systems. Not all of the integrations shown on this slide use webhooks to accept data, but most of
them do.
Uempty
Even if there is no predefined integration for a data source in your environment, you can send data
to an Event Manager webhook. You can create a custom webhook to map the fields within any
incoming JSON HTTP request to an event record.
In this example, a custom webhook extracts text from an incoming HTTP request and uses the
text to populate an event record:
• The resource.hostname field in the incoming HTTP request is mapped to the Node field of the
ObjectServer.
• The resource.location field in the incoming HTTP request is mapped to the Location field of
the ObjectServer.
• The summary and resource.cluster fields in the incoming HTTP request are combined with
some static text, and mapped to the Summary field of the ObjectServer.
• The severity field in the incoming HTTP request is mapped to the Severity field of the
ObjectServer.
IBM Cloud Pak for Waston AIOps Event Manager provides an easy to use tool to create custom
inbound webhooks.
Uempty
Uempty
Review questions
1. True or False: Probes are the only way to send events to the IBM Cloud Pak for AIOps.
2. What is a probe rules file?
a. A configuration file that controls how a probe creates an event from incoming data.
b. A configuration file that controls how a probe handles failover and recovery.
c. A checkpoint file that saves the current state of the probe to disk.
d. A checkpoint file that saves all outbound events to disk.
Uempty
Review answers
1. True or False: Probes are the only way to send events to the IBM Cloud Pak for Watson AIOps.
The answer is False.
2. What is a probe rules file?
a. A configuration file that controls how a probe creates an event from incoming data.
b. A configuration file that controls how a probe handles failover and recovery.
c. A checkpoint file that saves the current state of the probe to disk.
d. A checkpoint file that saves all outbound events to disk.
The answer is A.
Uempty
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• You edit several text files in these labs. Take your time and make sure your changes are
accurate before you save and close each file.
• Don't forget to copy your webhook URL. After you save your webhook, you will not be able to
see the URL again.
Uempty
Overview
This unit shows you how to configure and use the event analytics features included with Event
Manager to reduce the number of actionable events.
Uempty
Uempty
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
The IBM Cloud Pak for AIOps Event Manager uses advanced analytics to help you manage your
events. One of these analytics features is temporal grouping. Event Manager looks into your event
history to discover sets of events which tend to occur within a short time of each other. These sets
of events are considered to share a temporal relationship. When such events occur again, they are
grouped together to reduce “noise” and the overall number of actionable events.
The temporal grouping analytics algorithm works on unique identifiers for the events. To group,
the set of events needs to be seen at least 3 times, and the events within each set arrive within 20
minutes (default) of each other.
In this example, Event Manager has noticed that over time, six specific events always seem to
arrive together, within 33 seconds of each other. You can see the historical occurrences of these
events in the screen capture at the top of the slide. The next time the set of events arrive, Event
Manager groups them together in the event list, which you can see at the bottom of the slide.
There are six events in this example. Without this grouping, your IT operators might waste time
troubleshooting each of these events individually, and eventually discover that they are related
only after a lengthy investigation.
With Event Manager’s built-in grouping analytics, your IT operators can save time by considering
this group of events as a single issue immediately. Your operators also see less “noise” in the
event list, because these six events were consolidated and reduced to a single parent event.
Event Manager decorates the event list as events arrive to indicate that these events share a
temporal relationship. In the screen capture at the bottom of the slide, the Grouping column
shows a temporal grouping icon. You also see a tool tip if you hover over the icon. If you click the
icon or the Investigate link in the parent event, you see more information about why the events
were grouped together and examples of past occurrences.
By default, no configuration is required to start using the temporal grouping feature.
Uempty
Event Manager continuously learns from subsequent events which match the profile of a temporal
group. For example, if the set of events arrives again, but this time with a few more or less events
in the set, Event Manager will adjust accordingly.
The result of the temporal grouping algorithms is a grouping policy. Policies are a definition of
which events occurred together in the past and what action to take when they occur again, for
example: group the events together in the event list.
With temporal grouping, you can choose the deployment mode: Deploy first or Review first. In
Deploy first mode, temporal policies are enabled automatically, without the need for manual
review. In Review first mode, temporal policies are not enabled until they are manually reviewed
and approved by an administrator.
Uempty
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
Another aspect of temporal grouping is pattern matching. Event Manager can identify patterns
within existing temporal groups and apply these patterns to new events that have not been seen
before.
Event Manager identifies recurring problems across multiple temporal groups, extracts the
resource information from those groups, and creates a pattern. This pattern can then be used to
group new instances of the problem events which match the problem signature but occur on new,
previously unseen resources.
In this example, the events at the top of the slide have been seen several times in the past. They
are grouped together by temporal grouping analytics. These events have arrived together in the
past and Event Manager has learned to group them together if they arrive again.
The events at the bottom of the slide are new; Event Manager has never seen them before.
However, with temporal pattern grouping, Event Manager will group them together the first time
they arrive based on the problem signature pattern learned from the events at the top of the slide.
Uempty
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
The result of temporal event analytics is a policy. Policies are created automatically when Event
Manager detects a temporal relationship among events. Policies take action against incoming
events, such as grouping them together.
You view and manage policies with the Event Manager user interface. Tasks you can perform
include:
• You can view the historical occurrences of a group.
• You can disable policies that are no longer valid or that do not produce the groups you want.
• You can edit temporal pattern policies. (Temporal and seasonality policies cannot be edited.)
• If your environment requires you to approve the automatically generated policies before they
are enabled, you can approve them in the user interface.
Uempty
Seasonality
• Identify events that recur with predictable regularity
• Identify “invisible” chronic issues
• Helps to prioritize maintenance tasks
• Provides valuable insights into incident
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
The purpose of event seasonality is to automatically identify events that occur in a non-random
pattern and show those events to your IT personnel. For example, consider a low disk space event
that occurs every Monday at about 2:00 PM because of a scheduled backup. Instead of wasting
time troubleshooting the low disk space event every Monday afternoon, your team can see that
the event is seasonal, and can choose to suppress it.
IBM Cloud Pak for AIOps Event Manager automatically learns from your event history and
identifies events that occur in a seasonal pattern over time. By finding seasonal events, it is
possible to reduce the number of events that occur at non-random times, which can be done by
adjusting the IT process to compensate for known peaks, by filtering the events, or by
suppressing the events completely. Event Manager’s seasonality analysis can also help determine
where and when anomalies occur that were not previously known.
Your operators see easy to understand info graphics that show when the event usually occurs and
when its has occurred in the past.
Like temporal grouping, the result of seasonal event analytics is a policy.
Uempty
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
Event Manager periodically runs training jobs for temporal and seasonal analytics. These jobs run
once a day and search the last three months of your event history. If any historical events have a
temporal relationship or are seasonal, Event Manager creates policies to act on future events.
Training occurs periodically on a fixed schedule in the background, with trained models sent to
the policy service in the form of actionable policies.
There are two analytics training jobs: one for temporal grouping and one for seasonal analysis. If
needed, you can force these jobs to run manually.
Uempty
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
Use the Event Manager user interface to manage temporal or seasonal event analytics settings.
This slide shows the page in the user interface where you change global event analytics settings.
Note
To disable an individual policy, archive it. Archiving a policy stops it from processing future events
that match the temporal or seasonal profile. Use the Policies page in the Event Manager user
interface to archive individual policies.
Uempty
About temporal grouping configuration
You can configure the following settings for temporal analysis:
• You can enable and disable temporal grouping.
• You can set automatically-generated temporal policies to require approval before they start
taking action, or you can set them to start running immediately.
You can configure the following settings for temporal pattern analysis:
• You can enable and disable temporal pattern grouping.
• You can set automatically-generated temporal policies analysis to require approval before
they start taking action, or you can set them to start running immediately.
• You can choose to exclude any event fields that you do not want to be considered as a
temporal pattern.
Uempty
Uempty
Review questions
1. True or False: Seasonal events are events that have been correlated with weather conditions.
2. What is a temporal group?
a. A group of events that always seem to arrive together.
b. A group of events that expire immediately.
c. A group of events that you set to expire together.
d. A group of events that are automatically rejected.
Uempty
Review answers
1. True or False: Seasonal events are events that have been correlated with weather conditions.
The answer is False.
2. What is a temporal group?
a. A group of events that always seem to arrive together.
b. A group of events that expire immediately.
c. A group of events that you set to expire together.
d. A group of events that are automatically rejected.
The answer is A.
Uempty
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• After you run training manually, don't forget to delete the trainer pod.
Temporal and seasonal event analytics © Copyright IBM Corporation 2022, 2023
Uempty
Unit 4. Topology
Estimated time
00:45
Overview
This unit teaches you how to add topology data to Event Manager. You then learn how to group
different segments of a topology together to further reduce events and calculate root cause.
Uempty
Uempty
Topology overview
The IBM Cloud Pak for AIOps benefits:
• Up-to-date visibility of your resources and their connectivity
• Historical comparison (what has changed?)
• Reads and loads topology data from complex infrastructures and services
• Groups events that have a topological relationship
• Probable cause
The topology service within the IBM Cloud Pak for AIOps can discover the resources in your IT
systems and the dependencies among them. The topology information that AIOps discovers is
used to deliver several valuable features, including:
• Dynamic and interactive topology maps that show users how resources are connected and
how they depend on each other. The maps include a “rewind” and a “Delta” mode where you
can compare your current topology with a past state to easily identify what has changed.
• The topology service can read from predefined sources of topology data, such as Amazon Web
Services, Azure, Kubernetes, Cisco, and OpenStack. For custom data sources, you can load
topology data with flat files or through an API. This is important, because modern services and
applications are increasingly deployed in environments that take advantage of distributed and
virtualized infrastructure.
• The IBM Cloud Pak for AIOps can match incoming events to resources in your topology. This
provides several benefits:
▪ Multiple events from a segment of your topology are grouped together, to reduce the
number of overall events.
▪ AIOps calculates the event within the group that is most likely to be the probable cause of
the overall problem.
▪ Users can see up-to-date status of the resources in a topology map. If there is an event
associated with a resource, it will be decorated in the map. Users can also launch a
contextual map from event lists, which shows the topology of the affected resources.
Uempty
This slide shows an example of a topology map. These maps are interactive and dynamic. Notice
the following features of the map:
1. At the top of the map, you can change the number of hops. This increases the number of
connected resources displayed in the map. There is also a filter button here, where you can
include or exclude resources and relationships in the map.
2. At the right of the map, you can zoom in or out, change the layout, fit to screen and pan. You
can also zoom in and out with the mouse wheel, and pan by dragging the mouse.
3. At the bottom of the map is a timeline. You can move the pin in the time line to rewind the map
to a past state. You can also place two pins at different points in the time line and the map will
show indicators of what has changed during the two points in time.
4. Resources that have events associated with them are decorated in the map.
Not shown on this slide is the share button, where you can export the map as PNG or SVG images,
or as a direct link.
You can also interact with the resources and relationships in the map. For example, you could
right click a line in the map to learn more about a relationship among two resources.
These maps are also highly customizable. Administrators can change many properties of these
maps, for example the icons, the line styles, and the interaction with the map itself.
Uempty
Topology grouping
The IBM Cloud Pak for AIOps can group resources in a topology together. These resource groups
make it easier to find and visualize collections of related resources in a large topology, and groups
enable event correlation between resources in a group.
In the case of event correlation, events coming from resources of the same group are combined
together visually in event lists to reduce the amount of “noise” events that your end users see.
Grouped events allow your IT staff to focus on the group of events and their probable cause,
rather than trying to troubleshoot each event individually.
In this example, you see a group of six resources in a topology map. Incoming events that match
members of the topology group are correlated together in the event list. The event list makes it
clear that the events have been correlated together because there is a topological relationship
among the resources that emitted the events.
Notice the following details about the events in the list:
• A group of events is represented by a synthetic parent event. Synthetic parent events have the
string GROUP: in their summary field.
• There is an icon present in the Grouping and Topology columns of the test events. This icon
means that the node in the event is a member of a topology group.
• There is a numeric value in the Probable Cause column of the grouped events. The event with
the highest probable cause value is the event which Event Manager considers to be the root
cause of the problem.
• Users see fewer actionable events. In this example, six events have been reduced to a single
parent event.
Uempty
Probable cause
When events are grouped together based on their topology, the IBM Cloud Pak for AIOps
automatically suggests which event is likely the root cause of an overall problem.
AIOps recognizes how the resources in these events are connected to each other and applies
additional analytic processing to find probable cause:
• Classification of the events based off event summary; this is done using natural language
processing (NLP)
• Computation of the paths, dependencies, and scores between events
In this example, the resources are all related to a line card that is inserted into a network switch.
Resources in this group include network interfaces, the slot that the card is using, and the card
itself. Six incoming events have arrived: five events from interfaces and one event from the line
card. AIOps has determined that the root cause of the problem is the line card failure event,
because its failure led to the interface failures.
As a result of topology based event analytics, your operators see:
• These six events grouped into a single event
• A topology map showing how the card, interface, and slot are connected
• The event that is the probable root cause of the overall problem, indicated by the longest bar
in the Probable Cause column.
Uempty
Uempty
An observer is a service that extracts topology information from a data source and inserts it into
the topology service database.
Observer jobs are a definition of the access details for a target data source. For example, you
configure a File observer job with the name of the topology file you want to load. You configure a
Kubernetes observer job with details about the cluster and namespaces you want to discover.
Observer jobs are triggered to retrieve data. Each type of observer can support multiple jobs, for
example you can create two Kubernetes observer jobs to periodically retrieve data from two
different Kubernetes clusters. You can manage observer jobs with the user interface or an API.
Observer jobs can be long-running or transient. For example, a REST observer “load” job is a
one-off, transient job (unless scheduled to run at set intervals), where a REST observer “listen”
job is long-running and runs until explicitly stopped, or until the observer is stopped.
Most Observers are technology or vendor specific, such as the DNS or VMWare observers.
However, the File observer and the REST observer are special because you can use them to obtain
topology data from custom data sources. The File observer and the REST observer are also useful
for testing.
Uempty
Important
It is important to note that topology data discovered from different observer jobs is separate. This
means that the resources and relationships that are discovered by one observer job are
considered to be a completely separate topology than the resources and relationships that are
discovered by a second observer job. For example, if you define two Kubernetes observer jobs to
discover two different Kubernetes clusters, the topology of the two clusters will not be linked in
topology maps. If you need to join topologies from two different observer jobs, use a merge rule,
which is described later in this unit.
NOTE: Deleting an observer job does not delete the resources or relationships in the topology
database. Topology data must be deleted through the job itself, manually, or through a scripted
removal process.
Uempty
V:{"_operation":"InsertReplace","uniqueId":"router06","matchTokens":["router-06","router06"],"name":"router06","entityTypes":["router"]} Example
V:{"_operation":"InsertReplace","uniqueId":"CPU_01","matchTokens":["CPU_01"],"tags":["TagA","TagB"],"name":"cpu01","entityTypes":["cpu"]} topology file
V:{"_operation":"InsertReplace","uniqueId":"CPU_02","matchTokens":["CPU_02"],"tags":["WAIOpsDemo"],"name":"CPU_02","entityTypes":["cpu"]}
V:{"_operation":"InsertReplace","uniqueId":"WAN_Firewall","matchTokens":["WAN_Firewall"],"tags":["WAIOpsDemo"],"name":"WAN_Firewall","entityTypes":["firewall"]}
V:{"_operation":"InsertReplace","uniqueId":"Server101","matchTokens":["Server101"],"tags":["WAIOpsDemo"],"name":"Server101","entityTypes":["computer"]}
V:{"_operation":"InsertReplace","uniqueId":"Richard","matchTokens":["Richard"],"tags":["WAIOpsDemo"],"name":"Richard","entityTypes":["person"]}
E:{"_toUniqueId":"router06","_edgeType":"connectedTo","_fromUniqueId":"Server101"}
E:{"_toUniqueId":"WAN_Firewall","_edgeType":"uses","_fromUniqueId":"router06"}
E:{"_toUniqueId":"Server101","_edgeType":"uses","_fromUniqueId":"Richard"}
E:{"_toUniqueId":"CPU_01","_edgeType":"contains","_fromUniqueId":"Server101"}
E:{"_toUniqueId":"CPU_02","_edgeType":"contains","_fromUniqueId":"Server101"}
The File observer loads topology data from a plain text file. Each line is a JSON element that
describes a resource or a relationship. The values in each line are read by the observer and
converted to properties in the topology database. You must upload topology files to the File
observer container so they can be read by observer jobs. One File observer job can load data from
only one file.
With in a topology file, values in square brackets represent an array of multiple values: for
example, the first line in the slide has two values for matchTokens; the second line has two values
for tags.
The start of each line represents an action:
• V: The line creates a resource.
• E: The line creates a relationship
• W: The line causes the File observer job to wait a specified time before it continues to the next
line.
• D: The line deletes a resource.
In this example, six resources and five relationships have been loaded from the file. The lines that
add resources include properties such as uniqueId, tags, and entityTypes. The lines that add
relationships include properties such as edgeType and the direction of the relationship (to/from).
You can add user-defined properties in the file. For example, you could add the custom property
myBankDept with the line:
V:{"_operation":"InsertReplace","uniqueId":"Server101","matchTokens":["Server101"]
,"name":"Server101","entityTypes":["computer"],"myBankDept":"retail"}
Uempty
To delete all resources that have been loaded by a File observer job, point the job to an empty file
or create a file to delete specific resources.
Uempty
The REST observer accepts topology data from HTTP requests. In this example, a REST observer
job named jp-listen has accepted data from two different HTTP requests and modeled a
topology.
The first request creates a resource named IBM. The second request creates a resource named
Armonk, New York and a relationship between them.
These requests were sent to the REST observer resource API, which is at the URL:
https://<your route>/1.0/rest-observer/rest/resources
Although it is not shown on this slide, you can use the REST Observer references API to add
relationships at the following URL:
https://<your route>/1.0/rest-observer/rest/references
There are other REST observer APIs. These APIs are well documented and include a swagger
interface for development and testing. You can access the swagger interface here:
https://<your route>/1.0/rest-observer/swagger
There are two types of REST observer jobs: listen jobs and bulk replace jobs.
To delete all resources that have been loaded by a REST observer job, use the DELETE method in
a series of HTTP requests.
Uempty
Properties are key-value pairs that are associated with a resources or relationships.
The Topology Service has two categories of properties, generic and user defined:
• Generic properties are few in number and constrained to a single data type, such as integer or
string. These properties are indexed by the search service and you can use them to search for
resources in the user interface. Some important generic properties are:
▪ _id: The tenant-unique identifier of a resource, independent of the uniqueId.
▪ uniqueId: The string by which the provider knows the resource, it might be a UUID via
which the provider can look up its own local data store for information about the resource.
The uniqueId only needs to be unique within the context of its provider.
▪ matchTokens are used to store strings which match incoming events to resources.
▪ mergeTokens are used to combine topologies together records together based on merge
rules.
▪ name is the string which will be used in the UI. It does not need to be unique and should
be fairly short and human readable.
▪ tags can be used to store strings which can be searched in the user interface.
▪ entityTypes is the type(s) of object the resource represents, for example server, CPU, or
container. There is a set of predefined entity types with associated icons, or you can add
custom entityTypes.
User-defined properties are free form and are not constrained to any given date type. Observers
are free to add new user properties as they are needed. User-defined properties are not indexed
by the search service. You cannot use these properties to search for resources in the user
interface.
Uempty
A topology group is a limited collection of resources within your larger overall topology.
Administrators group resources together that have a common purpose or function that they want
to monitor.
For example, imagine AIOps has discovered the topology of a large Kubernetes cluster. You are
responsible for the health of a key application is running in the cluster, and you want to monitor
the microservices that make up that application as a group. You can create a topology group that
includes only the components of the application you are interested in. Events that arrive from the
microservices in your topology group will be correlated in event lists to reduce the number of
overall events.
Topology groups are created by Topology templates. Templates define:
• The members of the group (which resources to include or exclude)
• The visual appearance of the group in a topology map
• Whether or not events from group members are correlated together
There are three types of topology templates:
• A dynamic template automatically creates and updates multiple resource groups based on
your criteria.
• A tag based resource group template defines a single group of resources that share a common
set of tags.
• An exact template defines a single set of specific resources.
Uempty
Topology rules
• Merge: merge a common resource from separate topologies to stitch the topologies together
• Match: use a resource property to match incoming events
• Tag: use a resource property to add a tag
• History: exclude unimportant property changes from history retention
• Business criticality: use a resource property to populate the businessCriticality
property
Rules change the way the topology service processes data. You can define rules with the user
interface or with an API. The following types of rules are available:
• Merge rules merge a common resource from separate topologies to stitch the topologies
together. In a merge rule, you specify which property should be used to correctly identify the
resource that is common to multiple topologies.
• Match rules copy the value of another property to the matchTokens property. The matchTokens
property is used to match incoming events to resources.
• Tags rules copy the value of another property to the tags property so that it becomes
searchable in the user interface.
Hint
Any field that isn't indexed and can therefore not normally be searched for becomes searchable if
copied to the tags property.
• History rules identify properties that change every observation, but that don't indicate an
important change, for example a host's sysUpTime property. A history rule excludes properties
like these from being retained in history.
• Business criticality rules copy a particular property of a resource into the resource's
businessCriticality property, which is then used to define business criticality.
Uempty
Topology dashboard
The WebGUI user interface and the AIOps user interface both provide a topology dashboard
where each user can display the groups that are most important to them. To create a dashboard,
users mark topology groups they want to include on the dashboard as favorites.
Notice that there are three distribution charts in the example. This is because three topology
groups were marked as favorites. The size of each chart represents the number of resources in the
group. The colors of each distribution chart represent the severity of events that are currently
active for the resources in each group.
If you click a distribution chart, the bottom of the dashboard displays more information about the
group:
• A topology map of the group
• Historical event status of the group
• Historical changes to members and relationships in the group
Uempty
Uempty
Review questions
1. True or False: The IBM Cloud Pak for AIOps models only servers and network devices.
2. What is an edge?
a. A relationship between two resources, such as a network connection.
b. A group of resources manually added to a topology.
c. An endpoint device, such as a workstation or IP phone.
d. A topology template.
Uempty
Review answers
1. True or False: The IBM Cloud Pak for AIOps models only servers and network devices.
The answer is False.
2. What is an edge?
a. A relationship between two resources, such as a network connection.
b. A group of resources manually added to a topology.
c. An endpoint device, such as a workstation or IP phone.
d. A topology template.
The answer is A.
Uempty
Exercise: Topology
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
Adding the token in the merge rule is a three-step process:
Step 1: Tokens field is empty Step 2: Enter the token and click Add
Uempty
Overview
This unit shows you how to configure another event reduction technique: scope-based groups.
You also see how groups can be combined into super groups, to achieve even greater event
reduction.
Uempty
Uempty
Scope-based groups
Scope-based group: a group of events that share the same value of a resource attribute and
arrive around the same time.
You can group events together based on known relationships in your IT systems. Scope-based
policies copy the value of a specified event field to the ScopeID field of the event. Incoming events
that meet the conditions of a scope-based policy and are enriched by copying a value to their
ScopeID field. Events that have the same ScopeID value and arrive around the same time are
grouped together.
Consider the three examples on this slide.
Applications example: A collection of applications compose a service. Events from these
applications all have the value shoppingCart in their Service field. The ScopeID is set to the
value of Service by a policy. All events where ScopeID='shoppingCart' that arrive within 300
seconds of each other are grouped.
Line of business example: A collection of storage arrays are assigned to the research team.
Events from these arrays all have the value Research in their Department field and the string
storage in their node name. The ScopeID is set to the value of Department by a policy. All events
where node contains 'storage' AND Department='Research' that arrive within 2 minutes of
each other are grouped.
Location example: Events from equipment in a specific location all have the value Istanbul-Labs
in their Location field. The ScopeID is set to the value of Location by a policy. All events where
Location='Istanbul-Labs' that arrive within 10 minutes of each other are grouped.
Uempty
Create a policy to start using scope-based groups. To create a policy, define the following details:
• Name: A unique name for the policy.
• Description: A meaningful description.
• Priority: The priority of the policy.
• Condition: Set conditions so the policy can select which events to enrich.
• Action: The event field you want to copy to ScopeID.
• Time window/quiet period: The time period in which events with the same ScopeID are
grouped together. Or, the number of seconds that need to pass with no further events, after
which the system will stop including events in each occurrence of each group.
Uempty
Scope-based groups are effective at event reduction. Users see fewer overall events when using
Scope-based groups. In this example, seven events have been combined into a single parent
event using scope-based groups.
Users see a Venn diagram icon in the Grouping column of event lists. The Venn diagram icon
represents a scope-based group. Clicking on one of the Venn diagram icons in the group takes
users to a detail page, where they can see more information about the group and the scope.
Uempty
Super groups
Super group: a group of events that have been joined together from other groups.
When events are in more than one group, scope-based groups can overlap with other groups. The
IBM Cloud Pak for AIOps automatically combines scoped-based groups with other groups to
create super groups. This event super group function achieves further event and ticket reduction.
In this example, a event from a node named computer1000 is in two groups: a topology group and
a scope-based group. The Venn diagram icon and the topology icon represent the type of each
group. AIOps has combined the two groups into a super group; reducing nine events into a single,
actionable parent event.
Uempty
Uempty
Review questions
1. What is a super group?
a. A group of events that have been joined together from other groups.
b. The global grouping setting, used to disable grouping.
c. A group of events that all come from the same node.
d. A group of events that are more important to the business than other events.
Uempty
Review answers
1. What is a super group?
a. A group of events that have been joined together from other groups.
b. The global grouping setting, used to disable grouping.
c. A group of events that all come from the same node.
d. A group of events that are more important to the business than other events.
The answer is A.
2. What is a scope-based group?
a. A group of events from scoping management software.
b. A group of events that are always suppressed.
c. A group of events that share the same value of a resource attribute and arrive around the same time.
d. A group of events that are always escalated.
The answer is C.
Uempty
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• This exercise depends on a topology group that you created in an earlier unit: Unit 4 “Topology.” Before you
continue, be sure to complete section 6 of the Topology unit.
Uempty
Unit 6. Runbooks
Estimated time
00:45
Overview
This unit shows you how to work with runbooks, which operators use to fix incoming problems.
These runbooks can be automated, so that the fix can be run at the push of a button; or configured
to run without any human interaction at all.
Uempty
Uempty
About runbooks
• IBM Runbook Automation is an easy-to-use service that empowers IT Operations
Management teams to:
Create, manage, and execute guided tasks and automated activity
Quickly set up event-triggered automated guidance and actions
Interoperate with management and collaboration tools, both cloud-based and on-premises
Automatically track runbook and automation execution activity statistics
Share expertise: SMEs can author runbooks once; users access their knowledge repeatedly
IT systems are growing. The number of events is increasing, and the pressure to move from
finding problems to fixing them is increasing. IBM runbook automation assists operational and
expert teams in developing consistent and reliable procedures for daily operational tasks.
Use IBM runbook automation to build and execute runbooks that help IT staff solve common
operational problems. IBM runbook automation can automate procedures that do not require
human interaction, thereby increasing the efficiency of IT operations processes. Operators can
spend more time innovating and are freed from performing time-consuming manual tasks.
A runbook is a controlled set of automated and manual steps that support system and network
operational processes. A runbook orchestrates all types of infrastructure elements, like
applications, network components, or servers.
Uempty
• Automatically track runbook and automation execution activity statistics
• Share expertise across your team
Uempty
Using IBM runbook automation, you can record standard manual activities, so that they are run
consistently across the organization. To simplify these tasks further, the next step is to replace
manual steps with automated tasks, and it allows your organizations to travel the automation
maturity curve:
• Start with standard manual activities that are documented, being consistent across the
organization.
• Replace manual steps with automated tasks.
• Transition to fully automated procedures.
There are three types of runbooks, each requires different levels of human interaction:
• Manual runbooks: A step-by-step description of the exact procedure an operator should
follow. Operators use their standard tools (for example: terminal emulator, putty, GitHub) to
interact with the target system.
• Semi automated runbooks: Each step describes exactly what an operator should to do, and
the operator simply pushes a button to execute an automated task on a target system.
• Fully automated runbooks: The runbook is selected by the system as response to a trigger
and executed without operator attention. The results of the runbook are stored for technical
review.
Uempty
Runbook personas
The IBM Cloud Pak for AIOps includes tools for the following personas within your team:
• Authors create and edit runbooks, along with other runbook service objects
• Users or operators run runbooks that have been created for them and leave feedback
There are two categories of tools to use when working with runbooks:
• Authoring and configuration tools: subject matter experts on your team author runbooks,
create runbook objects, and view feedback. Administrators configure connections to target
systems.
• User tools: Users run runbooks and leave feedback. If configured, users can run runbooks
from an event list.
Uempty
Runbook parameters
Runbooks can include parameters. Parameters can be used as variables that get substituted by a
value in the text of a runbook step, or in an automated step. The value of these parameters can be:
• Populated by the user at runtime by the user
• Populated with default values by the runbook author
• Automatically populated with values from an IT event
• Automatically populated from a previous step in the runbook
In this example, a manual runbook guides users through the process of restarting a DB2 database
instance. The runbook has two parameters: HOSTNAME and DB2INSTANCE. This is because the
runbook needs the host where the database is running and the name of the instance to restart the
instance. In this example runbook, the user starts by entering values for HOSTNAME and
DB2INSTANCE. After the parameters are populated, the text in the runbook substitutes the two
parameters with the user-defined values.
Uempty
In the case of semi-automated and fully-automated runbooks, the runbook service connects to
the target systems and takes an action. There are several ways to connect to your target systems:
• With SSH: IBM runbook automation can run scripts and stream commands over SSH to target
systems. The runbook service saves the output of the command or script.
• With an HTTP interface: You can use HTTP methods such as GET, POST, and DELETE to
connect the runbook service to a web service. The runbook service saves the response.
• With Ansible Tower: Use the automation features of Ansible Tower, such as playbooks, job
templates, job workflow templates, credential management, and the integration of external
version control systems for playbooks.
• GitHub external runbook library: A special-use connection. Used to retrieve runbook content
from a GitHub repository.
Uempty
Runbook automations
Automations are pushbuttons for users that connect to a target system and take an action.
Automations can be included in semi-automated and fully automated runbooks.
• Automations depend on
connections to target
systems
• Automation types correspond
to connection types (for
example SSH or HTTP)
With semi-automated and fully-automated runbooks, users use push buttons to run commands.
These buttons are called automations. When users click a Run button, the runbook service
connects to the target system and executes some action.
This example shows the configuration of a runbook automation named FTP_SERVICE_STATUS. The
automation in this example connects to a server and streams a command to it using SSH.
This pushbutton automation is a key to consistently. With automations like this example, runbook
users cannot make a mistake entering commands because they do not need to enter any
commands.
Uempty
Running a runbook
This example shows a semi-automated runbook. The user simply clicks the Run button and the
runbook service connects to the target system and runs the command for the user. The user then
sees the output of the command and the result and can more on to the next step.
After all steps in the runbook are complete, users are prompted to leave feedback.
Uempty
Creating runbooks
Runbooks are quick and easy to create. Runbook authors use a simple tool to build runbooks.
Uempty
Runbook triggers
Triggers match incoming events to runbooks that are likely to fix the problem. End users can
select an event in an event list, then start a runbook to fix the problem that caused the event.
Triggers perform the following actions:
• Match incoming events to runbooks based on conditions within the event
• Events that match runbooks are enriched with runbook details, such as RunbookID and
RunbookStatus.
• Parse text from events to use as parameter values
• Fully-automated runbooks start by themselves in response to matching events
This slide shows the trigger configuration page. In this example, events that meet the following
conditions are matched to a runbook named Fix FTP Service.
Summary='FTP service provided by vsftpd is down' AND Service='FTP'
The trigger also reads the Node field of the incoming event and uses it to populate a parameter
within the selected runbook: HOST.
Uempty
Runbooks are delivered to end users in event lists. If an event matches a runbook, users click the
dot in the Runbook column in an event list to learn more about the runbook and run it.
WebGUI users can launch a runbook with a right-click tool.
Uempty
Runbook history
Runbook authors and approvers can view runbook history. With runbook history, you can:
• View the results of runbooks that have run in the past. If the runbook contained an
automation, the output of the target system is saved in the runbook’s history.
• Compare user feedback from different versions of your runbook. This makes it easy to see if
your runbook improves as you edit it over time.
• See the average time it took to run each version of the runbook.
You can offload historic runbook executions to a file, to reduce the footprint of the database and
increase database performance.
Uempty
After all steps in the runbook are complete, users are prompted to leave feedback:
• The user can give a star rating to the runbook to indicate how useful it was or how well it
worked. This star rating is visible to other runbook users and runbook authors.
• The user can add a comment. This comment will be visible to runbook authors.
• The user must click either Runbook did not work or Runbook worked to exit. This action is
used to calculate the success rate of the runbook.
Feedback helps your team's subject matter experts and runbook authors know if the runbook was
clear and worked as intended, or if it needs improvement.
Uempty
The user interface includes a runbook statistics dashboard. This dashboard shows data about
runbooks that have run in the past.
The runbook statistics dashboard is divided into two sections: runbook metrics are displayed on
the left and runbook execution records are shown on the right.
Uempty
Uempty
Review questions
1. True or False: The runbook service can connect to a target using SSH.
2. What is a runbook trigger?
a. A runbook trigger matches incoming events to runbooks that are likely to fix the problem.
b. A runbook trigger enables or disables all runbooks.
c. A runbook trigger starts AI training jobs.
d. A runbook trigger is a JDBC driver, used to read and write data to internal databases.
Uempty
Review answers
1. True or False: The runbook service can connect to a target using SSH.
The answer is True
2. What is a runbook trigger?
a. A runbook trigger matches incoming events to runbooks that are likely to fix the problem.
b. A runbook trigger enables or disables all runbooks.
c. A runbook trigger starts AI training jobs.
d. A runbook trigger is a JDBC driver, used to read and write data to internal databases.
The answer is A.
Uempty
Exercise: Runbooks
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• The SSH connection will not work if you don't save the key in a file named: authorized_keys
in the location: /home/netcool/.ssh
• Triggers must have a description. You cannot save a trigger without a description.
Uempty
Unit 7. Triggers
Estimated time
01:00
Overview
Triggers are an automated way to alter data, including event data, in the central Event Manager
database. This unit teaches you how to create and use triggers.
Uempty
Unit objectives • Learn to interact with Event Manager using an SQL interface
• Understand how triggers change events in the Event Manager
database
Uempty
Uempty
7.1. ObjectServer structure and SQL
ObjectServer
structure and SQL
Uempty
What is an ObjectServer?
• An ObjectServer is the central database where Event Manager stores events
• An ObjectServer is a collection of multiple databases
• ObjectServers:
Receive event data from probes, webhooks, and other and monitors
Process event data using tools and automations
Transfers event data using gateways
Displays event data to the user
• ObjectServers come in pairs for redundancy: a primary ObjectServer and a backup
ObjectServer
• ObjectServers can be deployed in tiers to process high volumes of events, for example a
collection tier to receive events along with an aggregation tier to process events
ObjectServers are the central data stores within Event Manager, including the database where all
events are stored.
Uempty
ObjectServer databases
Initially, the ObjectServer has the following databases, among others:
• alerts: Alert data, and event list configuration
• catalog: System catalog containing Object Server metadata (can be viewed but not modified)
• custom: Database for tables added by users
• iduc_system: Channel setup for accelerated event notification (AEN)
• master: Compatibility with previous releases; Desktop ObjectServer tables
• persist: Triggers, procedures and signals
• security: Authentication information for users, roles, groups, permissions
• service: service.status table for service display (used mostly with monitors)
• tools: User tools and menu structure
• transfer: Used by the ObjectServer gateways
Uempty
• Identifier • Tally
• Serial • ExpireTime
• ServerSerial • Severity
• ServerName • OwnerUID/GID
• StateChange • AlertGroup
• FirstOccurrence • AlertKey
• LastOccurrence • Manager
• Type • Summary
• Acknowledged • Class
• Node
Triggers © Copyright IBM Corporation 2022, 2023
The ObjectServer table alerts.status defines the structure of the Event Manager event record.
Event Manager probes, webhooks, and other monitors populate the alerts.status table.
In the example of a probe, the probe collects information from some source, and breaks the
information into pieces referred to as tokens. The probe assigns a token to a column in the
alerts.status table. The probe populates the table by creating an SQL INSERT statement. That
statement is forwarded to the ObjectServer, and causes a record to be created in the
alerts.status table.
This slide lists some of the key fields in the alerts.status table:
• Identifier: This is the unique identifier for the alerts.status database and is key to event
deduplication. It is essential that the identifier identifies repeated events appropriately.
• Serial: This is an automatically populated field and is a unique reference for an event within a
particular ObjectServer. Event Manager automatically assigns a number to this field when a
new event arrives in the ObjectServer.
• ServerSerial and ServerName: Unique values for the server that received the event first. This
is important for architectures with multiple ObjectServers and gateways.
• StateChange: This is a time field updated by triggers. Event Manager updates this field with
the current time each time the state of an event changes (either from the data source or the
ObjectServer).
• FirstOccurrence: This is a time field that is updated by triggers. It contains a timestamp of
when the event first arrives in the ObjectServer. It should not subsequently be changed.
• LastOccurrence: This is a time field updated by triggers. It contains a timestamp of the last
occurrence of the event. Unlike FirstOccurrence, its value changes if another instance of the
same event arrives.
Uempty
• Type: An integer field and can generally take three values: 0, 1 and 2. A type of 0 means that
the type has not been set. A type of 1 means that the event is a problem event (link down, for
example). A type of 2 means that the event is a resolution event (link up).
• Acknowledged: Events in Event Manager can be acknowledged by a user. It is this field that
represents the acknowledgement of an event. It is an integer field but behaves as a Boolean:
0=unacknowledged and 1=acknowledged.
• Node: Identifies the managed entity from which the alarm originated. This could be a device or
host name, service name, application name, or other entity. The Node column must contain
the name of the entity which allows direct communication, or can be resolved to allow direct
communication, with the entity.
• Tally: The ObjectServer maintains a count (or tally) of the total number of recurrences of an
event.
• ExpireTime: An integer field that can optionally be used to automatically remove events.
When set to a non-zero value, it represents the number of seconds that the event remains in
the ObjectServer. A trigger checks any event with a non-zero value. After the event has been in
the system for longer than the configured number of seconds, it is removed.
• Severity: This field denotes the severity or priority of the event within the ObjectServer. It is
an integer field and has these values by default:
▪ 0: Clear
▪ 1: Intermediate
▪ 2: Warning
▪ 3: Minor
▪ 4: Major
▪ 5: Critical
• OwnerUID: The owner ID of the event in alerts.status table.
• OwnerGID: The group ID of the event in alerts.status table.
• AlertGroup: The descriptive name of the failure type indicated by the alert. For example:
Interface Status or CPU Utilization.
• AlertKey: Indicates the managed object instance referenced by the alert. For example, the
disk partition indicated by a file system full alert or the switch port indicated by a utilization
alert.
• Manager: Normally denotes the data source that processed the event.
• Summary: A summary of the problem associated with the event.
• Class: A way of classifying equipment types in events. Enables tools to be assigned against
events of specific equipment types.
Uempty
One of the primary roles of a data source such as a probe is to convert machine information from
the event source into human-readable text in the ObjectServer event record. There are times
when it might be important to see the original machine data, typically when debugging some
problem. You can configure the data source to send extra data to the ObjectServer whenever an
event is created. This additional data is saved in the alerts.details table. There is a database link
between alerts.status and alerts.details, which enables a user to view the details that are related
to a specific event.
Uempty
When working with events, you might want to track the history of a particular event:
Who has owned it
What severity levels it has passed through
What automations have acted on it
When working with events in Event Manager, you often want to track the history of an event. You
want to know who owns the event, what severity levels it passes through, what automations act
upon it, and more. You can use the journal to track the history. When a new journal entry is added,
the data is stored in the alerts.journal table.
The journal entry contains the name of the user, the date, time, and text that describes the
operation. This information provides an important chronological history of actions that are taken
against the event.
You can add entries to the journal manually. You can also add to the journal with a trigger.
Uempty
ObjectServer SQL is a subset of ANSI SQL, which is used widely throughout Event Manager. For
example, it is used in the SQL file that creates the ObjectServer, and is used in automations in the
ObjectServer. ObjectServer SQL commands can be roughly divided into three functional areas:
• Data Definition: Used in SQL files to define and create databases and tables
• Data Manipulation: Used by triggers, tools, and filters to retrieve, modify, and delete data
• System Administration: Used on the command line to manage the system
The data ObjectServer is case sensitive, including database names, table names, and column
names.
This unit covers data manipulation commands, which are useful for ObjectServer triggers.
Uempty
Command-line access
• The nco_sql command provides access to the ObjectServer's SQL interface
$OMNIHOME/bin/nco_sql –server <ObjectServer Name>
Requires ObjectServer user name and password
If not specified, root is the assumed user
The nco_sql command is available inside of ObjectServer containers and on computers
where ObjectServers run
Use the nco_sql tool to access the command-line SQL interface of an ObjectServer.
nco_sql –server <OBJECTSERVER_NAME> -user <NAME> –password <password>
If no user name is specified, the system assumes root.
Uempty
Use the SELECT command to retrieve one or more rows or partial rows of data from an existing
table.
You can select an event with the following command:
select * from database.table where FieldName=condition;
Select all non-hidden columns:
select *
From the database and table specified:
from database.table
Using the following condition:
where FieldName=condition
In this example, the command retrieves all columns from the alerts.status table and displays
every record with a Severity of critical.
select * from alerts.status where Severity=5;
Uempty
Inequality comparisons
SELECT * FROM alerts.status WHERE Grade >= 3;
ObjectServer SQL supports the ability to retrieve specific fields or columns. The following example
only retrieves the Summary and Class fields.
select Summary, Class from alerts.status
ObjectServer SQL supports logical operators:
AND
OR
ObjectServer SQL supports comparison operators.
> greater than[or equal to] >=
< less than[or equal to] <=
<> Not equal to
LIKE and NOT LIKE are typically used in string comparisons and often in conjunction with regular
expressions. The following metacharacters are the most commonly used:
. Match any single character (for example, link.n matches link2n, not link21n)
* Match none or more of the previous characters (for example, link* matches lin, link, or
linkkk)
+ Match one or more of the previous characters (for example, link+ matches link, linkkk but
not lin or linxk)
[ ] Match any single character within the given range (for example, link[0-5] matches link2 but
not link9)
^ Ensure that the pattern matches at the beginning of the string (for example, ^link.* matches
linknorth, but not northlink)
Uempty
$ Ensure that the pattern matches at the end of the string (for example, .*link$ matches
northlink but not linknorth)
In regular expressions, a backslash (\) escapes special characters (match literal value).
The NOT keyword inverts the result of any comparison.
Uempty
Uempty
• Multiple fields
UPDATE alerts.status
SET Severity = 5, Service = 'Web Host‘
WHERE Grade = 4 AND Customer like 'ISP';
In this case, the asterisk or field name is not required. The statement updates the table that is
defined, using the set assignment, based on the where condition.
For example, this statement first locates any record in the alerts.status table with a Severity of 3.
It changes the Severity to 4 for every record found.
update alerts.status set Severity=4 where Severity=3;
Uempty
The INSERT command creates a new row of data in an existing table. If you are not inserting
values for every column in the row, you can specify a comma-separated list. This list has columns
that are inserted within parentheses, followed by the VALUES keyword, followed by a
comma-separated list of values within parentheses.
INSERT statement
insert into database.table (IntegerField, StringField, InetgerField2,
StringField2) values (3, 'text', 3, 'more text');
String field values are single-quoted. You must specify a value for the primary key columns in the
INSERT command.
UPDATING keyword
The optional UPDATING keyword forces the specified columns to be updated if the insert is
deduplicated.
Example:
insert into status (Identifier, Severity, Tally, Serial) values
('ConrolMachineStats15', 5, 12, 21) updating (Severity);
In the preceding example, a new record is inserted into the alerts.status table. The new record
will have the following four fields specified.
• Identifier: ControlMachineStats15
• Severity: 5
• Tally: 12
• Serial: 21
Uempty
If a record already exists in alerts.status with the same identifier, then deduplication occurs.
During deduplication, only specific fields are updated, and Severity is not one of them. By
including the updating (Severity) text, the ObjectServer is forced to update the Severity field.
Without that text, the Severity field does not change.
Delete statement
This statement deletes the rows from the table using the specified condition, for example:
delete from database.table where condition;
This statement removes every record from alerts.status that is Green/Clear (Severity=0):
delete from alerts.status where Severity=0;
Note
Uempty
The Event Manager administrator tool is a powerful configuration tool that helps you customize
and manage ObjectServer databases. The administrator tool is a thick client, which you download
and install with the ObjectServer on-premises software. The administrator tool connects to
ObjectServers in your environment, whether your ObjectServers are on-premises or running in
Red Hat OpenShift.
The administrator tool includes a tool called the SQL workbench. This tool helps you create and
validate SQL commands.
This slide shows an SQL statement in the SQL workbench, along with a list of columns in the
alerts.status database.
Uempty
7.2. Automations and triggers
Automations and
triggers
Uempty
Triggers (automations)
Triggers are a way to respond to events that happen within the ObjectServer.
Triggers are used for the following tasks, among others:
• Automate management of events
• Perform actions automatically on receipt of certain events
• Incorporate escalation procedures
• Correlate events
Triggers detect changes in the ObjectServer and run automated responses to these changes. This
enables the ObjectServer to process alerts without requiring an operator to take action.
Triggers are also called automations.
Uempty
Trigger types
There are three types of triggers:
• Database: A database condition exists in ObjectServer
• Temporal: This trigger runs on a timed basis
• Signal: A system or user-defined signal was raised
Uempty
When you create a trigger, you must configure some settings. Some of the setting values are
common across all trigger types. Some triggers do not use some settings.
Trigger names are character strings. A trigger name cannot contain spaces or special characters
except for the underscore. The trigger name must be unique.
Triggers are organized into trigger groups. You must choose a group when you create a trigger.
The WHEN setting is used to configure an optional condition that must be met before a trigger can
activate. The condition that is defined in the WHEN setting is in addition to the type of trigger. For
example, you create a temporal trigger with a frequency of every hour. The trigger activates every
hour. If you create a WHEN condition, the trigger runs every hour only if the WHEN condition is
met.
The ACTION setting contains the commands that run when the trigger is activated. The
commands are typically one or more SQL commands.
Uempty
Trigger groups
Triggers are organized into trigger groups. Trigger groups are used to organize and control one or
more triggers.
A trigger group has a name, and the name has the same constraints as a trigger name. A trigger
group can be used to control the activation of multiple triggers. If you disable a trigger group, you
prevent the activation of all triggers that belong to that group.
Uempty
WHEN clause
You can determine when the trigger fires:
• Day of week
• Time of day
• Severity at a certain level:
new.Severity>3
• Deduplication period is below a certain amount:
(new.LastOccurrence - new.FirstOccurrence) < 60
The WHEN setting is used to define a condition that must be met before a trigger activates. A user
might create a trigger to automatically delete certain events from the ObjectServer that runs once
every hour. After the trigger is enabled, the trigger activates every hour, on every day of the week.
The user might not want the trigger to remove events on Saturday or Sunday. The user can add a
WHEN condition to the trigger to test for the day of the week. In the WHEN condition, the user
specifies that the trigger does not activate on Saturday or Sunday. After the WHEN condition is
added, the trigger activates once every hour on every day of the week except Saturday or Sunday.
Uempty
Trigger actions
• Trigger actions are SQL code blocks
• They can be designed to manipulate data in the action statement itself
• They can run a predefined procedure
The ACTION contains the commands that run when the trigger activates. The ACTION can contain
a single SQL command, a block of commands, or a command that runs a procedure.
Uempty
This slide contains a list of some of the SQL commands that you can use within an SQL code block.
These commands are commonly used in triggers.
Uempty
IF condition THEN
action_command_list
[ELSEIF condition THEN optional
action_command_list
...]
[ELSE optional
action_command_list]
END IF;
ObjectServer SQL is used to implement logical controls within any code block. The most common
way is by using an IF() statement.
For instance, you might want to implement the logic that if Grade is 99, do one thing; else if Grade
is 98, do something else. You can use IF() blocks for this task.
Uempty
You can use the FOR EACH ROW loop to perform actions on a set of rows that match a certain
condition. The following example increases the severity of all alerts in the alerts.status table that
have a severity of 3 to a severity of 4.
FOR EACH ROW alert_row in alerts.status WHERE alert_row.Severity=3
BEGIN
SET alert_row.Severity = 4;
END;
Triggers that use FOR EACH ROW are also known as a row-level triggers.
About the EVALUATE clause
Generally, use of the EVALUATE clause is relatively inefficient and its use should be avoided
whenever possible. When a trigger contains an EVALUATE clause, a temporary table is created to
hold the results of the SELECT statement in the EVALUATE clause. The amount of time and
resources that this temporary table consumes depends on the number of columns that are
selected and the number of rows matched by the condition in the WHERE clause.
In most cases, you can replace an EVALUATE clause with a FOR EACH ROW clause which cursors
over the data and does not incur the processor usage of creating a temporary table.
A suitable use for an EVALUATE clause is when a GROUP BY clause is being applied to an SQL
query.
Uempty
In summary, triggers activate based on three types of conditions: time, database change, and
signal. When the trigger activates, an optional WHEN clause is evaluated. If the WHEN condition is
met, the trigger continues. The optional EVALUATE setting creates a temporary table that contains
records that meet some condition. The ACTION setting contains the commands that run.
Uempty
ACTION begin
delete from alerts.status
where Severity = 0 and
StateChange < (getdate() - 120);
end
A temporal trigger activates based on a time frequency. Do not confuse time in this case with time
of day. The frequency defines how frequently the trigger activates, not when it activates. For
example, a temporal trigger with a frequency of 1 hour activates every hour. The trigger might not
activate on even hour boundaries, for example, 8:00, 9:00, 10:00. The activation is based on when
the ObjectServer starts. After the ObjectServer starts, the trigger activates every hour.
Uempty
ACTION begin
set old.Tally = old.Tally + 1;
set old.LastOccurrence = new.LastOccurrence;
set old.StateChange = getdate();
set old.InternalLast = getdate();
set old.Summary = new.Summary;
set old.AlertKey = new.AlertKey;
if ((old.Severity = 0)and(new.Severity > 0))
then
set old.Severity = new.Severity;
end if;
end
One of the key features of the ObjectServer is data deduplication. This feature provides for
significant event volume reduction by storing a single copy of an event regardless of how many
times it repeats.
In a database trigger, the ObjectServer looks for a database operation to occur against a table,
rather than a time interval. The database operation can be Delete, Insert, Reinsert, or Update. The
Pre/Post Action selector determines whether the action executes before or after the specified
database operation.
Apply to Row/Statement, if set to row (the default), means that the contents of the Action tab run
as many times as there are rows selected. When set to Statement, the action runs only once,
regardless of how many rows were affected.
On the Action tab, triggers also have access to the implicit variables new and old, whose values
are automatically set by the system.
• Row fields before change: old.fieldname (for example, old.Severity)
• Row fields before change: new.fieldname (for example, new.Severity)
In some operations, new or old row variables might not be available. For example, if a row is
deleted, there is no new row to read or modify. The availability of implicit variables depends on
the database operation performed.
Uempty
The When tab is empty, so it always runs.
The Evaluate tab is not used in this trigger.
The Action tab updates fields on the existing event selectively (replacing the existing old values
with the incoming new values). It also increments the Tally, and only updates the Severity if it had
already been set to clear (0).
Uempty
A signal is an occurrence in the ObjectServer that can be detected and acted upon. Signals can
have triggers attached to them. The ObjectServer can then respond with a specific action when a
signal is raised.
System signals are raised by the ObjectServer on changes to the system, for example:
• System startup, system shutdown
• Client connect, client disconnect, connection failure
• Backup success or failure
When a system signal is raised, attributes that identify the cause of the signal are attached to the
signal:
%signal.at, %signal.server, %signal.node
These attributes cannot be deleted or modified.
The Settings tab lets you choose the signal to execute this trigger. The signal can be a System or
User signal.
The When, Evaluate, and Action tabs function as in previous types of triggers covered.
The ObjectServer includes several system triggers, which are signal triggers. System triggers are
raised automatically based on some condition. For example, the connect system signal is raised
when a component connects to the ObjectServer. A disconnect signal is raised when a component
disconnects from the ObjectServer. If a signal trigger is configured based on the connect signal,
the trigger activates and creates a new event when a component connects to the ObjectServer.
Uempty
Signal triggers are used to automate numerous system functions, for example:
• Auditing: The triggers generate ObjectServer events when administrative changes are made,
such as the addition of new fields in an ObjectServer table.
• Profiling: Triggers collect and report statistics regarding the performance of the ObjectServer.
• Connections: Triggers generate events that are based upon connects, disconnects, and
connection failures.
• Failover and Failback: Triggers are used to implement controlled failover and failback in
ObjectServer high availability configurations
You can create user signals. For user signals, the signal must be raised in some fashion. The signal
can be raised with a tool, or from within another trigger. For example, a database trigger can be
configured to activate based on a DELETE to the alerts.status table. In the ACTION section of the
database trigger, you can configure a statement to RAISE a user signal.
Uempty
Procedures
Procedures are executable code called to perform common operations.
• Two types of procedures:
SQL procedures manipulate data in an ObjectServer database
External procedures run an executable on a remote system
• Can call procedures from nco_sql, a trigger, or a tool
• Syntax:
{EXECUTE|CALL} [ PROCEDURE ] procedure_name(expr,...);
• Example:
EXECUTE PROCEDURE myproc();
or with parameters:
EXECUTE PROCEDURE myproc(“text”,@Node);
Procedures are objects that can be called in SQL to perform an SQL operation or an external
operation. A procedure is similar to a macro in a programming language. It is a prebuilt collection
of code (SQL statements) that can be called from another process. The behavior of the procedure
can be adjusted, based upon variables (parameters) that are passed to the procedure when
called.
The procedures are stored in an appropriate table in the ObjectServer. The tables are
catalog.sql_procedures or catalog.external_procedures.
An External Procedure calls a script to run on the ObjectServer machine or any other machine
defined in process activity (PA). The process must be in the form of a script, and can accept
command-line parameters passed as variables from the caller. External procedures can accept
only IN type parameters. After the script is launched, it cannot return parameters back to the
caller.
You must run process activity (PA) to use external procedures. The external procedure is called
within the ObjectServer. The ObjectServer notifies process activity that a command must be run
on a host. The ObjectServer passes the host, user, and command name to process activity. The
process activity daemon runs the command.
Uempty
This slide shows the definition for the jinsert procedure. The jinsert procedure creates a journal
record. The parameters values identify the corresponding event record, and the message to place
in the journal entry. You can use this procedure to create a journal entry when a trigger modifies
an event record.
In this example, the mail_on_critical trigger has been revised to call the jinsert procedure to
produce a journal entry.
Uempty
When constructing the expression in the where clause, pay attention to type of field. Perform all
integer testing first (Severity=5), character processing next (Node=XYZ), and regular expression
testing last.
When creating the filter condition, include criteria that ensures that the trigger does not select the
same event over and over. This is important because triggers are designed to perform automated
actions.
If multiple temporal triggers are defined to run at the same frequency (for example, every five
minutes), stagger the priority settings among them. Staggering them ensures that they do not all
run at the same time.
Triggers are a powerful and useful feature. Most teams introduce new triggers periodically to
resolve some operational situation. Over time, it might be difficult to determine why a trigger
exists and what it was designed to accomplish. Take the time to add a short description to any
new triggers. Consider capturing details, such as when the trigger was created, the author, and
why the trigger was created.
Avoid editing the standard triggers that come with Event Manager. Instead, make a copy of the
existing trigger and edit the copy.
Uempty
Unit summary • Learn to interact with Event Manager using an SQL interface
• Understand how triggers change events in the Event Manager
database
Uempty
Review questions
1. True or False: A temporal trigger runs on a timed basis.
2. Which is the best way to insert a journal entry when an ObjectServer trigger fires?
a. The JINSERT stored procedure.
b. The AUTO_TRIGGER stored procedure.
c. With the topology service.
d. With the runbook service.
Uempty
Review answers
1. True or False: A temporal trigger runs on a timed basis.
The answer is True.
2. Which is the best way to insert a journal entry when an ObjectServer trigger fires?
a. The JINSERT stored procedure.
b. The AUTO_TRIGGER stored procedure.
c. With the topology service.
d. With the runbook service.
The answer is A.
Uempty
Exercise: Triggers
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• This exercise depends on steps you completed earlier in labs for Unit 2: “Incoming
integrations.” Before you continue, be sure to complete the labs for the “Incoming
integrations” unit of this course.
• After 10 minutes of inactivity, the Event Manager Administrator tool logs you out. If this
happens, double-click the ObjectServer to open the tool again. If you are logged out, you also
see errors in the terminal. You can safely ignore errors like these:
ERROR : Code-0 : Mon Jan 30 14:52:57 EST 2023 : bastion.labs.ihost.com/10.100.1.8 :
TextEditorPanel.getFullWordAt : : Exception: Invalid location
ERROR : Code-0 : Mon Jan 30 14:52:58 EST 2023 : bastion.labs.ihost.com/10.100.1.8 :
TextEditorPanel.getFullWordAt : : Exception: Invalid location
ERROR : Code-0 : Mon Jan 30 14:52:59 EST 2023 : bastion.labs.ihost.com/10.100.1.8 :
TextEditorPanel.getFullWordAt : : Exception: Invalid location
ERROR : Code-0 : Mon Jan 30 14:53:00 EST 2023 : bastion.labs.ihost.com/10.100.1.8 :
TextEditorPanel.getFullWordAt : : Exception: Invalid location
Uempty
Overview
This unit discusses how to manage user access to the Event Manager user interfaces.
Uempty
Uempty
If you run the IBM Cloud Pak for AIOPs Event Manager on Red Hat OpenShift, users must be
created in one of two ways:
• With the OpenLDAP interface, if you are using on the included OpenLDAP pod that comes with
Event Manager on OpenShift
• On your own enterprise LDAP server, if you are using the LDAP proxy option
Uempty
Adding users and groups with the WebSphere
Administrative Console
Figure 8-3. Adding users and groups with the WebSphere Administrative Console
If you use the included OpenLDAP pod that comes with Event Manager on OpenShift, you can use
the WebSphere Administrative Console to manage users and groups.
Uempty
newExampleUser.ldif
dn: uid=jdeans,ou=users,dc=mycluster,dc=icp Apply with:
objectClass: top
objectClass: person ldapadd -c -x -w $LDAP_BIND_PWD -D $LDAP_BIND_DN
objectClass: organizationalPerson -H ldapi:/// -f newExampleUser.ldif
objectClass: inetOrgPerson
cn: Jeanie Deans
uid: jdeans
givenName: Jeanie Deans
sn: jdeans
userPassword: p@ssw0rd Apply with:
ldapmodify -w $LDAP_BIND_PWD -D $LDAP_BIND_DN
addExampleUserToGroup.ldif -H ldapi:/// -f addExampleUserToGroup.ldif
dn: cn=icpadmins,ou=groups,dc=mycluster,dc=icp
changetype: modify
add: member
member:
uid=jdeans,ou=users,dc=mycluster,dc=icp
You can add users and groups directly to the OpenLDAP server that comes with Event Manager on
OpenShift.
To add and modify objects to OpenLDAP directly:
1. Connect directly to the pod named evtmanager-openldap-0, which runs the OpenLDAP
server.
2. After you are connected to the pod, create an .ldif text file to configure the objects you want.
3. Finally, run an ldap command to apply the configuration in your new file.
In this example, the file named newExampleUser.ldif decribes a user id: jdeans. After you
create this file in the pod, you run the ldapadd command to add the user in the file to the
OpenLDAP server running in the pod.
Next, the file named addExampleUserToGroup.ldif describes a change: modify the icpadmins
group by adding jdeans as a member. After you create this file in the pod, you run the ldapmodify
command to add the user to the group.
Uempty
DASH Roles
IBM Dashboard Application Services Hub (DASH) provides visualization and dashboard
services. DASH includes roles that can be associated with users or groups.
• User roles: the permissions in the role are granted to a single user
• Group roles: the permissions in the role are granted to a group of users
What is DASH?
IBM Dashboard Application Services Hub (DASH) provides visualization and dashboard services.
DASH has a single console for administering IBM products and related applications.
DASH relies on an application named IBM Jazz for Service Management (JazzSM). JazzSM
provides shared integration services, such as data, administrative, dashboard, reporting, and
security.
The WebGUI uses a client/server architecture and it is hosted inside DASH. Users connect to
DASH to access the WebGUI.
All of these user interface components run in an application server. WebSphere Application Server
is the web application server used to run Jazz for Service Management and its dependent
applications: DASH and WebGUI.
IBM Cloud Pak for AIOps running in Red Hat OpenShift uses LDAP for user authentication. The
following applications are configured to use a common federated user repository:
• IBM Dashboard Application Services Hub
• IBM Jazz for Service Management
• WebSphere Application Server
This means that users of all products that are in your instance Dashboard Application Services
Hub can be administered centrally.
About DASH roles
DASH includes roles that can be associated with users or groups. Groups logically categorize
users into units with common functional goals. Roles determine the data that users and groups
can view, and the actions that they can perform.
Uempty
A best practice is to assign roles to groups, rather than users.
Uempty
Key roles
The following roles grant permission to the features described in this course:
• inasm_admin • ncw_gauges_viewer
• inasm_editor • ncw_user
• inasm_operator • netcool_ro
• iscadmins • netcool_rw
• iscusers • noi_engineer
• ncw_admin • noi_lead
• ncw_dashboard_editor • noi_operator
• ncw_gauges_editor
This slide is a list of roles that are important for Event Manager users and administrators. These
roles provide the following permissions:
inasm_operator: A user with the inasm_operator role can access the Topology UI, and use it to
search for and visualize the resources in the topology service core application.
inasm_editor: The same permissions as inasm_operator, plus a user with the inasm_editor role
can add comments to resources from the Topology Viewer Context (right-click) menu. (A user with
the inasm_operator role can view comments, but not add new ones.)
inasm_admin: The same permissions as inasm_editor, plus a user with the inasm_admin role has
access to administrator tools, where they can define custom UI elements for the Topology Viewer.
iscadmins: This role grants access to the Dashboard Application Services Hub administrative
features.
iscusers: Users who are assigned this role can access the DASH welcome page and access their
credential store. All users have this role by default.
ncw_admin: Users with the ncw_admin role have access to the administrative functions of
WebGUI and the AIOps user interface.
ncw_dashboard_editor: A user with this role can edit event dashboards, which are monitor boxes
that show evens by category.
ncw_gauges_editor: A user with this role can edit the Gauges page and widgets, which display
self-monitoring data about Event Manager.
ncw_gauges_viewer: Users with this role can view Event Manager self-monitoring data on a
Gauges page.
Uempty
ncw_user: This role is the base that defines users as able to access the Web GUI and the AIOps
user interface. All users require this role.
netcool_ro: This role gives read only access to event management functions in the user
interfaces.
netcool_rw: This role gives read and write access to event management functions in the user
interfaces.
The following three roles provide access to event analytics and runbook features:
noi_operator
Event Analytics: The noi_operator role can open the Incident Viewer from the Event
Viewer, but cannot click-through on the seasonality and grouping icons in the Incident
Viewer. The Temporal group or Seasonal event panels are not available with this role. The
See more info option is not available on individual events with this role. Also, policies
cannot be approved or rejected with this role.
Runbook: View alerts and run the runbooks that are linked to those alerts. This role does
not have read access to the runbook library or other runbook pages.
noi_engineer
Event Analytics: The noi_engineer role can perform all operations on the UI, except for
managing policies.
Runbook: Like noi_operator, plus full read/write access to the runbook pages (Library,
Execution, Automations, Triggers).
noi_lead
Event Analytics: The noi_lead role can perform all operations on the UI. With the noi_lead
role, you can manage policies. This feature is not available to other roles.
Runbook: Like noi_engineer, plus full access to the administration of automation
connections and API keys, and full access to the runbook settings.
Uempty
Uempty
Review questions
1. True or False: You can use the WebSphere Administrative Console to manage users and
groups.
2. Which role is used to work with topology?
a. inasm_admin
b. topology_user
c. topology_admin
d. ibm_all_admin
Uempty
Review answers
1. True or False: You can use the WebSphere Administrative Console to manage users and
groups.
The answer is True.
2. Which role is used to work with topology?
a. inasm_admin
b. topology_user
c. topology_admin
d. ibm_all_admin
The answer is A.
Uempty
Uempty
These are the tasks you complete during the lab exercises for this unit.
Uempty
Lab tips
• During your lab, you will log in and out of the user interfaces several times to test new users
• Close the browser between user sessions for best results
Uempty
Unit 9. Summary
Estimated time
00:10
Overview
This unit summarizes what you have learned, and directs you to other resources to help you
continue learning.
Uempty
Unit objectives • Explain how the course met its learning objectives
• Identify IBM credentials that are related to this course
• Locate resources for further study and skill development
Uempty
Course • Describe the event management capabilities of the IBM Cloud Pak
for AIOps
objectives
• Connect Event Manager to incoming data sources
• Work with Temporal and seasonal event analytics
• Configure the Event Manager topology service
• Create scope-based groups
• Create runbooks and map them to incoming events
• Work with triggers
• Manage users
Uempty
Uempty
https://ibm.biz/swat_explains
Uempty
Additional resources (1 of 5)
• IBM Cloud Education course information
View and download course materials and
course corrections.
http://ibm.biz/CourseInfo
• IBM Developer
IBM's official developer program offers access
to software trials and downloads, how-to
information, and expert practitioners.
https://developer.ibm.com/
Uempty
Additional resources (2 of 5)
• IBM Automation Community
Learn about Blockchain, Blueworks Live, BPM,
Workflow, Case, Content Management,
Decision Management, Robotic Process
Automation, Platform, and Cloud Pak for
Automation
https://community.ibm.com/community/user/
automation/home
Uempty
Additional resources (3 of 5)
• IBM Training
Search the IBM Training website for courses
and education information.
https://www.ibm.com/training
• Learning Journeys
Learning Journeys describe a recommended
collection of learning content to acquire skills
for a specific technology or role.
https://www.ibm.com/training/journeys/#tab-
ibm-cloud
Uempty
Additional resources (4 of 5)
• IBM Redbooks
IBM Redbooks are developed and published by
the IBM International Technical Support
Organization (ITSO). Redbooks typically provide
positioning and value guidance, installation and
implementation experiences, typical solution
scenarios, and step-by-step "how-to" guidelines.
http://www.redbooks.ibm.com/
• IBM Documentation
IBM Documentation is the primary home for IBM
product documentation.
https://www.ibm.com/docs
Uempty
Additional resources (5 of 5)
• IBM Marketplace
Learn about IBM offerings for Cloud, Cognitive,
Data and Analytics, Mobile, Security, IT
Infrastructure, and Enterprise and Business
Solutions.
https://www.ibm.com/products
Uempty
Unit summary • Explain how the course met its learning objectives
• Identify IBM credentials that are related to this course
• Locate resources for further study and skill development
Uempty
Course completion
You have completed this course:
Configuring IBM Cloud Pak for AIOps Event
Manager
backpg