SSR System Arch
SSR System Arch
Y
R
A
IN
IM
EL
PR
Disclaimer
The contents of this document are subject to revision without notice due to
continued progress in methodology, design and manufacturing. Ericsson shall
have no liability for any error or damage of any kind resulting from the use
Y
of this document.
R
Trademark List
A
Ericsson.
IN
IM
EL
PR
Contents
1 Overview 1
1.1 Scope 1
1.2 Audience 1
Y
2 SSR Functional Architecture 1
2.1 Hardware Architecture 1
R
2.2 Software Architecture 36
2.3 System Redundancy 75
A
3 Architectural Support For Features 81
3.1 Layer 2 Cross-Connection and VPWS on SSR 81
3.2
3.3
3.4
Circuits
Link Aggregation Groups
Port and Circuit Mirroring
IN 90
92
111
IM
3.5 Routing 112
3.6 MPLS 127
3.7 Forwarding 136
EL
Management 214
3.12 Event Tracking Interface 215
3.13 Failure Event Notification Processes 219
4 Administration 221
4.1 Accessing the SSR System Components 221
4.2 Configuration Management 225
Glossary 279
Y
R
A
IN
IM
EL
PR
1 Overview
For an overview of the SSR platform with use cases, see the SSR System
Description.
1.1 Scope
This description covers the hardware, software, and functional aspects of
the product. It describes the internal functionality of the SSR and provides
a background for internal technical training. It includes a description of the
process modules and their interaction.
1.2 Audience
This document is intended for Ericsson employees in Research and
Development and Technical Support.
• Chassis
• Power modules
• Fan trays
The SSR contains the newest generation chassis controller card designed to
improve the performance and scalability of the control plane functions. The
platform takes advantage of the latest generation of Intel x86 processor, along
with higher memory densities to dramatically improve performance. SSR 8000
uses the switch fabric, and all cards in the chassis are connected to the central
switch fabric. Some key advantages of this design are a simpler backplane,
distributed intelligence, and scalability. Broadcom’s FE600 device (a 96-port
switch fabric chip) is used for the central switch fabric. Every SSR system uses
RPSW and ALSW switch cards. The 8020 system also uses SW cards.
• Eight switch fabric cards, including two RPSW cards that package a route
processor complex with the switch fabric and two ALSW cards that package
an internal switch (for control traffic) and alarm hardware.
• Backplane with 28 vertical slots divided into two stacked card cages, each
holding 10 full-height line and service cards and 4 half-height switch cards.
• Eight power entry modules (PEMs) in dedicated slots at the bottom of the
chassis. The PEMs blind mate into a custom vertical power backplane
to which the customer’s DC terminal lugs attach via bus bars. After
conditioning, the output power exits the power backplane via bus bars to
the backplane. Each supply requires an A (primary) 60-amp 54 V direct
current (DC) feed. An identical redundant B feed to the supply is Or'd inside
the PEM to provide a single load zone for N+1 redundancy. Total available
power is 14.7 kW, based on seven active 2.1 kW power supplies.
Note: Or'd power refers to two signals (DC power in this case), which are
logically combined such that the output is true if either one is true
(X= A+B). This is accomplished by connecting the two sources
with low-voltage diodes, with the higher DC voltage becoming the
output level. In contrast, with and'd power both signals must be
true for the output to be true (X=A*B).
• Chassis 38 rack units (RU) (57.75 in) high, including cable management,
which fits in a 600 mm BYB501 cabinet.
The SSR 8010 chassis is a 10-slot version of the 20-slot SSR 8020 router. It
uses the same line cards, service cards, and switch cards as the SSR 8020.
It also shares the same PEMs, fan tray, cable management, and air filter.
Basically, it is an SSR 8020 without an upper card cage.
• Six PEMs in dedicated slots at the bottom of the chassis. The two rows
of three horizontally mounted PEMs blind mate into horizontal power
backplanes to which the customer’s DC terminal lugs attach via bus bars.
Each supply requires an A (primary) 60-amp 54 V DC feed. An identical
redundant B feed to the supply is Or’ed internal to each supply to provide a
single load zone for N+1 redundancy. After conditioning, the output power
exits the power backplane via bus bars to the backplane. Total available
power is 10.5 kW, based on five active 2.1 kW power supplies.
• A 4-port 10GE or 20-port GE and 2-port 10GE line card, which allows up to
4 10GE XFP plug-ins or 20 SFP plugs-ins and 2 10GE XFP plug-ins.
• A 1-port 100 GE or 2-port 40GE line card, which allows up to one 100GE or
two 40GE hot pluggable form factor (CFP) plug-ins; required for Broadband
Network Gateway (BNG) services.
• A Smart Services Card (SSC), which provides advanced services that are
beyond the scope of the terminating and forwarding capabilities provided
by the line cards. The SCC is targeted on both control- and user-plane
applications.
Unlike a line card, an SSC does not have I/O interfaces that are used for
traffic processing. It receives all its traffic from other line cards via the
switch fabric. The SSC supports a single application per card and offers
complete installation flexibility. It occupies a single slot in the chassis and
can be plugged into any usable card.
From a design perspective, only two switch card variants exist, because the
SW card is a depopulated version of the ALSW card. All SSR systems require
both the RPSW and ALSW cards. The SSR 8020 system also requires SW
cards for expanded fabric capacity. The switch cards natively support 100
Gbps line card slots.
All line cards are connected to all fabric switches, as illustrated in Figure 3 for
the SSR 8020, for example.
On the SSR line cards, the fabric access processor (FAP) with Interlaken
interfaces connects to the FE600s on each switch card using a Broadcom
proprietary protocol. There are multiple serlializer/deserializer (SerDes) links
(Differential High-Speed Serial Interfaces) from each line card to each FE600
in the system. Table 2 describes the configuration and throughput of the data
plane on the two chassis.
• Timing control bus (TCB) used by the ALSW cards to provide reference
clock and epoch clock support for the system.
• Selection control bus (SCB) used by the ALSW cards to arbitrate the active
route processor in the system and to communicate that information to all
switch cards.
• Common equipment, such as fan trays and PEMs, are controlled by the
active RPSW card over I2C (two-wire standard interface) buses.
The CMB is one of three similar buses used to both control and distribute
information to the other elements of the chassis. The CMB provides redundant,
low-level communication and control from the RPSW cards to the line cards,
ALSW cards, and SW cards (see Figure 5). Figure 4 displays the CMB
interconnections. The master and standby RPSW card declaration defines
the master and standby CMBs. There is no hardware switchover. If a CMB
failure is detected, the RPSW card must decide if it is still master-capable. If
not, the ALSW card declares a new master RPSW card, and the new master
CMB follows.
The bus consists of an 8 MHz clock and a bidirectional synchronous data line.
The CMB also has an active low-interrupt line sourced by the bus recipient and
received by the RPSW cards, as well as a detect/reset signal. Not all CMB
interfaces need to support all CMB functions. The CMBs between the two
RPSW cards are of a very reduced nature.
The ALSW and SW cards use the Shiba field progammable gate array (FPGA).
For more information, see the Shiba Functional Specification with EngDoc ID
HW-HLD-0032 (http://cdmweb.ericsson.se/WEBLINK/ViewDocs?DocumentNa
me=62%2F15941-FCP1217270&Latest=true).
The RPSW cards support a variation of the CMB called the route processor
management bus (RPMB). The Phalanx Complex Programmable Logic Device
(CPLD) supports the RPMB with the following features on the RPSW card.
The CPLD allows a specialized feature set without compromising the original
SMB specification.
The RPSW card connects the other switch cards (ALSW and SW) through PCI
Express Gen 1 x1 lane ports, as shown in Figure 6. The PLX 8615 PCI Express
ports are enumerated as port 0 for the x4 lane Gen 2 interface to the Jasper
Forest processing chip and ports 1 through 8 for the x1 lane Gen 1 interfaces to
the switch card FPGAs on all switch cards, including the onboard FPGA. As
Figure 6 shows, all transactions to all FPGAs on all switch cards emanate from
the processor complex through the PLX 8615, which allows a homogeneous
interface for software. Port 0 is the UP port to the host and is configured as a
PCIe Gen 2 (5.0 GT/s) interface. Ports 1 through 8 are configured as a PCIe
Gen 1 (2.5GT/s) interface.
Figure 7 illustrates the control plane GE interconnection system used for line
card and RPSW card control communication. This is separate from the fabric
and forwarding plane connections. This interface is the backplane Ethernet,
with a central Ethernet switch on each ALSW card. The line cards support
1GE links, and the RPSW cards support a 10GE link rate from the Ethernet
switch on the ALSW cards.
The Intel i82599 Dual 10GE Network Interface Controller provides a x4 lane
Gen 2 (5.0 GT/s) interface from the Ethernet switch to the Jasper Forest
processor complex. A direct control plane connects the RPSW cards through
redundant 1GE links. The Intel i82580 Quad GE Network Controller provides
a x4 lane Gen 1 (2.5 GT/s) interface from the Ethernet switch to the Jasper
Forest processor complex. The Jasper Forest processor complex is capable of
handling 3–6 Gbps data bandwidth from the Ethernet control plane. The 10GE
uplinks provide future expansion capability. One Ethernet 10/100/1000Base-T
external system management port is also provided on each RPSW card from
the i82580 Quad Ethernet LAN controller.
The TCB is a redundant bus sourced by the ALSW cards. Its primary purpose
is as a conduit for timing information between the ALSW card and the line cards
(see Figure 8). The bus commands provide supportfor SyncE clock distribution.
For more information, see Section 2.1.4.5 on page 13.
Redundant buses are mastered from each ALSW card in the system. Each bus
is wired as a clock and data pair, wired in a dual star pattern to each line card in
the system (see Figure 9). The epoch clocks are two synchronized counters
operating synchronously to the TCB. They are used throughout the system for
coordinated event logging. The TCB provides the capability to distribute a
synchronized epoch clock to all line cards processing elements in the system.
The SCB clocks from each ALSW card are run from the local 100 MHz oscillator
used for the system FPGA clock. The data is non-return-to-zero (NRZ).
plesiochronous network using the physical layer interface. All network elements
along the synchronization path must be SyncE-enabled.
The core process on the SSR (timing control module daemon TCMd) is located
on the active RPSW card. It performs the following functions:
SyncE produces alarms when port, line card, chassis or ALSW faults are
generated. For more information about the alarms, see Alarms and Probable
Causes.
• BITS output ports have no dedicated input source selector and are always
timed from the equipment clock.
• The SSR SyncE implementation does not currently support SNMP. The
following counters and statistics are, however, available via CLI show
commands:
0 Input source used to drive a port’s transmit timing, and the state of that
source (freerun, holdover, locked).
• SSR CLI and front end components that implement the new configuration
and monitoring functions. The configurations are in three configuration
areas: Equipment Clock, SyncE port, and BITS.
• RCM – manages and stores the configuration. The configuration splits into
2 paths in RCM. SyncE port configuration follows the DSL configuration
path, through CTL_MGR, CSM, PAD, and down to the line card. The
configuration intended for TCMd, which includes Clock Selector and BITS,
flows through TCM_MGR and then directly to TCMd.
• TCMd – the main SyncE process. It controls all the timing features
• PAD implements several new SLAPIs to configure and monitor the SyncE
port.
• CMS provides card state notifications to the TCMd, including LC OIR and
ALSW OIR and redundancy change. CSM is not expected to change and is
provided here for clarity. RPSW (Standby)
• The TCM hardware is physically located on the ALSW card. Both Active
and Standby TCM are controlled by the Active RPSW. The active TCM
performs its function to the system. The Standby TCM is in warm standby:
• Equipment Clock PLL is synchronized to the Active output; and BITS inputs
and outputs are disabled.
Each ALSW card (active and standby) has an equipment clock Stratum-3/3E
module. The module on the active ALSW card performs monitoring of
synchronization input sources and provides the equipment clock to the
chassis. The module on the standby ALSW card waits in warm standby mode,
synchronized to the active clock.
On the standby ALSW card, BITS inputs and outputs are disabled. On the
active ALSW card:
• BITS inputs are enabled, monitored for faults, and if so configured, are
available as synchronization input sources.
Two Y-cables, one each for BITS A and BITS B, are required for BITS input
and output redundancy.
When there is no active RPSW or no active ALSW card, the system is said to
be in headless operation. Line card software detects the absence of an active
RPSW or active ALSW and takes the following compensating measures:
• If no ALSW card is present in the system, the line card switches to the
local oscillator.
Note: The SyncE function is in this degraded state during the two to three
minutes required to complete a software upgrade. During that short
period, the system operates without an active RPSW card, though
it continues to forward traffic.
The SCB is a redundant bus sourced by the ALSW cards. Its primary purpose
is to act as a conduit for RPSW mastership selection between ALSW cards.
Redundant buses are mastered from each ALSW card in the system. Each bus
is wired as a clock and data pair, wired in a dual star pattern to each line card in
the system (see Figure 11). The SCB clocks from each ALSW card are run from
the local 100 MHz oscillator used for the system FPGA clock. The data is NRZ.
The signals from each ALSW card to its mate run at a higher frequency
to ensure that the messages from each internal logic state machine are
synchronized at all times and to ensure a smaller window of uncertainty during
the selection process. The cross-coupled links run at 100 MHz and use
low-voltage differential signaling (LVDS) for both data and clock.
The SCB links to the RPSW card are slightly different because the RPSW
cards need both the IEEE 1588 clock synchronization updates and the epoch
clock updates. The ALSW cards overlay the TCB functionality onto the SCB
bus so that the RPSW FPGA must check only one interface to get all required
status and information.
Network propagated timing and building integrated timing supply (BITS) timing
are supported on the ALSW cards, which contain aggregation, selection,
and distribution logic for the system reference clock. The implementation is
compliant with SONET/SDH (ITU-T G.823/824) and SyncE (ITU-T G.8261)
standards.
The line card clocking architecture supports line-side clock recovery and
forwarding to the central timing logic and line-side transmit clocking support.
Figure 14 shows the basic block diagram for clocking support on the 40-port
GE and 10-port 10GE line cards. Figure 15 shows the diagram for clocking
support for the 1-port 100 GE or 2-port 40GE line cards.
Logic to support line-side Tx clocking selects its reference from the following
input sources: a free, running oscillator; a BITS clock, or any looped-back,
line-side receive clocks. The clock multiplexing function resides in a line
card–specific FPGA. The SI53xx family of voltage controlled oscillators
(VCXOs) is used to provide clock smoothing and glitch-less Tx clock switchover.
• CPU
0 Native quad core (four CPUs in a single device), with 32K Layer 1
(L1) and 256K L2 cache per core
0 Sixteen PCI Express Gen 1 and Gen 2 SerDes lanes, allocated across
the bridges
• BIOS
• USB
0 Interfaces to CMB, SCB, epoch clock, dune fabric CPU interface, I2C
interfaces to common equipment
0 Dual PCI Express Gen 1 x1 interface (one hard core and one soft core)
The ALSW card implements the user plane fabric (FE600 device), plus the
system’s control plane Ethernet switch, timing circuits, and alarm indicators.
Figure 17 provides a diagram of the ALSW card components.
For information about the ALSW card role in SyncE, see Section 2.1.4.6 on
page 17.
The following describes the Shiba FPGA and major block functionality.
• Provides RPSW card access to all devices on the ALSW card, including:
0 CMB slave
Figure 18 shows the layout and components of the 10-port 10GE line card,
which is based on two NP4 NPUs. Each NPU supports five ports of 10GE and
accesses the other line cards and services cards through the FAP. The card
also has a local processor that translates high-level messages to and from the
route processor into low-level device commands on the card. This card also
has two 10GE physical interface adapters that connect with the ports.
Figure 19 illustrates the 40-port GE line card. It is based on a single NP4 NPU
that supports all 40 ports of GE and accesses the other line and services cards
in the system through the FAP. The card also contains a local processor and a
GE physical interface adapter connecting the ports.
This high capacity card is based on two NP4 NPUs that process GE traffic,
running in simplex rather than duplex. The card also contains a local processor
and a GE physical interface adapter connecting the ports. The card accesses
the other line and services cards in the system through the FAP.
Only supported CFPs are allowed to power up. Also, hardware memory DIMMs
are now ‘keyed’ so that only approved DIMMs are allowed on the card. The
card does not boot with unapproved DIMMs.
This card can be configured to run in 40Gb or 100Gb mode using the card
mode command. You must reload the router to switch from one mode to the
other.
Figure 21 illustrates the 4-port 10GE, or 20-port GE and 2-Port 10GE line card,
which supports BNG application services on the SSR.
The line interface uses pluggable SFPs for GE bandwidth or pluggable XFPs
for 10GE bandwidth.
This card has two forwarding complexes, iPPA3LP (ingress) and ePPA3LP
(egress), which provide 40G forwarding bandwidth. The control path is based
on the same LP (Freescale™ PowerPC™ MPC8536) as the NP4 line cards,
but it runs at a higher frequency (1.5GHz) on this card, and is equipped with
4MB of RAM. The PPA3 clock rate is also set at the maximum of 750MHz to
improve data path forwarding performance. Flow control is supported by the
(Vitesse) Ethernet MAC. The FANG FPGAs associated with the IPPA3 and
EPPA3 NPUs provide interfaces with the Fabric through the FAP, carrying
20G of data throughput.
• Proxied—Modules that are not PFE aware do not talk directly with PPA, but
communicate through the PLd, which proxies messages in both upstream
and downstream directions.
The PLd therefore has a proxy thread and associated endpoint for every
feature that it proxies. Since there are two PFEs, the registration messages
are proxied and multiplexed into a single registration by PLd.
Configuration messages for a given feature are sent to the PLd thread
proxying that feature. The PLd determines the target PPA based on the
PFEid in the IPC header (some messages are sent to both PFEs).
When circuit information comes from the IP OS to the PPA3LP (or PFE), the
method to indicate that the circuit should be added or deleted is as follows:
0 8 cores
0 20 MB L3 cache
0 2 MB L2 cache
• Eight memory channels with eight dual in-line memory module (DIMM)
sockets
0 DDR3L-1333 Mhz
0 AMC.3 storage
• FAP
2.1.11 Backplane
The SSR 8000 backplane links together the different components of the SSR
chassis infrastructure. The backplane supports the major components in the
chassis, including the line cards, switch cards, PEMs, fan trays, EEPROM
card, and power backplane.
The SSR system uses the double-star architecture in which all line cards
communicate with all switch cards. The chassis supports each line card
interfacing with up to eight switch cards (see Figure 3), and the communication
between them relies on the backplane traces. Any line card slot can host either
a line card or a service card. Another important function of the backplane is
to distribute power from the PEMs through the power backplane to the entire
system.
The cards are vertically aligned in the front of the backplane. PEMs plug
into the power backplane located in the bottom part of the chassis. Bus bars
transfer power from the power backplane to the backplane, which distributes
the power to all cards and fan trays in the system.
The SSR 8020 has eight PEMs in dedicated slots at the bottom of the chassis,
and the SSR 8010 has 6. The PEMs blind mate into a custom vertical power
backplane to which the customer’s DC terminal lugs attach via bus bars. After
conditioning, the output power exits the power backplane via bus bars to the
backplane. Each supply requires an A (primary) 60 amp 54 V DC feed. An
identical redundant B feed to the supply is Or'd inside the PEM to provide a
single load zone for N+1 redundancy. Total available power is 14 kW based on
seven active 2 kW power supplies.
Each PEM is equipped with the status LEDs on the front surface to the right
of the inject/eject lever. See SSR 8000 Power Entry Modules for definitions of
the LED states.
The SSR fan tray is under command of the system’s 1+1 redundant RPSW
controller cards. Each RPSW card interfaces to the fan tray with a dedicated I2C
bus, each augmented with reset, interrupt request, and insertion status signals.
The SSR fan tray incorporates a controller board. The function of the controller
boards is:
The Thermal Manager varies the speed of the fans in response to thermal
events reported by the service layer. The thermal events are based on
temperatures reported by the cards installed in the chassis. There are four card
thermal states: Normal, Warm, Hot, and Extreme. There are two fan speeds:
High (full speed) and Low (40% of full speed). The fans speed up to full speed
when the temperature changes from Normal to any of the other three states.
For example, the fans speed up from Low to High if the temperature changes
from Normal to Warm, Hot, or Extreme. If the temperature goes to Normal
from Extreme, Hot, or Warm, the fan state changes to the Hysteresis state,
which responds to past and current events, and a 10-minute hysteresis timer
starts. At the end of that time period, the fans slow to Low speed, unless the
temperature goes up during that period. If the temperature goes up, the timer is
cleared, and the fans stay at High speed.
Fan failure detection notes when speeds deviate more than 15 percentage
points from the commanded set point. Failure modes:
• If a fan fails, the rest of the fans in the same fan tray run at full speed (fault
speed).
• If I2C communication with the host is lost, the watchdog timer expires,
and all fans run at full speed.
The SSR runs the Ericsson IP Operating System that is built around the
Linux OS. It uses the Linux kernel to implement basic services. such as
process management, scheduling, device management, and basic networking
functionality, as well as to provide some of the functionality of the system (like
ping and traceroute). Although the operating system routing stack depends on
many of Linux services, this dependence is not visible to the operator.
The operating system provides general interfaces for configuring and interacting
with the system that are OS-independent, such as the command-line interface
(CLI), Simple Network Management Protocol (SNMP), and console logs. Even
OS-specific information, like lists of processes and counters, is displayed
through the CLI in an OS-independent way. As a result, the operator does
not have to interact with the Linux OS directly or even be aware that the OS
used is Linux. However, it is possible to get access to the Linux shell and
directly perform Linux operations. This is intended to be done only by support
personnel, because it provides superuser access, and doing something wrong
can bring the system down. Also, there is not much additional information that
can be extracted through the Linux shell when compared to the information
provided in the CLI. Such sessions are typically used for internal debugging in
our labs.
2.2.2.1 AAA
AAA is not directly involved with the PFE resource management, but instead
communicates with PPA through IPC. For example, AAA provisions service
traffic volume limits directly to PPA and receives the volume limit exceed
events from PPA. AAAd will be one of the users of the new AFE layer. The
2.2.2.2 ARPd
ARP entries are maintained in a database residing on the control plane. They
are maintained in the form of adjacencies that associate an identifier (the
adjacency ID) with a resolved ARP entry (associating a context-specific IPv4
address with a given MAC address).
You can explicitly configure a static ARP entry using the ip host command.
Multiple entries can be defined per port.
When two SSRs are configured in ICR pairs, ARPd on both peers can
communicate and synchronize their ARP caches. To enable this, enter the ip
arp sync icr command on an ICR interface. The feature works with BGP-based
ICR, Multi-Chassis LAG (MC-LAG), and VRRP ICR if ARP synchronization is
enabled. When it is enabled, the ARP daemon becomes a client of ICRlib and
uses it to communicate with ARPd on the ICR peer chassis. ARPd on the
active and standby peers sends application messages over ICRlib with ARP
entries to be added or deleted.
2.2.2.3 BGP
The BGP module is based on a strict user thread (pthread) model, with the
exception that the keepalive thread runs at a higher priority than the rest. This
is in contrast with other daemons in which all threads run at the same priority.
• The BGP daemon requests and releases policies from the Routing Policy
Manager (RPM) daemon using the Routing Policy Library (RPL) API. BGP
uses every policy supported by the RPM, and much of the RPM function
is specific to BGP. For example, community lists, extended community
lists, autonomous system (AS) paths, and much of the route map function
are used only by BGP.
• BGP installs both IPv4 and IPv6 routes in the Routing Information Base
(RIB). For the case of RFC 4364 or 6VPE VPNs, the nonconnected next
hop can be a label-switched path (LSP). BGP monitors next hops for RIB
resolution and supports Bidirectional Forwarding Detection (BFD) for peer
failure detection.
• MPLS labels allocated by BGP are downloaded to the Label Manager (LM).
BGP allocates labels for both RFC 4364 VPNs and 6VPE VPNs.
• BGP registers with the Interface and Circuit State Manager (ISM) for all
interfaces in a context in which a BGP instance is configured. It also
registers for port events associated with that interface.
When the SSR node is used as a BGP route-reflector, you can conserve
memory and CPU usage on the controller card and line cards by filtering BGP
routes to reflect routes to its iBGP clients. This is useful if BGP routes are
not needed to go to the line cards (for example, when the router is not in the
forwarding path toward the BGP route destinations). To reduce the size of the
RIB and FIB tables, you can filter which routes are downloaded from BGP to
the RIB and FIB before being advertised to peer routers.
Note: To avoid the risk of dropped traffic, design networks so that the routes
that are advertised by the router with this feature enabled do not
attract traffic. This option is not well suited for cases in which the
route-reflector is also used as a PE or ASBR node.
2.2.2.4 CFMd
• Path Discovery via Link trace Message and Reply (LTM and LTR)
2.2.2.5 CLI
The SSR Execution CLI (EXEC-CLI) is the primary user interface with the
system. This is a multi-instance process that runs one instance for each CLI
connection to the system.
The CLI is a data-driven state machine using a parse tree (or parse chain) to
define the various system modes. A parse tree is a collection of parse nodes,
each linked together to form the tree. Each node in the parse tree defines
the keyword or token type that can be accepted for a specified command-line
input. The CLI parser has several parse trees. Each parse tree is defined as a
mode. The two main modes are exec mode and config mode. Exec mode is
used for examining the state of the system and performing operations to the
node. Config mode is used for changing the configuration of the box. Each
mode has several nested submodes.
A parser control structure contains the state of the parser, information for the
current mode, user input, and arguments that have been parsed.
The parser starts parsing a command by starting at the first node, which is
the root of the parse tree for the current mode stored in the parser control
structure. The root is pushed onto the parser stack (LIFO), and the parser
loops until the parser stack is empty. The loop pops the node on top of the
stack and calls the token-dependent parsing function. If that token type has
an alternate transition, the token function first pushes the alternate transition
onto the parser stack without consuming any of the input. The parser tries all
the alternates, attempting all tokens at a given level in the tree. The token
function then attempts to parse the token. If the token is parsed successfully,
the token function consumes the input and pushes the accept transition onto
the parser stack.
The leaves in the tree can be one of three types: an EOL token or the special
nodes CLI_NONE and CLI_NOALT. The EOL token signifies the acceptance of
the command. After reaching EOL and parsing successfully, the parser saves
the parser control structure. If parsing has finished successfully, the parser
takes the save parser control structure and calls the function pointed to by the
EOL macro to execute the command.
The NONE corresponds to CLI_NONE and marks the end of the alternate
nodes of the current branch, but not the end of the alternate nodes for the
current level. The NOALT corresponds to CLI_NOALT and marks the end of all
alternate nodes, both for the current branch and the current level. The parser
distinguishes between the two to detect ambiguous commands.
After parsing a command, the parser calls the action routine, which executes
the command. The CLI control structure contains information about the parsing
of the command. This lets the action routine know if the default or no option
was added at the beginning of the command. Also, certain parser macros store
values in the control structure. For example, numbers, strings, or addresses,
based on the tokens that were matched for this command, are stored in the
arguments inside the control structure. The action routine uses these values to
interact with the system and execute the specified action.
The CLI has a limited set of interactions with the system, because all access
is controlled through a data communication layer (DCL). The majority of all
DCL calls directly interact with the RCM, except in situations where a direct
connection is needed.
The Ericsson CLI helps Ericsson platforms running the Ericsson IP Operating
System to provide common information models and operation, administration,
and maintenance (OAM) components across all network elements (NEs).
The Ericsson Common Information Model (ECIM ) that is common among
all Ericsson NEs includes logical models for OAM functions, such as fault
management and equipment management. The operating system uses
the ECIM and Common Operations and Management (COM) to supply an
OAM solution to all platforms running the OS. For example, the MPLS-TP
provisioning on the SSR supported by the OAM solution is the same MPLS-TP
it supports on other NEs.
• OAM support (NETCONF and CLI) used for platform applications, such as
for Enhanced Packet Gateway (EPG).
2.2.2.6 CLIPS
2.2.2.7 CLS
Classifier (CLS) is the module that handles ACL and access group processing.
Initially, CLS receives access groups from the RCM. The access groups
determine which ACLs are retrieved from the RPM. The RPM then pushes this
information to CLS, where it is processed into CLS data structures. Although
the ACLs are not in their original format, the ACLs are considered to be
processed into CLS raw format. This format is not suitable for download to the
line card, but is used within CLS to enable easier analysis of the ACLs for
building. Another level of processing is required to transform the ACLs into a
format for transfer to the line card. Platform-dependent libraries and capabilities
process the raw format ACLs into a platform-specific format that is easily
transferred to the line card. If no processing libraries exist, a default processor
creates ACL rule sets as a basic array of a rules data structure. The rules data
structure is a globally defined structure that is understood by all platforms.
• QoSMgr informs CLS when a circuit binds to a QOS policy that uses an
ACL.
• In the case of forward policies that are configured on a link group, QoSMgr
informs CLS of grouping and ungrouping. It also provides a slot mask
for link aggregation group (LAG) pseudocircuits so that appropriate ACL
provisioning can occur.
Every operation that occurs in the database must either completely finish or
appear as if it was not started at all. An operation cannot be left half done;
otherwise, it corrupts the database. This means that every operation in RDB
must be atomic.
As changes occur to data in the database, all the user's operations are saved
into a transaction log instead of being performed directly to the database.
When users have completed modifications, they can issue either an abort or
a commit of the transaction. The abort operation removes the transaction log
and all related locks, leaving the database in its prior consistent state. When a
commit is issued, it must be performed to the database in one atomic operation.
Because a transaction log can contain numerous different modifications, the
transaction log must remain persistent to ensure that it is always completely
performed.
The database has no knowledge of the type of data that it contains within its
memory. All records are represented as a combination of a key and a data
buffer. When a user modifies a record within a transaction, a lock is created for
that record to prevent other users from modifying it at the same time. If another
user accesses a locked record, the transaction is blocked until that record is
released or until the initial user rolls back access to the record. This locking
guarantees a one-to-many relationship between transactions and records. A
record can belong to at most one transaction, and no two transactions can
be modifying the same record at one time. This preserves atomicity during a
commit operation, because each committed transaction is guaranteed to modify
only the records that are in its control.
When a transaction completes, all locks that were created for it are removed,
and the transactional memory is reclaimed. From the point of view of any other
user, the transaction was committed in one operation, because locks prevented
access to every record in the log until the transaction completed.
The database is kept redundant across controllers using the same two-phase
commit procedure for applying the transaction logs to the persistent storage. If
a redundant controller is present, the transaction log is first replicated to the
standby and committed, before it is committed on the active. This ensures that
information is not distributed until it is guaranteed to be redundant.
The database is a library, but requires many threads to perform the tasks
needed.
2.2.2.9 CSM
need to have a platform-dependent portion. Instead, they use the hooks in the
CMA API to interface directly with the target specific processes (such as PAd).
CMA abstracts the details of the chassis so that the other software that is
involved in chassis management (mostly CSM) can be generic and portable to
other chassis architectures and types. This is not a separate process but rather
a library that is linked with the process that needs the abstraction.
It is also used for Clientless IP Service Selection (CLIPS), interacting with the
CLIPS daemon to appropriately assign IP addresses.
2.2.2.11 DOT1Qd
The following figure illustrates the DOT1Qd interactions with other modules:
2.2.2.12 ESMC
The ESMC PDU is composed of the standard Ethernet header for a slow
protocol, an ITU-T G.8264 specific header, a flag field, and a type length value
(TLV) structure.
The NPU driver provides the data path for ESMC PDU punting and insertion.
Fabric Manager (running on the RPSW card) configures the switch fabric and
monitors its performance. Fabric Manager performs the initial configuration of
the fabric when the system starts and changes the fabric configuration when
the fabric cards or RPSW switchovers have faults.
2.2.2.14 FLOWd
FLOWd controls which circuits have packet flow classification enabled, and
their classification attributes, and provides an infrastructure for enabling and
controlling the classification of packets into flows. Along with FLOWd, the IPFIX
daemon (IPFIXd) uses profiles to control its operations, and some of these
profiles need to be delivered to the appropriate PFEs. Flowd provides the
infrastructure for delivering these profiles and related messages to the PFEs,
as well as for automatically applying default attributes to circuits which are
configured for IPFIX, but which have not been explicitly configured for flow.
The Ericsson IP operating system control plane modules see the forwarding
plane as a set of line cards with a unique slot number per card. Each card
is split into ingress and egress functionality. The service layer (basic and
advanced) configures the forwarding plane in terms of logical functional blocks
(LFBs), with an API set for each forwarding block or LFB. An LFB is a logical
representation of a forwarding function, such as:
See the line card architecture diagrams in Section 2.1.9 on page 27. The FABL
and ALd configure these card resources for fast path forwarding (data packets
processing through the forwarding engine) and might also help in slow path
functionality, that is, functionality that requires special handling of packets,
such as ICMP, VRRP, or BFD.
Ingress
4 Services such as ACL, QoS policing, and policy-based routing (PBR) are
applied on circuits. If any of these services are applied, the packet is
submitted for special processing. Otherwise, the packet is forwarded to
the next stage.
6 FIB—Contains the best routes that RIB downloads to the forwarding plane.
Each line card maintains its own FIB to make routing decisions. For IP
packets, FIB is used for longest prefix match routing. The circuit determines
in which context (or which FIB instance) the packet lookup is done.
8 LFIB—Label FIB is used to look up MPLS labeled packets for further MPLS
actions (SWAP, POP, PUSH PHP).
SSC
• IPv4 tunnel next hops, including IPv4 GRE and IPv4 in IPv4 next
hops
Egress
2.2.2.16 Healthd
The Health Monitoring daemon (Healthd) monitors the health of the system
based on the Unit Test Framework (UTF). It inherits several benefits from the
UTF (including fully scriptable python interface with a potential SWIG C/C++
interface). Healthd is composed of four major functional components:
For more information about the Healthd feature, see Section 5.1 on page 228.
2.2.2.17 HR
• AAA receives the subscriber provisioning details and sends them to RCM.
HR listens to the socket bound to the local port 80, waiting for packets.
• When a subscriber attempts to send HTTP traffic, the ingress PFE forwards
the traffic to the local port 80.
• HR uses the packet's circuit information (URL, message, and timeout value)
to construct an HTTP REDIRECT message to return to the subscriber.
When the redirect is successful, HR informs AAA to remove the policy from
the circuit, or add an ACL that allows access.
2.2.2.19 ISM
Interface State Manager (ISM) monitors and disseminates the state of all
interfaces, ports, and circuits in the system. ISM is the common hub for
Ericsson IP Operating System event messages. ISM records can display
valuable troubleshooting information using the show ism command. For
information about interpreting the various forms of the command, see Section
5.10 on page 264.
ISM receives events from the Card State Manager (CSM), the Interface
Manager (IFmgr) in RCM, or from media back ends (MBEs). Each component
creates events and sends them to ISM. For a component to listen to the events
that ISM receives, it must register as an ISM client.
The CSM and IFm components talk to a special ISM endpoint that takes a
configuration type message and converts it to an ISM event for processing.
CSM announces all port events, while IFm announces all interface events and
static circuit creation and initial configuration.
MBEs in the system talk to ISM through the MBE endpoint. Before an event
can be sent to ISM, an MBE must register using a unique and static MBE
identifier. After an MBE has registered with ISM, it can send any type of event
to announce changes to the system. ISM takes all events received from all
MBEs and propagates these events to interested clients. Clients must register
with ISM using the client endpoint. This registration also includes the scope of
which circuit and interface events a client is interested in. Registration reduces
the overhead that ISM has of sending every event to every registered client. A
client can be registered with as many different scopes as needed.
When ISM receives an event, it marks the event as received and passes the
event to interested clients. ISM tries to not send duplicate events to a client,
but if it does, a client must handle the duplication. ISM sends events in a
specific order, starting first with circuit events and followed by interface events
in circuit/interface order. All circuit delete events are sent before any other
circuit events, and all interface delete events are sent before any other interface
events. This order is to ensure that deleted nodes are removed from the system
as quickly as possible, because they might interfere with other nodes trying to
take their place.
For examples of the role of ISM in system processes, see the BNG session
diagrams in Section 3.8.2 on page 151.
2.2.2.20 IS-IS
Like the OSPF modules, the IS-IS module does most of its work in a global
worker thread running the dispatcher. Additionally, the dispatcher thread
receives IPC messages from other daemons and processes in the dispatcher
thread using the IPC task dispatcher capability.
The IS-IS MO thread handles configuration, clear, and show messages and
runtime data structures in the MO thread. A mutex is used to avoid data
structure contention problems. This mutex is tied into the dispatcher library.
IS-IS non-stop routing (NSR) is not enabled by default. You can enable it
with the nonstop-routing command in Is-IS configuration mode. To verify
that IS-IS information is being synchronized to the standby RPSW card, you
can use the show isis database, show isis adjacency, and show
isis interface commands on the active and standby RPSW cards.To
support NSR, pre-switchover/restart adjacencies need to be maintained (data
necessary need to maintain an adjacency should be synchronized from active
to standby RPSW card. When IS-IS NSR is enabled, each neighbor’s MAC
address is synchronized so that IIH packets containing the neighbor’s MAC
address can be sent out. To support this:
• IPC communicates between the active IS-IS process and the standby IS-IS
process and between the standby ISM and the standby IS-IS process.
• The IS-IS process on the standby RPSW controller card is always started
when there is IS-IS configuration, and the IS-IS endpoints are open. As a
result, the standby IS-IS process registers with all open endpoints on the
standby RPSW card.
• The standby IS-IS process also receives and processes information from
the standby RCM process.
2.2.2.21 L2TP
component and the backend daemon (L2TPd). The L2TP RCM component
manages all L2TP related configuration within the configuration database, and
sends the details to L2TPd, which communicates them to the other modules.
2.2.2.22 LDP
During the LDP session, LSRs send LDP label mapping and withdrawal
messages. LSRs allocate labels to directly connected interfaces and learn
about labels from neighbors. If a directly connected interface is shut down, an
LSR withdraws the label and stops advertising it to its neighbors. If a neighbor
stops advertising a label to an LSR, the label is withdrawn from the LSR's
Label Forwarding Information Base (LFIB). Teardown of LDP adjacencies or
sessions results if Hello or keepalive messages are not received within the
timeout interval.
• LM—LDP installs LDP LSPs in LM, and LM communicates with LDP for
L2VPN/VPLS/Port-Pw PW bring up.
• RIB—LDP registers with RIB for route redistribution and prefix registrations.
2.2.2.23 LGd
Link Group Daemon (LGd) is responsible for link aggregation and running the
Link Aggregation Control Protocol (LACP).
• FABL and ALD—The control plane sends and receives control packets to
or from line cards via the kernel. At the line card, FABL receives outgoing
packets from the kernel, and sends incoming packets to the kernel. FABL
APIs are defined to get packets between FABL and the ALD. NP4 driver
APIs are defined to get packets between the ALD and the NPU.
• DOT1Q CLI
• DOT1QMgr
• DOT1Qd
• Clients— Components that are registered with ISM for link group
information. This includes the line card PFEs. For example, label
manager (LM) registers with ISM to receive LAG group messages, level-1
pseudo-circuits and level-2 pseudo circuits (802.1Q PVCs).
For details about the LG information flow, see Section 3.3.6 on page 99.
The Label Manager (LM) is the Ericsson IP Operating System daemon that
manages label requests and reservations from various MPLS protocols such
as LDP and RSVP, and configures LSPs and PWs in the system. It installs
LSPs and Layer 2 routes in RIB next-hop label forwarding entry (NHLFE). It
provisions the LFIB entries on the ingress side, and the MPLS adjacencies
on the egress side in the forwarding plane. It also handles MPLS-related
operator configurations and MPLS functionality, such as MPL -ping and MPLS
traceroute. L2VPN functionality is handled in LM, including configuration and
setting up of PWs. Virtual private wire services (VPWSs) or virtual leased lines
(VLLs) use a common framework for PW establishment.
The SSR supports a single, platform-wide label space that is partitioned per
application (for example, LDP and RSVP). An LM library that facilitates label
allocation is linked per application. Applications install the allocated labels in
LM using the LM API.
The LM also handles PWs. The SSR supports PWs in L2VPNs (also known
as VPWS and VLLs) that provide Layer 2–emulated services such as Ethernet
VLANs over IP/MPLS packet-switched networks.
• RIB—The LM installs LSP routes and L2 routes through RIB. It also queries
RIB for next hops.
• LM stores the adjacency IDs for label next hops (ingress label map (ILM)
entries) and LSP next hops (FTN entries) in the shared memory so that
LM can retrieve them after an LM process is restarted or when an RPSW
switchover occurs.
The MPLS module, most of which is managed by the label manager (LM)
module, is responsible for programming the forwarding plane with label
information as well as managing the label allocation. It accepts requests for
labels from various protocols (LDP, RSVP, BGP, and MPLS static), allocates
the labels, synchronizes the allocated labels with the standby RPSW controller
card so that they can be recovered in the event of a switchover, and then
programs the labels to the forwarding plane or returns the allocated labels
to the protocols that requested them so that they can use them. The MPLS
module also accepts configurations from the CLI, mostly for enabling MPLS
functionality on interfaces and protocols.
MPLS static is the daemon that is responsible for configuring static LSPs. To
configure a static LSP on the ingress label edge router (iLER), an operator
specifies the LSP's next hop, the egress label, and the egress peer of the LSP.
The configuration commands are sent to the MPLS static daemon, which uses
the LM API to configure the LSP in LM. Static label swap entries and static label
pop entries (also called ILM entries) can also be configured through the MPLS
static daemon on the LSR and egress LER, respectively.
2.2.2.27 NATd
NATd interacts with the following components to maintain the NAT address
translations in the line card PFEs:
• NATd organizes and distributes the details to the PFEs (through ISM and
ALd), and sends updates when configuration changes occur.
• RIB uses NAT data in calculating routes and downloads them to FIB.
2.2.2.28 NDd
• Support for multiple link types, including Ethernet, L2TP LNS tunnels (in
the future), and LAG.
2.2.2.29 OSPF
This module implements the OSPFv2 (RFC 2318), and OSPFv3 (RFC 5340)
protocols. It also supports the OSPF MIB as described in RFC 4750.
OSPF interacts with RIB both to install OSPF routes and to receive redistributed
routes from other routing instances that might also be OSPF. OSPF installs
connected routes as well as LSP shortcut routes.
0 Prefix lists are used for prioritized RIB download of selected IPv4
prefixes.
The RPL library provides APIs for all interaction with policy objects
• SNMP—OSPF supports SNMP queries via the IPC thread with IPC Request
and Reply. Since no objects or state machines are modified, this can be
done without worrying about contention as long as the operating system
run-to-completion user thread model is maintained. SNMP notifications are
sent directly from the dispatcher thread to the SNMP Module.
2.2.2.30 PAd
The Platform Admin Daemon (PAd) is a process than runs in both the active
and standby RPSW controller cards. It provides support for configuring line
cards and ports, monitoring the line card hardware (card and port status),
and implements the switchover functionality. The PAd process contains the
operating system drivers that are used for communicating with the line card
hardware. Through these drivers, PAd can configure the line card hardware
and ports as well as monitor the state of the card and the port and detect
conditions like port down, card crashes, card pull and so on.
• PAd receives configuration information for cards and ports from the CSM.
PAd communicates the status information for cards and ports to the
CSM process, which in turn propagates the information to the rest of the
operating system.
• The RPSW PM process is responsible for starting the PAd process prior
to any other RP applications and waits for PAd to determine and report
the active/standby state using the redundancy library before launching
any other applications.
2.2.2.31 PEM
2.2.2.32 PIM
The PIM daemon is based on a strict user thread (pthread) model with many
specialized threads. Multicast cache entries are maintained per group and
interface in a hierarchal database with (*,G) entries and (S,G) entries for
each active group. The cache entries for each (S,G) include all the outgoing
interfaces.
2.2.2.33 Ping
2.2.2.34 PM
The process manager (PM) monitors the health of every other process in the
system. The PM is the first Ericsson IP Operating System process started when
the system boots. It starts all the other processes in the system. The list of
processes to be started is described in a text file that is packaged with the SSR
software distribution. The PM also monitors the liveness of the processes and, if
any process dies or appears to be stuck, it starts a new instance of the process.
In SSR, the PM subsystem is distributed with a master PM process running in
the active RPSW card and master PM processes running in the slave RPSW
card, SSC cards, and line cards. The system can not recover from failures of
the PM processes. If the PM master process crashes, a switchover is initiated.
2.2.2.35 PPP
2.2.2.36 PPPoE
The PPP over Ethernet (PPPoE) module, which manages PPPoE configuration
and subscriber session setup and tear down, consists of two components,
an RCM manager component and the backend daemon (PPPoEd). The
PPPoE RCM component manages all PPPoE related configuration within the
configuration database, which then gets downloaded to PPPoEd.
The following figure illustrates PPPoE daemon interactions with other modules:
2.2.2.37 QoS
On the SSR, the quality of service (QoS) module implements the resource
reservation control mechanism and configures the forwarding that implements
the services to guarantee quality of service.
• Ethernet ports
• Link groups
For more information about applying QoS policies, class maps, and scheduling
to circuits, see Configuring Circuits for QoS.
QoSMgr also learns properties of circuits (L2 bindings) from the respective
MBE Managers as needed by the forwarding modules to enforce/implement
certain functionalities.
2.2.2.38 RADIUS
2.2.2.39 RCM
The RCM engine is responsible for initializing all component managers and
for maintaining the list of all backend processes for communication. The set
of managers and backend processes is set at compile time. The registration
of manager to backend daemons occurs during RCM initialization, and each
manager is responsible for notifying the RCM engine with which backend
processes it communicates.
The RCM engine provides a session thread for processing any connection
requests from the interface layer. When a new interface layer component (CLI,
NetOpd, and so on) wants to communicate through the DCL to RCM, it starts a
new session with the RCM engine. Each session has a separate thread in RCM
for processing DCL messages. Because the RCM managers are stateless, the
threads only have mutual exclusion sections within the configuration database.
Each session modifies the database through a transaction. These transactions
provide all thread consistency for the RCM component managers.
The RCM has many other threads. These threads are either dynamically
spawned to perform a specific action or they live for the entire life of the RCM
process.
Routing Information Base (RIB) is the operating system daemon that collects
all routes from all routing protocols or clients (such as BGP, OSPF, IS-IS, and
static routes) and determines the best path for each route based on the routing
distance. A route is composed of a prefix (for example, 20.20.20.0/24) and a
path through a next-hop (for example, 10.10.10.10) residing on a circuit (for
example, circuit_1/interface_1) and is always associated with a distance. The
distances are set by the standards for the route sources. For example, for a
connected adjacency, the distance is 0, for OSPF it is 110, and for IS-IS it is
115. The set of routes with best paths (the ones with the lowest distances)
constitute the Forwarding Information Base (FIB), which RIB downloads to
the forwarding plane.
RIB is also responsible for route resolution and redistribution. Route resolution
consists of recursively finding the best connected next hop for a non-connected
remote peer address. Typically, routes from a non-connected iBGP remote
peer are resolved on a connected next-hop derived from IGP routes. Route
redistribution consists of relaying a set of routes from one source domain (such
as OSPF as an IGP) to another destination domain (for example, BGP as an
EGP), filtered by a specified routing policy (such as an ACL-based policy).
RIB is one of the fundamental daemons in the operating system. It has a major
impact on the transient period from boot to steady state. On the active RPSW
card, the RIB startup and booting sequence directly impacts how packets flow
in and out of the box as it configures the routing tables in the forwarding plane
and the connectivity to the management interface (RIB installs the management
subnet routes in the kernel). The speed at which RIB collects the routes from its
clients, selects the best path, and downloads these routes to all the line cards is
a major factor in the time for the SSR to reach steady state on loading.
• ISM—RIB is a client of ISM, which is how RIB learns about circuits and
interfaces and thereby configures subnet routes and subscriber routes. RIB
is also an ISM MBE, primarily for setting BFD flags on some circuit types.
For details about the RIB boot process, see Section 3.5.1.1 on page 113, and
the role of RIB in subscriber session management, see Figure 62.
For the role of RIB in LAG constituent to PFE mapping, see Section 3.3.6 on
page 99.
• RIB for route downloads and registrations (RIB distributes RIP routes)
2.2.2.42 RSVP
The following RSVP RFCs are supported: RFC 3031, RFC 3032, RFC 3209,
and RFC 4090 (facility protection only).
• RIB—RSVP queries RIB for the outgoing interface and next-hop for a given
prefix. RSVP also registers for BFD sessions through RIB.
2.2.2.43 SNMPd
The MIB is a virtual database of defined objects used to manage the network
device. MIB objects are organized hierarchically, each with a unique object
identifier (OID).
The information received by SNMP comes from various sources: ISM, RIB,
and any other client that generates SNMP notifications. ISM and RIB have
dedicated threads for communication, whereas all other clients communicate
with a notification endpoint. This endpoint is used for generating trap requests
from the system.
The SNMP Research component supports v1, v2, and v3 of the SNMP protocol.
The v1 and v2 are obsolete by IETF but are still supported for customers
using these older versions. The SNMP Research package has only a few
customizations, and they relate to making the component context aware. The
context has been added to the protocol community string and is parsed by
the package. With these changes, multiple instances and their contexts are
supported.
2.2.2.44 Staticd
The Static daemon (Staticd) supports both interface (connected) and gateway
(non-connected) IP and IPv6 static routes that can be configured either through
the CLI or the NetOp Element Management System (EMS). Additionally,
configured gateway routes may be verified using the proprietary Dynamically
Verified Static Route (DVSR) protocol that periodically pings the specified
gateway.
For details about static route resolution, see Section 3.5.5 on page 120.
2.2.2.45 STATd
The main task of the statistics daemon (STATd) is to maintain counters from the
line cards. It collects the counters from the forwarding plane, aggregates and
processes counters, and allows various applications in the system (including
CLI) to access these counters. STATd provides limited counter resiliency for
some restart cases. STATd maintains the following types of counters: context,
port, circuit, pseudo circuit, and adjacency. STATd maintains counters in a tree
of Counter Information Base (CIB) entries. These entries hold the counters for
contexts, port, circuits, and adjacencies. Each CIB entry contains configuration
information and counters. CIBs are placed into various aggregation general
trees to allow walking circuit hierarchies. Each CIB can contain optional counter
values. Recent optimizations allocate memory only when certain counters are
needed so that the memory footprint of STATd is reduced. For each type of
counter values, the following versions are kept:
2 Push—The line card updates STATd with the latest counters. The
forwarding plane sends messages to STATd either when triggered by
certain events or periodically. A reliability mechanism is implemented
for only the data that is deemed too important to be lost when STATd
restarts. STATd acknowledges this data when received, which requires the
forwarding plane to perform additional processes when STATd restarts. It
must resend all the data that was not acknowledged by STATd because it
may have been lost.
A bulk statistics schema specifies the type of data to be reported, the reporting
frequency, and other details. The information is collected in a local file and then
is transferred to a remote management station. STATd manages the creation,
deletion, and configuration of bulk statistics schemas. The schema selects the
information to be reported and determines its format. You can associate the
system, contexts, ports, or 802.1Q PVCs with a schema, which will include
the associated data in the information reported by the schema. When this
association happens in CLI, STATd is notified and adds a work item in the
schema definition so that the related data can be collected when the schema is
processed periodically. This functionality builds on the counters maintained by
STATd and does not introduce any new dependencies on the forwarding plane.
• Kernel—Interacts for bulk statistics but not for management port counters.
The system manager module (Sysmon) and Logger module manage system
event messages. Together, they produce the system log (Syslog), used to
monitor and troubleshoot the SSR system. The SSR should be configured to
automatically store the Syslogs on an external Syslog server. For information
about how to configure, access, and collect logs, see Logging and Basic
Troubleshooting Techniques.
2.2.2.47 TCMA
2.2.2.48 TCMd
• TCM Agent (TCMA), running on the line card in the Application Layer
daemon (ALd) context
• Card Admin daemon (CAD), Load Index Map (LIM) drivers, Packet
Input/Output (PKTIO) hardware module, and the NPU driver on the line
cards.
2.2.2.49 TSM
Figure 30 illustrates the control information flow for SSC traffic slice
management (TSM).
When SSR nodes are configured in an ICR pair (in the BGP-based model),
TSM packet steering changes to a more complex model. In this case, packets
are steered to specific SSCs by using service maps as well as multiple,
dynamically created, TSFT tables. For a diagram of the steering flow with this
configuration enabled, see Figure 87.
2.2.2.50 Tunneld
The Tunnel Manager (Tunnel) process implements soft tunnels on the SSR,
adding only an encapsulation without a tunnel entry endpoint in the forwarding
plane. It handles tunnels according to the next-hop types in the Forwarding
Information Base (FIB), including:
• GRE tunnels
• IP-in-IP tunnels
2.2.2.51 VRRP
With release 12.2, with the feature called VRRP Hot Standby (HS), the system
should achieve hitless switchover and process restart. Both the controller card
and line card store state information for VRRP service. When a controller card
switchover occurs, the newly active controller card recovers states by retrieving
them from the line card. No synchronization occurs between the active and
standby RPSW processes. Running the reload switch-over or process
restart vrrp commands on a VRRP router in owner state does not cause it
to lose its owner state.
When the VRRPd is running on the active node, the standby daemon is also
running. It receives all ISM and RCM messages but not RIB or line card
messages. During switchover, the standby daemon takes over and sends all
the sessions to the line cards. When the line cards receive the sessions, they
compare them with a local copy to determine which ones to send back to the
RPSW card. When that process is complete, the line card receives an EOF
message from the RPSW card and cleans up the stale sessions.
VRRP assigns a virtual route identifier (VRID). Using the VRID, a virtual MAC
address is derived for the virtual router using the notation of 00-00-5E-00-01-XX,
where XX is the VRID. This MAC address is installed in the ARP table and is
used in packet forwarding by the owner router. VRRP implementation consists
of three components: RCM manager, backend daemon, and forwarding. The
2.2.2.52 XCd
• Switch fabric to which all line cards connect (on the SSR 8020, 7 + 1 at
5.57 GHz and on the SSR 8010, 3 + 1 at 6.25 GHz). When a fully loaded
SSR 8020 chassis incurs three switch card failures, the system continues
to switch traffic at line rate. If a fourth card fails, switching falls below line
rate. On the SSR 8020, each line card is connected to all eight switch
fabric cards (four on the SSR 8010). Each line card has 32 links of 6.25
Gbps, which are distributed on switch fabric cards for connectivity (four
links per switch fabric card).
You can also install multiple SSC cards for high availability. If one of them fails,
the line cards steer packets to the SSC cards that remain in service.
The redundancy model on a SSR system uses a two- tier process to select its
active and standby system components. The selection process at the lowest
level is controlled by hardware that resides on the ALSW cards. This hardware
is responsible for the selection of the primary ALSW and master (active) RPSW
cards and their associated busses (CMB, PCIe, SCB, TCB). For definitions
of ALSW primary/secondary state and RPSW master/standby state, see
HW-HLD-0031. Once the primary ALSW and master RPSW cards are selected,
the selection of active and standby ALSW cards at the next tier is controlled by
software operating on the master RPSW card for components such as the GigE
control plane switch, timing distribution, and alarm logic.
• The new design takes advantage of the co-location of the PAd process with
all other RPSW processes in a single Linux instance and accounts for the
removal of the SCL links between the RPSW cards.
• The hardware signals and shared memory used for implementing the M2M
and Red Link are replaced with direct messaging between M2M and Red
Link components using raw sockets and TCP over Ethernet.
• The SCB arbitration logic on the ALSW cards is used for master (active)
RPSW card selection. The PAd process monitors the overall health of both
RPSW cards and coordinates with the hardware for RPSW failovers.
Because the controller cards are not involved in ingress to egress traffic
forwarding, and because each line card maintains its own FIB to make routing
decisions, when a controller card is temporarily unavailable (such as during
switchover), traffic continues to be forwarded.
2.3.1 Active and Standby RPSW and ALSW Card Selection During Startup
RPSW cards contain internal file systems that store the operating system image,
its associated files, and the configuration database. A synchronization process
ensures that the standby card is always ready to become the active card.
• When either the software release or the firmware on the active controller
card is upgraded, the standby controller card automatically synchronizes its
software or firmware version to that of the active controller.
• The configuration databases of the active and standby cards are always
synchronized.
Selection of the active and standby RPSW and the primary and secondary
ALSW cards occurs in the following scenarios at chassis startup.
4 PM starts the PAd process, which in turn instantiates the Controller Selector
and the M2M and Red Link threads.
5 PAd waits for a callback from ipcInitialize3() to notify it that active standby
determination is made, at which point it will call ipcProcessIsReady().
7 The Controller Selector writes its mastership capability to the ALSW card by
calling slShelfCtrlSetHwMasterCapable() at a regular interval of 3 seconds
as long as it is master capable. The periodic call to this function prevents
ALSW selector HW watchdog timeout.
10 The ALSW hardware notifies the Controller Selector whether it is the active
or standby candidate. If the RPSW card is going active, the Controller
Selector calls the slShelfCtrlGoActive SLAPI. If the RPSW card is going
standby, the Controller Selector calls the slShelfCtrlGoStandby SLAPI.
11 Once the PAd process has established the active/standby state, it updates
the state information stored in the redundancy library, which is published
to other applications.
14 PAd gets the ipcInitialize3() callback and completes the remainder of its
initialization in parallel with other processes
16 The PAd on the active RPSW card invokes the registered callbacks to
synchronize admin and realtime layer state information to the standby PAd
process.
17 The remaining applications synch their state information from active to the
standby using IPC and update their IPC checkpoint information as each
process completes.
Note: The term active candidate refers to the RPSW card that has been
selected by the ALSW card HW selector to go active but has not yet
gone active.
1 Both ALSW cards wait for a 1st PCIe write from an RPSW card to start the
primary ALSW selection algorithm in ALSW FPGA.
2 The primary ALSW card selects the desired master (active) RP and notifies
the RP of its selection.
3 Once PAd and CMBd have finished initializing, the ALSW Selector
determines which ALSW card is active based on health of the ALSW card.
If both ALSW cards are equally healthy, the active ALSW card is chosen.
4 The ALSW Selector calls slAlSwGoActive for the active ALSW card and
calls slAlSwGoStandby for the standby ALSW card.
1 The active and standby cards synchronize with the configuration database
and flash, and collect required state information.
3 The active RPSW file system is monitored for changes by DLM. Whenever
the file system is modified, DLM synchs the changes across to the standby
RPSW card.
4 State information in PAd is actively synched to the mate PAd process via
the PAd Redundancy 9 Module over the Red Link.
5 Events from line cards received at the SL Upper on the active PAd process
are synched across to the standby PAd process via the Redundancy
Module during normal operation.
6 All other processes synchronize state as required using IPC and DDL.
For more information about the SSR file system, see Managing Files.
• The active RPSW card is removed from the chassis. The standby RPSW
card detects that the active card has been removed and the CMBd module
on the standby RPSW card reports the event to the Card Detection
subsystem. The Card Detection subsystem forwards the event to the
Controller Selector, which updates the mate status information.
• The active RPSW card's ejector switch is opened. The active RPSW
card CMBd receives the ejector switch open event and forwards it to the
Controller Selector state machine.
• The standby RPSW card detects that the active RPSW card has a
hardware or software fault and takes over control of the system. The active
and standby RPSW cards exchange fault information with each other
through the exchange of Fault Condition notifications over the Red Link.
As with the SE800, each RPSW card keeps track of its own local faults
and its mate’s faults and uses these as the inputs to the failover trigger
algorithm. Whenever a new fault is detected, both RPSW cards receive
the notification and the Fault Handler forwards the fault to the Controller
Selector. Typically, RPSW software failures do not result in RPSW card
failovers. If software failures cause an RPSW failover, the fault is reported
using the Fault Condition event and the fault is treated in the same way
as a hardware failure.
• The active ALSW card detects the loss of RPSW-intiated heartbeats and
initiates a hardware failover. The standby RPSW card is interrupted and
notified of the RP mastership change from hardware. The Controller
Selector running on the standby RPSW card starts a software failover.
• The active RPSW card's PAd process crashes. The PM process actively
monitors all processes. If PAd crashes, PM exits, which triggers RPSW
switchover in the same way as in the previous case.
• The user enters the reload switch-over alsw command. The PAd
ALSW selector calls slAlSwGoStandby on the active ALSW card, which
internally checks the primary/secondary status of the active ALSW card.
Because the ALSW card is primary, the driver demotes the primary ALSW
card and then promotes the secondary ALSW card.
• The user enters the reload standby alsw command. The PAd ALSW
selector processes the reload request without regard for the primary or
secondary status of the active ALSW card. The PAd ALSW selector calls
slAlSwGoStandby to reload the active ALSW card, and the driver checks
the primary/secondary status of the active ALSW card. Because the ALSW
card is primary, the driver demotes the primary ALSW card and then
promotes the secondary ALSW card.
• The user opens the ejector switch on the primary ALSW card. An interrupt
arrives at the driver software, which notifies PAd that the ejector was
opened. The PAd ALSW Selector evaluates the request and determines
that it is the highest priority request. The PAd ALSW Selector calls
slAlSwGoStandby to perform the switchover. When the slAlSwGoStandby
call is made, the driver also checks the primary/secondary status of the
active ALSW card. The driver demotes the primary ALSW card and then
promotes the secondary ALSW card.
• The primary ALSW card is removed from the chassis. The secondary
ALSW card generates a software interrupt and starts the B5 timer. The
driver handles the software interrupt and checks the state of the pulled
ALSW card. Because the ALSW card has been pulled, the driver does
nothing. The B5 timeout occurs, and the secondary ALSW card promotes
itself to primary.
• The primary ALSW card fails. The secondary ALSW card does not detect
the Inform signal and generates a software interrupt and starts the B5
timer. The driver handles the software interrupt and checks the state of the
failed ALSW card . Because the ALSW card has failed, the driver does
nothing. The B5 timeout occurs, and the secondary ALSW card promotes
itself to primary.
You can cross connect local cross-connections between two service instances
(as in Figure 32) or, for VPWS, between a service instance and a PW instance
(as in Figure 33).
Under Ethernet port configuration, you define service instances with match
options, which designate different Layer 2 service instances for carrying specific
types of traffic, similar to ACLs. Any services not matched are dropped.
• default—Default match option for the port. This match option specifies a
default circuit that captures packets that do not match the criteria for any
other service instance.
11 Unmatched packets.
You can enable automatic VLAN tag modification for packets between SIs.
Under a service instance VLAN rewrite configuration, use the ingress and
egress commands to modify the Layer 2 tags of an incoming packet. Possible
tag operations are push, pop, and swap. VLAN tag manipulation guidelines:
• VLAN tag rewrites can be performed for the two outer tags only.
• The following constructs are valid for push and swap operations only:
0 dot1q vlan-id
0 dot1ad tag
0 priority-tagged
• Tags swapped and pushed by the router must match the egress side
match options.
The SSR supports NGL2 over PPA3LP cards, including a subset of the
cross-connection types (untagged packets are not supported):
• Match dot1q x which accepts tag type 8100, plus a single tag X
• Match dot1q * which accepts tag type 8100, plus any single tag in the
range from 1 to 4095
• Match dot1ad x which accepts a port ether type tag, plus a single tag X
• Match dot1ad * which accepts a port ether type tag, plus a single tag
in the range from 1 to 4095
• Match dot1ad x:y which accepts an outer tag X with port ethertype and
inner tag Y with type 8100.
Packets having more than two tags are also accepted in this case, but only
two tags are checked X:Y and X:Y:* Note: This is equivalent to transport
pvc X:Y in SmartEdge
Packets having more than two tags are also accepted, but only two tags
are checked.
• The SSR NGL2 supports egress re-write swap rule only for the top-most tag
• The egress re-write swap rule for the inner most tag must support the
configuration Match dot1ad x:* which accepts an outer tag X with port
ethertype and inner tag in the range from 1 to 4095 with type 8100. Packets
having more than two tags are also accepted, but only two tags are checked
QoS MDRR queueing is supported for NGL2 over PPA3LP based cards, as well
as the following QoS services that are supported on the other SSR line cards:
The LM, RIB, and XC modules interact with PPA3LP to maintain the VLL to
PFE mappings. When changes occur, ISM informs LM, RIB, and XC; then:
• LM calculates new slot masks from the ISM message details and a
download is triggered. LM uses slot based downloads for both ingress and
egress entries. Learned PFE masks are downloaded for LAG adjacencies
(FIB nexthop and LM adjacency) in nexthop/adjacency messages.
RIB calculates new slot masks and stores them. Learned PFE masks are
sent in egress PFE download objects (FIB nexthop and RIB adjacency) in
nexthop/adjacency messages.
• For physical circuits, the PFE determines if the message is applicable for it
or not. If it belongs to it, then the PFE processes it ( Add, delete, modify), If
it does not belong to it, then the PFE discards it.
• For pseudo circuits, XCd provides the PFE mask, and the PPA does a
lookup on it to decide whether to process ( add, delete, modify) or drop
the message.
• The ISM module picks up all the circuit-related information and sends it
down to the Iface process in the line card using IPC.
• The XCd process on the RPSW card picks up all the XCd
configuration-related information (that needs to be applied to a
circuit/interface, for example binding a bypass) and sends it to the XCd
module on the line card using IPC.
The FABL software running on the line card processes the messages received
from the RP and calls the appropriate APIs for configuring hardware. For XCd,
the Iface module receives the circuit information from the ISM process running
on the RPSW card. The Iface process communicates with the Eth-VLAN
module if the circuit requires handling for VLAN encapsulation. Similarly, the
Iface module communicates with the XCd module running on the line card
when it determines that the given circuit requires XCd-specific handling. Figure
34 shows the steps and modules involved in the creation of XCd and Ethernet
VLANs and illustrates the potential points of failure (PoF) during the creation.
3.2 Circuits
The Ericsson IP operating system on the SSR uses the concept of circuit to
identify logical endpoints for traffic in the forwarding plane. Think of circuits as
light-weight interfaces where services like QoS and ACLs can be applied. The
router uses both physical circuits that correspond to a real endpoint in the
forwarding plane (such as a port or VLAN) or pseudo circuits that correspond to
logical endpoints (such as a link group).
Each circuit has a 64-bit ID that is unique across the SSR chassis. IDs are
assigned dynamically by the control plane and are used by the software to
identify circuits. They appear in the various show commands. The IDs are not
flat 64-bit numbers but have internal structures that are used when displaying
them in show commands; for example, 1/1:511:63:31/1/1/3. The first two
numbers are the slot and the port where the circuit exists, the next number
is a channel that is applicable to only certain types of circuits, and the other
numbers are an internal ID and type information. Physical circuits are always
associated with a real slot and port, whereas pseudo circuits have a slot value
of 255 and the port value provides additional type information for the circuit.
Note: The SSR also supports subscriber circuits for PPP/PPPoE, CLIPS,
and L2TP subscriber sessions. For information about the creation and
termination of these circuits, see Section 3.8.2 on page 151.
• Card
• Ethernet port
Note: When creating a port and binding an interface to it, the interface is
translated into a unique integer in the box, called the interface grid. An
interface grid looks like 0x1…01 and it keeps increasing. Interfaces are
configured in a context and stored in an interface tree. Circuits also
have grids, but in this case they are called circuit handles, which are
64 bits long and have the following structure:
slot/port/channel/subchannel/[unused]/owner/level/running number
The slot number is the identifier of the line card, and the port number is
the identifier of the port on that line card. The channel number is the
channel number; however, on the SSR, which only supports Ethernet, it
is 1023 because Ethernet is not channelized. The subchannel number
for Ethernet is 63 or FF. The owner is the process that created the
circuit and depends on the type of the circuit. Level 0 is for port, 1 is for
a circuit under a port (Ethernet) running number. For example, if we
have 2 VLANs, only this field is different. An example circuit handle for
physical port 4 on line card 2 is: 2/4/1023/63/../1/0/1.
To see the entire circuit tree (all the circuit handles), use the show
ism circuit summary command. Circuits are stored in a radix tree
based on the circuit handle. The structure of circuit handles speeds up
searches. For example, if a port down message comes in, you can
walk the circuit structure based only on the first few bits (slot/port) and
view every circuit that is under that specific port. The circuit structure
stores circuit information, which can be displayed by using the show
ip route circuit cct_handle command (for the whole tree or a
specific circuit). Circuits are unique and global on the router. They are
not connected to any context or routing table, but if you type a specific
circuit handle, the operating system displays the context in which the
circuit is used.
To verify that circuits have been configured on line cards as expected, use any
of the following (hidden) commands:
Link groups support fast failover, as well as the quality of service (QoS) policing,
metering, and queueing features. Although you set the QoS configuration for
link groups at the link-group level in link group configuration mode, policing,
metering, and queueing are performed internally per constituent port.
• Link redundancy—Using a link group with two ports on the same line card
to provide link redundancy.
SPGs are allocated per-LAG using the formula (N * (N-1) + 1), where N is the
configured maximum-links parameter for the LAG. Within each LAG, one of
the SPG-IDs (the “+ 1” in the formula) is reserved for mapping some special
circuits, and the (N * (N-1)) is designed such that if one LAG member port fails,
there will be (N-1) entries in the SPG table referring to that failed port, so those
entries can easily be redistributed among the remaining (N-1) available ports.
All traffic in the same subgroup goes through the same port, not by the
source and destination address as in load balancing. ISM assigns a SPG-ID
to every LAG circuit, regardless of whether the circuit is packet-hashed or
circuit-hashed. ISM also maintains the mapping from SPG-ID to LAG port
in the SPG-ID table. Assignment of circuits to SPG-IDs is not as simple as
round-robin – a new circuit is assigned the SPG-ID that would lead to the best
balancing. Fair balancing is only considered for circuit-hashed circuits, where
forwarded traffic uses the SPG-IDs. For packet-hashed circuits, the SPG-ID is
only used for injected control traffic, so those circuits are not included in the
balancing calculations. It is possible for all packet-hashed circuits on a given
LAG to be assigned the same SPG-ID.
For packet-hashed traffic, load balancing is a function of the hash key computed
for the packet from various packet header fields and the distribution of LAG
ports in the hash table.
• The number of available ports can be divided equally among all rows in the
hash table. For example a 3 port LAG may not be perfectly distributed in a
table, where a 4 port LAG is likely to be, depending on implementation.
• The set of input flows (IP headers, either 2-tupe or 5-tuple) hashes equally
to different hash table rows. Generally this is done by building a sequence
of flows with IP addresses varying by 1.
Load balancing hash keys are computed depending on ingress circuit type, and
node configuration. A key is computed by running an algorithm on a set of
inputs, which are fields copied from the packet header. For example when the
input circuit is configured for L3 forwarding, the key is built with fields from the
L3 (IP) and sometimes Layer 4 (L4) (TCP/UDP) headers. When the input circuit
is configured for L2 forwarding, the key can be built from the L2 header, or we
can configure load balancing hashing to look deeper in the packet. You can
configure more header data to be considered by using the global configuration
mode service load-balance ip command.
For general load balancing information, see Section 3.7.7 on page 144.
There is a LAG hash table for every LAG that maps hash results to LAG
constituents. LAG topology and configuration influence the efficiency of load
balancing.
Note: When links are added or removed from a LAG via configuration, tables
are also reshuffled to achieve optimal balancing.
Table 5 assumes that each constituent port is the same speed (GE, 10GE).
Hash keys are built only once for each packet during packet parsing, but some
packets need to use them for more than one path selection. For example, an
IP route may use ECMP, and some or all of the ECMP paths might be over
LAGs. ECMP is very similar to LAG in that some the bits from the hash result
are used to select a path, but in the ECMP case it is an L3 path opposed to the
L2 path in LAG. Imagine the case of a 2-path ECMP where the first path leads
to a 2-port LAG shown in Figure 35.
Figure 35 2-Path ECMP Where The First Path Leads to a 2-port LAG
Assuming a similar table is used for ECMP path selection as for LAG port
selection, if both algorithms use the same hash result then both algorithms will
select the same path. Thus all traffic that hits this LAG for this ECMP route will
use port X, resulting in bad load balancing unless there are lots of different
routes pointing to this LAG. The solution is to use a different hash result for
different path selections. We don’t want to compute a new result using new
fields from the packet, but since the original result is generally 32 bits, we can
build multiple different results from that result. If each result needs to be 8
bits, we can extract four different results from the original result. If the hashing
algorithm is good, then it’s results are well distributed in 32 bits, and thus any N
bits from the 32 bit result should be well distributed.
• When the receiving port is in a LAG, the port SRAM contains a LAG flag
and the Constituent ID (CID) for the port.
• An LACP bit indicates whether LACP is active. If it is set for standby, then
only LACP packets are accepted.
• The VLAN demux table for the port leads to the pseudo-circuit.
• During L2/L3/L4 packet parsing, a hash key is built from various header
fields such as the source and destination IP address.
• The LAG next hop contains a pseudo-adjacency ID, plus either an SPG-ID
(for circuit-hashed circuits) or an LG-ID (for packet-hashed circuits).
• For circuit-hashed circuits, the Destination Cookie and CID are determined
using the SPG ID.
• For packet-hashed circuits, the Destination Cookie and CID are determined
using the LG ID plus 6 bits of the computed hash result.
• The packet is sent to the fabric with the Pseudo-adjacency ID, the
Destination Cookie, and the CID.
Packets are sent on the egress path with the following process:
• The ICH indicates that the egress circuit is a LAG pseudo-circuit and
provides the CID.
• The CID is used as an index to find the per-constituent stats offset, metering
token-bucket (if needed), and queueing details.
RIB support for LAG is handled differently for trunk LAGs and Access
LAGs. For trunk LAGs, RIB produces multi-adjacency Trunk LAG next hops
(representing the constituents) in the FIBs, as in Figure 36.
RIB supports access LAG with single pseudo-adjacency and SPG-ID based
next hops. FIB entries pointing to such a nexthop assign the data traffic to one
of the constituent ports in the LAG next hop referenced by the SPG-ID. RIB
acts on slot-mask and SPG-ID changes from ISM to update the next hops on
all NPU ingress flows and the adjacencies on the respective slots. No packet
hashing is supported by access LAGs. FIB lookup gives the access LAG next
hop and the corresponding pseudo adjacency (PW-Adj) with the SPG-ID. The
SPG-ID look up (SPG table is present on all the NPUs) gives the physical
adjacency and the packet is sent across the back plane to the egress path with
the PW-ADJ and the physical adjacency.
For access LAGs, RIB produces a single adjacency for each constituent in the
FIBs, as in Figure Y.
Each LSP circuit has the following values in the circuit handle:
For MPLS over LAG, the Ethernet circuit is a pseudo circuit, and each LSP
adjacency is a pseudo adjacency capable of transmitting packets to any
constituent on that ePFE.
The pseudo adjacency is stored on all slots in the LAG slot mask.
Transit LSPs over LAG interfaces are similar to the ingress case, except that
label mappings pointing to the LSP pseudo adjacencies are stored on all slots
of the ingress path and indicate that either packet hashing or circuit hashing is
needed; see the following example.
2 LGMgr assigns a valid LG-ID to the link group and creates the
pseudo-circuit handle.
3 LGMgr informs the other managers about the new link group and then
sends the message to LGd with all the LAG attributes.
4 LGd stores the information and sends IPC a message about the new link
group to ISM, which contains the pseudo-circuit handle and the default
attribute flags. It also sends the circuit ethernet config message,
the min-link and max-link configuration messages to ISM. The first time that
the default values are sent, LGd also sends the SPG egress mode flag to
ISM. If not set explicitly, the default mode of round robin is sent. Then the
MBE EOF message is sent to indicate the end of link-group information.
6 ISM also sends the UP/DOWN state of the LAG circuit, the circuit
ethernet config message received from LGd, and SPG group
create message, and the LG config message with the link group flags
to all the clients.
1 When the link group is deleted, LG CLI sends a link-group delete message
to LGMgr.
2 LGMgr informs the other managers about the LAG deletion (with an LG
callback).
3 LGMgr deletes the circuit handle for this LAG from it's database and frees
the allocated LGID for reuse.
6 ISM sends the delete message first for the level 2 circuits under the link
group. ISM then sends the state of all the level 2 interfaces whose unbind
messages were sent by LGd. ISM also sends the currently active link group
configurations to all the clients, then sends an ungroup message for each
constituent on the LAG circuit.
7 ISM sends the delete message first for the level 1 circuits under the link
group. ISM then sends the state of all the level 1 interfaces whose unbind
messages were sent by LGd. ISM also sends the currently active link group
configurations to all the clients, then sends an ungroup message for each
constituent on the LAG circuit.
1 When a new constituent link has been added to a LAG, LG CLI sends
the grouping information to LGMgr.
4 LGd sends the link group event with the level 1 pseudo-circuit handle and
the physical link's handle to ISM. Along with this, LGd also sends the LG
Ethernet and circuit configuration messages with Ethernet flags.
5 ISM forwards the LG group message to all the clients. Then ISM forwards
the Ethernet configuration for the member circuit and the state of the
member circuit and port circuit to them. ISM then sends the number of links
that can pass traffic to all the clients.
2 LGMgr informs the other managers about the deletion, and then sends the
LG ungroup event to ISM with the level 1 pseudo-circuit handle and the
physical link's handle. LGMgr also sends the circuit configuration message
about the deletion to ISM.
3 ISM forwards the LG ungroup message to all the clients and then forwards
the Ethernet configuration for the member circuit to them.
4 ISM also forwards the Ethernet configuration for the parent circuit (without
the deleted link) to all the clients.
When an 802.1Q PVC is created in a link group using the dotq pvc option
command in link-group configuration mode, a level 2 pseudo circuit under
the link-group is created with the default set of attributes. The addition is
propagated throughout the system with the process illustrated in Figure 42.
5 ISM forwards the dot1q configuration for this circuit to all clients, along
with the state of the circuit.
The SSR (unified) LAG model consists of two basic modes for load balancing:
packet hashing and circuit hashing (“subscriber protection”). Circuit hashing
has a faster selection mechanism and better failover properties (if replicated)
favoring its use for subscriber facing routing. However, replicating all L2
circuits for all ports uses too many resources in the forwarding plane, if the
number of subscriber circuits is high. To overcome the high resource load
in the forwarding plane, economical mode was introduced for circuit hashed
circuits (with the price of slower failover handling). The use case for this feature
is networks where scaling is more important than fast failover, for example,
subscriber facing LAGs.
• PWFQ and MDRR queuing policies are not supported on 10GE ports
• All ports in a particular LAG must belong to the same card type (for
example, no mixing 10GE and 1GE ports)
NGL2 circuits (service instances) under economical LAG are always circuit
hashed, by default.
Note: For multicast traffic, you must configure “replicate and load-balance”
on the parent circuit to transmit multicast packets on the parent circuit,
with Remote Multicast Replication (RMR); otherwise, the multicast
traffic is transmitted per child circuit.
Slot 1 Egress:
Size of Sub-protection group table : 8
No entries in sub-protection group table : 2
Sub-protection group table is at : 0x43edac0
Slot 4 Egress:
Size of Sub-protection group table : 8
No entries in sub-protection group table : 2
Sub-protection group table is at : 0x43ed940
• DOT1Q sets the LAG options for replicating (to allow circuits to be
replicated on all the constituent ports) and packet-hashing load-balancing.
• If there are changes, DOT1Q sends them to PPA3LP to change the slot
masks.
• DOT1Q asks ISM to get the PFE mask from the parent circuit handle.
Role of ISM
• ISM downloads all the L1 circuit events to the constituent PFE's registered
with ISM. All L2 circuits created under the LAG are marked by default as
economical and circuit-hashed, All L2 circuit events are sent to the Home
Slot. Non-Home Slots only receive L2 circuit configuration events.
Role of CLS
• Initially, when CLSd receives the configuration from RCM, CLSd obtains
the PFE mask using the ISM API which enables retrieving the PFE mask
for any type of circuit.
• CLSd sends the message to ISM to “subscribe” for the events for this circuit
After that, every time when PFE mask for this ULAG changes, CLSd
receives from ISM the cct-config event, containing the new PFE mask for
this circuit
• When the circuit is unconfigured, CLS “unsubscribes” from the PFE mask
update events from ISM, for this circuit. The only (significant) change
interface required, is to introduce the PFE mask to the cct-config event
data (currently it has only slot-mask).
• ISM sends PPPoE notice that a circuit has been created when a LAG is
created.
• PPPoE receives a circuit up message when a port has been configured for
LAG and PPPoE marks the circuit as ready.
• When PADI is received, PPPOE will extract the real physical circuit handle
from the CMSG and use it to check for PFE session limit by calling the
API from ISM (already done today).
• PPPd receives CCT-create and CCT-Cfg from ISM, create a circuit and
notify PPPOE the circuit is ready to negotiate LCP.
• PPPOE receives PPP message and reply to the client with PADS.
PPP will call an API from ISM with the ULAG handle and ISM will return the
active physical slot, this active slot will be filled in the CMSG.
• when a port belong to the ULAG link-group is down, ISM will send a circuit
down on the ULAG circuit and PPP will pull down the PPP link.
• For PPPoE, the pfe complex mask is queried from ISM by passing the real
received cct handle. PPP will use PAL layer to add pfe complex mask for
a pseudo cct handle.
Port and circuit mirroring are not supported on the 4-port 10GE or 20-port GE
and 2-port 10GE line card with PPA3LP NPU
To prevent mirroring loops, for physical forward output types, you cannot
enable mirroring for a port if it is already in use as a mirror output.
To support local mirroring, this feature includes support for both physical ports
and next-generation Layer 2 (NGL2) service instances as forward output. NGL2
service instance forward outputs allow VLAN tag manipulation on the frame.
Port or circuit mirroring are enabled after the following configuration steps are
completed:
• Ethernet port
• Pseudowire instance
• Mirror source ports (a port cannot be both a source of mirror traffic and
a mirror destination)
• Link groups
• Ingress mirroring is performed as the packet is received from the wire and
includes all packets received from the port before any modifications are
made to the packet. Error packets and exceeding the ICFD queue limit
packets are not mirrored. However, any packet that the NP4 pipeline has
accepted for processing is mirrored, including multicast packets.
The Egress Port Mirror Key contains the outgoing port number copied from
the PSID in the Egress Circuit search result. For rate-limiting, the L2 packet
length is also included in the key. The rate-limit is applied to the packet as
it is received from the fabric.
• Mirroring decisions are supported by the NP4 ALd mirror policy cache
(entries are added when the policy is created, updated, or deleted). The
table enables later communication between FABL and ALd, which use the
policy IDs. The policy table includes such data as policy state and ID, the
next hop ID, the forwarding-mirror class, and other policy attributes. Each
PFE has a table and policies are shared by ingress and egress mirroring.
Policy bindings are also maintained in ALd in another table, which includes
such data as the circuit handle and mirror policy handle.
3.5 Routing
On the SSR, IPv4 and IPv6 route information is collected from the different
routing protocols in the Routing Information Base (RIB) on the RPSW controller
card, which calculates the best routes and downloads them to the Forwarding
Information Base (FIB) stored on the line cards.
RIB boots occur for a variety of reasons, such as restart of the RIB or ISM
processes or the switchover during the upgrade from one SSR OS release to
another.
From the RIB perspective, a route is a prefix pointing to one or more next
hops, as seen in Figure 45. Each next hop is a path to the route endpoint or
to another router.
• A next-hop key, which consists of the triplet (IP address, circuit handle,
and interface grid).
• A next-hop grid, which is a unique identifier that begins with 0x3, followed
by the next-hop-type, followed by a unique index.
Next-hop types are listed in the grid.h file. There are hundreds of next-hop
types. The two basic types of next-hop are:
For each connected next-hop, RIB also creates an adjacency ID. The
next-hop grid is used on ingress by the packet forwarding engine (PFE) and
the adjacency ID is used on egress for indexing next-hops. RIB downloads
the same structure for both ingress and egress. There is a one-to-one
mapping between the next-hop key, the next-hop grid, and the adjacency
ID (if present).
Adding a new route to RIB always begins with the rib_rt_add()l thread, the
main entry point, triggered by routing protocols (such as BGP, OSPF, IS-IS,
or RIP) adding routes to the RIB by sending messages to it. RIB has a route
thread which performs a basic IPC message check and finds the client that is
attempting to add the route. If the client is registered, the message is assigned
into a client queue. Each client has its own queue on top of a scheduler. The
scheduler depletes these queues, specifically taking 2 IPC messages from
each client’s queue using the round-robin method. In a single IPC message
there might be a few hundred routes packed.
RIB can have multiple paths (next hops) to the same prefix. This occurs when
next hops with the same prefix are added by different routing protocols. The
router saves each path provided by each protocol. For example, OSPF adds
10.10.10.0/24 → 2.2.2.2 and staticd adds 10.10.10.0/24 → 1.1.1.1 , Or, paths
to the prefix 10.10.10.0/24 are added by different protocols through different
linked list storing paths. Theoretically, the prefix 10.10.10.0/24 can be reached
by an access circuit (AC) path (regular non-LSP or IGP routes), an adjacency
path (ARP entries from directly connected networks in the same subnet), or an
LSP path (MPLS entries). If the prefixes added by different clients collide, the
router stores them in the appropriate linked lists. When the router forwards a
packet and several paths (next hops) to the same prefix are available in the
RIB, the router must select which one to use for that packet.
Which path does the router use? Each path has a distance, that is, route
priority, and the path with the lowest priority is selected as the preferred route.
The adjacency-supplied next hop has the lowest distance (0), because packets
need only to be passed through a port on the current router to reach their
destination. The distance parameter sometimes indicates which client created
the next hop, because each protocol has a unique distance default value. The
linked lists are sorted by distance. The link with the shortest distance becomes
the active path. RIB downloads only the active path to the line cards, thus the
collection of prefixes and active paths forms the FIB stored in the line cards.
Every time there is a change in the active path (for example, a new protocol
with lower distance has added the same route), it must be downloaded to
update the FIBs.
even though it was learned from an LSP path. Although routes from LSP
paths by default are FIB ineligible, all LSP paths must be downloaded to FIB,
because these paths must be available on the line card for use by L2VPN
and L3VPN routes. L2VPN and L3VPN routes are selected by MPLS criteria
and not their distance values.
Interface subnet routes are internal routes that are added automatically by RIB
every time an IP interface is configured and is up. These routes are added
automatically for each IP interface, even if the interface is not used for sending
or receiving traffic yet.
1. If both the port and interface circuit are up when an interface is bound
to a port, the interface comes up.
2. If the interface has an IP address, RIB creates a subnet route called the
locally connected route triggered by the corresponding ISM event, as
shown inFigure 47. At this point, the router doesn’t know whether a router,
a host, or an entire LAN is on the far end of the link.
The ARP resolution process occurs when traffic is being sent or received
in the interface. If there is traffic sent out of the interface, ARP is involved
and tries to resolve the destination IP address. When it is resolved, an
adjacency route is installed. When a routing protocol installs a route that
has this interface as a next hop, the process described in Section 3.5.1.4
on page 115 is used.
Note: The hidden routes (routes that are not immediately visible through
show commands) in Table 6 are created automatically to punt IP
packets sent to specified IP interfaces onward to the RPSW card for
further processing. The next hops of hidden routes have a special
grid (0x314). Their circuit and interface attributes are zeros. For more
information on punted packets, see Section 3.7.4 on page 141.
The unresolved connected next hop at 10.10.10.0/24 in the first row of Table 6
and illustrated in Figure 47 is resolved by the following process:
Because the locally connected next-hop is connected, the router knows its
circuit and the interface, and RIB creates an adjacency ID for it. However,
its IP address is zero, which means that when a packet destined to a host in
this subnet (for example, 10.10.10.2) is received and the longest prefix match
points to this next hop, the packet forwarding engine (PFE) knows where to
send it but can’t send it yet, because the destination MAC address is unknown
and therefore it cannot create its Layer 2 encapsulation. The PFE sends an
ARP cache miss message to ARPd with the packet's destination address.
ARPd queries RIB for the source address. RIB finds the subnet route entry with
longest prefix match from the interface address, which is sent back to ARP
to use as the source address toward the given destination. ARP creates the
packet and sends it to the Networking Processing Unit (NPU), which forwards it
on the appropriate link. If the destination is alive, it replies with its MAC address,
which goes up to ARPd. Since there is no ARP table on the line card NPU,
ARPd inserts the following adjacency route into RIB (MAC ADD message):
This is called a connected adjacency next hop (resolved next hop), which has
exactly the same circuit handle and interface grid as the locally connected next
hop except that it has a valid MAC address and IP address.
Note: When a connected route is created, RIB downloads both the prefix and
next-hops and the egress adjacencies to the PFEs.
Another technique is to Telnet from both directions (from and to the SSR)
and observe the Telnet packets with the tcpdump command. This technique
works if you are the only person Telneting to the SSR, which is likely. If you
see your packets in tcpdump, RIB is not processing them; If not, you need to
double-check that your routes are installed properly on the line card.
You can also use the hidden show ism client RIB log command to see if
ISM sent the interface up, circuit up, and port up messages to RIB, whether all
three messages were sent, and when they were sent. To filter these messages,
provide the circuit handle in the command: show ism client RIB log
cct handle circuit_handle detail. The output of this command
displays the IP address. If you only know the IP address, the circuit handle can
be derived from the show ip route ip-addr detail command.
The first step in debugging a route is to enter the show ip route prefix
detail command and verify whether or not RIB has the route prefix. If not,
enter the command show {static | ospf | bgp} route prefix to
verify whether or not the routing client has the route.
Note: When debugging routes, using the show ospf route command
queries OSPF for its routes. Using the show ip route ospf
command queries RIB for its OSPF routes.
If RIB has the route, it displays the adjacency. The first byte of the adjacency is
the card number (for example, 0x8 is card 9). If the first byte is 0xff, it means a
pseudocircuit, which can be located on any card. ISM decides on which card to
put the circuit and tells RIB in the slot mask field of the circuit. If the traffic is
congested on that card, ISM relocates it to another card; the slot mask indicates
the actual position. A zero adjacency means it is blackholed. Packets routed to
a blackhole adjacency are discarded.
Verify that a card is installed using the show chassis command. Does it
match the slot/port fields of the circuit handle?
If the cards are installed in the chassis as expected, verify the next hop using
the show ip route next-hop nh-hex-id detail. The command
output displays the next-hop IP address, the resolved MAC address for that
address, the interface grid, the FIB card bits, and the reference counters, which
indicate how many routes and next hops are pointing to this next hop. The FIB
card bits indicate on which cards the next hop has been downloaded. Using
the binary representation of these bits, the upper bits indicate the egress slots,
while the lower bits indicate which ingress cards have it. A 1 in a binary position
means that the card with that number has the next hop.
To verify that the MAC address matches the one that ARPd has for that specific
next hop IP address, enter the show arp cache command. If the destination
host is down, there is no resolved MAC address. If you have a resolved MAC
address but RIB does not have the adjacency, there is a miscommunication
between ARP and RIB.
The interface grid identifies the interface that is bound to a specific circuit. To
see what interface status RIB has, use the show ip route interface
iface_grid command. The show ism interface iface_grid command
provides the same information from the ISM perspective. In these commands
the iface_grid argument is the internal interface ID in hexadecimal numbering.
To verify what information the FIB has regarding the destination IP address,
use the show card x fabl fib * command. The output provides detail
on what RIB has downloaded, including the /32 entry for the adjacency. If that
is not present, verify the next-hop grid that was downloaded using the show
card x fabl fib nexthop {all | NH-ID} command. The output
displays the valid IP address, circuit handle, and interface for all next hops in
FIB. To verify that it has an adjacency ID and a MAC address, use the show
card x fabl fib adjacency {all | adj-id} command. If the route
does not have an adjacency or MAC address, then the problem may be ARPd.
Note: The adjacency index follows the card number in the adjacency ID. For
example, if the adjacency is 0x80000001 (card 9), the adjacency index
is 0x0000001.
If adjacencies or next hops are missing from the FIB, check the logs to see
what has been downloaded by RIB using the show card x fabl fib log
[rib] [ingress | egress] command. Note the message timestamps, which
may indicate when FIB received a route entry from RIB.
You can configure up to eight static routes for a single destination. Among
multiple routes with the same destination, preferred routes are selected in the
following order:
2 If two or more routes have the same distance and cost values, the
equal-cost multipath (ECMP) route is preferred.
3 When redistributing static routes, routing protocols ignore the cost value
assigned to those routes. When static routes are redistributed through
dynamic routing protocols, only the active static route to a destination is
advertised.
The SSR resolves next hops for the routes, adds them to RIB, and downloads
them to FIB.For the example in Figure 48, an interface with IP address
10.10.10.1/24 is bound, and the 10.10.10.0/24 route to the locally connected
next hop (unresolved next-hop) is downloaded by RIB to line cards.
2 The ingress FIB lookup results in an unresolved next hop. The packet is
forwarded to egress.
3 Because the 10.10.10.2 address is not resolved, a message is sent from
egress forwarding to ARPd to resolve 10.10.10.2. The packet is buffered at
egress.
7 ARPd forwards the broadcast ARP request through egress to the physical
output.
10 ARPd sends message to RIBd with the MAC address resolved for
10.10.10.2 11.
Figure 49 illustrates the steps to add the static route to RIB and download
it to FIB.
The static route (with the same prerequisites as the static routing sequences) is
added with the following steps:
1 STATICd queries and registers for next-hop 10.10.10.2 for static route
30.1.1.1.
3 When the next hop has been added, STATICd sends a message to RIBd to
add the static route.
If the next hop is unresolved when a packet arrives matching the static route,
the next-hop IP address is resolved using ARP, as shown in Figure 48.
To verify that static routes have been added to RIB and downloaded to FIB,
use the show ip route command.
3.5.6 IPv6
To enable the SSR to serve as an IPv6-aware node, the SSR platform
supports configuring IPv6 interfaces toward the neighboring nodes and
enabling IPv6 address-based routing and forwarding. The router supports
IPv6 address-based routing protocols (OSPFv3, IS-IS, RIPng, and BGP), IPv6
neighbor discovery (ND), IPv6 link-local address configuration and forwarding
support, and 6PE and 6VPE support. Support for tunneling mechanisms like
6PE, 6VPE, and IPv6 over IPv4 enables customers to route IPv6 packets
through intermediate IPv4 address-based networks.
IPv6 configuration is supported for all physical and link group circuit types. Also
dual stack with simultaneous IPv4 and IPv6 configuration is supported for these
circuit types. IPv6 is also supported on GRE circuits for tunneling IPv6 packets
in IPv4 GRE tunnels as well as in IP in IP tunnel circuits for 6in4 tunneling.
IPv6 and RIB—The entire IPv6 routing protocol and the IPv6 static route installs
routes with RIB. As with IPv4 routes, RIB qualifies the best possible routes and
installs them into the routing table. The responsibility of the IPv6 RIB includes:
• Installing the next hop and the route, in the same order
Link-local addresses have link scope, meaning they are relevant over a single
link. Packets with source or destination link-local addresses are not forwarded
by routers to another node. IPv6 link-local support allows customers to
configure link-local addresses per interface or let the system assign a link-local
address per interface automatically.
RIB sends all link-local next hops, both local & remote, to all line cards. RIB
re-downloads them to all the next hop’s Home Slots whenever next-hop circuit
slot masks change. FABL downloads all remote link-local addresses and only
line card-specific local link-local next hops to forwarding.
FABL keeps a link-local database to resolve link local addresses into next-hop
IDs with the following information:
• Link local address and circuit handle to next-hop for remote destinations
FABL stores both remote and local link-local table entries, where as fast-path
forwarding only stores the local link-local table. The fast-path forwarding
link-local table of local destinations is used to perform source link-local address
validation for incoming packet to ensure that only correctly addressed link-local
packets are sent to the RPSW.
The FABL link-local database is kept in shared memory. Insertion and deletion
of entries of the database is done only by the FABL-FIB process. Other
processes have read-only permission for lookup.
The SSR supports ping to both remote and local link-local destinations. In the
ping ipv6 command, the circuit handle must be included with the link-local
address to identify a remote link-local destination. For more information, see
Command List. Following is the flow diagram
Figure 50 illustrates the process for pinging a remote link-local address with link
layer address (MAC) not yet resolved:
For example:
1 A user sends an IPv6 Ping packet to FABL PAKIO with link-local destination
ID fe80::300:50 and circuit handle: 0x40080001
c Redoes lookup with zero IPv6 address and context ID as key. This
returns the interface next hop ID.
3 FABL PAKIO forwards the packet with the link-local next hop to the NPU.
5 FABL ND stores & throttles the packet for resolution and sends an ND
Resolve Request to NDd for the destination address.
10 The link-local destination is resolved into the MAC next hop for the
destination IPv6 address. RIB installs it to FABL and FIB.
12 The NPU carries out an ICMP Ping exchange with the next hop to verify it ,
and forwards the ICMP-Reply message to FABL-PAKIO, which forwards
it to back to the originator.
Physical and link group pseudo circuits can be configured to carry both IPv4
and IPv6 traffic at the same time. Such circuits are called dual stack circuits. A
single circuit handle will represent state and counters required to support both
IPv4 and IPv6 traffic. This support is provided when a single interface has both
IPv4 and IPv6 addresses configured. Both single bind and multi-bind interfaces
with such configurations are supported. Each of the addresses (IPv4 and IPv6)
can be added or removed without affecting traffic for the other address type.
For example, a circuit can move between single stack and dual stack without
affecting traffic flow for the address family that is not modified. Dual stack
configuration is supported over unified LAG circuits in both packet hashed and
circuit pinned mode. Separate ingress and egress counters are maintained for
IPv6 and IPv4 traffic per circuit when a circuit is configured as dual stack.
3.6 MPLS
IP routing suffers from four delays: propagation, queueing, processing, and
transmission. These delays cause IP routed data to be rough and disorderly,
making IP routing unsuitable for multimedia applications, especially phone
calls. The largest delay is for processing, because lookups in the tree storing
the routing table take time. To resolve this problem, MPLS and the label
concept were invented.
In MPLS, the complete analysis of the packet header is performed only once,
when it enters an MPLS-enabled network. At each incoming (ingress) point of
the MPLS-enabled network, packets are assigned a label stack by an edge
label-switched router (LSR). Packets are forwarded along a label-switched
path (LSP), where each LSR makes forwarding decisions based on the label
stack information. LSPs are unidirectional. When an LSP is set up, each LSR
is programmed with the label stack operations (for example, pop-and-lookup,
swap, or push) that it is to perform for that label.
Once an MPLS router receives a packet with a label, the SSR performs a
lookup on the label and decides what to do. If it is an intermediate router along
the LSP path, it swaps the label with a different one and forwards the packet
unless the next hop is the edge of the LSP path. In the latter case, the router
pops off the label and forwards the packet to the edge, which does not have to
do label lookup but only pure IP routing. This is called penultimate hop popping
(PHP). Logically the MPLS tunnel terminates at the edge, even though the
eLER (edge LER) does no label handling.
At the egress point, an edge LSR removes the label and forwards (for example,
through IP routing longest prefix match lookup) the packet to its destination.
MPLS uses RSVP, LDP, or BGP to communicate labels and their meanings
among LSRs.
• At the ingress LSR, the IP TTL field is decremented and propagated to the
MPLS TTL field located in the label header.
• At the egress LSR, the MPLS TTL field replaces the IP TTL field, and the
label is popped.
3.6.4 MPLS-TE
Another situation where shortest path LSP is insufficient is when QoS is
imposed on the network and the routers involved in traffic need to reserve
resources for the flows. This process is called MPLS traffic engineering
(MPLS-TE). RSVP is used for MPLS-TE scenarios. RSVP also can handle
failures by installing backup paths that handle traffic when a failure in the
primary path is detected. The backup LSP feature of RSVP is called fast reroute
(FRR). Backup paths can be set to bypass a node, a link, or even a full LSP. To
detect failures, network administrators can use MPLS ping or configure virtual
circuit connectivity verification (VCCV) or Bidirectional Forwarding Detection
(BFD). Of these methods, BFD provides the fastest automatic detection of
MPLS connection failures.
In Figure 52, suppose that an LSP (ABCD) is configured between nodes A and
D. If a link or node fails along the LSP, traffic could be dropped until an operator
reconfigures (which is a slow mechanism) the LSP around the problem (for
example, AEFD if the fault is between B and C).
A bypass LSP functions exactly like any other RSVP LSP except that it does not
carry traffic under normal conditions. When a link or node failure is detected,
traffic is quickly rerouted onto a bypass RSVP to circumvent the failure.
3.6.7 L3VPN
Figure 54 shows the high-level design of the MPLS functionality in the Ericsson
IP Operating System.
• Enabling LDP
• Configuring L2VPNs
These MPLS configurations are mainly stored in the LM, RSVP, and LDP
processes and are used to configure the various line cards. Thus, multiple
processes cooperate to configure MPLS LSPs and the dataplane for MPLS
forwarding. For example, RIB installs the FEC entry (route pointing to an MPLS
next hop), whereas LM installs the NHLFEs. LM is also the control plane entity
handling the MPLS-ping and MPLS-traceroute. LDP and RSVP are processes
that handle label distribution. BGP can also interact with LM to configure
L3VPNs. The major LM data structures and their relationships are used to store
the information passed from RCM or communicated from other processes such
as ISM (for example, circuit state), LDP (for example, PW inner label), and RIB
(for example, next-hop related information).
MPLS implementation is managed in both the control plane and the data plane:
• The control plane runs the protocols such as LDP, BGP, and RSVP-TE and
configures the data plane through LM.
LSPs are typically set up dynamically through LDP or RSVP, but they can also
be set up through static configuration. The SSR supports a single platform-wide
label space that is partitioned per application (for example, LDP and RSVP). An
LM library is linked per application, which facilitates the label allocation. The
applications later install the allocated label to LM via the LM API. Although
LSPs or MPLS tunnels may utilize label stacks that are more than one label
deep (for example, PWs consisting of an outer tunnel label and an inner PW ID
label), this section describes setting up only the outer-label tunnel LSP.
The SSR supports the following ways to configure such tunnel LSPs:
• Static—An operator manually configures the label values and label stack
operations (push, pop, swap) to use on a specific interface, along with all
the needed information such as the egress IP of the LSP, the next-hop IP
address, and so on. The configuration is forwarded from CLI to RCM to
the MPLS-static daemon, which uses the LM API to configure the LSP
through the LM daemon.
An LSP can be set up over a Link Aggregation Group (LAG) where different
circuits over different line cards are participating in the LAG. For that topology,
an ingress next-hop structure (FTN case) or an ILM entry can point to an
array of adjacency IDs belonging to the same or different egress line cards.
The label to be used is the same across the LAG (the same label is used for
all adjacencies) and the data plane load-balances flows across the various
adjacencies/circuits in the LAG.
ECMP can also be used between two endpoints when several LSPs are set
up with equal cost between these two endpoints. RIB configures an array of
next hops for the same prefix. Each next hop points to an MPLS adjacency
holding the label for that LSP.
From the dataplane perspective, there is no difference whether the LSP is set
up statically or dynamically. Packets are received on a specific port and a
specific circuit (for example, tagged or untagged VLAN).
The forwarding abstraction layer (FABL) constantly monitors the state of each
port (updated by the drivers in PAd) referenced by the adjacency associated
with a next-hop structure (FTN case) or an ILM entry. When the egress circuit
of the main LSP goes down, FABL converts a port failure to a failure of the
circuits that are created on the failed port. If an alternative adjacency is
configured (for either bypass or backup LSPs since FABL does not differentiate
between the two), FABL immediately swaps the main LSP adjacency with the
configured backup adjacency.
In the control plane, the configuration of a bypass LSP is different than the
backup LSP. Even though a bypass LSP is created by RSVP through LM, when
a bypass LSP is configured in the data plane, LM notifies RSVP about that LSP.
Upon receiving the notice, RSVP rechecks its database for all eligible LSPs
that can benefit from the created bypass LSP. RSVP reconfigures all eligible
LSPs with NFRR with backup adjacencies (all eligible LSPs are automatically
reconfigured, not reinstalled). For a backup LSP, RSVP instructs LM to append
a backup LSP to an existing LSP. When RSVP realizes that the primary LSP
has failed, the backup LSP is reinstalled to LM as the primary LSP. The
configuration supports two backup LSPs for the same LSP. However, RSVP
downloads only one backup per primary LSP depending on availability (for
example, if the first backup fails, RSVP replaces it with the second backup
LSP). Finally, if both a backup LSP and a bypass LSP are configured for the
same primary LSP, RSVP configures only the backup LSP.
3.7 Forwarding
The SSR 8000 introduces a forwarding abstraction layer (FABL) to scale the
function across different line cards. The FABL is network processor-independent
and communicates with the control plane processes on the route processor
(RP). It also communicates with an adaptation layer daemon (ALd) on the line
card LP that contains and isolates network processor-dependent functions.
The combination of FABL and ALd maintains the data structures used by the
NPU to control forwarding. The NPU is responsible for forwarding packets to
other line cards, receiving control packets sent to the SSR, and sending control
packets originated by the SSR. The SSR supports unicast, multicast, punted,
SSC-punted, and host-sourced packet flows.
SSR line cards and SSC cards support the following forwarding types:
outgoing circuits, the multicast packets are replicated either in the fabric
element (FE), egress PFE, or both.
The SSR line cards and the SSC card also support the following control
information flow types:
• SSC Traffic Slice Management—To support traffic steering over SSC, SSR
introduces traffic slice management (TSM) to install either target next hop
In unicast forwarding, packets are forwarded from one line card to another
across the switch fabric.Figure 58 illustrates the packet flow path of an incoming
unicast packet from an ingress interface to an egress interface. IPv4 and IPv6
unicast IP addresses are supported.
2. Task optimized processor (TOP) stages of the ingress PFE process packets
with microcode processing, which includes forwarding functions such as
FIB lookup and ingress services such as rate limiting, QoS propagation,
and ACL marking. At the end of TOP processing, the egress port of the
packet is determined.
4. Packets are scheduled from the VOQ of the ingress FAP (iFAP) to the
egress FAP (eFAP). iFAP converts the ITHM to fabric traffic management
(TM) header (FTMH), and the FE uses the FTHM to route the packets to
the destination eFAP.
5. The eFAP transfers the packets to the PFE. The packet then enters the
PFE following the processing flow in Step 1.
Note: Each PFE can simultaneously process ingress and egress traffic.
Packets flowing through the PFE use the same TOP stages but
undergo different processing flows. Each FAP also simultaneously
supports ingress and egress traffic.
1. As ingress packets enter the PFE, they are classified as multicast packets.
3. At the end of TOP processing, PFE places the Multicast Adjacency ID in the
ITHM header. The packet is then enqueued in one of the fabric multicast
queues (FMQ) in iFAP. iFAP includes an FMGID and indicates that the
packet is of type multicast in the FTHM header.
4. The FE in the switch card receives the packets and transmits them to all
eFAPs specified in the FMGID.
5. The eFAP transfers the packets to the egress PFE. The egress PFE
identifies the packets as multicast from the forwarding header and performs
a lookup on the Multicast Adjacency ID. The results include the number of
port replications and a list of adjacency IDs.
• Egress packet steering—At the last SSC, packets are enqueued to the
egress queue toward line cards. The forwarding module then forwards the
packet to an egress line card based on regular FIB lookup.
When SSR nodes are configured in an ICR pair (in the BGP-based model),
TSM packet steering changes to a more complex model. In this case, packets
are steered to specific SSCs by using service maps as well as multiple TSFT
tables (dynamically created). For a diagram of the steering flow, see Figure 87.
In the egress punt path, the punt destination is based on egress adjacency
lookup. The processing flow is the same as that of the ingress punt path.
The following types of packets can be punted to the LP from the ingress path:
• Network control packets (for example, BGP, OSPF, BPDU, ARP, ICMP,
and IGMP)
• Packets destined for the SSR (for example, Telnet, FTP, SNMP)
Packets with an adjacency that has an unresolved ARP can be punted from
the egress path.
FABL processes on the LP either handle the packets or send them to the RP.
Figure 57 illustrates the processing path for packets sourced from either the
LP or the RP. The corresponding processing paths are summarized in the
following steps.
1. The LP prepends the packet with a Host Transmit Header (HTH) and then
transmits the packet toward the PFE. Source packets are buffered in the
PFE TM and then delivered to the TOP stages for further processing.
2. Unlike forwarding packets, sourced packets entering the TOP stages are
not subjected to ACL or rate limiting. Sourced packets primarily undergo
FIB lookup in the ingress microcode processing path.
3. At the end of TOP stages, the packet is queued to the VOQ of IFAP in the
same way as Step 3 for the unicast forwarding flow.
Note: The priority in the Host Transmit Header is preserved while queuing
to VOQ.
1. See Step 1 for the ingress path, with the exception that FIB lookup is not
required, and the prepended HTH also includes forwarding information
such as the next-hop adjacency and priority.
3. See Step 6 for unicast forwarding flow, with the exception that the egress
PFE TM sends the packet to the egress MAC interface, which in turn sends
the packets to the interface.
The SSR 8000 family architecture separates the control and forwarding planes.
The separation of the route processing and control functions (performed by the
operating system software running on the controller card) from the forwarding
function (performed on the individual traffic cards) provides the following
benefits:
In software, the router also supports the following types of redundant routes:
• MPLS route redundancy using LDP. For more information, see Configuring
MPLS.
The SSR supports traffic load balancing in the following redundant topologies:
• MPLS networks with label edge router (LER) and label-switched path (LSP)
transit node (P router) configurations
For information about load balancing, see Section 3.7.7 on page 144.
Table 7 provides details of load balancing for traffic in LER configurations (see
RFC 6178).
• To use Layer 4 data (or the 5-tuple hashing option), including the source
and destination IP addresses, the IP protocol, and the source and
destination ports for User Datagram Protocol (UDP) or Transmission
Control Protocol (TCP) streams) in the algorithm, use the service
load-balance ip command with the layer-4 keyword.
• Fragmented packets
Table 7 describes the hashing for LER load balancing for different applications.
Load balancing hashing for egress LERs (eLERs) depends on the topology of
the network into which packets are being forwarded. For example, if traffic is
being forwarded:
• Into a LAG path with multiple links, LAG hashing is used. For more
information, see Configuring Link Aggregation Groups.
You can configure load balancing in networks using BGP route advertisement
by using the BGP multipath capabilities. By default, BGP multipath is disabled,
which means that BGP installs a single path in the RIB for each destination.
If that path fails and no other path has installed a path for that prefix, traffic
destined for that path is lost until the path is available again.
When BGP multipath is enabled with the multi-paths command, BGP installs
multiple best equal-cost paths in the routing table for load-balancing traffic to
BGP destinations. With multipath, the paths can be:
• In a VPN context, a combination of iBGP and eBGP, where only one eBGP
path is allowed and the number of allowed iBGP equal-cost paths is equal to
the maximum number of paths allowed (configured with the multi-paths
eibgp path-num command) minus one. For example, if you configure
eibgp 7, six iBGP paths and one eBGP path are installed in the RIB.
When BGP multipath capabilities are enabled, even though multiple paths
are installed in the RIB, BGP advertises only one path (the BGP best path)
to its peers.
• Random input fields from different traffic streams. If multiple fields are used,
changes in those fields between different streams should not be correlated.
In testing, hashing might not work well if there is correlation between the
multiple fields used in the hashing. For example, using field1 (one byte) and
field2 (one byte) to generate the hash key (= field1 XOR field2) for testing
purposes, traffic streams can be generated in the following ways:
1 Field 2 stays the same and field 1 is incremented by 1 for each stream
(a typical testing method). This will achieve good hashing result. It's not
exactly random, but simple, and good for one field hashing.
2 Both field 1 and field 2 are incremented by 1 for each stream. People would
think this is random, but it's not because 0 XOR 0 = 1 XOR 1 = 2 XOR 2
= ….. All the hashing keys are the same. Starting field1 and field2 with
different values helps, but not much. Incrementing field1 by 1 and field by 2
helps some, but they are still correlated, which is not the best method.
3 Use a random number generator for both field 1 and field 2. This is the
best way to verify hashing effect.
For LSR label stack hashing, if the testing equipment is generating a label
stack of (1, 1) (2, 2), the load balancing effect will not be optimum.However,
adding a few outer label values with a lot more inner label values should work.
For LER hashing, if both the source IP address and destination address are
incremented by 1 for each steam, the load balancing will not work well. For L4
hashing, if both src port and dest port are incremented by 1 for each stream,
the load balancing effect again will not work well. In general, when multiple
input fields are used, testing should be done using separate random generators
for each field.
The PPA3LP NPU subsystem receives and processes packets coming from
various interfaces like access side interfaces, trunk side interfaces and control
interfaces. Based on processing results, the NPU either drops the packets or
forwards them onto different interfaces. The NPU subsystem also provides
various platform dependent API’s for FABL to send various configurations to
populate the various databases that are required for forwarding traffic. NPU
also provides API’s to punt and receive the control packets to and from the
FABL PAKIO module on the line card processors.
To support the BNG application, the NPU subsystem provides support for
forwarding IPv4 and IPv6 packets to and from the following types of subscribers
that are brought up either statically by configuration or dynamically by protocols.
The BNG application supports the following subscriber configurations.
As illustrated in Figure 61, the BNG application suite is logically divided into
three groups - the BNG protocols, BNG control, and BNG services.
The router control plane communicates with the ALd processes on the line card
processor which in turn communicates with the line card NPU.
Session connection information flow, using ALd as a liaison with the platform
dependent hardware:
• ISM forwards the circuit details to IFace. IFace sends a request to ALd to
enable the PPPoE encapsulation.
• ALd forwards it to PPPoE via the kernel. PPPoE returns a PPPoE Active
Discovery Offer (PADO) message to ALd.
• PPPoE creates the circuit with PPP attributes and forwards the details
to ISM.
• ISM forwards the circuit details to IFace. IFace Forwards Details to ALd.
• The client and PPPoE negotiate the session details via the kernel, Ald,
and the NPU.
• If the interface binding configured is for both IPv4 and IPv6, ISM informs
IFace.
• ND messages are exchanged between the client and NDd via the kernel,
ALd, and the NPU (all modules are informed).
• DHCPv6 and the client negotiate session details via the kernel, ALd, and
the NPU (all modules are informed).
• FIB sends the routes and next hop to ALd to be used for packet forwarding.
Note: The role of ALd in subscriber session negotiations is assumed, but not
shown in the rest of the session diagrams.
In static sessions, a fixed number of circuits are created manually. When the
sessions are set up, the subscribers are bound to the circuits by the ISM. The
Ericsson IP Operating System supports the following circuit encapsulation
types for static sessions:
Session Bring-Up
1. When the circuit is created, the CLI module transfers the configuration to
the router configuration manager (RCM) module.
2. RCM requests ISM to create the circuit and activate the session. RCM
sends the circuit configuration details to ISM along with the request.
3. ISM creates the circuit and propagates these updates to the other modules.
5. AAA sends a request to ISM for binding the interface with the configured
circuit.
7. The line card sends the subscriber status to the statistics daemon (STADd),
which further propagates the status to AAA.
8. When the client sends an ARP request message, ARP responds with an
“ARP Response” and adds the MAC address of the client to the Routing
Information Base (RIB).
9. RIB downloads the route, next-hop, and adjacency information to the line
card.
Session Termination
4. AAA sends a request to ISM to unbind the interface from the circuit.
5. ISM unbinds the interface and propagates these updates to the other
modules.
6. The line card sends the subscriber status to the statistics daemon (STATd),
which further propagates the status to AAA.
Session Bring-Up
To set up a static subscriber session using DHCP, you must configure circuits
and bind the subscribers to the circuits. The subscribers should also be
configured with DHCP_Max_Leases (RBAK VSA #3).
The following figure illustrates how a static subscriber session with DHCP is
created:
2. AAA sends a request to ISM for binding the interface with the circuit. AAA
sends the circuit configuration information along with the request.
7. AAA requests the ISM to update the circuit configuration with the IP
address.
9. ARP provides RIB with the MAC address of the host client.
10. RIB downloads the route and next-hop information to the line card.
11. The DHCP daemon forwards the “DHCPOFFER” message to the client.
12. The client sends a “DHCPREQUEST ” to the DHCP daemon, which further
forwards this request to the DHCP server.
13. The DHCP server responds with the “DHCPACK” message, which contains
all the configuration information requested by the client.
2. The DHCP daemon running on the router receives the message and
requests AAA to remove the IP address of the client from AAA database.
3. AAA sends an “ip host del” message to the ISM along with the circuit
configuration of the circuit to be removed.
4. ISM deletes the circuit and propagates these updates to the other modules.
Session Bring-Up
Static CLIPS sessions are static circuits that stay up as long as the port is up.
The CLIPS session is brought down only when the port is down, or the CLIPS
PVC is un-configured.
1. After the circuit is created, the CLI transfers the configuration to RCM.
3. CLIPS requests ISM to create the circuit and activate the session, sending
the circuit configuration details to ISM along with request.
4. ISM creates the circuit and propagates these updates to the other modules.
6. AAA sends a request to ISM for binding the interface with the configured
circuit.
8. When the client sends an ARP request message, ARP responds with an
“ARP Response” and adds the MAC address of the client to the Routing
Information Base (RIB).
9. RIB downloads the route, next-hop, and adjacency information to the line
card.
Session Termination
3. CLIPS sends a “session down” message to AAA to bring down the currently
active session.
4. AAA sends a request to ISM to unbind the interface from the circuit.
5. ISM unbinds the interface and propagates these updates to the other
modules.
6. The line card sends the subscriber status to STATd, which further
propagates the status to AAA.
Point to Point Protocol (PPP) is available on the router over untagged Ethernet
and 802.1Q PVCs. PPP over Ethernet (PPPoE) is a client-server connection
technology for subscribers to access the internet and IP services over an
Ethernet connection.
2. When the router receives it, the PPPoE process returns a unicast PPP
Active Discovery Offering (PADO) message to the client, containing it's
MAC address, server name, and the services it offers.
3. Assuming that the client selects the router's offer, it returns a PPP Active
Discovery Request (PADR) for connection.
4. PPPoE sends a request to ISM to create the circuit with the PPPoE session
ID.
ISM creates it and passes the circuit information to the other BNG modules.
The following figure illustrates the rest of the IPv6 connection process.
1. After the session/circuit has been established, the client and PPP negotiate
the IPv4 address using an IPCP message exchange.
3. AAA sends the request to bind the subscriber circuit to the interface to ISM.
7. The client and DHCPv6 exchange messages to assign the IPv6 PD prefix.
The following figure illustrates the IPv4 PPP connection communication flow.
1. After the session/circuit has been established, the client and PPP negotiate
the IPv4 address and exchange their interface IDs using an IPCP message
exchange:
2. AAA sends the interface binding, and session configuration with IP address
to ISM.
1. The client sends an IPCP termination request (IPCP term req) to PPP.
1. The client sends an IPv6CP termination request (IPv6CP term req) to PPP.
4. PPP sends a stack down message to AAA, which passes the configuration
change to ISM.
To bring down the PPPoE session, the router performs the following process:
4. PPP sends an LCP termination message to the client and reports that
the session is down to AAA.
5. AAA reports that the interface binding has been removed to ISM.
9. AAA reports the change in configuration to ISM and ISM passes the circuit
deletion to the other modules.
2. The DHCP daemon running on the router receives the message and
initiates a request with CLIPS to create a session.
3. CLIPS requests the ISM to create the circuit and activate the session.
CLIPS sends the circuit configuration details to the ISM along with request.
4. ISM creates the circuit and propagates these updates to the other modules.
8. AAA sends a request to ISM for binding the interface with the configured
circuit.
12. The DHCP daemon requests AAA to add the IP host address.
13. AAA requests the ISM to update the circuit configuration with the IP
address.
15. ARP provides the RIB with the MAC address of the host client.
16. RIB downloads the route, next-hop and adjacency information to the line
card.
17. The DHCP daemon forwards the “DHCPOFFER” message to the client.
18. The client sends a “DHCPREQUEST” to the DHCP daemon, which further
forwards this request to the DHCP server.
19. The DHCP server responds with the “DHCPACK” message, which contains
all the configuration information requested by the client.
Session Termination
2. The DHCP daemon running on the router receives the message and sends
a request to CLIPS to delete the session.
4. AAA sends a request to ISM to unbind the interface from the circuit.
6. The line card sends the subscriber status to STATd, which further
propagates the status to AAA.
10. The DHCP daemon forwards the “DHCPRELEASE” message to the DHCP
server.
Session Bring-Up
1. When the line card receives an IP packet from the client, it initiates a
request with CLIPS to create a session for the client.
2. CLIPS requests ISM to create the circuit and activate the session (and
sends the circuit configuration details to ISM along with request).
3. ISM creates the circuit and propagates these updates to the other modules.
8. AAA sends a request to ISM for binding the interface with the configured
circuit.
10. When the client sends an ARP request message, ARP sends an “ARP
Response” with the MAC address of the client to RIB.
11. RIB downloads the route, next-hop, and adjacency information to the line
card.
Session Termination
1. If IP packets are not received within the configured time, the line card sends
an "idle-timeout" message to AAA.
4. CLIPS sends a “session down” message to AAA to bring down the currently
active session.
5. AAA sends a request to ISM to unbind the interface from the circuit.
6. ISM unbinds the interface and propagates these updates to the other
modules.
7. The line card sends the subscriber status to STATd, which further
propagates the status to AAA.
This section describes the connection flows for LAC and LNS sessions.
2. When the router receives it, the PPPoE process returns a unicast PPP
Active Discovery Offering (PADO) message to the client, containing it's
MAC address, server name, and the services it offers.
3. Assuming that the client selects the router's offer, it returns a PPP Active
Discovery Request (PADR) for connection.
4. PPP sends a unit message to ISM and ISM sends updates to the other
modules, as well as a circuit up message to PPP.
5. PPP informs PPPoE that the unit is ready, and a synchronization between
them occurs.
7. The client and PPP carry out an LCP message exchange before beginning
authentication.
10. If successful, PPP sends a CHAP success message to the client, PPP
sends the session details to ISM and ISM sends updates to the rest of
the modules.
12. L2TP, the forwarding plane (FWD), and the LNS negotiate the tunnel setup.
18. PPPoE sends the session account information to AAA, which sends it to
ISM.
20. Packets to and from the client are now tunneled to the LNS.
4. PPP sends the LCP termination request to the line card, which further
propagates this message to the client.
8. The line card sends the subscriber status to the PPPoE daemon, which
further forwards the message to AAA.
9. AAA requests ISM to bring the session down and delete the circuit.
10. ISM deletes the circuit and propagates these updates to all the other
modules.
1. To initiate a tunnel between the LAC and LNS, the LAC sends a “Start
Control Connection Request (SCCRQ)” to L2TP.
2. The L2TP process gets the route from RIB and responds with a “Start
Control Connection Reply (SCCRP)”.
6. The L2TP process requests ISM to create and activate the circuit.
b ISM creates the circuits and propagates these updates to all the other
modules.
10. The LAC and PPP carry out an LCP message negotiation.
11. PPP sends the LAC a CHAP challenge and the LAC sends a CHAP
response.
13. If the subscriber was authenticated, PPP sends the LAC a CHAP success
message.
14. If the subscriber is configured for dual-stack, LAC and PPP carry out an
IPv6CP message exchange.
16. PPP sends circuit updates to ISM and ISM propagates the updates to all
the other modules.
17. AAA sends circuit updates to ISM and ISM propagates the updates to all
the other modules.
18. RIB downloads the subscriber route and next hop to FWD/FIB.
4. ISM unbinds the circuit and propagates the updates to all the other modules.
5. The line card sends a subscriber status message to STATd; which further
forwards the message to AAA.
7. AAA requests ISM to bring the session down and mark the state.
9. L2TP sends a CDN message to the LAC and a circuit delete message
to ISM.
10. ISM deletes the circuit and propagates these updates to all the other
modules.
3.9.1 QoS
Quality of Service (QoS) manages traffic flow through the SSR. The SSR
implementation is similar to QoS on the SmartEdge at the customer level. The
primary differences are internal, based on the SSR hardware architecture.
Customer-level features differ only in that there are fewer supported features
on the SSR and minor differences, such as supported ranges and valid values
for some commands.
Figure 83 illustrates the line card components used by QoS to process ingress
and egress traffic.
Ingress traffic is processed by a line card and then forwarded across the
switch fabric to another line card where egress processing takes place. QoS
processing of ingress network traffic is performed in the PFE on the NPU
before it is transferred to the fabric access processor (FAP), which forwards
traffic across the switch fabric. Some internal QoS functions are performed
in the FAP to maintain end-to-end QoS performance. For example, the FAP
schedules traffic according to the PD-QoS priority and buffers traffic in virtual
output queues (VOQs) as it is sent across the fabric.
The NPU Task optimized processor (TOP) packet processor performs some
egress QoS functions and sends traffic through the NPU Traffic Manager (TM)
for scheduling out through the line card ports.
FAP can also function for egress traffic. Likewise, an egress FAP pertains to
functioning for egress traffic.
• QoS classification through class definitions and policy access control lists
(ACLs)
While performing in the egress role, the FAP controls traffic scheduling across
the fabric and to the NPU.
For internal system QoS functionality, the FAP performs the following tasks:
For egress traffic, the FAP provides the following internal system QoS functions:
The QoSMgr and QoSd modules provide the following QoS services on the
line cards:
0 QoS CLI and Data Collection Layer (DCL)—Includes the CLI parse
chain definitions for all QoS CLI show and configuration commands
and their command handler functions, which forward the resulting
events to the Router Configuration Module (RCM) by way of the DCL.
• QoS Forwarding Plane (FWD QoS)—In general, the design and operation
of the RBOS forwarding plane software is beyond the scope of this
document (as is FABL). However, it is expected that the provisioning
events signaled over the QoSd -> FABL and FABL -> FALd interfaces will
ultimately be driven by, closely mirror, and reference elements derived from
the QoS PI -> PD interface on the RP.
7 Event queued to QoSd—The QoSMgr queues a record with all the relevant
change information, including cookies, for delivery to QoSd if and when
the configuration transaction is committed. The QoSMgr also registers a
callback to back out any required RTDB and PD-specific changes in the
event that the transaction is aborted.
10 FALd API invoked—FABL code on the line card must take the configuration
event messages received from QoSd and invoke the appropriate PD
forwarding APIs to implement the changes, supplying the relevant
PD_COOKIE objects that contain all the allocated resource IDs and other
platform-specific information needed to enact the configuration change.
Each circuit has a single PFE ID. A PFE ID represents and abstracts all
the necessary hardware devices that might need to apply QoS services in
a particular direction (ingress or ingress) for a particular physical device (for
example, non-aggregate). The PFE ID cannot change for the lifetime of the
circuit. If, instead, there are multiple network processor units (NPUs) handling,
for instance, the egress packet forwarding path for a single VLAN, the PD
domain presents the NPUs to the PI control plane as a single device.
• Circuit-level attributes:
0 Propagated settings
In the Ericsson IP Operating System, the RCM tracks which PFEs are
associated with each circuit and tracks when QoS policies and secondary
configuration objects must be referenced on a particular PFE.
The first time that a QoS object is bound to a circuit hosted on a particular PFE,
the RCM invokes RP-QoS-PD APIs to validate the object’s parameter for the
specific PFE and to allocate resources needed to instantiate the object on
the PFE. If the instantiation operation fails due to parameter incompatibility or
insufficient resource, the circuit binding operation fails and the appropriate error
is signaled back to the provisioning entity (usually AAA or the CLI). Sources of
PFE scope information include:
The following commands can be used to verify that the QoS and ACL
configurations have been downloaded to the line cards:
• Class maps
• Class definitions
• Policy ACLs
• Forward policies
• Mirror policies
RP QoS creates only one concurrent instantiation of a QoS object per unique
slot/PFE-id instance when:
• The same PFE ID is associated with a physical circuit instance for both
egress and ingress purposes.
QoS signals the object creation to PD-RP-QoS API for validation and resource
allocation purposes only one time for both ingress and egress, and similarly
sends only one object creation message to FABL (or CLSMgr for ACL).
The QoS control plane is responsible for fulfilling all dependencies and
guaranteeing that the provisioning messages are delivered to the FABL for
each PFE in the correct order.
0 Instantiate the QoS policy on the PFE(s) that are referenced by the
circuit.
0 Provision the bindings that inherit from the root binding above.
0 If required by any resulting cookie, update the binding for each circuit
that inherits the root binding.
0 When all circuit bindings that reference a QoS policy have been
removed from a PFE, the QoS policy can be removed from the PFE.
0 When all policies that reference a QoS secondary object have been
removed from a PFE, the secondary object can be removed from the
PFE.
• Almost all current QoSd -> PPA/FABL messages have been extended to
include the relevant PFE ID and QoS PD_COOKIE object of the appropriate
type.
3 Instantiate the policy on the PFE if the PFE has not yet referenced it.
4 Instantiate the combined binding of the policy plus all relevant secondary
objects to the circuit on the PFE.
5 Signal the circuit binding record to QoSd with any PD_COOKIES associated
with the above instantiation and allocation operations.
6 Send an individual creation message to FABL for the PFE for each new
object instantiation and the binding message itself, each including any
PD_COOKIE.
When a CLI change is aborted rather than committed, the relevant PD-specific
functions are invoked to back out the change on the PD side. The PI code
supplies both the current cookie that reflects the aborted change and the
original cookie that reflects the desired state to return to. QoSMgr also cleans
up any of its own memory or RTDB records or state associated with the
aborted changes. For static circuits, cookie information is stored in an extended
qos_media_conf_t binding record in the RDB.
The SSR associates a 6-bit internal PD (Packet Descriptor) QoS value with
each packet, that is initialized when a packet is received on ingress and which
remains associated with a packet as it transits the forwarding plane. The upper
3 bits hold the priority and the lower 3 bits hold the drop precedence.
• On MPLS circuits—The EXP value from the relevant MPLS label is copied
to the upper 3 bits of PD QoS (inverting the upper 3 bits if the internal
representation is zero-highest) and the lower 3 bits are set to zero.
Table 10 defines the QoS propagation types for supported ingress circuits
and forwarding types. As noted above, only one ingress propagation type
is applied according to precedence rules, type of packet received, circuit
type, and configuration. For example, a label-switched (MPLS) packet could
propagate from an IP packet or Ethernet frame by configuring the use-ip or
use-ethernet command within the MPLS class map.
Table 11 describes the QoS propagation types for egress circuit and forwarding
types.
In the Ericsson IP Operating System, you can control the mapping between
packet header external priority and drop-precedence markings (IP TOS/DSCP,
Ethernet 802.1q, 802.1p, and MPLS EXP) and their internal representations
in the PD QoS value.
On the SSR, ingress propagation of QoS markings can also be enabled per
port or per service instance. For incoming packets, this feature enables the
use of 802.1p priority bits in the 802.1q Ethernet header or MPLS EXP bits in
the outermost MPLS label to set the PD QoS priority value that determines
ingress priority treatment for each packet. Each NPU has four ingress queues
for incoming traffic. The PD QoS priority value assigned to a packet determines
which of the four queues the packet is admitted to. Each PD QoS priority value
has a fixed mapping to a queue, as shown in Table 12.
The queues are serviced in strict priority order on each NPU. Each
highest-priority queue is fully serviced before the next lower-priority queue is
serviced. Under congestion, the highest-priority packets are most likely to be
forwarded, and the lowest-priority packets are most likely to be discarded due
to queue overflow. If the port-propagate commands are not configured, all
packets received on the port are treated as the lowest-priority traffic, for ingress
oversubscription purposes (they are assigned to PD-QoS priority value 0 and
ingress queue 3).
The exception is MPLS transit LSR, where the default MPLS propagation is
always applied. The default propagation for IP is to copy the full PD-QoS
to DSCP. For other propagation types, the default uses the 8P0D mapping
schema with eight priorities & zero drop precedence (DP) values. When
translating from a 3-bit value to a 6-bit value, the 3-bit value is copied to the
priority field and the DP is cleared to zero. Other mapping schema types reduce
the number of priorities represented to enable some DP values in the encoding.
For more information, see mapping-schema.
The SSR supports up to 128 class maps per line card for the marking types
listed above.
NP4-based cards use the instantaneous queue length for performing a queue
admittance test for a WRED Drop decision at the instant in which a packet
is to be enqueued. Whenever a packet is submitted for enqueuing, a queue
admittance test is performed to determine whether to enqueue the packet or
drop it. The instantaneous queue length is compared against the configured
WRED curve queue occupancy thresholds to determine which drop probability
value to use in the drop decision.
On the other hand, the RED algorithm devised by Floyd & Jacobson uses
a moving weighted average queue length in which the queue length is
sampled and averaged over time (giving higher weight to the most recent
value). For more information, see "Calculating the average queue length" in
http://www.icir.org/floyd/papers/red/red.html. Other NPUs in SSR might support
exponential weight where the weight value influences the responsiveness of
the admittance test to changes in the queue occupancy. The Floyd & Jacobson
RED algorithm applies a filter to reduce the effects of short-term traffic bursts.
This filter may be tuned by adjusting the value of the exponential weight.
However, NP4 does not perform this aspect of the algorithm and may be more
susceptible to dropping bursty traffic.To mitigate traffic loss, it is recommended
that you configure NP4 WRED curves with more lenient drop probability and
thresholds than you could select when used with an exponential weight.
Each queue can have up to three drop profiles, where the drop profile for a given
packet is determined by the PD-QoS value, as configured in the congestion
avoidance map. Per queue, a PD-QoS value can be assigned to only one drop
profile. However, multiple PD-QoS values can select the same drop profile.
Because a congestion avoidance map accommodates up to 8 queues, the map
can specify up to 8 queue depths, 8 exponential weights, and 24 drop profiles.
• Drop profile templates (which define the shape of the RED curve):
The limits for NP4 are: 256 scaling factors and 128 (ie 16*8) templates.
The overhead profile works in conjunction with a priority weighted fair queuing
(PWFQ) policy's configured rate maximum value. The PWFQ policy defines the
rate of traffic flow. The overhead profile defines the encapsulation overhead
and the available bandwidth on the access line. These attributes are associated
with an overhead profile:
• Layer 1 overhead = reserved bytes per packet—If the reserved bytes per
packet attribute is set to 12, 64-byte packets are shaped as if they were
76 bytes.
As of SSR, Release 12.2, the router supports binding QoS services to service
instances.
Layer 2 service instances are configured under Ethernet ports or link groups.
For a link group, and for any L2 service instances under that link group, QoS
bindings are applied per constituent.
The following QoS services are now extended to service instances on SSR (on
40-port Gigabit Ethernet (GE) and 10-port GE cards):
• Only two out-most layers of 802.1Q tags are read or updated for 802.1p
priority bits for ingress and egress propagation.
This functionality is extended using the existing architecture for Layer 3 circuits
(dot1q PVCs) with the following changes:
• Creating a service instance circuit triggers the creation of an h-node for the
circuit. In addition, newly created circuits inherit hierarchical, inherited,
policing, and metering bindings from the parent circuit (port or LAG), if the
parent circuit has such bindings. Unlike 802.1Q PVCs, service instance
circuits do not inherit from other 802.1Q PVCs.
• When you attach a dot1q profile to a service instance, the profile is sent as
an optional attribute of the service instance through FABL to the line card.
For 802.1Q PVCs, the profile is sent as part of the 802.1Q object to FABL.
• The profile can include information to propagate the 802.1p priority to and
from the PD-QoS priority marking. Unlike PVCs, the propagation from
the Ethernet priority can happen from the outer or inner tag. Similarly,
propagation to Ethernet can be done for outer, inner, or both Ethernet tags.
There are no architectural modifications from FABL, ALd or micro code changes
as part of this feature.
3.9.2 ACLs
3.9.2.1 Overview
Access control lists (ACLs) provide advanced traffic handling for incoming or
outgoing traffic. They contain rules that match subsets of the traffic to actions
that determine what is to be done with the matching traffic. The ACLs can be:
0 QoS policies, which are the interface to all QoS support in the system.
IP filtering ACLs contain permit and deny statements. Their order is significant,
because the rules are matched in the order that they are specified. They have
an implicit deny all statement at the end so that all traffic that is not explicitly
permitted is dropped. If a non-existent ACL is applied to an interface, all traffic
passes through.
Policy ACLs contain rules that map subsets of the traffic to classes. Their
order is significant, because the rules are matched in the order that they are
specified. Traffic that does not match any of the statements is assigned to
the default class.
• There is no support for packet logging when IP ACL filters are applied to
Layer 2 circuits.
• Ethernet ports
• L2VPN ports
• Transport-enabled circuits
The CLS module filters, classifies, and counts packets that are processed
using ACLs. CLS receives multiple ACLs from multiple components
(QOS/NAT/FORWARD/FILTER/PKT_DBG/RPF) and these ACLs are applied
to one or more circuits. Each packet is processed by a sequential lookup in
each of the configured ACLs.
• CLS RPSW PI—Manages the processing of ACLs and access groups into
ACL rule sets. An access group is a container that is bound to one or more
circuits. ACLs on the same circuit are separated into different access
groups by their service category. An ACL rule set is a set of ACLs applied
to the PFE as a single lookup list. It can contain part of an access group
(one or more ACLs within the access group), the whole access group, or
multiple access groups. The ACL rule set does not have to be on the same
circuit or even within the same service category.
ACL rule sets are built based on PD capabilities. The PD code provides
capabilities, validation, resource allocation, and utility functions to process
multiple ACLs into one or more ACL rule sets. A cookie for the ACL rule set
is returned by the PD library.
CLS RP PI also manages the distribution of the ACL rule sets to the
appropriate line cards. The ACL rule set can be reused by multiple circuits
within the same service on the same PFE. The distribution of the ACL rule
sets to the ACL FABL code is transactional in nature. This means that
ACL rule sets can be rolled back if not completed correctly (for example,
if CLS RP restarts).
• ACL FABL PI processes the following messages:
As the ACL rule set is received (using the download_msg), and based
on the information within the summary message, the ACL rule set
is decompressed and added to the database. If there is a FABL or
CLS restart during this phase, the database entry is removed and a
re-download of all ACL rule sets is requested.
0 CLS sends the done_msg when all download messages have been
sent, signaling to the FABL that the ACL rule set has been completely
downloaded.
At this point, FALd has added the ACL rule set to the hardware.
The ALd layer provides APIs and functionality that are platform-specific. For
CLS/ACLs, this layer interacts with drivers to configure the PFEs to perform
packet classification. It receives the ACLs in a well-known format. If necessary,
the ALd converts the ACLs into a hardware-dependent version of the ACLs.
The ALd also provides functions to extract information such as logs and
counters from the PFE.
PD can be considered part of the ALd that exists on the controller card and line
card. The PD Libraries provide resource management and reference IDs for
the ACL rule set. PD also provides special build functions to build ACL rule
sets into PD objects.
the ACL, FABL re-invokes the ALd APIs to add the ACL rule set to the PFE.
The ALd is expected to reply to additions after it completes this task. In this
case, the ACL rule set cannot be used until the reply has been received.
All requests to use the ACL rule set should be blocked until the reply has
been received.
• Applying ACLs to circuits—The CLS and ACL FABL process the bind_msg
message received from the RP to apply an ACL rule set to a particular
circuit. Under certain circumstances, the RP denies all packets (stopping
all traffic). Otherwise, it activates the ACL rule set in the PD code.
• Downloading the ACL rule sets failure—If a failure is detected during the
downloading of an ACL rule set, the ACL database entry is marked failed
and a failure message is sent to the RP. If the failure occurred before the
ACL database entry was created, a temporary entry is created and marked
failed. Because the failure message is asynchronous, bind messages could
have been received for the ACL rule set. Any bind messages are discarded.
Eventually a delete message is received from the RP to remove the ACL
rule set. If the failure occurred while the ACL rule set was being downloaded
to the database, the ACL cannot be removed from the ALd and therefore
no message is sent to the ALd. Instead, the ACL is removed from the ACL
database, and a reply message is sent to the RP to free up its resources.
• During a CLS restart—For ACL rule sets that are removed during a CLS
restart, the FABL receives a delete message. Because CLS does not
know about the circuit-to-ACL mappings, the FABL unbinds the ACL rule
set. After it is unbound, the ACL rule set is removed from the ALd. When
each ACL rule set is unbound, the ALd sends a reply to the RP to free
the resources on it.
• During an ALd or PFE restart, the FABL does not know what was processed
by the ALd or what information was lost. Therefore, the FABL reapplies all
configurations to the ALd and PFE for each corresponding PFE.
The ACL database stores the following information about ACL rule sets in
persistent memory:
• PD cookie
The ACL database is transactional and can roll back ACL rule sets before they
are completed. The ACL rule set-to-circuit mappings are kept in the PI data
structure associated with each circuit. This information is called a feature block.
During CLS restarts, because an ACL could have been removed without the
RP sending an unbind message, the CLS generates an unbind message that
is sent to the ALd, based on the linked list of circuits in the ACL database.
For each circuit, the feature block stores the list of ACL rule sets required for
the circuit. The list is created when bind requests are received from the RP and
contains enough information to re-create the bind request to ALd.
For configuration and verification information about VRRP ICR, see Configuring
Inter-chassis Redundancy.
For more configuration and operations information, see Configuring BGP for
Inter-Chassis Redundancy.
Note: The BGP-based ICR infrastructure on the SSR is generic and can be
used with multiple applications on the SSR. As of SSR 13B, EPG is the
only product using the infrastructure. The architecture is designed to
accommodate the addition of new applications in future releases.
• The ICRd now interacts with BGP through the BGP Lib.
• ICRd influences the attributes with which BGP may now advertise and
withdraw prefixes.
• Based on the preference level configured on the node, and the first VIP that
is added (on both nodes), ICRd will make a state determination with regard
to which chassis to bring up as active and which as standby
• TSM will include the ICR client library and will add and install VIPs from
the EPG application to ICRd.
• TSM APIs have been enhanced to support the installation of routes with
ICR prefix attributes (ICR or ICR tracked VIPs).
• TSM will now propagate ICR state and transport information for every
update received from ICRd back to the application, by means of callbacks.
EPG-TSM interactions
• EPG will now install ICR specific routes with the ICR or tracking attributes.
• These represent VIP addresses within a context that the application may
choose to track.
• EPG will now use the ICR transport library to send and receive information
between its peer nodes.
• The ICR library is now available on the SSC and provides for a reliable
transport interface.
• EPG/application components on the SSC may send out packets that will
be traffic steered to the SSC on the standby node, with the active node
being the initiator of the traffic.
• A new ICR steering table has been added and the application can manage
this steering table to retrieve the provisioned traffic slice forwarding table.
• A new PAP ID (the ICR PAPID) has been introduced. This is functionally
like a regular kernel PAPID.
• The line card NPU now supports steering of ICR packets by looking into the
ICR header and extracting the application ID and application type.
When two SSRs are part of a high availability cluster like BGP-based ICR,
MC-LAG, or VRRP ICR (and ARP synchronization is enabled), the ARP
daemon becomes a client of ICRlib and uses it to communicate with ARPd on
the ICR peer chassis. ARPd on the active and standby peers sends application
messages over ICRlib with ARP entries to be added or deleted.
To enable BRP based ICR active and standby nodes in an ICR pair to be
seen externally as one, they are configured with the same IP address. ICRd
manages the active or standby node states and all traffic flows to the active
node. Sessions are synchronized on both nodes, provided by ICR message
transmission and flow control.
For configuration and operation information about this ICR model, see
Configuring BGP for Inter-Chassis Redundancy.
When an ICR loopback interface is created, it includes the local and remote
IP addresses and local and remote UDP ports. ICR transport packets include
the destination card type (RPSW or SSC) to enable packets to be punted to
the RPSW card or steered to an SSC, based on traffic slice forwarding tables
(TSFTs).
When SSR nodes are configured in an ICR pair (in this model), TSM forwarding
changes to a more complex forwarding structure:
Figure 87 illustrates the traffic steering flow at the line card NPU.
To direct these forwarding streams, a service map has been added to the
TSFTs at each hosted next hop.
For unicast forwarding, TSMd interacts with RIB to install, remove, and
redistribute routes for components such as EPG in FIB.
• When changes occur, TSMd sends the prefix and tracking attribute to ICRd
using the ICR library API.
If peer detection is configured, BGP monitors the prefix state; if the active
node is not detected before the timer expires, BGP uses the callback
function to inform ICRd.
For more information, see Configuring Multichassis LAG and Event Tracking.
• All communicating interfaces between the MC-LAG node pair are enabled
for ICR.
In using both levels of peer detection logic, the primary method is BFD and
the secondary is ICR.
• A "split brain" situation (where both the Home Slot and Backup Home Slot
could be active) could occur if an ICR transport link-down event causes
a BFD-down event.
When you configure single-session BFD on the ICR transport link (between
SSR1 and SSR2 in Figure 88), the link state events are propagated to the
MC-LAG via the ETI. When the link group process receives a link down (false)
event, it takes action to renegotiate link availability, effectively influencing
a switchover. When a BFD-down event is received by the standby node,
the standby node takes over. If BFD is disabled, LGd uses the ICR state to
determine whether switchover is needed. BFD event configurations are sent to
ALd via RCM -> RIB ->FABL-BFD. If a BFD state changes in the forwarding
engine, ALd fetches the stored ETI object from its database and publishes it
to all subscribing processes.
If you enable sub-second chassis failover, the following process occurs when
the active links (1, 2, and 3 in Figure 88) fail; for example, if the power is
switched off in the active chassis. The system detects the active chassis failure
(single-session BFD over LAG detects ICR link down) and quickly reacts (BFD
propagates the event to LAG through the ETI). Finally, the system switches
over to the standby MC-LAG (LAG signals the remote switch to enable the link
toward SSR 2, and at the same time LAG enables the pathway from SSR 2
toward the trunk).
For double-barrel static routes with both next-hops reachable, if the router
detects a failure of the link to the primary next-hop, packet forwarding is
switched to the backup next-hop.
The router supports detecting the following failures in the primary path:
You can configure double-barrel static routes among multiple routes to a single
destination.
The following line cards do not currently support double-barrel static routes:
• A negation-flag for the ETI object. By default, line cards monitor the ETI
object and switch to using the backup when the ETI object is FALSE. If the
(non-configurable) negation-flag is set, then line-cards switch to using the
backup when the ETI object is TRUE.
• RIB adds the routes to FIB on line cards. If a client adds multiple routes
to the same destination with the same cost and distance, RIB treats it
as an ECMP route. RIB allows clients to add ECMP of double-barrels,
or mixed ECMPs where some paths are double-barrel and some are
single-barrel. RIB clients register for next hops and prefixes to be able to
return double-barrel next-hops where applicable.
To support Ethernet CFM over MC-LAG, two new communication paths have
been added to the router:
Figure 89
Figure 89 illustrates the flow of messages between SSR modules in the active
and standby peers, after LGd gets an update from CFM when a remote MEP
goes to the DOWN state.
LGd publishes an ETI event notifying the LAG is going active. FABL LACP
and FFN subscribes to the event and takes action accordingly. LGd does
not include CFM’s UP message in calculating whether the LAG meets the
configured min-link requirement.
CFM-FABL then takes the decision to publish the state change to LGd or
not, based on information stored in its database.
• If the number of active links becomes more than the number of min-links
and MC-LAG is supposed to go to active state, then LGd moves the
MC-LAG from standby state to active state and sends respective event to
the modules through ETI or ISM.
Ethernet CFM depends on each EVC being assigned a Home Slot by CFMd.
During operation, CFMd regularly synchronizes the local MEP information from
the remote MEP.
Single-session BFD over LAG is based on Home Slot and Backup Home Slot
definition and selection. With SSR release 12.2 and higher, changes to FABL
enable syncing up the BFD state from the Home Slot to the Backup Home Slot
(and also to non-Home slots). The goal is for BFD clients to not detect a link
failure within the LAG.
For each BFD session, the control plane picks one line card as the Home Slot,
and the PFE on that line card handles the transmission and reception of BFD
messages. A packet that arrives on a non-Home Slot for a session is redirected
to the Home Slot. It is forwarded using loopback adjacency processing (based
on a loopback adjacency stored in an internal packet header added to the
packet on ingress), which indicates that the Home Slot is to perform the
processing for the packet.
If the active Home Slot fails, the backup Home Slot becomes the new active
one and takes over BFD sessions quickly.
ALd maintains a global table and provides allocations of the Watchdog counters
and OAM real-time counters for all the keepalive protocols such as VRRP,
BFD, 802.1ag, CFM, and Y.1731. ALd maintains the permitted allocations for
each timer level (3.3ms, 10ms, 100ms, 1sec). When all available OAM timer
resources and pre-allocated watchdog counters are allocated, it returns an
appropriate error code for further allocation requests.
• Adding, deleting, and updating BFD sessions and their attributes in the
BFD table on NPU. The update BFD session API updates any change
from Home Slot to backup Home Slot. The update BFD session API also
communicates state information to non-Home slots to track BFD over LAG
sessions.
• Read/get one BFD table entry for a BFD session or read/get all BFD entries
from the BFD table in NPU.
• Processing BFD status update messages from NPU, triggering FFN events
(on session timeout or on session going down), and updating FABL-BFD
about status changes or remote-peer parameter changes.
BFD uses a homing table in the NPU (per ingress PFE) to fast switch from Home
Slot to backup when the Home Slot goes down. When the Home Slot of a BFD
session goes down, the BFD FFN client handler code in ALd PD (in all other
slots) toggles the control bit to make the backup Home Slot become active in all
20 Home Slot table entries corresponding to the original slot that went down.
Tracked-object-action
The action to be taken when a tracked-object event is
generated.
Tracked-object-event
The event.
Tracked-object-publisher
Entity that publishes the tracked-object-event. For SSR,
release 12.2, BFD is the tracked-object-publisher.
Tracked-object-subscriber
The entity that acts when it receives a
tracked-object-event. For SSR, release 12.2,
LAG is the tracked-object-subscriber.
For example, ETI can enable high-speed reactions to outages in MC-LAG, PW,
or LAG links. For example (see Figure 90):
1 Power fails in the active chassis, SSR 1, resulting in links 1, 2, and 3 being
down. SSR 2 must detect this in less than a second and switch over with
minimal data-path traffic disruption.
2 The BFD session over the link group between SSR 1 and SSR 2 detects
the inter-chassis links being down.
4 LAGd signals the remote Ethernet switch to begin sending traffic to SSR 2.
At the same time, LAGd enables the pathway from SSR 2 to the switch.
The states can be ‘up or down or some other state published by the publisher.
A compile-time list of states is supported in the system.
ETI interacts with the RPSW modules, RCM, Event Publishing, and Event
Subscribing in the following process:
3 RIB propagates this information to ALd PI BFD on the associated line card,
so that it can publish the events as they happen. The ALd PI BFD publishes
the current status of the tracked-object immediately after this.
6 LAGD registers the action handlers for the event with a subscription tag
(a quick reference for the event subscriber). For example, this could
correspond to the subscription-side LAG instance.
7 The ETI process registers with the ETI library as a subscriber for the event
so that it can maintain statistics and some generic optional actions, such as
logging and snmp-traps, for the event.
8 When the event publisher is ready to publish an event, it calls the ETI
library publishing API.
9 The send-side library distributes the event through the ETI transport. The
ETI transport delivers the event to LAGD and ETI process through TIPC
SOCK_RDM reliable multicast.
The ETI library calls the event subscriber action handler, with the
subscription tag.
10 A copy of the event is also sent to the ETI process on the RPSW card.
This is not a time critical or urgent message, and is delivered with regular
priority reliable messaging.
11 The ETI process handler is called, which maintains statistics, logs, traps,
and so forth.
To support the proper version of TIPC, the RPSW, line card, and SSC kernels
have been upgraded to Linux 3.0. The TIPC module is loaded when each
card comes up.
For an SSC, if the Platform Admin daemon (PAD) is notified about card level
critical faults (such as thermal or voltage) from CMS and other modules, it
sends a Card Down event before starting the card deactivation processing,
causing ETI to publish a Card Down message. The SSC Card Down condition
can also be triggered by a CLI configuration change, a kernel panic, an SSC
PM crash, or by pulling the SSC out of the chassis.
Note: Line cards are supported by FFN, which is not supported by ETI. For
more information, see Section 3.13.2 on page 220.
• The SSC kernel supports detection of kernel panics and sends CMB
messages to CMS.
• To support LAG on SSC for egress (SSC -> LC) traffic, SSC ALd subscribes
to receive line card FFN events. When it receives such events, it recovers
by sending packets to LAG constituents that are still reported to be up.
The FFN mechanism is not currently based on the ETI infrastructure; see
Section 3.12 on page 215. In the future, it may be becauseit is expected that
the ETI infrastructure could provide a much more flexible and generic interface
than the current FFN functionality, and will be able to cover multiple event types
and be easily extended to new events.
4 Administration
A.
You may also need to access the Linux kernel to perform the following actions:
The SSR system components are accessed through the primary and secondary
RPSW controller cards, the OK mode, and the Linux shell.
You can also access line cards and SSC cards to collect data and troubleshoot.
[local]Ericsson#start shell
sh-3.2#
The # prompt indicates that you are at the Linux shell level.
To log on to the standby RPSW card from the CLI, enter the following command:
[local]Ericsson#telnet mate
To log on to the standby RPSW card from the Linux shell on the active RPSW
card, enter the following command:
sh-3.2$telnet rpsw2
This example assumes that RPSW1 is active (if RPSW2 is active, enter
the telnet rpsw1 command). To verify the active card, enter the show
chassis card command.
To log on to a line card or SSC from the RPSW CLI (with the root password)
and open the line card ALd CLI, enter the following command:
local]Ericsson#start shell
sh-3.2$ ssh root@lc-cli-slot-num
root@lc-1[1]:/root> /usr/lib/siara/bin/ald_cli
1. Enter the reload command in exec mode from the console port.
3. If you typed ssr* within 5 seconds, the ok prompt appears. The system
sets the autoboot time limit to 5 seconds; however, during some operations,
such as a release upgrade, the system sets the time limit to 1 second to
speed up the process and then returns it to 5 seconds when the system
reboots. If you missed the time limit, the reload continues; start again with
Step 1).
You can access the Linux prompt and perform root-level troubleshooting or
recovery if you:
When the SSR is reloaded, and you type ssr within 5 seconds, a Linux bash
shelll (not the same shell as the one started by the start shell command) is
entered). The regular Linux prompt displays. From this shell, you can perform
various utilities and applications, if you have the correct permissions, as in a
standard Linux system.
A non-root user can perform many actions, execute most utilities, and run many
applications; however, only a root user can perform certain operations that
are privileged and affect the system functionality (for example, shutdown the
system, stop, start, or restart processes, use SSH).
1. Console logon—The user root can log on from the serial console after the
login: prompt. The default root user password is root. After a successful
root username & password logon, the exec_cli is executed and the Ericsson IP
CLI is available for SSR configuration. At this point, administrator configuration
is allowed.
2. SSH login—As per the SSH configuration, the user root is not permitted to
log on through the management interface via SSH. If the user attempts this,
the password will be prompted three times and each attempt will display a
permission denied error. All system administrators can log on via SSH if, and
only if, they provide the correct username and password combination. Once
logged on, the Ericsson IP CLI is started for the system administrator. To
access to the Linux shell from the Ericsson IP CLI, execute the start shell
command. At the Linux CLI, the user does not have root privileges.
the system administrator. To access to the Linux shell from the Ericsson IP
CLI, execute the start shell command. At the Linux CLI, the system
administrator is root and has root privileges. If you have a Linux user
configured in the system, you can connect to the chassis through SSH using
that user's information.
The SSR has two internal 16-GB internal disks. Storage is divided into four
independent partitions: p01, p02, /flash on the first disk, and /md on the second
disk:
• The p01 and p02 partitions are system boot partitions that store operating
system image files. One is the active partition and one is the alternate
partition.
The active partition always stores the current operating system image
files. The alternate partition is either empty or stores the operating system
image files from another release. Only one alternate configuration can
be stored at a time.
The RPSW cards in the SSR ship with the current operating system release
installed in the active partition, either p01 or p02. The system loads the
software release when the system is powered up.
• The /flash and /md partitions are internal storage partitions used for
managing configuration files, core dump files, and other operating system
files. The /flash partition is 8 GB in size and is primarily used for storing and
managing configuration files. The /md partition is 16 GB in size and stores
all kernel and application core files and log files.
You can also mount a USB flash drive in the external slot of an RPSW card
for transferring software images, logs, configuration files, and other operating
system files. The USB flash drive is not intended for continuous storage.
Note: The USB flash drive cannot be accessed when it is at the firmware
ok> prompt.
Each line card has one 2 GB internal storage disk that is partitioned in four
parts: /p01, /p02, /flash, and /var (/var/md).
To see the file system organization, use the show disk command. In the
following example, a USB drive has been inserted and mounted.
[local]Ericsson#show disk
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda1 3970536 1082032 2688396 29% /p01
rootfs 3970556 1429364 2341080 38% /
/dev/sdb1 15603420 1506468 13310588 10% /var
/dev/sda3 7661920 149600 7126176 2% /flash
/dev/sdc1 1957600 1551776 405824 79% /media/flash
When both legacy CLI and Ericsson CLI configurations are created on the SSR,
they are combined in a single configuration file.
A configuration file can have a text version and a binary version. The system
generates both versions when you enter the save configuration command
in exec mode.
By default, the system loads the binary version of the system configuration file,
ericsson.bin, from the local file system during system power on or reload. If
the binary version does not exist, or if it does not match the ericsson.cfg file,
the system loads the ericsson.cfg file. The ericsson.cfg file is loaded
on the system at the factory and should exist on initial power-up. However, if
the ericsson.cfg file has been removed, the system generates a minimal
configuration, which you can then modify.
You can modify the active system configuration in the following ways:
With an interactive configuration, you begin a CLI session and access global
configuration mode by entering the configure command in exec mode.
In global configuration mode, you can enter any number of configuration
commands.
The operating system supports comment lines within configuration files. To add
a comment to your configuration file, begin the line using an exclamation point
(!). Any line that begins with an ! is not processed as a command.
Note: For operations that request the use of a transfer protocol, such as FTP,
SCP, or TFTP, it is assumed that a system is configured and reachable
by the SSR.
For the customer instructions for which data to collect when submitting a
customer service request (CSR), see Data Collection Guideline.
5.1 HealthD
Health Monitoring daemon (Healthd) is a new feature that enhances the
debuggability of the SSR. Its two main purposes are monitoring the system
health and supporting operators in troubleshooting system issues. The
high-level functions of Healthd include:
• Detecting system problems before, when possible, and as they happen and
taking preconfigured actions
Note: For security reasons, Healthd refuses any connections to the Healthd
TCP port with a destination IP address other than the loopback
address. Therefore, you must connect to Healthd from the local node
with proper authentication. For further protection of the RPSW card,
you can also independently configure filters or ACLs on the line cards
to deny connections from the outside to the Healthd port. It is assumed
that only Ericsson-trained personnel can use Healthd. Healthd’s user
interface and functionality must not be exposed to operators.
Healthd is initialized by the Process Manager (PM) when the system is coming
up, with the following process:
3 After the actions are loaded, the generic configuration script runs and loads
the Healthd functionality, which applies to all Release 12.2 SSR nodes.
The script configures Healthd events that run on a periodic basis. The
active RPSW and standby RPSW scripts can be different; you set a global
variable to determine which script to load.
5 When the generic and per-node scripts have loaded and run, you can log in
to the Healthd console and add new configurations and operations or run
troubleshooting scripts.. A history function helps you save the configuration
in a file.
Healthd is built on top of the UTF core library to allow reuse of code developed
by various teams to configure and query their own daemons. This architecture
is summarized in the following figure.
5.2 Logging
The operating system contains two log buffers: main and debug. By default,
messages are stored in the main log. If the system restarts, for example, as a
result of a logging daemon or system error, and the logger daemon shuts down
and restarts cleanly, the main log buffer is saved in the /md/loggd_dlog.bin
file, and the debug log buffer is saved in the /md/loggd_ddbg.bin file. You
can view the contents of the main log file using the show log command in
exec mode.
Persistent and startup log files are stored by default in /md/log. In the
following example, the dir command shows the log files stored in that location:
[local]Ericsson#dir /md/log*
Contents of /md/log*
-rw-r--r-- 1 root root 16 Jun 05 05:26 /md/loggd_ddbg.bin
-rw-r--r-- 1 root root 394312 Jun 05 05:26 /md/loggd_dlog.bin
-rw-r--r-- 1 root root 2156739 Jun 05 23:09 /md/loggd_persistent.lo
-rw-r--r-- 1 root root 9751732 Jun 01 02:34 /md/loggd_persistent.lo
-rw-r--r-- 1 root root 261145 Jun 05 23:09 /md/loggd_startup.log
-rw-r--r-- 1 root root 346712 Jun 05 05:26
/md/loggd_startup.log.1
Collect system logs from both the active and standby route processor/switch
(RPSW) cards to attach to a CSR. The files are named messages.x.gz.; they
can be found in the /var/log directory through the Linux shell mode. The log file
must include the time of the failure. Time stamps before and after the event
occurred must also be included in the CSR. It is important to verify exactly in
which file the actual failure is, because the active message log file is eventually
overwritten. For example, the file can be in /var/log/messages.2.gz instead of
current message log. Verify the logging configuration on the router by collecting
the output of the show configuration log command.
Note: You cannot use the show log command to display the contents of
the debug buffer, unless you enable the logging debug command
in global configuration mode. But enabling the logging debug
command can quickly fill up the log buffer with debug and non-debug
messages. To prevent the main buffer from filling up with debug
messages and overwriting more significant messages, disable the
logging debug command in context configuration mode.
By default, log messages for local contexts are displayed in real time on the
console, but non-local contexts are not. To display non-local messages in real
time, use the logging console command in context configuration mode.
However, log messages can be displayed in real time from any Telnet session
using the terminal monitor command in exec mode (for more information,
see the command in the Command List).
All log messages contain a numeric value indicating the severity of the event
or condition that caused the message to be logged. Many log messages are
normal and do not indicate a system problem.
Table 13 lists event severity levels in log messages and their respective
descriptions.
To disable the display of INFO messages on the console, use the no form
of the commands.
To prepare for troubleshooting, collect system logs from both the active and
standby RPSW cards. The files are named messages.x.gz. They are
located in the /var/log directory through the Linux shell mode (see Section
4.1.1 on page 222). The log file includes the time of the failure. Timestamps
before and after the event occurred must also be included in the CSR. It
is important to verify exactly in which file the actual failure is, because the
active message log file is eventually overwritten. For example, the file can
be in /var/log/messages.2.gz instead of the current message log. Verify
the logging configuration on the router by collecting the output of the show
configuration log command.
For information about collecting logs and show commands for troubleshooting,
see Section 5.4 on page 256.
Be sure that you have logging enabled on the console during a reload or
switchover. To turn on event logging, configure logging in global configuration
mode.
Note: Enabling event logging with these hidden commands can be very useful
in troubleshooting, but the volume of data produced can impact SSR
performance. Disable these commands after troubleshooting.
The since, until, and level keywords are only available after specifying
the active keyword or the file filename construct.
The show log active all command prints all current active logs in the
buffers. When the buffer is full, the log is wrapped out of the buffer and written
into a series of archive files named messages.x.gz. These files are located
in the /var/log directory through the NetBSD shell mode, as shown in the
following example.
[local]Ericsson#start shell
#cd /var/log
#ls -l
total 56
-rwxr-xr-x 1 11244 44 0 Jun 3 21:37 authlog
-rw-r--r-- 1 root 44 12210 Aug 12 18:46 cli_commands
-rw-r--r-- 1 root 44 415 Aug 12 18:47 commands
-rwxr-xr-x 1 11244 10000 1178 Sep 6 17:58 messages
#less messages
Sep 6 07:43:36 127.0.2.6 Sep 6 07:39:52.327: %LOG-6-SEC_STANDBY:
Sep 6 07:39:52.214: %SYSLOG-6-INFO: ftpd[83]:
Data traffic: 0 bytes in 0 files
Sep 6 07:44:51 127.0.2.6 Sep 6 07:39:52.328: %LOG-6-SEC_STANDBY:
Sep 6 07:39:52.326: %SYSLOG-6-INFO: ftpd[83]:
Total traffic: 1047 bytes in 1 transfer
When you enter the reload command from the CLI, or the reboot command
from the boot ROM, the system copies its log and debug buffers into the
following files:
/md/loggd_dlog.bin
/md/loggd_ddbg.bin
As an aid to debugging, you can display these files using the show log
command:
By default, the timestamps in all logs and debug output are accurate to the
second. You can configure accuracy to the millisecond by entering the following
commands.
[local]Ericsson#configure
[local]Ericsson(config)#commit
When the controller cards cannot boot up, you cannot access them remotely.
To determine the cause, collect the information shown in the console while
the controller card is booting.
1. Connect the console cable between your PC and the RPSW controller
card console port.
2. Set connect parameters like baud rate, data bits, parity, and stop bits
correctly (for XCRP4, typically set them as 9600, 8, N, and 1, respecitively).
4. Start the controller card by powering on the power supply or inserting the
card back into the slot.
You can view the ISP log in the CLI using the show isp-log command, or you
can extract the ISP log from /flash/isp.log using the copy command with
the scp keyword. The ISP log is persistent across switchovers and reboots.
When the ISP log file reaches the size limit you set with the isp-log size
command, the system stops writing log entries in the file, logs an entry in the ISP
file stating that the file is full, and displays the following system error messages.
To resume logging entries in the ISP log file, extract the ISP log file using the
following command:
copy /flash/isp.log scp: //user@hostname/isp.log clear
Note: This command clears the isp.log file after it is successfully copied to
another location, enabling ISP logging to resume. If you disable the ISP
log or change the size limit, the system removes the existing ISP log
file. Also, if you change the ISP log file size limit to a lower setting than
the current file size, the system deletes all entries from the ISP log file
You can also use the copy command with the tftp keyword for extracting
the file.
You can use the information in the ISP log to manually compute system
downtime and other statistics or, in the event of a problem, you can send the
extracted file to your support representative for analysis.
• Event type. See Loggingfor the specific event types in the ISP log file.
• Event timestamp. The time that the event occurred in Universal Time
Coordinated (UTC) format.
• Trigger method. If a user performed the action, the ISP log records the
trigger method as manual. If the system performed the action, the ISP log
records the trigger method as auto.
To collect basic data for submitting or escalating a TR, perform the following
steps.
The basic macro runs the commands listed in Table 14, grouped by focus.
2. For specific processes or the SSC1 card, run the command a second time
with an appropriate keyword. For example, for AAA problems, enter the
show tech-support aaa command.
Optionally, you can collect other data relevant to the problem. See Table 15.
Note: Whenever possible, run these commands when the problem is still
present. If the issue is related to traffic, counters, or other changing
elements, run the command again after a short interval (3–5 minutes).
Note: ASE, ATM, Flowd, L2TP, Mobile IP, PPP, PPPoE are not supported on
the router.
For information on the SSR functions covered by the basic command (without
any keywords), see Table 14.
For the procedures to capture the output of the command, see Section 5.3.1
on page 237.
Table 15 describes the commands included in the command with the keywords.
It also includes the macro names run with each keyword.
Note: The show tech support ase command is the same as the
ase-tech macro.
• To save your CLI session to a file on your computer, use the capture or
logging function in your terminal emulation software.
• Use the UNIX script command on the terminal server before logging on
to the router and running the show command to save the output to a file in
your working directory.
To save the output of the show tech-support command to /md and then to
an external drive:
To use the script command to save the output to a file in your working
directory:
1. Access the router from a UNIX environment (for example, from a terminal
server), and enter the script filename command.
working-directoryscript show_tech.log
Script started, file is show_tech.log
working-directorytelnet 10.10.10.2
Trying 10.10.10.2...
Connected to isp-224.
Escape character is '^]'.
isp-224
login: admin
Password:
[local]isp-224#
[local]isp-224#term len 0
[local]isp-224#show tech-support
4. When the command has completed and the CLI prompt appears,
enter the exit command twice to exit the router and then the script.
The script completes with the message, Script done, file is
show_tech.log.
For information about collecting data for troubleshooting, see Data Collection
Guideline.
5.5 Debugging
The Ericsson IP Operating System includes many debugging messages to
troubleshoot system processes. By default debugging is not enabled because
of performance impact, but you can enable it when needed for troubleshooting.
Debugging in the router can be a context-specific task or a context-independent
(global) task.
Debugging messages are sent to the syslog, console, or log files, depending on
what is configured. For more information, see Logging.
To debug all contexts on your router, use the system-wide local context. You
see debug output related to this context and all contexts running on the router.
For example, to see all OSPF instances on the router, issue the debug ospf
lsdb command in the local context.
When you debug from the local context, the software displays debug output
for all contexts. When a debug function is context specific, the debug output
generated by the local context includes a context ID that you can use to
determine the source of the event (the context in which the event has its origin).
You can then navigate to the context that contains the event and collect
additional information to troubleshoot it.
The following example displays debug output from a local context. The debug
output generated using the show debug command includes the context ID
0005, which is highlighted in bold. To find the source of the debug event (the
context name) for context ID 0005, issue the show context all command.
In the Context ID column, look for the context ID with the last four digits
0005—in this case, 0x40080005, which indicates that the source of the debug
event is context Re-1.
[local]Ericsson#show debug
OSPF:
lsdb debugging is turned on
[local]Ericsson#
Apr 18 12:21:04: %LOG-6-SEC_STANDBY: Apr 18 12:21:04: %CSM-6-PORT:
ethernet 3/7 link state UP, admin is UP
Apr 18 12:21:04: %LOG-6-SEC_STANDBY: Apr 18 12:21:04: %CSM-6-PORT:
ethernet 3/8 link state UP, admin is UP
Apr 18 12:21:05: %CSM-6-PORT: ethernet 3/7 link state UP, admin is UP
Apr 18 12:21:05: %CSM-6-PORT: ethernet 3/8 link state UP, admin is UP
Apr 18 12:21:05: [0002]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72
Apr 18 12:21:05: [0003]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.2
Update Router LSA 200.1.2.1/200.1.2.1/80000009 cksum ce79 len 36
Apr 18 12:21:05: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3
Update Sum-Net LSA 0.0.0.0/200.1.3.1/80000001 cksum bb74 len 28
Apr 18 12:21:05: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3
Update Router LSA 200.1.3.1/200.1.3.1/8000000a cksum 142 len 36
Apr 18 12:21:05: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72
Apr 18 12:21:05: [0003]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72
Apr 18 12:21:06 [0005]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
//Associated with Context ID 0x40080005. This is context specific output,
in this case, context Re-1.
----------------------------------------------------------------
[local]Ericsson# show context all
Context Name Context ID VPN-RD Description
-----------------------------------------------------------------
local 0x40080001
Rb-1 0x40080002
Rb-2 0x40080003
Rb-3 0x40080004
Re-1 0x40080005 // The source of the debug event for
Re-2 0x40080006 // Context ID 0005 is context Re-1.
Re-3 0x40080007
[local]Ericsson#
The current context affects the output of some debug commands. For example,
the debug ospf lsdb command can be context specific because multiple
contexts can exist, each running its own protocols. In this example, you see
only the OSPF debug output from context MyService. If you run the same
command from the local context, you see output from all contexts that have
OSPF enabled. The context ID in the debug message logs shows all the
contexts for which this debug event is applicable. To debug a specific context
for OSPF, navigate to that context—in this example, MyService.
[local]Ericsson#context MyService
[MyService] Ericsson#terminal monitor
[MyService] Ericsson#debug ospf lsdb
OSPF:
lsdb debugging is turned on
[MyService]Ericsson#
Feb 27 15:11:24: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 1.1.1.1/1.1.1.1/8000000c cksum ba60 len 36
Feb 27 15:11:24: [0001]: %OSPF-7-LSDB: OSPF-1: Delete
Net:192.1.1.1[1.1.1.1] Area: 0.0.0.0
Feb 27 15:11:24: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 1.1.1.1/1.1.1.1/8000000d cksum b861 len 36
Feb 27 15:12:09: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Net
LSA 192.1.1.1/1.1.1.1/80000002 cksum 1b4a len 32
Feb 27 15:12:09: [0001]: %OSPF-7-LSDB: OSPF-1: Delete
Net:192.1.1.1[1.1.1.1] Area: 0.0.0.0
Feb 27 15:12:09: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 2.2.2.2/2.2.2.2/80000005 cksum 6ec8 len 36
Feb 27 15:12:09: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 1.1.1.1/1.1.1.1/80000010 cksum 4f30 len 48
Feb 27 15:12:09: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Net
LSA 192.1.1.1/1.1.1.1/80000003 cksum 194b len 32
Feb 27 15:12:14: [0001]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update
Router LSA 2.2.2.2/2.2.2.2/80000006 cksum 237a len 48
• slot/port—13/1
• Authority (the application that made the circuit, in this case, ATM) is 1, the
level of circuit (in this example, a traffic-bearing Layer 2 circuit) is 2, and the
internal ID (a sequential uniquely assigned number) is 11.
In the second example, the debug aaa authen function in the local context
is system-wide because no context identifiers are displayed in the output. In
the third example, the local context displays context identifiers 0002, 0003,
and 0004, which indicates that the source of the LSA updates are context
specific. When you issue the show context all command, these contexts
are displayed as 0x40080002 , 0x40080003, and 0x40080004.
Figure 95
[local]Ericsson#terminal monitor
information, such as the task name, task owner, priority, and instruction queue
that were active at the time the core file was created.
Collect crash files from both the active and standby RPSW cards.
If a system crash occurred and core dump files were generated, you can
display the files using the following commands.
• show crashfiles
On SSRs, core dump files are placed in the /md directory in the /flash partition
(a directory under root FS mounted on the internal CF card), or in the
/md directory on a mass-storage device (the external CF card located in the
front of RPSW), if it is installed in the system.
Use double slashes (//) if the pathname to the directory on the remote server
is an absolute pathname. Use a single slash (/ ) if it is a relative pathname (for
example, under the hierarchy of the username account home directory).
To display information about existing core dump files, use the show
crashfiles command.
Note: This command does not display information about crash files that have
been transferred to a bulkstats receiver, which is a remote file server.
Note: You initiate a manual core dump by forcing a crash on any SSR
process or card. However, doing so can destroy other troubleshooting
evidence. Before generating core dumps, collect already existing crash
files and send them to your customer support representative.
To force a core dump for a process without restarting the process, enter the
following command:
5.7 Statistics
You can configure the router to produce bulk statistics to monitor your system.
For more information, see Configuring Bulkstats.
The SSR detects and disables unsupported transceiver, which can potentially
cause power and thermal problems in the router, when they are installed.
This feature does not affect any existing operation and service on supported
transceivers and does not require any configuration. Typically, users would see
it in the following use cases:
An alarm is raised to alert the user to the situation. When the user replaces
the transceiver with an Ericsson approved one, the alarm is cleared and
port operations are restored.
• When the user configures a port on a line card already inserted in the
chassis with an unsupported transceiver, the transceiver is disabled and
the port remains in the down state. An alarm is raised to alert the user to
the situation. When the user replaces the transceiver with an Ericsson
approved one, the alarm is cleared and port operations are restored.
This topic is also covered in the SSR Line Card Troubleshooting Guide.
Be sure that you have logging enabled on the console during a restart or
switchover. To turn on event logging, configure logging in global configuration
mode.
Note: Enabling event logging with these hidden commands can be useful
in troubleshooting, but the volume of data produced can impact
performance. Disable these commands after troubleshooting.
1. Enter global configuration mode and enable event logging with retention.
5.10.1.2 After a Switchover, Does the Newly Active RPSW Controller Card Contain
Complete Information?
In the ISM commands in this section, clients are SSR processes that receive
information from ISM, and media back ends (MBEs) send information to ISM.
After a switchover, to examine the ISM events that were on the standby RPSW
card (and just became active), use the show ism global command.
• The standby RPSW card is synchronized with the current state of the active
RPSW in real time. To view the messages updating the standby RPSW,
enter the show ism client SB-ISM log or the show ism client
SB-ISM log cct-handle detail command. SB-ISM is the ISM
running on the standby RPSW. You can also use the show redundancy
command to examine the RPSW card status.
• To view client and MBEs registering and EOF received and sent files, enter
the (hidden) show sys status process ism-name command.
• To view the process crash information, enter the show proc crash-info
command.
• To view the ism core dump of the system, enter the show crashfiles
command.
• For consuming too much memory in the system, the output is Kernel
kills with signal 9, no core dump.
• If there are any link group problems, get the MBE and client circuit logging
of aggregate and constituent circuits. Enter the following commands to
view the link-group problems:
2. To view the events that were received by ISM, use the show ism global
event-in log detail or the show ism mbe log detail command.
3. To show when events were processed by ISM, use the show ism
global complete log detail command. You can filter it by a circuit
or interface.
• To view information about all clients, use the show ism client
command.
• To view detailed information about a specific client, use the show ism
client client-name detail command.
• To view information about blocking clients details, use the show ism
client client name det command, as in the following example:
In this example:
Note:
• Ensure that table version for each client is similar else the
higher priority client will blocks the lower priority clients from the
subsequent updates.
• View the IPC Q/sent/err/drop values, the Q and the sent values
should be the same, if the system is running substantially. They
might differ for short duration, but the sent value should be equal
to the Q value.
1. To view information about all MBEs, use the show ism mbe command.
2. To view information about a specific MBE, use the show ism mbe
mbe-name detail command.
The status of L3 v4 proto and L3v6 proto (ENABLED and UP in this case)
indicates whether the dual stack is up.
6. To examine the messages going out of ISM, use the show ism client
client-name log cct handle cct-handle detail or the show ism
client client-name log interface int-grid detail command.
7. To list the circuits on a router, use the show ism circuit command; this can
enable you to look up a circuit handle, as in the following example.
Caution!
Risk of system instability. Because restarting ISM has a major impact on
all modules and the process of resynching all modules after restart is time
consuming, only restart ISM on production systems that are already down or
during a maintenance window.
If you do need to restart ISM (if for example, a disconnect exists in the
ISM messages to and from the MBEs), you can restart the ISM process to
synchronize them. After the restart, all MBEs resend information to ISM. All
clients are then populated with this information. To restart the ISM process, use
the process restart ism command.
POD tests verify the correct operation of the controller cards, backplane, fan
trays, power modules, and each installed line card during a power-on or reload
sequence. These tests also run whenever a controller card or line card is
installed in a running system. The POD for each component consist of a series
of tests, each of which can indicate a component failure.
During each test, the POD display results and status. If an error occurs, the
test lights the FAIL LED on the failing card but does not stop loading the SSR
software. A backplane or fan tray that fails lights the FAN LED on the fan tray.
The maximum test time is 130 seconds: 60 seconds for a controller card, 10
seconds for the backplane and fan tray, and 5 seconds for each installed line
card. If the system has two controller cards, the controller tests run in parallel.
To display results from a POD, enter one of the following commands in any
mode:
In general, if a component fails to pass its POD tests, you might need to replace
it. Contact your local technical support representative for more information
about the results of a failed test.
POD tests are enabled by default in the SSR software. If they have been
disabled, you can enable them with the diag pod command in global
configuration mode. You can also set the level of POD to run on startup or
installation using the command with the level level construct. For more
information, see
You can use OSD to verify hardware status or isolate a fault in a field
replaceable unit (FRU).
If a component fails to pass POD or OSD tests, you might need to replace it.
Contact your local technical support representative for more information about
the results of a failed test.
Five levels of tests are supported, but not all cards support all levels of tests.
Table 18 lists the levels and types of tests performed and the components for
which the tests are supported on the routers.
The OSD tests verify the correct operation of the standby RPSW card, the
standby ALSW card, line cards, and switch fabric cards in the chassis.
Before testing chassis components, put each installed card in the OOS or OSD
state using one of the following commands.
You cannot test the active RPSW or ALSW card, but you can view the results
using the out-of-service-diag command. To execute OSD on active
RPSW or ALSW cards, you must change the state from active to standby using
the switchover command.
Note: The correspondence between the card name that appears in the CLI
and the line card type is found in the "Card Types" section of the
Configuring Cards document.
Note: If the level you select is not supported for the unit you want to test, the
tests run at the highest level for that unit.
Table 19 lists the available parameters for an OSD session of the diag
out-of-service command.
A session log stores the most recent results for each card in main memory and
also on the internal file system for low-level software. In addition, a history file
on the internal file system stores the results for the previous 10 sessions.
You can display partial test results while the tests are in progress. A notification
message displays when the session is complete. To view test results, enter the
show diag out-of-service command in any mode at any time. You can
display the latest results for a traffic or standby controller card from the log or
the results for one or more sessions from the history file.
Note: If you are connected to the system using the Ethernet management
port, you must enter the terminal monitor command in exec
mode before you start the test session so that the system displays
the completion message. For more information about the terminal
monitor command, see Basic Troubleshooting Techniques.
To display the results from OSD sessions, use one of the following commands.
You can enter the commands in any mode.
• Display results for all components from the last initiated session using the
show diag out-of-service command.
• Display results for line cards and switch fabric cards using the show diag
out-of-service card slot command.
• Display results for the standby RPSW or standby ALSW card using the
show diag out-of-service standby command.
• Display results for the active RPSW or active ALSW card using the show
diag out-of-service active command.
• Display results for the last n sessions with the show diag
out-of-service history n command. The latest session is displayed
first. You can list up to 10 sessions.
In general, if a unit fails a test, you should replace it. Contact your local technical
support representative for more information about the results of a failed test.
Table 20 lists the states of the LEDs when an OSD session runs on a line card
or standby RPSW card in an SSR chassis.
To clear or display the results from OSD sessions, perform the tasks
described in Table 23. Enter the clear diag out-of-service and
diag out-of-service commands in exec mode. Enter the show diag
out-of-service command, which can display results for up to 20 sessions
from the history log, in any mode.
For instructions to run OSD for chassis components, such as an RPSW card,
ALSW card, SW card, line card, or service card, and return them to the
in-service state, see General Troubleshooting Guide.
Glossary
AAA CMB
Authentication, Authorization, and Accounting Card Management Bus
AC COM
attachment circuit Common Operations and Management
Y
ACLs CPLD
Access control lists Complex Programmable Logic Device
R
ALd CSM
adaptation layer daemon Card State Manager
A
ALSW CSM
Alarm Switch Card State Module
ALSW
Alarm Switch card
IN
CSPF
Constrained Shortest Path First
IM
AMC DCL
Advanced Mezanine Card Data Communication Layer
ARPd DHCP
Address Resolution Protocol daemon Dynamic Host Configuration Protocol
EL
ATCA DMA
Advanced Telecom Computing Architecture Direct memory access
BFD DP
PR
BGP DSCP
Border Gateway Protocol Differentiated Services Code Point
BNG DU
Broadband network gateway Downstream Unsolicited
CIB eFAP
Counter Information Base egress FAP
CLS eLER
Classifier egress LER
CM EPG
Configuration Management Enhanced Packet Gateway
CMA ESI
Chassis Management Abstraction Enterprise Southbridge Interface
EXP iFAP
MPLS experimental priority bits) ingress FAP
FABL IFmgr
forwarding abstraction layer Interface Manager
FALd IGMP
Forwarding Adaptation Layer Internet Group Management Protocol
FAP IGMPd
Y
fabric access processor Internet Group Management Protocol daemon
FIB IGP
R
Forwarding Information Base Interior Gateway Protocol
FM iLER
A
Fault Management ingress Label Edge Router
FMM ILM
Fabric Multicast Manager
FRR
Fast Reroute
IPC IN
ingress label map
Inter-process communication
IM
FTMH IPG
fabric traffic management (TM) header Inter-Packet Gap
FTP IS-IS
EL
GE ISM
Gigabit Ethernet Interface State Manager
GPIO ITHM
PR
GPRS L2VPNs
general packet radio service Layer 2 Virtual Private Networks
GRE L2
Generic Routing Encapsulation Layer 2
GTP L3
GPRS tunneling protocol Layer 3
HRH L3VPN
Host Receive Header Layer 3 VPN
HTH LACP
Host Transmit Header Link Aggregation Control Protocol
iBGP LAG
internal Border Gateway Protocol Link Aggregation Group
LDP MRU
Label Distribution Protocol maximum receive unit
LER MW
Label Edge Router middleware
LFBs NBI
Logical Functional Blocks Northbound Interface
LFIB NEBS
Y
Label Forwarding Information Base Network Equipment Building Standards
LGd NHFRR
R
Link Group Daemon Next-Hop Fast Reroute
LM NHLFE
A
label manager Next-Hop Label Forwarding Entry
LP NPU
Local Processor
LSAs
Link State Advertisements
IN
Networking Processing Unit
NTMH
Network Processor TM header
IM
LSP OAM
Label-switched path Operation, Administration, and Maintenance
LSR OCXO
EL
MACs OFW
Media Access Controllers Open Firmware
MBEs OIFs
PR
McastMgr OSD
Multicast Manager out-of-service diagnostics
MFIB OSPF
Multicast Forwarding Information Base Open Shortest Path First
MIB PAd
Management Information Base Platform Admin daemon
MO PCI
Managed Object Peripheral Component Interface
MPLS PCI-E
Multiprotocol Label Switching PCI Express
MPLS-TE PD-QoS
MPLS traffic engineering Packet descriptor QoS
PEM QPI
Protocol Encapsulation Manager Quick Path Interface (Bus connecting CPU
chips on SSC card)
PEMs
Power Entry Modules RCM
Router Configuration Module
PFE
packet forwarding engine on the NPU RDB
Redundant (Configuration) Database
PHY
Y
Physical Interface Adapter RED
random early detection
PD
R
platform dependent RIB
Routing Information Base
PI
A
platform independent RP
Route Processor
PI-RP-QoS
Platform-independent RP QoS
PICMG
PCI Industrial Computer Manufacturers Group
RPF
IN
Reverse path forwarding
RPL
IM
Routing Policy Library
PIM
Protocol Independent Multicast RPM
Routing Policy Manager
PM
EL
PWFQ
priority weighted fair queuing RSVP
Resource Reservation Protocol
PWM
Pulse Width Modulation RTC
real time clock
QoS
Quality of service SAs
support agents
QoSd
QoS Daemon SATA
Serial Advanced Technology Attachment
QoSLib
QoS shared library SCB
Selection control bus
QoSMgr
QoS RCM Manager SCP
Secure Copy Protocol
SerDes TS
Serlializer/deserializer Traffic steering
SFTP TSM
Shell FTP traffic slice management
SI TX
service instance Transaction
SNMP UARTs
Y
Simple Network Management Protocol Universal Asynchronous Receiver/Transmitt
ers
SPI
R
Service Provider Interface UDP
User Datagram Protocol
SPI
A
system packet interface UTC
Universal Time Coordinated
SSC
Smart Services Card
SSDs
solid-state drives
IN
VLLs
virtual leased lines
VLP
IM
SVLAN Very Low Profile
Service VLAN
VPN
SW Virtual Private Network
EL
TFTP
Trivial FTP
TM
(fabric) traffic management
TM
Traffic Management
TOP
Task optimized processor