PCIe
Transaction Layer
Outline
PCIe Basic
◦ Topology
◦ Configuration Header
◦ Enumeration
Transaction Layer
◦ Transaction Layer Packet(TLP)
◦ TLP Header
◦ TLP Type
◦ Flow control
◦ Virtual channel / Traffic class
◦ Ordering
PCIe Basic
◦ Topology
◦ Configuration space
◦ Enumeration
Topology
PCIe interfaces connected by a Link.
◦ Link : A point-to-point connection. Only two interfaces
can be connected on link, and no loop.
Component:
◦ Root Complex
◦ Switch
◦ Bridge
◦ EndPoint
Root Complex
Interface between CPU/PCIE bus/Memory
RC acts on behalf of the CPU to communicate
with the rest of the system.
Switch/Bridge
Switches allow more devices to be attached to a
single PCIe Port.
PCIe-PCI Bridges provide an interface to other
buses, such as PCI or PCI-X.
Endpoint
Endpoints act as initiators and Completers of
transactions on the bus.
Requester/Completer
◦ Requester : initiates requests
◦ Completer: Services requests
Configuration Header-1
There are registers in devices or bridges
that stores information or status of
devices.
The configuration space are called
Header.
◦ Type 0 : EP
◦ Type 1 : switch bridge
Configuration Header-2
Configuration software allocates memory
space for each enumerated devices
Software can acts with them by accessing
the memory location.
Each port(upstream/downstream) has
configuration header.
Configuration Header-3
• PCIe extended the reserved size of the memory
space called “Extended Configuration Space “ to
4K Bytes. (Space for PCI is 256 Bytes)
• Based on maximum of BDF, PCIe will costs
256MB memory space as a maximum.
• 4K*256(Bus)* 32(Device)*8(Function)
Enumeration
Enumeration SW searches the hierarchy for EPs, switch
bridges and gives them ID(BDF).
Bus number (Maxumum:256)
Device number(Maxumum:32)
Function number((Maxumum:8)
◦ Pri = Primary Bus Number
◦ Sec = Secondary Bus Number
◦ Sub = Subordinate Bus Number
Transaction Layer
Transaction Layer
◦ Transaction Layer Packet(TLP)
◦ TLP Header
◦ TLP Type
◦ TLP Routing
◦ Flow control
◦ Virtual channel / Traffic class
◦ Ordering
Layering Overview
Transaction Layer
◦ In response to requests from the Software Layer, generates
outbound packets.
Data Link Layer
◦ Is responsible for Link management and performs three
major functions:
◦ TLP error correction
◦ flow control
◦ Link power management.
Physical Layer
◦ The spec divides the Physical Layer discussion into two
portions:
◦ logical part : 8b/10 encode, scrambling, serializing…etc.
◦ electrical part : Driving differential signal
Transaction Layer Packet(TLP)-1
Transaction Layer Packet(TLP)-2
Types of requests
◦ Indicates the types of requests from requester.
ex. An endpoint wants to write memory, raises
and memory write request.
Routing
◦ Indicates the target of requests. Includes where the TLP
should be delivered.
Ordering
◦ When multiple requests reached a switch. Decide witch
one should pass first.
TLP Header
Format (Fmt)
Type
Traffic Class(TC)
2 or 3DW Could be changed
Attribute(Attr)
Lightweight Notification(LN)
TLP Hint(TH)
TLP Digest(TD)
Poisoned Data(EP)
Address Type(AT)
Length
TLP Header – Format & Type
Fmt & Type field represents the basic of this TLP.
TLP Header has two types, 3DW, 4DW or w/ prefix.
Fmt[2:0]: T T L
◦ Fmt[2] : If set, TLP w/ prefix. 9 8 N
◦ Fmt[1] : If set, TLP is 4DW, or 3DW.
◦ Fmt[0] : If set, TLP is with data payload.
Type[5:0]
◦ Field is encoded for type of TLP from TLP initiator. Ex. Read memory, write
configuration etc.
TLP Types-1
MRd
MWr
Memory
TLP types can be sorted roughly by 5 categories MrdLk
◦ IO Read/Write AtomicOps
◦ Read/Write data from/to an Legacy EP.
IORd
◦ Memory Read/Write Read/Write IO
◦ Read/Write data from/to main memory.
IOWr
CfgRd0
◦ Configuration Read/Write Type0
◦ Read/Write configuration register of Eps. CfgWr0
◦ Type0 for EPs, Type1 for bridge Configuration
CfgRd1
Type1
◦ Message CfgWr1
◦ RC uses Message TLP to control or read status of EP/Switch.
◦ This TLP type takes place of sideband signals of Legacy bus.
Msg
Message
MsgD
◦ Completion
◦ Indicate the TLP is serving the requester’s TLP. Cpl
Completion
CplD
TLP Types-2
4DW or 3DW
With data?
AtomicOPs
What’s the message
Posted & Non-Posted Requests
Requests can be separated by Posted and Non-posted.
Request Type
Posted requests Memory Write Posted
◦ The request don’t need a response(completion).
Message Posted
◦ Memory Write, Message request.
Memory Read
Non-posted
Memory Read Lock
Non-posted requests AtmoicOps Non-posted
◦ The request need a response(completion).
◦ IO Read/Write, Memory Read, Configuration Read/Write. IO Read
Non-posted
IO Write
Configuration Read
Non-posted
Configuration Write
TLP Routing
Address routing : The destination of TLP is targeted by address.
◦ Memory request
◦ IO request
◦ Message
ID routing : The destination of TLP is targeted by ID.
◦ Configuration request
◦ Completion
◦ Message
Implicit routing
◦ Message
Address Routing
Address routing used for
◦ IO
◦ Memory
Address should be size of 32 bits or
64 bits( over 4GB)
ID Routing
ID routing used for
◦ Configuration
◦ Completion
RC/Switch transmit TLP to a
proper target by the BDF
Implicit routing
Implicit routing used for
◦ Message
Message routing subfield Type[2:0]
◦ 000b : Route to RC
◦ 001b : Route by address
◦ 010b : Route by ID
◦ 011b : Broadcast downstream
◦ 100b : terminate at receiver
◦ 101b : Gather & route to RC
TLP Header – other field
Length : Payload size (unit DW)
Attr[2:1] : Related to ordering
TC [2:0] : such like priority, larger means
higher priority.
TD : If set, TLP has ECRC.
EP : If set, TLP is poisoned. T T L
9 8 N
AT : Address type
LN :Lightweight Notification
T8,T9 : Tag’s extension bits
Attr[0] : No snoop
TH : TLP process hint.
Traffic Class
During initialization, device driver communicates
software, decided TC values to use for each type of
packet.
The TC value defaults to zero so packets that don’t
need priority service won’t accidentally interfere
with those that do.
Traffic Classes that define eight priorities specified
by a 3-bit TC field within each TLP header (with
ascending priority; TC 0-7).
TLP Hint (TH)
Adding hints about how the system should handle TLPs targeting memory space can improve latency and
traffic congestion.
With TH set
Attribute Field
Attr[2] : ID-Based Ordering
Attr[1] : Relaxed Ordering
Attr[0] : No Snoop
A
T T L AR
R
9 8
r N rr
No Snoop : The memory transaction TAG
doesn’t need to be updated to catches.
Lightweight Notification(1/2)
LN protocol provides a notification service for when
cacheline of interest are updated.
LN Requester (LNR) : a client subsystem in an
Endpoint that sends LN Read/Write Requests and
receives LN Messages.
T T L
9 8 N
LN Completer (LNC) : a service subsystem in the host TAG
that receives LN Read/Write Requests, and sends LN
Messages when registered cachelines are updated.
Lightweight Notification(2/2)
LN Read Example
1. an LNR sends an LN Read to a Memory Space
range that has an associated LNC
2. Requesting a copy of a cacheline.
3. The LNC returns the requested line to the LNR
and records that the LNR has requested
notification when that line is updated.
4. Later, the LNC notifies the LNR via an LN
Message when some entity updates the line, so
the LNR can take appropriate action.
TLP Header – Length
One TLP can transmit 4KByte as a
maximum.
00 0000 0000b represents 1024DW. T
9
T
8
TAG
To represents a no data payload TLP,
length field need to cooperate with DW
BE field.
TLP Header – DW BE
DW Byte Enable is 8-bit field.
Because PCIe bus accessing memory is DW-
aligned, BE indicates which bytes are valid
for the head and tail of data stream.
Ex. Byte 0 and Byte 1 of First DW are not
accessed, 1st DW BE is 1100b. 00b means
Byte 0 and Byte 1 not be accessed.
Address Type (AT)
Address Type (AT) field is used to indicate the type of address
that is present in the request header.
00b : Address is untranslated
T T
01b : Address need to be translated into physical address. 9 8
TAG
10b : Address is translated into physical address.
Tag
Tag generated by Requester, and it must be
unique for all outstanding Requests that require a
Completion for that Requester.
T T L
Tag and Requester ID consist Transaction ID. 9 8 N
TAG
Transaction ID
IO request
IO Requests is made for Legacy devices.
TLP type filed 00010b = IO request.
Fmt[2] indicates the TLP if w/ data.
IO request is always 3DW.
IO request’s TC is always 000b.
Length for IO request always 1DW.
Last DW BE must be all 0.
Memory Request
Type can be:
◦ 00000b : Memory Read/Write
◦ 00001b : Memory Read Locked
Length indicates the data size of this
transfer.
◦ 10’h1 = 1DW
◦ 10’h2 = 2DW
◦ 10’h3ff = 1023DW
◦ 10’h0 = 1024DW(4KB)
The address is DW-aligned.
Configuration Requests
Only RC can initiate Configuration Request.
Configuration Request is routed by ID routing.
Bridge transfer Type 1 TLP to Type 0 TLP if it
reaches the bottom.
TC must be 000b.
Tag only used 4:0 (32 outstanding transaction).
But if Extended Tag bit is set. It supports 256.
Ext Reg Number & Register Number field:
Used for accessing configuration space.
Completions
Completion responds to non-posted Request and a 3 DW TLP.
Completion copies attributes of request and appends to
Completion’s header.
◦ Requester ID
◦ Tag
◦ TC
◦ Attribute bits
Completion status defines 4 status of completion
◦ 000b : Successful Completion(SC)
◦ 001b : Unsupported Request (UR)
◦ 010b : Configuration Request Retry Status(CRS)
◦ 100b : Completer abort(CA)
Byte Count : Remaining to satisfy a read request.
Message Requests
Message Request is used to replace sideband
signals in PCI.
All Message Requests uses 4DW header.
Message routing subfield Type[2:0]
◦ 000b : Route to RC
◦ 001b : Route by address
◦ 010b : Route by ID
◦ 011b : Broadcast downstream
◦ 100b : terminate at receiver
◦ 101b : Gather & route to RC
Message Code
This spec defines the following groups of Messages:
◦ INTx Interrupt Signaling
◦ Power Management
◦ Error Signaling
◦ Locked Transaction Support
◦ Slot Power Limit Support
◦ Vendor-Defined Messages
◦ Latency Tolerance Reporting (LTR) Messages
◦ Optimized Buffer Flush/Fill (OBFF) Messages
◦ Device Readiness Status (DRS) Messages
◦ Function Readiness Status (FRS) Messages
◦ Precision Time Measurement (PTM) Messages
Flow Control-1
Virtual Channels are hardware buffers that act
as queues for outgoing packets.
Flow Control check that the another side of
the link’s buffer is able to accept the TLP.
Flow Control mechanisms can improve
transmission efficiency if multiple Virtual
Channels (VCs) are used.
Flow Control-2
Header
Each VC Flow Control buffer at the receiver is managed
for each category.
◦ There are 6 types of buffer for each VC. Data
Three categories :
◦ Posted Transactions
◦ Non-Posted Transactions
◦ Completions
Credit is the unit for VC.
◦ Different types TLP, different size of credit.
◦ Ex. 1 unit for posted request header is 5DW, but for completion
header is 4DW
Minimum Flow Control Flow Control
Posted Request header(PH):
◦ 1 unit ,4DW HDR + Digest =5DW
Posted Request data(PD)
◦ Max_Payload_Size /16 bytes(credit)
◦ Ex. 1024byte/16, 64 unit
Non-Posted Request header(NPH)
◦ 1 unit ,4DW HDR + Digest =5DW
Non-Posted Request data(NPD)
◦ 1 unit. Credit Value = 4DW
Completion HDR (CPLH)
◦ 1 unit. Credit Value = 4DW
Completion Data (CPLD)
◦ Max_Payload_Size /16 bytes(credit)
Flow Control-3
Flow Control is a function of the Transaction Layer
and in charge between Transaction and Link Layer.
◦ Link and Physical layer should process DLLP.
Credit Space info
Flow Control use DLLP(Data Link Layer Packet) to
communicates with another side. And DLLP which is
sent by receiver includes buffer space info.
Responsibility
◦ Devices Report Available Buffer Space
◦ Receivers Register Credits
◦ Transmitters Check Credits
Data Link Layer Packet(DLLP)
Byte0[5:4]
◦ 00b : Posted
◦ 01b : Non-posted
◦ 10b : Completion
VC ID
◦ Indicates the VC will be updated
HdrFC field
◦ It’s 8-bit field and support 127 unit as a maximum.
DataFC field
◦ It’s 12-bit field and support 2047 unit as a maximum.
Flow Control-4
Transmitter Elements
◦ Transactions Pending Buffer
◦ Credits Consumed counter
◦ Credit Limit counter
◦ Flow Control Gating Logic
Receiver Elements
◦ Flow Control Buffer
◦ Credit Allocated
◦ Credits Received counter
Counters Roll Over
Virtual Channel
VCs are hardware buffers that act as queues for
outgoing packets.
Each port must include the default VC0, but may
have as many as eight (from VC0 to VC7).
The higher index one got the higher priority.
VCs configuration registers called the Virtual
Channel Capability Block.
Virtual Channel Capability Block
What information includes?
◦ VC count
◦ VC ID
◦ TC/VC Mapping
◦ VC Arbitration Capability
◦ Port Arbitration Capability
◦ Arbitration table
TC/VC Mapping
Configuration software set the TC/VC Map during
initialization.
Configuration software assigns an ID.
Configuration software determines the Number of VCs to
be Used.
Rules regarding the TC/VC mapping:
◦ TC0 will automatically be mapped to VC0. Other TCs may be mapped
to any VC. VC0 which is always hardwired.
◦ A TC may not be mapped to more than one VC.
VC Arbitration
VC arbitration determines the order of packet
transmission based on TC number.
Software can choose arbitration policy provided by
hardware.
VC capability registers provide three basic VC arbitration
◦ Strict Priority Arbitration
◦ Group Arbitration
◦ Hardware fixed Arbitration
Strict Priority VC Arbitration
The default priority scheme is based on the inherent
priority of VC IDs(VC0=lowest priority and VC7=highest
priority).
Strict priority arbitration enables minimal latency for
high-priority transactions.
The mechanism is automatic and requires no
configuration.
Strict priority has the potential to starve low-priority
channels for bandwidth.
Group VC Arbitration-1
Port VC Capability Register 1 can select the boundary
to separate Low-Priority and High-Priority.
High-Priority applies Strict Priority, and Low-Priority
can choose priority scheme by software.
Group VC Arbitration-2
Selection for Low-Priority Arbitration Scheme
◦ Hardware Fixed : a hardware-based method and requires no
additional software setup.
◦ Weighted Round Robin : Software loads a table that to the
register field. And VC entry will repeatedly scan all table
entries in a sequential fashion and send packets from the VC
specified in the table entries.
Group VC Arbitration-3
WRR supports different number of phases.
Port Arbitration
For Switch ports and root ports, Packets from
multiple ports can all target the same VC in the
same outgoing port, arbitration is needed to
access to that VC.
Port arbitration will usually need software
configuration for each virtual channel supported.
Port Arbitration Policy
Software can set up the port arbitration table that
table will be scanned, each phase specifies the
port number from which the next packet is
received.
WRR Arbitration Mechanisms
◦ Access ports according to the PAT(Port Arbitration
Table)
◦ If the scanned port has no transaction, this port will
be pended and scan the next phase immediately.
Time-Based, Weighted Round Robin
(TBWRR)
This mechanism is required for isochronous support.
Rather than immediately advancing to the next phase, the
time-based arbiter waits until the current virtual timeslot
elapses before advancing.
This ensures that transactions are accepted from the
ingress port buffer at regular intervals.
The length of the timeslot currently has the value of 100ns.
Port Arbitration
VC0 Port Arb.
VC1 Port Arb.
Transaction Ordering-1
PCI Express ordering rules apply to transactions of the
same Traffic Class (TC).
Different TCs have no ordering requirement(unrelated).
Ordering relationships defined by the PCIe spec are
based on TLP type. TLPs are divided into three
categories:
◦ Posted
◦ Completion
◦ Non-Posted
Transaction Ordering-2
If TLP2 is sent with proper ordering setting,
TLP2 can be sent and don’t need to wait
for TLP1 finished.
Relaxed Ordering
Transactions are required to remain order while they go through buffers in bridges.
RO allows switches to reorder transactions to improve performance.
RO attribute bit set(Attr[1]), indicating that software verifies it to be unrelated to other transactions, and that
allows it to be re-ordered ahead of other transactions.
Attr[2] : ID-Based Ordering
Attr[1] : Relaxed Ordering
ID Ordering
Transaction from different EPs, there is no relationship
between them.
Software can enable the use of IDO by setting its
Device Control 2 Register.
Relaxed and ID ordering are applied within a same VC.
Ordering Rules Table
PCIe-PCI bridge must pass to prevent dead lock
Same Transaction ID not allowed to pass.