Everything you wanted to
know about RISC-V
Paolo Bonzini
November June 2024
What is RISC-V?
●
A new(ish) RISC-based instruction set and
architecture
●
Started in 2010 at UC Berkeley
●
Should simple, clean, easy to understand…
●
… while supporting a wide range of applications
Timeline
●
2011: ISA specification published, now CC-BY
●
2012: first physical processor tapeout
●
2014: “Instruction sets should be free”
●
2015: RISC-V Foundation formed
●
2017: GCC and Linux port upstream
●
2019-2020: basic and privileged ISA frozen
Vendors
●
SiFive (Berkeley)
●
NVIDIA, Western Digital
●
FPGA vendors, e.g. Xilinx (AMD)
●
Rivos, Ventana
What does it look like?
●
Overall similar to MIPS or SPARC
– 3-operand instructions
– Limited size for immediates
– Memory operand available only for loads and stores
– No processor flags, including carry bit
– No 8- and 16-bit arithmetic and comparisons
●
Suitable for position independent code
Hello, world!
.data
hello: .ascii "Hello world!\n"
.text
.globl _start
_start:
li a0, 1 # load immediate
lla a1, hello # load local address
li a2, 13
li a7, 64 # write
ecall
Li a0, 0
li a7, 93 # exit
ecall
Hello, world!
.data
hello: .ascii "Hello world!\n"
.text
.globl _start
_start:
addi a0, zero, 1
auipc a1, %pcrel_hi(hello) # load high 20 bits
add a1, a1, %pcrel_lo(hello) # adjust low 12 bits
addi a2, zero, 13
addi a7, zero, 64 # write
ecall
addi a0, zero, 0
addi a7, zero, 93 # exit
ecall
32 general purpose registers
●
1 zero (zero, x0)
●
12 callee-save (s0-s11, x8/x9/x18-x27)
●
8 arguments (a0-a7, x10-x17)
●
7 caller-save (t0-t6, x5-x7/x28-x31)
●
4 “special” (ra, sp, gp, tp, x1-x4)
Hello, world!
.data
hello: .ascii "Hello world!\n"
.text
.globl _start
_start:
addi x10, x0, 1
auipc x11, %pcrel_hi(hello) # load high 20 bits
add x11, x11, %pcrel_lo(hello) # adjust low 12 bits
addi x12, x0, 13
addi x17, x0, 64 # write
ecall
addi x10, x0, 0
addi x17, x0, 93 # exit
ecall
Privileged architecture
●
“Control and status registers” (CSR)
●
Three operating modes (plus hypervisor)
– Machine, Supervisor, User
●
Three interrupt sources
– Software interrupt, timer interrupt, external interrupt
●
Each mode can delegate interrupts and
exceptions to the lower privilege
Main privileged instructions
●
CSR access
●
EBREAK (breakpoint)
●
ECALL (call higher level)
●
MRET / SRET (return to S or U respectively)
Important CSRs
●
Processor information ●
Exception cause and
(M only) saved PC
●
Processor status ●
Timer and cycle
●
Interrupt state counter
(pending/enabled) ●
Page table address
●
Scratch register
●
Exception vector
SBI (Supervisor Binary Interface)
●
Standard interface to M mode
– Timers
– Power management
– IPIs
●
Accessible with ECALL from supervisor mode
●
Open source implementation: OpenSBI
Kernel world switch
●
Swap user tp with ●
Restore registers
sscratch (CSRRW) from stack (incl. sp)
●
tp is now per-CPU base ●
Swap kernel tp with
●
Load kernel stack sscratch
pointer ●
Execute SRET
●
Save registers to kernel
stack
Extensions
●
Modular design with optional extensions
●
Example: RV64IMAFD
– 64-bit, full Integer ISA
– M = Multiplication/division
– A = Atomic operations + LL/SC
– FD = Floating point support (32-/64-bit)
Extensions
●
1x RV64IMAC
●
1x RV32IMFC
●
4x RV64GC_Zba_Zbb
– G = IMAFD
– Zba / Zbb = bit manipulation
instructions
More on extensions
●
Zname – because 26 extensions are not enough
●
Xname – vendor-specific
●
Sname – privileged extensions
●
Zicsr and Zifencei – almost universal
– CSRs might not be present on low-end hardware
– ifence.i added after ratifications
●
C – compressed encoding
Memory order
●
s390, x86: total store ordering
– Store-load can appear reordered to other CPUs
– Load-load, load-store, store-store preserved
●
ARM, PPC: relaxed memory order
– Manually specify acquire and release relationship
●
RISC-V: relaxed unless Ztso is present
Compressed encoding
●
16-bit instructions for higher code density
●
About half of the instructions can be
compressed
●
8% fewer bytes than x86-64, 28% fewer bytes
than ARMv7
●
But...
Compressed encoding
●
No instruction alignment
●
Uses ¾ of the encoding space
11
31 0
00 01
10
15 0
More extensions
●
Packed SIMD in general purpose regs (P, Zp*)
●
Vector (V, Zv*)
●
Bit manipulation and crypto (Zb*)
●
Control-flow integrity (Zicfi*)
Why so many extensions?
●
Original designers’ idea was to justify every
instruction quantitatively
– Also they didn’t know OSes very much
– Examples: Zacas (atomic compare-and-swap,
LL/SC was considered enough), Zihintpause
●
New applications, e.g. BF16 floating point
Why so many extensions?
●
Original idea was to use macro-op fusion
– Detect pairs of compressed of instructions
– Decode them as a single instruction
●
Didn’t quite work out
– Zba: address generation (shift-by-1-and-add)
– Zicond: rs2 != 0 ? rs1 : 0, rs2 == 0 ? rs1 : 0
●
It can only get worse
Enter profiles...
●
Profiles describe a baseline set of extensions
– RVA22 = Application processors, 2022
– RVM23 = Microcontrollers
●
Coming next: RVA23, RVB23
– RVA = General purpose OS (e.g., Fedora)
– RVB = Custom OS (e.g. Yocto)
●
Example difference: H (hypervisor extension)
How to try it?
●
HiFive boards cost several $100
●
Most cheap boards don’t run mainline kernel
●
QEMU (also supports dozens of extensions)
●
fedoraproject.org/wiki/Architectures/RISC-V
/Installing still in progress
●
Contacts: Richard Jones, Andrea Bolognani