Lecture 04 RISC-V ISA
CSCE 513 Computer Architecture
Department of Computer Science and Engineering
Yonghong Yan
[email protected] https://passlab.github.io/CSCE513
1
Acknowledgement
• Slides adapted from
– Computer Science 152: Computer Architecture and
Engineering, Spring 2016 by Dr. George Michelogiannakis from
UCB
• Reference contents
– CAQA A.9
– COD textbook, chapter 2
2
Review: ISA Principles -- Iron-code Summary
• Section A.2—Use general-purpose registers with a load-store architecture.
• Section A.3—Support these addressing modes: displacement (with an address offset
size of 12 to 16 bits), immediate (size 8 to 16 bits), and register indirect.
• Section A.4—Support these data sizes and types: 8-, 16-, 32-, and 64-bit integers and
64-bit IEEE 754 floating-point numbers.
– Now we see 16-bit FP for deep learning in GPU
• http://www.nextplatform.com/2016/09/13/nvidia-pushes-deep-learning-inference-
new-pascal-gpus/
• Section A.5—Support these simple instructions, since they will dominate the number
of instructions executed: load, store, add, subtract, move register- register, and shift.
• Section A.6—Compare equal, compare not equal, compare less, branch (with a PC-
relative address at least 8 bits long), jump, call, and return.
• Section A.7—Use fixed instruction encoding if interested in performance, and use
variable instruction encoding if interested in code size.
• Section A.8—Provide at least 16 general-purpose registers, be sure all addressing
modes apply to all data transfer instructions, and aim for a minimalist IS
– Often use separate floating-point registers.
– The justification is to increase the total number of registers without raising problems in
the instruction format or in the speed of the general-purpose register file. This
compromise, however, is not orthogonal.
3
What is RISC-V
• RISC-V (pronounced "risk-five”) is a ISA standard
– An open source implementation of a reduced instruction set computing (RISC)
based instruction set architecture (ISA)
– There was RISC-I, II, III, IV before
• Most ISAs: X86, ARM, Power, MIPS, SPARC
– Commercially protected by patents
– Preventing practical efforts to reproduce the computer systems.
• RISC-V is open
– Permitting any person or group to construct compatible computers
– Use associated software
• Originated in 2010 by researchers at UC Berkeley
– Krste Asanović, David Patterson and students
• 2017 version 2 of the userspace ISA is fixed
– User-Level ISA Specification v2.2
– Draft Compressed ISA Specification v1.79 https://riscv.org/
– Draft Privileged ISA Specification v1.10 https://en.wikipedia.org/wiki/RISC-V
4
Goals in Defining RISC-V
• A completely open ISA that is freely available to academia and industry
• A real ISA suitable for direct native hardware implementation, not just
simulation or binary translation
• An ISA that avoids "over-architecting" for
– a particular microarchitecture style (e.g., microcoded, in-order, decoupled, out-of-
order) or
– implementation technology (e.g., full-custom, ASIC, FPGA), but which allows
efficient implementation in any of these
• RISC-V ISA includes
– A small base integer ISA, usable by itself as a base for customized accelerators or
for educational purposes, and
– Optional standard extensions, to support general-purpose software development
– Optional customer extensions
• Support for the revised 2008 IEEE-754 floating-point standard
5
RISC-V ISA Principles
• Generally kept very simple and extendable
• Separated into multiple specifications
– User-Level ISA spec (compute instructions)
– Compressed ISA spec (16-bit instructions)
– Privileged ISA spec (supervisor-mode instructions)
– More …
• ISA support is given by RV + word-width + extensions
supported
– E.g. RV32I means 32-bit RISC-V with support for the I(nteger)
instruction set
6
User Level ISA
• Defines the normal instructions needed for computation
– A mandatory Base integer ISA
• I: Integer instructions:
– ALU
– Branches/jumps
– Loads/stores
– Standard Extensions
• M: Integer Multiplication and Division
• A: Atomic Instructions
• F: Single-Precision Floating-Point
• D: Double-Precision Floating-Point
• C: Compressed Instructions (16 bit)
• G = IMAFD: Integer base + four standard extensions
– Optional extensions
7
RISC-V ISA
• Both 32-bit and 64-bit
address space variants
– RV32 and RV64
• Easy to subset/extend
for education/research
– RV32IM, RV32IMA,
RV32IMAFD, RV32G
• SPEC on the website
– www.riscv.org
8
RV32/64 Processor State
• Program counter (pc)
• 32 32/64-bit integer registers
(x0-x31)
– x0 always contains a 0
– x1 to hold the return address on a
call.
• 32 floating-point (FP) registers
(f0-f31)
– Each can contain a single- or
double-precision FP value (32-bit
or 64-bit IEEE FP)
• FP status register (fsr), used
for FP rounding mode &
exception reporting
9
RV64G In
One Table
10
Load/Store Instructions
11
ALU Instructions
12
Control Flow Instructions
13
RISC-V Dynamic Instruction Mix for SPECint2006
14
RISC-V Hybrid Instruction Encoding
• 16, 32, 48, 64 … bits length encoding
• Base instruction set (RV32) always has fixed 32-bit
instructions lowest two bits = 112
• All branches and jumps have targets at 16-bit granularity
(even in base ISA where all instructions are fixed 32 bits
15
Four Core RISC-V Instruction Formats
https://github.com/riscv/riscv-opcodes/blob/master/opcodes
Additional opcode
Additional opcode bits 7-bit opcode field
bits/immediate (but low 2 bits =112)
Reg. Source 2 Reg. Source 1 Destination Reg.
Aligned on a four-byte boundary in memory. There are variants!
Sign bit of immediates always on bit 31 of instruction. Register
fields never move. 16
With Variants
Additional opcode 7-bit opcode field
Additional opcode bits
bits/immediate (but low 2 bits =112)
Reg. Source 2 Reg. Source 1 Destination Reg.
Based on the handling of the immediates
17
RISC-V Encoding Summary
Immediate Encoding Variants
• 32-bit Immediate produced by each base instruction format
– Instruction bit: inst[y]
19
RISC-V Addressing Summary
, i.e., displacement addressing
R-Format Encoding Example
funct7 rs2 rs1 funct3 rd opcode
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
add x6, x10, x6
0 6 10 0 6 51
0000000 00110 01010 000 00110 0110011
0000 0000 0110 0101 0000 0011 0011 0011two =
0065033316
RISC-V I-Format Instructions
immediate rs1 funct3 rd opcode
12 bits 5 bits 3 bits 5 bits 7 bits
• Immediate arithmetic and load instructions
– rs1: source or base address register number
– immediate: constant operand, or offset added to base address
• 2s-complement, sign extended
• Design Principle: Good design demands good compromises
– Different formats complicate decoding, but allow 32-bit instructions
uniformly
– Keep formats as similar as possible
RISC-V S-Format Instructions
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
• Different immediate format for store instructions
– rs1: base address register number
– rs2: source operand register number
– immediate: offset added to base address
• Split so that rs1 and rs2 fields always in the same place
Integer Computational Instructions (ALU)
• I-type (Immediate), all immediates in all instructions are sign
extended
– ADDI: adds sign extended 12-bit immediate to rs1
– SLTI(U): set less than immediate
– ANDI/ORI/XORI: Logical operations I-type instructions end with I
– SLLI/SRLI/SRAI: Shifts by constants
24
Integer Computational Instructions (ALU)
• I-type (Immediate), all immediates in all instructions are sign
extended
– LUI/AUIPC: load upper immediate/add upper immediate to pc
I-type instructions end with I
• Writes 20-bit immediate to top of destination register.
• Used to build large immediates.
• 12-bit immediates are signed, so have to account for sign when
building 32-bit immediates in 2-instruction sequence (LUI high-
20b, ADDI low-12b)
25
Integer Computational Instructions
• R-type (Register)
– rs1 and rs2 are the source registers. rd the destination
– ADD/SUB:
– SLT, SLTU: set less than
– SRL, SLL, SRA: shift logical or arithmetic left or right
ADDI x0, x0, 0
26
Control Transfer Instructions
NO architecturally visible delay slots
• Unconditional Jumps: PC+offset target
– JAL: Jump and link, also writes PC+4 to x1, UJ-type
• Offset scaled by 1-bit left shift – can jump to 16-bit
instruction boundary (Same for branches)
– JALR: Jump and link register where Imm (12 bits) + rd1 = target
27
Control Transfer Instructions
NO architecturally visible delay slots
• Conditional Branches: SB-type and PC+offset target
12-bit signed immediate split across two fields
Branches, compare two registers, PC+(immediate<<1) target
(Signed offset in multiples of two).Branches do not have delay slot
28
Loads and Stores
• Store instructions (S-type)
– MEM(rs1+imm) = rs2
• Loads (I-type)
– Rd = MEM(rs1 + imm)
29
Specifications and Software
From riscv.org and github.com/riscv
• Specification from RISC-V website
– https://riscv.org/specifications/
• RISC-V software includes
– GNU Compiler Collection (GCC) toolchain (with GDB, the debugger)
• https://github.com/riscv/riscv-tools
– LLVM toolchain
– A simulator ("Spike")
• https://github.com/riscv/riscv-isa-sim
– Standard simulator QEMU
• https://github.com/riscv/riscv-qemu
• Operating systems support exists for Linux
– https://github.com/riscv/riscv-linux
• A JavaScript ISA simulator to run a RISC-V Linux system on a web
browser
– https://github.com/riscv/riscv-angel
30
RISC-V Implementations
• For RISC-V implementation, the UCB created Chisel, an
open-source hardware construction language that is a
specialized dialect of Scala.
– Chisel: Constructing Hardware In a Scala Embedded Language
– https://chisel.eecs.berkeley.edu/
• In-order Rocket core and chip generator
– https://github.com/freechipsproject/rocket-chip
• Out-of-order BOOM core
– https://github.com/ucb-bar/riscv-boom
• UCB Sodor cores for education (single cycle, and 1-5 stages
pipeline)
– https://github.com/ucb-bar/riscv-sodor
31
RISC-V Implementations
• A list from
– https://riscv.org/risc-v-cores/
• The Indian IIT-Madras is developing six RISC-V open-source
CPU designs (SHAKTI) for six distinct usages
– https://shaktiproject.bitbucket.io/index.html
• SiFive HiFive Unleashed
– First Linux RISC-V Board
• First shipment: June 2018
– https://www.sifive.com/
– https://github.com/sifive/freedom
32
Additional Information
33
Calling Convention
• C Datatypes and Alignment
– RV32 employs an ILP32 integer model, while RV64 is LP64
– Floating-point types are IEEE 754-2008 compatible
– All of the data types are keeped naturally aligned when stored in memory
– char is implicitly unsigned
– In RV64, 32-bit types, such as int, are stored in integer registers as proper sign extensions of
their 32-bit values; that is, bits 63..31 are all equal
• This restriction holds even for unsigned 32-bit types
34
Calling Convention
• RVG Calling Convention
– If the arguments to a function are conceptualized as fields of a C struct, each with
pointer alignment, the argument registers are a shadow of the first eight pointer-
words of that struct
• Floating-point arguments that are part of unions or array fields of structures are passed in
integer registers
• Floating-point arguments to variadic functions (except those that are explicitly named in
the parameter list) are passed in integer registers
– The portion of the conceptual struct that is not passed in argument registers is
passed on the stack
• The stack pointer sp points to the first argument not passed in a register
– Arguments smaller than a pointer-word are passed in the least-significant bits of
argument registers
– When primitive arguments twice the size of a pointer-word are passed on the stack,
they are naturally aligned
• When they are passed in the integer registers, they reside in an aligned even-odd register
pair, with the even register holding the least-significant bits
– Arguments more than twice the size of a pointer-word are passed by reference
35
Calling Convention
• The stack grows downward and the stack pointer is always kept 16-byte aligned
• Values are returned from functions in integer registers v0 and v1 and floating-point
registers fv0 and fv1
– Floating-point values are returned in floating-point registers only if they are primitives or
members of a struct consisting of only one or two floating-point values
– Other return values that fit into two pointer-words are returned in v0 and v1
– Larger return values are passed entirely in memory; the caller allocates this memory
region and passes a pointer to it as an implicit first parameter to the callee
36
Memory Model
• RISC-V: Relaxed memory model
37
Control and Status Register (CSR) Instructions
• CSR Instructions
• Timer and counters
38
Data Formats and Memory Addresses
Data formats:
8-b Bytes, 16-b Half words, 32-b words and 64-b double words
Some issues
Most Significant Least Significant
• Byte addressing Byte Byte
Little Endian
3 2 1 0
(RISC-V)
Big Endian 0 1 2 3
• Word alignment Byte Addresses
Suppose the memory is organized in 32-bit words.
Can a word address begin only at 0, 4, 8, .... ?
0 1 2 3 4 5 6 7
39
ISA Design
• RISC-V has 32 integer registers and can have 32 floating-point registers
– Register number 0 is a constant 0
– Register number 1 is the return address (link register)
• The memory is addressed by 8-bit bytes
• The instructions must be aligned to 32-bit addresses
• Like many RISC designs, it is a "load-store" machine
– The only instructions that access main memory are loads and stores
– All arithmetic and logic operations occur between registers
• RISC-V can load and store 8 and 16-bit items, but it lacks 8 and 16-bit arithmetic, including
comparison-and-branch instructions
• The 64-bit instruction set includes 32-bit arithmetic
40
ISA Design for Performance
• Features to increase a computer's speed, while reducing its cost
and power usage
– placing most-significant bits at a fixed location to speed sign-extension, and a bit-
arrangement designed to reduce the number of multiplexers in a CPU
41
ISA Design
• Intentionally lacks condition codes, and even lacks a carry bit
– To simplify CPU designs by minimizing interactions between instructions
• Builds comparison operations into its conditional-jumps
42
ISA Design
• The lack of a carry bit complicates multiple-precision arithmetic
– GMP, MPFR
• Does not detect or flag most arithmetic errors, including overflow, underflow
and divide by zero
– No special instruction set support for overflow checks on integer arithmetic operations.
• Most popular programming languages do not support checks for integer overflow, partly
because most architectures impose a significant runtime penalty to check for overflow on
integer arithmetic and partly because modulo arithmetic is sometimes the desired behavior
– Floating-Point Control and Status Register
43
ISA Design
• Lacks the "count leading zero" and bit-field operations normally used to
speed software floating-point in a pure-integer processor
• No branch delay slot, a position after a branch instruction that can be filled
with an instruction which is executed regardless of whether the branch is
taken or not
– This feature can improve performance of pipelined processors,
– Omitted in RISC-V because it complicates both multicycle CPUs and superscalar CPUs
• Lacks address-modes that "write back" to the registers
– For example, it does not do auto-incrementing
44
ISA Design
• A load or store can add a twelve-bit signed offset to a register that contains
an address. A further 20 bits (yielding a 32-bit address) can be generated at
an absolute address
– RISC-V was designed to permit position-independent code. It has a special instruction to
generate 20 upper address bits that are relative to the program counter. The lower twelve bits
are provided by normal loads, stores and jumps
– LUI (load upper immediate) places the U-immediate value in the top 20 bits of the destination
register rd, filling in the lowest 12 bits with zeros
– AUIPC (add upper immediate to pc) is used to build pc-relative addresses, forms a 32-bit offset
from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc,
then places the result in register rd
45