Risc V Processor 1
Risc V Processor 1
Under Supervision of
Dr. Anjan Kumar
Contents
1 Introduction ........................................................................................................................ 3
1.1 Purpose of the Documentation ..................................................................................... 3
1.2 Overview of RISC Architecture................................................................................... 3
1.2.1 Historical Context ........................................................................................... 4
1.2.2 Advantages of RISC ........................................................................................ 4
1.3 Scope of the Document................................................................................................ 4
2 RISC Architecture Overview.............................................................................................. 5
2.1 Key Features of RISC .................................................................................................. 5
2.1.1 Simplified Instruction Set ................................................................................ 5
2.1.2 Load-Store Architecture .................................................................................. 5
2.1.3 Single-Cycle Execution ................................................................................... 5
2.1.4 Pipelining ........................................................................................................ 5
2.1.5 Large Register Set ........................................................................................... 6
2.2 Block Diagram............................................................................................................. 6
2.3 Components ................................................................................................................. 6
2.4 Historical Context ........................................................................................................ 7
2.5 Advantages of RISC .................................................................................................... 7
2.6 Applications ................................................................................................................. 7
3 Design Overview ................................................................................................................ 7
3.1 Processor Components ................................................................................................ 8
3.1.1 Instruction Memory......................................................................................... 8
3.1.2 Data Memory .................................................................................................. 8
3.1.3 Control Unit .................................................................................................... 8
3.1.4 ALU (Arithmetic Logic Unit) ......................................................................... 8
3.1.5 Register File .................................................................................................... 8
3.1.6 Instruction Decode (instruction decode) ......................................................... 9
3.1.7 ALU (alu 64bit) .............................................................................................. 9
3.1.8 Data Memory (data memory) ......................................................................... 9
3.1.9 Cache Memory (cache memory)..................................................................... 9
3.1.10 Clock Generator (clock generator).................................................................. 9
3.1.11 Branch Predictor (branch predictor) ............................................................... 9
3.1.12 Floating Point Unit (fpu) ................................................................................. 9
3.1.13 Interrupt Controller (interrupt controller) ..................................................... 10
3.1.14 I/O Controller (io controller) ........................................................................ 10
3.1.15 I/O Handler (io handler) ............................................................................... 10
3.1.16 Debug Monitor (debug monitor) ................................................................... 10
3.2 Pipeline Stages........................................................................................................... 10
3.2.1 Instruction Fetch (IF) .................................................................................... 10
3.2.2 Instruction Decode (ID) ................................................................................ 10
3.2.3 Execution (EX).............................................................................................. 10
3.2.4 Memory Access (MEM) ............................................................................... 11
3.2.5 Write Back (WB) .......................................................................................... 11
3.3 Control Signals .......................................................................................................... 11
3.4 Data Flow .................................................................................................................. 11
4 Testing .............................................................................................................................. 11
1
4.1 Test Methodology ...................................................................................................... 12
4.1.1 Unit Testing .................................................................................................. 12
4.1.2 Integration Testing ........................................................................................ 12
4.1.3 System Testing .............................................................................................. 12
4.1.4 Regression Testing ........................................................................................ 12
4.2 Test Cases .................................................................................................................. 12
4.2.1 Load Word (LW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.2 Store Word (SW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.3 Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.4 Branch Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.2 Debugging Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1 Support for Additional Instructions . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Performance Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Integration with Peripheral Components . . . . . . . . . . . . . . . . . . . . . 15
5.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Enhanced Testing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5 Cache Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6 Exception Handling Improvements . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6.2 Potential Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1 Key Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.1 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2 Research Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.3 Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.4 Additional Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.1 Code Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.1.1 source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2
1 Introduction
The rapid advancement of computing technology has driven the need for efficient and high-
performance processors capable of handling increasingly complex tasks. The processor top1
is a 64-bit RISC (Reduced Instruction Set Computer) processor designed to execute a variety
of instructions efficiently while maintaining a modular and scalable architecture. This design
aims to balance performance, simplicity, and flexibility in a wide range of applications.
• Assist in debugging and optimization efforts by providing clear descriptions of each com-
ponent’s functionality.
• Simplicity: A reduced set of instructions simplifies the control logic required for execu-
tion. This simplicity allows for faster instruction decoding and execution while minimiz-
ing hardware complexity.
• Load-Store Architecture: In RISC architectures, only load (LW) and store (SW) in-
structions can access memory directly; all other operations are performed using registers.
This design minimizes memory access times and enhances performance by keeping data
in fast-access registers.
• Single-Cycle Execution: Most RISC instructions are designed to complete in one clock
cycle, which increases overall speed. This predictability simplifies timing analysis and
improves pipeline efficiency.
3
• Large Register Set: RISC processors typically feature a larger number of general-
purpose registers (16 to 32). This reduces the frequency of memory accesses and allows
for more efficient data handling during program execution.
• Scalability: The modular nature of RISC designs allows for easy scaling in terms of
performance and integration with other systems.
• Detailed descriptions of each component within the processor top1, including instruction
memory, data memory, control unit, ALU (Arithmetic Logic Unit), and register file.
• A breakdown of the pipeline stages (IF, ID, EX, MEM, WB) and their functions in exe-
cuting instructions.
• Implementation details including code structure and simulation environment used during
development.
• Future work opportunities for enhancing the processor design with additional features or
optimizations.
By providing this comprehensive documentation, we aim to facilitate understanding and
further development of RISC processor designs based on the processor top1 architecture. The
insights gained from this project can serve as a foundation for future research or practical
applications in various computing domains.
4
2 RISC Architecture Overview
The RISC (Reduced Instruction Set Computer) architecture is a design philosophy that empha-
sizes a small, highly optimized instruction set that can be executed within a single clock cycle.
This section provides an in-depth look at the key features of RISC architecture and illustrates
its advantages over other architectures, particularly CISC (Complex Instruction Set Computer).
2.1.4 Pipelining
Pipelining is a technique used in RISC architectures that allows multiple instructions to be pro-
cessed simultaneously across different stages of execution. Each stage of the pipeline performs
a specific function:
• Instruction Fetch (IF): Fetches the next instruction from memory.
• Instruction Decode (ID): Decodes the fetched instruction and generates control signals.
5
This overlapping of instruction execution significantly increases throughput by allowing
new instructions to enter the pipeline before previous ones have completed.
2.3 Components
• Instruction Memory: Stores program instructions that are fetched during the IF stage.
• Data Memory: Manages data storage and retrieval during MEM operations.
• Control Unit: Generates control signals based on opcode and function codes extracted
from instructions.
• ALU (Arithmetic Logic Unit): Executes arithmetic and logical operations as dictated
by control signals.
• Register File: Contains general-purpose registers that store intermediate values during
execution.
6
2.4 Historical Context
RISC architecture emerged in the early 1980s as a response to the increasing complexity of
CISC architectures like x86 and IBM System/360. Researchers aimed to create processors
that could execute instructions more efficiently by minimizing instruction complexity while
maximizing instruction throughput.
Early implementations of RISC architecture include:
• MIPS Architecture: One of the first successful commercial RISC architectures, widely
used in embedded systems.
• ARM Architecture: Known for its energy efficiency, ARM has become prevalent in
mobile devices and embedded systems.
• Energy Efficiency: Simpler instructions require less power to execute, making RISC
processors ideal for battery-powered devices such as smartphones and tablets.
• Scalability: The modular nature of RISC designs allows for easy scaling in terms of
performance and integration with other systems.
2.6 Applications
RISC processors are widely used across various domains due to their efficiency and perfor-
mance characteristics:
• Embedded Systems: Many embedded applications utilize RISC processors for their low
power consumption and high performance.
• Mobile Devices: ARM-based processors dominate the mobile market due to their energy
efficiency.
3 Design Overview
The design of the processor top1 is based on the principles of RISC architecture, emphasizing
simplicity, efficiency, and modularity. This section provides a detailed description of the key
components of the processor and how they interact within the overall architecture.
7
3.1 Processor Components
The processor top1 consists of several critical components that work together to execute in-
structions efficiently:
• Implementation: The control unit decodes the instruction during the instruction decode
(ID) stage and produces signals that control various components such as the ALU, data
memory, and register file.
• Implementation: The register file typically includes multiple registers (e.g., 32 registers)
that can be accessed by their identifiers (rs1, rs2, rd). The register file supports read and
write operations based on control signals generated during instruction execution.
8
3.1.6 Instruction Decode (instruction decode)
• Function: Decodes fetched instructions to extract opcode, function codes, source regis-
ters, and destination registers.
• Implementation: This module processes the instruction fetched from memory to provide
necessary information for execution and control signal generation.
9
3.1.13 Interrupt Controller (interrupt controller)
• Function: Manages interrupt requests from external devices or internal events.
• Implementation: This module prioritizes interrupts and signals the processor when an
interrupt should be serviced.
• Implementation: This module coordinates data transfer between I/O devices and mem-
ory or registers.
• Implementation: This module allows developers to set breakpoints, inspect register val-
ues, and monitor execution flow.
10
3.2.4 Memory Access (MEM)
This stage handles reading from or writing to data memory as required by load/store instruc-
tions using data memory. If a load operation is detected, data is fetched from memory; if a
store operation is detected, data is written to memory.
• mem to reg: Determines if data comes from memory or directly from ALU output.
• alu src: Indicates whether the second operand for ALU operations comes from an imme-
diate value or a register.
• The fetched instructions are decoded in ID stage using instruction decode, generating
control signals for subsequent stages.
• Operands are retrieved from registers or immediate values for execution in EX stage
using alu 64bit.
• Data is accessed in MEM stage if required by load/store instructions using data memory.
• Results are written back to registers in WB stage using register file for future use.
4 Testing
Testing is a critical phase in the development of the processor top1 to ensure that all compo-
nents function correctly and that the processor operates as intended. This section outlines the
testing methodology, specific test cases, and results obtained from the testing process.
11
4.1 Test Methodology
The testing methodology for the processor top1 involves several key steps:
• Functional Verification: Ensuring that each module performs its intended function cor-
rectly.
• Boundary Testing: Checking how modules handle edge cases, such as maximum and
minimum values.
• Data Flow Verification: Ensuring that data passes correctly between modules (e.g., from
instruction memory to instruction decode).
• Control Signal Verification: Checking that control signals generated by the control unit
correctly influence other components.
• Expected Outcome: After executing a load instruction, the specified register should
contain the correct data fetched from memory.
12
4.2.2 Store Word (SW)
• Description: Tests storing data from a register into memory.
• Expected Outcome: After executing a store instruction, the specified memory address
should contain the correct data stored from the register.
• Expected Outcome: The result of arithmetic operations should match expected values
based on input operands.
• Expected Outcome: The PC should correctly point to the next instruction based on
branch outcomes.
• Expected Outcome: The processor should correctly respond to interrupts and execute
appropriate handlers.
4.3 Results
The simulation results indicate that the processor top1 operates as intended, successfully exe-
cuting instructions and managing data flow between components. Key findings include:
• All unit tests for individual modules passed without errors.
• Integration tests confirmed that data flows correctly between modules, with control sig-
nals functioning as expected.
• System tests demonstrated that end-to-end instruction execution was successful for a
variety of test cases.
• Performance metrics showed that the processor achieved optimal throughput with mini-
mal latency during instruction execution.
13
4.3.2 Debugging Insights
During testing, several issues were identified and resolved:
• Minor bugs in control signal generation were fixed, improving overall stability.
5 Future Work
The design and implementation of the processor top1 provide a solid foundation for further
development and enhancement. This section outlines several areas for future work that could
improve performance, expand functionality, and adapt the processor for various applications.
• String and Array Processing: Implement instructions optimized for handling strings
and arrays, which are common in high-level programming languages.
• Dynamic Voltage and Frequency Scaling (DVFS): Implement DVFS techniques to op-
timize power consumption based on workload requirements.
14
5.3 Integration with Peripheral Components
5.3.1 Description
Developing interfaces for external devices or integrating with other processors in a system-on-
chip (SoC) environment can enhance functionality and expand application areas.
• Stress Testing: Conduct stress tests that push the processor to its limits, identifying
potential failure points or performance degradation under heavy workloads.
15
5.6.2 Potential Enhancements
• Advanced Interrupt Handling: Develop more sophisticated interrupt handling mecha-
nisms that prioritize interrupts based on urgency.
• Fault Tolerance Mechanisms: Implement strategies that allow the processor to recover
gracefully from errors or faults during execution.
6 Conclusion
The processor top1 represents a significant achievement in the design and implementation of a
64-bit RISC (Reduced Instruction Set Computer) processor. Throughout this project, we have
adhered to the principles of RISC architecture, emphasizing simplicity, efficiency, and modular-
ity. The design effectively integrates various components, including instruction memory, data
memory, control units, ALUs, and pipeline stages, to create a functional and high-performance
processor.
• Future Potential: The design lays a strong foundation for future work, including support
for additional instructions, performance optimizations, and integration with peripheral
components. The potential for enhancements ensures that the processor can adapt to
evolving technological demands.
• This documentation aims to serve as a comprehensive resource for understanding the de-
sign and functionality of the processor top1, providing guidance for future developments
and potential applications in various computing domains.
16
7 References
This section lists the resources and materials that were referenced or consulted during the
design, implementation, and documentation of the processor top1. These references provide
foundational knowledge and context for the concepts discussed throughout this document.
7.1 Books
• Hennessy, J. L., & Patterson, D. A.(2017). Computer Architecture: A Quantitative Ap-
proach (6th ed.). Morgan Kaufmann. This book provides a comprehensive overview of
computer architecture principles, including RISC design and performance evaluation.
• Patterson, D. A., & Hennessy, J. L.* (2013). Computer Organization and Design: The
Hardware/Software Interface (5th ed.). Morgan Kaufmann. This text covers fundamen-
tal concepts in computer organization and design, focusing on the interaction between
hardware and software.
• David A. Patterson & John L. Hennessy* (2018). Computer Organization and Design
RISC-V Edition: Fundamentals of Computer Engineering. Morgan Kaufmann. This
book specifically addresses RISC-V architecture and provides insights into modern RISC
designs.
17
7.4 Additional Materials
• Lecture Notes from University Courses: Various university courses on computer architec-
ture and digital design provided foundational knowledge that informed the design choices
made during this project.
• Open-source Projects: Insights gained from examining open-source RISC processor im-
plementations available on platforms like GitHub contributed to understanding practical
design considerations.
8 Appendices
module alu_64bit (
input logic [63:0] a, // Operand A
input logic [63:0] b, // Operand B
output logic [63:0] result, // Result of the operation
output logic zero // Zero flag
);
// ALU control signal encoding
localparam ADD = 4’b0000; // Addition
localparam AND = 4’b0010; // Logical AND
localparam OR = 4’b0011; // Logical OR
localparam XOR = 4’b0100; // Logical XOR
localparam SLL = 4’b0101; // Logical left shift
localparam SRL = 4’b0110; // Logical right shift
localparam SRA = 4’b0111; // Arithmetic right shift
always_comb begin
case (alu_ctrl)
ADD: result = a + b;
SUB: result = a - b;
AND: result = a & b;
XOR: result = a ˆ b;
SLL: result = a << b[5:0]; // Shift amount limited to 6 bits
SRL: result = a >> b[5:0];
SRA: result = $signed(a) >>> b[5:0];
default: begin
result = 64’b0; // Default case for invalid operation
zero = 1’b1; // Set zero flag if invalid operation occurs
end
endcase
module ALU_Control(
input wire [1:0] ALUOp, [3:0] Funct,
output reg [3:0] Operation
18
);
always @(ALUOp or Funct)
begin
if(ALUOp == 2’b00)
Operation <= 4’b0010;
else if(ALUOp == 2’b01)
Operation <= 4’b0110;
else if(ALUOp == 2’b10)
begin
if(Funct == 4’b0000)
Operation <= 4’b0010;
else if(Funct == 4’b1000)
Operation <= 4’b0110;
else if(Funct == 4’b0111)
Operation <= 4’b0000;
else if(Funct == 4’b0110)
Operation <= 4’b0001;
end
end
endmodule
Listing 2: ALU Control
module Adder(
input [63:0] a, [63:0] b,
output [63:0] out
);
assign out = a + b;
endmodule
Listing 3: Adder
module Control_Unit(
input wire [6:0] Opcode,
output reg Branch, MemRead, MemtoReg, MemWrite, ALUSrc, RegWrite, [1:0]
ALUOp
);
always @(Opcode)
begin
if(Opcode == 7’b0110011)
begin
ALUSrc <= 0;
MemtoReg <= 0;
RegWrite <= 1;
MemRead <= 0;
MemWrite <= 0;
Branch <= 0;
ALUOp <= 2’b10;
end
else if(Opcode == 7’b0000011)
begin
ALUSrc <= 1;
MemtoReg <= 1;
RegWrite <= 1;
MemRead <= 1;
MemWrite <= 0;
Branch <= 0;
ALUOp <= 2’b00;
19
end
else if(Opcode == 7’b0100011)
begin
ALUSrc <= 1;
MemtoReg <= 1;
RegWrite <=0;
MemRead <= 0;
MemWrite <= 1;
Branch <= 0;
ALUOp <= 2’b00;
end
else if(Opcode == 7’b1100011)
begin
ALUSrc <= 0;
MemtoReg <= 0;
RegWrite <=0;
MemRead <= 0;
MemWrite <= 0;
Branch <= 1;
ALUOp <= 2’b01;
end
else
begin
ALUSrc <= 0;
MemtoReg <= 0;
RegWrite <=0;
MemRead <= 1;
MemWrite <= 0;
Branch <= 0;
ALUOp <= 2’b00;
end
end
endmodule
module Data_Memory(
input wire [63:0] Mem_Addr, wire [63:0] Write_Data, wire clk, MemWrite,
MemRead, wire [1:0] wordSize,
output reg [63:0] Read_Data
);
reg [7:0] dataMem [63:0];
initial
begin
dataMem[0] = 8’b0;
dataMem[1] = 8’b0;
dataMem[2] = 8’b0;
dataMem[3] = 8’b0;
dataMem[4] = 8’b0;
dataMem[5] = 8’b0;
dataMem[6] = 8’b0;
dataMem[7] = 8’b0;
dataMem[8] = 8’b0;
dataMem[9] = 8’b0;
dataMem[10] = 8’b0;
dataMem[11] = 8’b0;
dataMem[12] = 8’b0;
dataMem[13] = 8’b0;
20
dataMem[14] = 8’b0;
dataMem[15] = 8’b0;
dataMem[16] = 8’b0;
dataMem[17] = 8’b0;
dataMem[18] = 8’b0;
dataMem[19] = 8’b0;
dataMem[20] = 8’b0;
dataMem[21] = 8’b0;
dataMem[22] = 8’b0;
dataMem[23] = 8’b0;
dataMem[24] = 8’b0;
dataMem[25] = 8’b0;
dataMem[26] = 8’b0;
dataMem[27] = 8’b0;
dataMem[28] = 8’b0;
dataMem[29] = 8’b0;
dataMem[30] = 8’b0;
dataMem[31] = 8’b0;
dataMem[32] = 8’b0;
dataMem[33] = 8’b0;
dataMem[34] = 8’b0;
dataMem[35] = 8’b0;
dataMem[36] = 8’b0;
dataMem[37] = 8’b0;
dataMem[38] = 8’b0;
dataMem[39] = 8’b0;
dataMem[40] = 8’b0;
dataMem[41] = 8’b0;
dataMem[42] = 8’b0;
dataMem[43] = 8’b0;
dataMem[44] = 8’b0;
dataMem[45] = 8’b0;
dataMem[46] = 8’b0;
dataMem[47] = 8’b0;
dataMem[48] = 8’b0;
dataMem[49] = 8’b0;
dataMem[50] = 8’b0;
dataMem[51] = 8’b0;
dataMem[52] = 8’b0;
dataMem[53] = 8’b0;
dataMem[54] = 8’d11;
dataMem[55] = 8’d12;
dataMem[56] = 8’d13;
dataMem[57] = 8’d14;
dataMem[58] = 8’b0;
dataMem[59] = 8’b0;
dataMem[60] = 8’b0;
dataMem[61] = 8’b0;
dataMem[62] = 8’b0;
dataMem[63] = 8’b0;
end
21
if(wordSize[1])
begin
dataMem[Mem_Addr[5:0]+7] <= Write_Data[7:0];
dataMem[Mem_Addr[5:0]+6] <= Write_Data[15:8];
dataMem[Mem_Addr[5:0]+5] <= Write_Data[23:16];
dataMem[Mem_Addr[5:0]+4] <= Write_Data[31:24];
dataMem[Mem_Addr[5:0]+3] <= Write_Data[39:32];
dataMem[Mem_Addr[5:0]+2] <= Write_Data[47:40];
dataMem[Mem_Addr[5:0]+1] <= Write_Data[55:48];
dataMem[Mem_Addr[5:0]+0] <= Write_Data[63:56];
end
else
begin
dataMem[Mem_Addr[5:0]+1] <= Write_Data[7:0];
dataMem[Mem_Addr[5:0]+0] <= Write_Data[15:8];
end
end
else
begin
if(wordSize[1])
begin
dataMem[Mem_Addr[5:0]+3] <= Write_Data[7:0];
dataMem[Mem_Addr[5:0]+2] <= Write_Data[15:8];
dataMem[Mem_Addr[5:0]+1] <= Write_Data[23:16];
dataMem[Mem_Addr[5:0]+0] <= Write_Data[31:24];
end
else
begin
dataMem[Mem_Addr[5:0]+0] <= Write_Data[7:0];
end
end
end
end
22
else
begin
if(wordSize[1])
begin
Read_Data[31:0] <= {dataMem[Mem_Addr[5:0]+0],
dataMem[Mem_Addr[5:0]+1],
dataMem[Mem_Addr[5:0]+2],
dataMem[Mem_Addr[5:0]+3]};
Read_Data[63:32] <= {32{dataMem[Mem_Addr[5:0]][7]}};
end
else
begin
Read_Data[7:0] <= dataMem[Mem_Addr[5:0]];
Read_Data[63:8] <= {56{dataMem[Mem_Addr[5:0]][7]}};
end
end
end
end
endmodule
Listing 5: Data Memory
module EXRegister(
input wire [63:0] PC_in,
[63:0] data1_in,
[63:0] data2_in,
[63:0] immData_in,
[4:0] rs1_in,
[4:0] rs2_in,
[4:0] rd_in,
[3:0] Funct_in,
wire Branch_in, MemRead_in, MemtoReg_in, MemWrite_in,
ALUSrc_in, RegWrite_in,
[1:0] ALUOp_in,
wire clk, reset,
output reg [63:0] PC_out,
reg [63:0] data1_out,
reg [63:0] data2_out,
reg [63:0] immData_out,
reg [4:0] rs1_out,
reg [4:0] rs2_out,
reg [4:0] rd_out,
reg [3:0] Funct_out,
reg Branch_out, reg MemRead_out, reg MemtoReg_out, reg
MemWrite_out, reg ALUSrc_out, reg RegWrite_out,
reg [1:0] ALUOp_out
);
always @(posedge reset or posedge clk)
begin
if(reset)
begin
PC_out <= 64’b0;
data1_out <= 64’b0;
data2_out <= 64’b0;
immData_out <= 64’b0;
rs1_out <= 5’b0;
rs2_out <= 5’b0;
23
rd_out <= 5’b0;
Funct_out <= 4’b0;
Branch_out <= 1’b0;
MemRead_out <= 1’b0;
MemtoReg_out <= 1’b0;
MemWrite_out <= 1’b0;
ALUSrc_out <= 1’b0;
RegWrite_out <= 1’b0;
ALUOp_out <= 2’b0;
end
else
begin
PC_out <= PC_in;
data1_out <= data1_in;
data2_out <= data2_in;
immData_out <= immData_in;
rs1_out <= rs1_in;
rs2_out <= rs2_in;
rd_out <= rd_in;
Funct_out <= Funct_in;
Branch_out <= Branch_in;
MemRead_out <= MemRead_in;
MemtoReg_out <= MemtoReg_in;
MemWrite_out <= MemWrite_in;
ALUSrc_out <= ALUSrc_in;
RegWrite_out <= RegWrite_in;
ALUOp_out <= ALUOp_in;
end
end
endmodule
Listing 6: EXRegister
module ForwardingUnit(
input wire [4:0] rdMem, wire regWriteMem, wire [4:0] rdWb, wire
regWriteWb, wire [4:0] rs1, wire [4:0] rs2,
output reg [1:0] ForwardA, reg [1:0] ForwardB
);
always @(rdMem or rdWb or regWriteMem or regWriteWb or rs1 or rs2)
begin
if (regWriteMem && (rdMem !== 5’b0) && (rdMem === rs2))
ForwardB = 2’b10;
else if (regWriteWb && (rdWb !== 5’b0) && (rdWb === rs2))
ForwardB <= 2’b01;
else
ForwardB <= 2’b00;
Listing 7: ForwardingUnit
24
module IDRegister(
input wire [63:0] PC_in, [31:0] ins_in, wire clk, reset,
output reg [63:0] PC_out, reg [31:0] ins_out
);
always @(posedge reset or posedge clk)
begin
if(reset)
begin
ins_out <= 32’b0;
PC_out <= 64’b0;
end
else
begin
PC_out <= PC_in;
ins_out <= ins_in;
end
end
endmodule
Listing 8: IDRegister
module Instruction_Memory(
input wire [63:0] Inst_Address,
output reg [31:0] Instruction
);
reg [7:0] insMem [15:0];
initial
begin
insMem[15] = 8’h02;
insMem[14] = 8’h95;
insMem[13] = 8’h34;
insMem[12] = 8’h23;
insMem[11] = 8’h00;
insMem[10] = 8’h14;
insMem[9] = 8’h84;
insMem[8] = 8’h93;
insMem[7] = 8’h0;
insMem[6] = 8’h9A;
insMem[5] = 8’h84;
insMem[4] = 8’hB3;
insMem[3] = 8’h02;
insMem[2] = 8’h85;
insMem[1] = 8’h34;
insMem[0] = 8’h83;
end
always @(Inst_Address)
begin
Instruction <= {insMem[Inst_Address[3:0]+3], insMem[Inst_Address
[3:0]+2], insMem[Inst_Address[3:0]+1], insMem[Inst_Address[3:0]+0]};
end
endmodule
module MEMRegister(
input wire [63:0] PC_in,
[63:0] aluResult_in,
[63:0] data2_in,
25
[4:0] rd_in,
wire Branch_in, MemRead_in, MemtoReg_in, MemWrite_in,
RegWrite_in, zero_in,
clk, reset,
output reg [63:0] PC_out,
reg [63:0] aluResult_out,
reg [63:0] data2_out,
reg [4:0] rd_out,
reg Branch_out, reg MemRead_out, reg MemtoReg_out, reg
MemWrite_out, reg RegWrite_out, reg zero_out
);
always @(posedge reset or posedge clk)
begin
if(reset)
begin
PC_out <= 64’b0;
aluResult_out <= 64’b0;
data2_out <= 64’b0;
rd_out <= 5’b0;
Branch_out <= 1’b0;
MemRead_out <= 1’b0;
MemtoReg_out <= 1’b0;
MemWrite_out <= 1’b0;
RegWrite_out <= 1’b0;
zero_out <= 1’b0;
end
else
begin
PC_out <= PC_in;
aluResult_out <= aluResult_in;
data2_out <= data2_in;
rd_out <= rd_in;
Branch_out <= Branch_in;
MemRead_out <= MemRead_in;
MemtoReg_out <= MemtoReg_in;
MemWrite_out <= MemWrite_in;
RegWrite_out <= RegWrite_in;
zero_out <= zero_in;
end
end
endmodule
module Program_Counter(
input clk, reset, [63:0] PC_In,
output reg [63:0] PC_Out
);
always @(posedge reset or posedge clk)
begin
if(reset)
PC_Out <= 64’d0;
else
PC_Out <= PC_In;
end
endmodule
26
Listing 11: Program Counter
module WBRegister(
input wire [63:0] aluResult_in,
[63:0] memData_in,
[4:0] rd_in,
wire MemtoReg_in, RegWrite_in,
clk, reset,
output reg [63:0] aluResult_out,
reg [63:0] memData_out,
reg [4:0] rd_out,
reg MemtoReg_out, reg RegWrite_out
);
always @(posedge reset or posedge clk)
begin
if(reset)
begin
aluResult_out <= 64’b0;
memData_out <= 64’b0;
rd_out <= 5’b0;
MemtoReg_out <= 1’b0;
RegWrite_out <= 1’b0;
end
else
begin
aluResult_out <= aluResult_in;
memData_out <= memData_in;
rd_out <= rd_in;
MemtoReg_out <= MemtoReg_in;
RegWrite_out <= RegWrite_in;
end
end
endmodule
module extractor(
input wire [31:0] ins,
output reg [63:0] imm_data
);
reg sel1, sel2;
always @(ins)
begin
sel1 = ins[5];
sel2 = ins[4];
if(sel1)
begin
imm_data[9:0] <= {ins[30:25] , ins[11:8]};
imm_data[63:10] <= {54{ins[30]}};
end
else
begin
if(sel2)
begin
imm_data[11:0] <= {ins[31:25] , ins[11:7]};
imm_data[63:12] <= {52{ins[31]}};
27
end
else
begin
imm_data[11:0] <= ins[31:20];
imm_data[63:12] <= {52{ins[31]}};
end
end
end
endmodule
Listing 13: extractor
module mux(
input wire [63:0] in1, wire [63:0] in2, wire sel,
output reg [63:0] out
);
always @(in1 or in2 or sel)
begin
if (sel)
out <= in2;
else
out <= in1;
end
endmodule
module mux2(
input wire [63:0] in1, wire [63:0] in2, wire [63:0] in3, wire [1:0] sel,
output wire [63:0] out
);
wire [63:0] tempResult;
mux m1 (.in1(in1), .in2(in2), .sel(sel[0]), .out(tempResult));
mux m2 (.in1(tempResult), .in2(in3), .sel(sel[1]), .out(out));
endmodule
module parser(
input wire [31:0] ins,
output wire [6:0] opcode,
wire [4:0] rd,
wire [2:0] funct3,
wire [4:0] rs1,
wire [4:0] rs2,
wire [6:0] funct7
);
assign opcode = ins[6:0];
assign rd = ins[11:7];
assign funct3 = ins[14:12];
assign rs1 = ins[19:15];
assign rs2 = ins[24:20];
assign funct7 = ins[31:25];
endmodule
28
module registerFile(
input wire [4:0] rs1, [4:0] rs2, [4:0] rd, [63:0] writeData, wire
regWrite, clk, reset,
output wire [63:0] ReadData1, wire [63:0] ReadData2
);
reg [63:0] registerArr [31:0];
always @(posedge reset or posedge clk)
begin
if(reset)
begin
registerArr[0] = 64’d0;
registerArr[1] = 64’d0;
registerArr[2] = 64’d0;
registerArr[3] = 64’d0;
registerArr[4] = 64’d0;
registerArr[5] = 64’d0;
registerArr[6] = 64’d0;
registerArr[7] = 64’d0;
registerArr[8] = 64’d0;
registerArr[9] = 64’d0;
registerArr[10] = 64’d15;
registerArr[11] = 64’d0;
registerArr[12] = 64’d0;
registerArr[13] = 64’d0;
registerArr[14] = 64’d0;
registerArr[15] = 64’d0;
registerArr[16] = 64’d0;
registerArr[17] = 64’d0;
registerArr[18] = 64’d0;
registerArr[19] = 64’d0;
registerArr[20] = 64’d0;
registerArr[21] = 64’d4;
registerArr[22] = 64’d0;
registerArr[23] = 64’d0;
registerArr[24] = 64’d0;
registerArr[25] = 64’d0;
registerArr[26] = 64’d0;
registerArr[27] = 64’d0;
registerArr[28] = 64’d0;
registerArr[29] = 64’d0;
registerArr[30] = 64’d0;
registerArr[31] = 64’d0;
end
else
begin
if(regWrite)
begin
registerArr[rd] <= writeData;
end
end
end
assign ReadData1 = registerArr[rs1];
assign ReadData2 = registerArr[rs2];
endmodule
Listing 17: registerFile
29
module RISC_V_Processor(
input clk, reset,
output [63:0] debug_pc, // Expose PC value
output [31:0] debug_instruction, // Expose current instruction
output [63:0] debug_alu_result // Expose ALU result
);
// Wire declarations
wire [63:0] PC_In, PC_Out, NextPC_In, BranchPC_In, imm_data, ReadData1,
ReadData2, writeData, ReadData3, ALUResult, MemRead_Data, updatedData1
, updatedData2;
wire [31:0] Instruction;
wire [6:0] opcode, funct7;
wire [4:0] rd, rs1, rs2;
wire [3:0] Operation;
wire [2:0] funct3;
wire [1:0] ALUOp, ForwardA, ForwardB;
wire Branch, MemRead, MemtoReg, MemWrite, ALUSrc, RegWrite, zero;
wire shouldBranch;
// IF/ID
wire [63:0] pcID;
wire [31:0] insID;
// ID/EX
wire [63:0] data1EX, data2EX, immDataEX, pcEX;
wire [4:0] rdEX, rs1EX, rs2EX;
wire [3:0] functEX;
wire BranchEX, MemReadEX, MemtoRegEX, MemWriteEX, ALUSrcEX, RegWriteEX;
wire [1:0] ALUOpEX;
// EX/MEM
wire [63:0] aluResultMEM, data2MEM, pcMEM;
wire [4:0] rdMEM;
wire BranchMEM, MemReadMEM, MemtoRegMEM, MemWriteMEM, RegWriteMEM,
zeroMEM;
// MEM/WB
wire [63:0] aluResultWB, MemRead_DataWB;
wire [4:0] rdWB;
wire MemtoRegWB, RegWriteWB;
// Instruction Fetch
Program_Counter pc(.clk(clk), .reset(reset), .PC_In(PC_In), .PC_Out(
PC_Out));
Adder adder(.a(PC_Out), .b(64’d4), .out(NextPC_In));
Instruction_Memory insMem(.Inst_Address(PC_Out), .Instruction(Instruction
));
mux newPCMux(.in1(NextPC_In), .in2(pcMEM), .out(PC_In), .sel(shouldBranch
));
// Instruction Decode
parser insPar(.ins(insID), .opcode(opcode), .rd(rd), .funct3(funct3), .
rs1(rs1), .rs2(rs2), .funct7(funct7));
30
Control_Unit ct(.Opcode(opcode), .Branch(Branch), .MemRead(MemRead), .
MemtoReg(MemtoReg), .MemWrite(MemWrite), .ALUSrc(ALUSrc), .RegWrite(
RegWrite), .ALUOp(ALUOp));
extractor e(.ins(insID), .imm_data(imm_data));
registerFile rf(.rs1(rs1), .rs2(rs2), .rd(rdWB), .clk(clk), .reset(reset)
, .regWrite(RegWriteWB), .writeData(writeData), .ReadData1(ReadData1),
.ReadData2(ReadData2));
// Execute
ALU_Control aluCt(.ALUOp(ALUOpEX), .Funct(functEX), .Operation(Operation)
);
Adder adder1(.a(pcEX), .b(immDataEX << 1), .out(BranchPC_In));
mux2 data1ForwardingMux (.in1(data1EX), .in2(writeData), .in3(
aluResultMEM), .sel(ForwardA), .out(updatedData1));
mux2 data2ForwardingMux (.in1(data2EX), .in2(writeData), .in3(
aluResultMEM), .sel(ForwardB), .out(updatedData2));
ForwardingUnit fwd(.rdMem(rdMEM), .regWriteMem(RegWriteMEM), .rdWb(rdWB),
.regWriteWb(RegWriteWB), .rs1(rs1EX), .rs2(rs2EX),
.ForwardA(ForwardA), .ForwardB(ForwardB));
mux aluSrcMux(.in1(updatedData2), .in2(immDataEX), .out(ReadData3), .sel(
ALUSrcEX));
ALU_64_bit alu(.a(updatedData1), .b(ReadData3), .ALUOp(Operation), .
Result(ALUResult), .zero(zero));
// Memory
31
Data_Memory uut(.Mem_Addr(aluResultMEM), .Write_Data(updatedData2), .clk(
clk), .MemWrite(MemWriteMEM), .MemRead(MemReadMEM), .wordSize(2’b11),
.Read_Data(MemRead_Data));
// Debug outputs
assign debug_pc = PC_Out;
assign debug_instruction = Instruction;
assign debug_alu_result = ALUResult;
endmodule
Listing 18: RISC V Processor
module tb();
always
#5 clk = ˜clk;
endmodule
8.2 Results
32
Figure 2: ALU 64 Bit
33
Figure 4: Adder
34
Figure 6: Data memory
Figure 7: Execute
35
Figure 8: Extractor
36
Figure 10: Instruction Decode Register
37
Figure 12: Memory Register
38
Figure 14: MUX 2
39
Figure 16: Program Counter
40
Figure 18: Write Back Register
41
Figure 20: Synthesis of RISC V Processor
42