0% found this document useful (0 votes)

144 views95 pages

Basic HLS Tutorial-2022.2

This document provides an introduction to designing a two frequency PWM modulator system using C++ and Vivado. It discusses the motivation and objectives which are to generate a digital PWM signal modulated by 1Hz and 3.5Hz sine waves. It proposes using a lookup table to store 256 samples of a sine wave period to represent the analog signal digitally. It then provides an overview of high-level synthesis using Vivado to transform C code into RTL, allowing hardware designers to work at a higher level of abstraction. The tutorial will walk through creating an HLS project, developing the C algorithm, synthesis to RTL, and integrating the developed IP into Vivado projects.

Uploaded by

Salim Hajji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views95 pages

Basic HLS Tutorial-2022.2

Uploaded by

Salim Hajji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

Basic HLS Tutorial

using C++ language and Vivado Design Suite to design two frequencies PWM
modulator system

www.so-logic.net 2023/01/17 1
2 2023/01/17 www.so-logic.net
Contents
1 INTRODUCTION 5
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Purpose of this Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Objectives of this Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 One Possible Solution for the Modulator Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 About HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Design Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Vivado HLS Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 DEVELOPING CUSTOM IP CORE USING HLS 17

2.1 Create a New Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Develop C Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Verify C Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 C Simulation Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Synthesize C Algorithm into an RTL Implementation (High-Level Synthesis) . . . . . . . . . . . 32
2.4.1 C Synthesis Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 C Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.3 Clock, Reset, and RTL Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.4 Applying Optimization Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Verify the RTL Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.5.1 Using C/RTL Co-Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.5.2 Analyzing RTL Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.6 Package the RTL Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.6.1 Packaging IP using Vivado IP (.zip) Format . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3 USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE 75

3.1 Create a New Project with Included Developed IP Core . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Create ARM-based Hardware Platform with Integrated Developed IP Core . . . . . . . . . . . . 77
3.3 Debug the Design with Included Developed IP Core . . . . . . . . . . . . . . . . . . . . . . . . . 91

3
CONTENTS

4 2023/01/17 www.so-logic.net
Chapter 1

INTRODUCTION
1.1 Motivation
"Basic HLS Tutorial" is a document made for beginners who are entering the world of embedded system design
using FPGAs. This tutorial explains, step by step, the procedure of designing a simple digital system using
C/C++/SystemC languages and Xilinx Vivado Design Suite.

1.2 Purpose of this Tutorial

Introduction
This tutorial is made to introduce you how to create, simulate and test an project and run it on your
development board.

The following project is designed for:

Designing Surface: VIVADO 2022.2

Programming Language: C
Device: Sozius Development Board

After completing this tutorial, you will be able to:

Launch and navigate the Vivado High-Level Synthesis (HLS) tool

Create a project using New Project Creation Wizard

Develop a C algorithm for your design

Verify a C algorithm of your design

Synthesize a C algorithm into an RTL implementation (High-Level Synthesis)

Generate reports and analyze the design

Verify the RTL implementation

Package the RTL implementations

1.3 Objectives of this Tutorial

Objectives of this Tutorial

In this tutorial a PWM signal modulated using the sine wave with two dierent frequencies (1 Hz and 3.5
Hz) will be created.

Frequency that will be chosen depends on the position of the two-state on-board switch.

5
CHAPTER 1. INTRODUCTION

PWM Signal

Pulse-width modulation (PWM) uses a rectangular pulse wave whose pulse width is modulated by some other signal (in
our case we will use a sine wave) resulting in the variation of the average value of the waveform. Typically, PWM signals
are used to either convey information over a communications channel or control the amount of power sent to a load. To
learn more about PWM signals, please visit http://en.wikipedia.org/wiki/Pulse-width_modulation.

Figure 1.1: Example of the PWM signal

Figure 1.1. illustrates the principle of pulse-width modulation. In this picture an arbitrary signal is used to
modulate the PWM signal, but in our case sine wave signal will be used.

1.4 One Possible Solution for the Modulator Design

One Possible Solution

Considering that we are working with digital systems and signals, our task will be to generate an digital
representation of an analog (sine) signal with two frequencies: 1 Hz and 3.5 Hz.

Figure 1.2: Sine wave with 256 samples

6 2023/01/17 www.so-logic.net
CHAPTER 1. INTRODUCTION

Figure 1.2 is showing the sine wave that will be used to modulate the PWM signal.

8
One period of the sine wave is represented with 256 (2 ) samples, where each sample can take one of 4096 (2
12 )
possible values. Since the sine wave is a periodic signal, we only need to store samples of one period of the
signal.

Note : Pay attention that all of sine signals with the same amplitude, regardless their frequency, look the same
during the one period of a signal. The only thing that is dierent between those sine signals is duration of a
signal period. This means that the sample rate of those signals is dierent.

Now, it is obvious that the sine wave can be generated by reading sample values of one period, that are stored
in one table, with appropriate speed. In our case the values will be generated using the sine function from the
C numerics library (math.h) and will be stored in an array.

1.5 About HLS

High-Level Synthesis (HLS) Tool

The Xilinx Vivado High-Level Synthesis (HLS) is a tool that transforms a C specication into a regis-
ter transfer level (RTL) implementation that you can synthesize into a Xilinx eld programmable gate array
(FPGA).

You can write C specications in C, C++, SystemC, or as an Open Computing Language (OpenCL) API C
kernel, and the FPGA provides a massively parallel architecture with benets in performance, cost, and power
over traditional processors.

By targeting an FPGA as the execution fabric, HLS enables a software engineer to optimize code for throughout,
power, and latency without the need to address the performance bottleneck of a single memory space and limited
computational resources.

This allows the implementation of computationally intensive software algorithms into actual products, not just
functionality demonstrators.

HLS Benets

High-level synthesis bridges hardware and software domains, providing the following primary benets:

Improved productivity for hardware designers

Hardware designers can work at a higher level of abstraction while creating high-performance hardware.

Improved system performance for software designers

Software developers can accelerate the computationally intensive parts of their algorithms on a new com-
pilation target, the FPGA.

HLS Design Methodology

Using a high-level synthesis design methodology allows you to:

Develop algorithms at the C-level

Work at a level that is abstract from the implementation details, which consume development time.

Verify at the C-level

Validate the functional correctness of the design more quickly than with traditional hardware descrip-
tion languages.

www.so-logic.net 2023/01/17 7
CHAPTER 1. INTRODUCTION

Control the C synthesis process through optimization directives

Create specic high-performance hardware implementations.

Create multiple implementations from the C source code using optimization directives

Explore the design space, which increases the likelihood of nding an optimal implementation.

Create readable and portable C source code

Retarget the C source into dierent devices as well as incorporate the C source into new projects.

HLS Phases

High-level synthesis includes the following phases:

Scheduling

Determines which operations occur during each clock cycle based on:

Length of the clock cycle or clock frequency

Time it takes for the operation to complete, as dened by the target device

User-specied optimization directives

If the clock period is longer or a faster FPGA is targeted, more operations are completed within a single
clock cycle, and all operations might complete in one clock cycle. Conversely, if the clock period is shorter
or a slower FPGA is targeted, high-level synthesis automatically schedules the operations over more clock
cycles, and some operations might need to be implemented as multicycle resources.

Binding

Determines which hardware resource implements each scheduled operation. To implement the optimal
solution, high-level synthesis uses information about the target device.

Control logic extraction

Extracts the control logic to create a nite state machine (FSM) that sequences the operations in the
RTL design.

1.6 Design Steps

Design Steps

8 2023/01/17 www.so-logic.net
CHAPTER 1. INTRODUCTION

Figure 1.3: Design Steps

This tutorial will be realized step by step with the idea to explain the whole procedure of designing an digital
system, using Vivado HLS tool.

1. First, we will develop algorithm at the C-level.

Work at a level that is abstract from the implementation details, which consume development time.

2. Then we will verify the algorithm at the C-level.

Validate the functional correctness of the design more quickly than with traditional hardware description lan-
guages.

3. After that, we will synthesize the C algorithm into an RTL implementation.

Using Vivado HLS tool we will automatically create an RTL implementation of our C algorithm. Vivado HLS
will automatically create data path and control path modules required to implement our algorithm in hardware.

4. Then, we will generate comprehensive reports and analyze the design.

After synthesis, Vivado HLS automatically creates synthesis reports to help you understand the performance of
the implementation.

5. Then, we verify the RTL implementation.

You can use it to verify that the RTL is functionally identical to the original C code.

6. At the end, package the RTL implementation into a selection of IP formats.

Using Vivado HLS, you can export the RTL and package the nal RTL output les as IP.

www.so-logic.net 2023/01/17 9
CHAPTER 1. INTRODUCTION

1.7 Vivado HLS Design Flow

The Xilinx Vivado HLS tool synthesizes a C function into an IP block that you can integrate into a hardware
system. It is tightly integrated with the rest of the Xilinx design tools and provides comprehensive language
support and features for creating the optimal implementation for your C algorithm.

The following Figure shows an overview of the Vivado HLS design ow.

HLS Design Flow

Figure 1.4: Vivado HLS Design Flow

Inputs and Outputs

Vivado HLS Inputs

Following are the inputs to Vivado HLS:

C function written in C, C++, SystemC, or an OpenCL API C kernel

This is the primary input to Vivado HLS. The function can contain a hierarchy of sub-functions.

Constraints

Constraints are required and include the clock period, clock uncertainty, and FPGA target. The clock
uncertainty defaults to 12.5

Directives

Directives are optional and direct the synthesis process to implement a specic behavior or optimiza-
tion.

C test bench and any associated les

Vivado HLS uses the C test bench to simulate the C function prior to synthesis and to verify the RTL
output using C/RTL Cosimulation.

10 2023/01/17 www.so-logic.net
CHAPTER 1. INTRODUCTION

You can add the C input les, directives, and constraints to a Vivado HLS project interactively using the Vivado
HLS graphical user interface (GUI) or using Tcl commands at the command prompt. You can also create a Tcl
le and execute the commands in batch mode.

Vivado HLS Outputs

Following are the outputs from Vivado HLS:

RTL implementation les in hardware description language (HDL) formats

This is the primary output from Vivado HLS. Using Vivado synthesis, you can synthesize the RTL into
a gate-level implementation and an FPGA bitstream le. The RTL is available in the following industry
standard formats:

VHDL (IEEE 1076-2000)

Verilog (IEEE 1364-2001)

Vivado HLS packages the implementation les as an IP block for use with other tools in the Xilinx design
ow. Using logic synthesis, you can synthesize the packaged IP into an FPGA bitstream.

Report les

This output is the result of synthesis, C/RTL co-simulation, and IP packaging.

Test Bench, Language Support, and C Libraries

Vivado HLS Rules

In any C program, the top-level function is called main(). In the Vivado HLS design ow, you can specify any
sub-function below main() as the top-level function for synthesis. You cannot synthesize the top-level function
main(). Following are additional rules:

Only one function is allowed as the top-level function for synthesis.

Any sub-functions in the hierarchy under the top-level function for synthesis are also synthesized.

If you want to synthesize functions that are not in the hierarchy under the top-level function for synthesis,
you must merge the functions into a single top-level function for synthesis.

Test Bench

When using the Vivado HLS design ow, it is time consuming to synthesize a functionally incorrect C function
and then analyze the implementation details to determine why the function does not perform as expected. To
improve productivity, use a test bench to validate that the C function is functionally correct prior to synthesis.

Vivado HLS Test Bench

The C test bench includes the function main() and any sub-functions that are not in the hierarchy under the
top-level function for synthesis.

These functions verify that the top-level function for synthesis is functionally correct by providing stimuli to
the function for synthesis and by consuming its output.

Vivado HLS uses the test bench to compile and execute the C simulation.

www.so-logic.net 2023/01/17 11
CHAPTER 1. INTRODUCTION

During the compilation process, you can select the Launch Debugger option to open a full C-debug environment,
which enables you to analyze the C simulation.

Note : Because Vivado HLS uses the test bench to both verify the C function prior to synthesis and to auto-
matically verify the RTL output, using a test bench is highly recommended.

Vivado HLS Language Support

Vivado HLS supports the following standards for C compilation/simulation:

ANSI-C (GCC 4.6)

C++ (G++ 4.6)

SystemC (IEEE 1666-2006, version 2.2)

C, C++, and SystemC Language Constructs

Vivado HLS supports many C, C++, and SystemC language constructs and all native data types for each
language, including oat and double types. However, synthesis is not supported for some constructs, including:

Dynamic memory allocation

An FPGA has a xed set of resources, and the dynamic creation and freeing of memory resources is
not supported.

Operating system (OS) operations

All data to and from the FPGA must be read from the input ports or written to output ports. OS
operations, such as le read/write or OS queries like time and date, are not supported. Instead, the C
test bench can perform these operations and pass the data into the function for synthesis as function
arguments.

OpenCL API C Language Constructs

Vivado HLS supports the OpenCL API C language constructs and built-in functions from the OpenCL API C
1.0 embedded prole.

C Libraries

C libraries contain functions and constructs that are optimized for implementation in an FPGA. Using these
libraries helps to ensure high quality of results (QoR), that is, the nal output is a high-performance design
that makes optimal use of the resources. Because the libraries are provided in C, C++, OpenCL API C, or
SystemC, you can incorporate the libraries into the C function and simulate them to verify the functional
correctness before synthesis.

Vivado HLS C Libraries

Vivado HLS provides the following C libraries to extend the standard C languages:

Arbitrary precision data types

Half-precision (16-bit) oating-point data types

Math operations

Video functions

12 2023/01/17 www.so-logic.net
CHAPTER 1. INTRODUCTION

Xilinx IP functions, including fast fourier transform (FFT) and nite impulse response (FIR)

FPGA resource functions to help maximize the use of shift register LUT (SRL) resources

C Libraries Example

C libraries ensure a higher QoR than standard C types. Standard C types are based on 8-bit boundaries (8-bit,
16-bit, 32-bit, 64-bit). However, when targeting a hardware platform, it is often more ecient to use data types
of a specic width.

For example, a design with a lter function for a communications protocol requires 10-bit input data and 18-bit
output data to satisfy the data transmission requirements. Using standard C data types, the input data must
be at least 16-bits and the output data must be at least 32-bits. In the nal hardware, this creates a datapath
between the input and output that is wider than necessary, uses more resources, has longer delays (for example,
a 32-bit by 32-bit multiplication takes longer than an 18-bit by 18-bit multiplication), and requires more clock
cycles to complete.

Using an arbitrary precision data type in this design instead, you can specify the exact bit-sizes to be specied
in the C code prior to synthesis, simulate the updated C code, and verify the quality of the output using C
simulation prior to synthesis. Arbitrary precision data types are provided for C and C++ and allow you to
model data types of any width from 1 to 1024-bit. For example, you can model some C++ types up to 32768
bits.

Note : Arbitrary precision types are only required on the function boundaries, because Vivado HLS optimizes
the internal logic and removes data bits and logic that do not fanout to the output ports.

Synthesis, Optimization, and Analysis

Vivado HLS is project based. Each project holds one set of C code and can contain multiple solutions. Each
solution can have dierent constraints and optimization directives. You can analyze and compare the results
from each solution in the Vivado HLS GUI.

Vivado HLS Synthesis, Optimization, and Analysis Steps

Following are the synthesis, optimization, and analysis steps in the Vivado HLS design process:

1. Create a project with an initial solution.

2. Verify the C simulation executes without error.

3. Run synthesis to obtain a set of results.

4. Analyze the results.

After analyzing the results, you can create a new solution for the project with dierent constraints and op-
timization directives and synthesize the new solution. You can repeat this process until the design has the
desired performance characteristics. Using multiple solutions allows you to proceed with development while still
retaining the previous results.

Vivado HLS Optimization

Using Vivado HLS, you can apply dierent optimization directives to the design, including:

Instruct a task to execute in a pipeline, allowing the next execution of the task to begin before the current
execution is complete.

www.so-logic.net 2023/01/17 13
CHAPTER 1. INTRODUCTION

Specify a latency for the completion of functions, loops, and regions.

Specify a limit on the number of resources used.

Override the inherent or implied dependencies in the code and permit specied operations. For example,
if it is acceptable to discard or ignore the initial data values, such as in a video stream, allow a memory
read before write if it results in better performance.

Select the I/O protocol to ensure the nal design can be connected to other hardware blocks with the
same I/O protocol.

Note : Vivado HLS automatically determines the I/O protocol used by any sub-functions. You cannot control
these ports except to specify whether the port is registered.

You can use the Vivado HLS GUI to place optimization directives directly into the source code. Alternatively,
you can use Tcl commands to apply optimization directives.

Vivado HLS Analysis

When synthesis completes, Vivado HLS automatically creates synthesis reports to help you understand the per-
formance of the implementation. In the Vivado HLS GUI, the Analysis Perspective includes the Performance
tab, which allows you to interactively analyze the results in detail.

Figure 1.5: Example of performance tab

The Performance tab shows the following for each state:

C0: The rst state includes read operations on ports a, b, and c and the addition operation.

C1 and C2: The design enters a loop and checks the loop increment counter and exit condition. The
design then reads data into variable x, which requires two clock cycles. Two clock cycles are required,
because the design is accessing a block RAM, requiring an address in one cycle and a data read in the
next.

C3: The design performs the calculations and writes output to port y. Then, the loop returns to the start.

RTL Verication

Vivado HLS RTL Verication

If you added a C test bench to the project, you can use it to verify that the RTL is functionally identical to the
original C.

The C test bench veries the output from the top-level function for synthesis and returns zero to the top-level
function main() if the RTL is functionally identical.

14 2023/01/17 www.so-logic.net
CHAPTER 1. INTRODUCTION

Vivado HLS uses this return value for both C simulation and C/RTL co-simulation to determine if the results
are correct.

If the C test bench returns a non-zero value, Vivado HLS reports that the simulation failed.

Important: Even if the output data is correct and valid, Vivado HLS reports a simulation failure if the test
bench does not return the value zero to function main().

Vivado HLS automatically creates the infrastructure to perform the C/RTL co-simulation and automatically
executes the simulation using one of the following supported RTL simulators:

Vivado Simulator (XSim)

ModelSim simulator

VCS

NCSim

Riviera

If you select Verilog or VHDL HDL for simulation, Vivado HLS uses the HDL simulator you specify. The Xilinx
design tools include Vivado Simulator. Third-party HDL simulators require a license from the third-party
vendor. The VCS, NCSim, and Riviera HDL simulators are only supported on the Linux operating system.

RTL Export

Vivado HLS RTL Export

Using Vivado HLS, you can export the RTL and package the nal RTL output les as IP in any of the following
Xilinx IP formats:

Vivado IP Catalog
Import into the Vivado IP catalog for use in the Vivado Design Suite.

System Generator for DSP

Import the HLS design into System Generator.

Synthesized Checkpoint (.dcp)

Import directly into the Vivado Design Suite the same way you import any Vivado Design Suite checkpoint.

Note : The synthesized checkpoint format invokes logic synthesis and compiles the RTL implementation into a
gatelevel implementation, which is included in the IP package.

For all IP formats except the synthesized checkpoint, you can optionally execute logic synthesis from within
Vivado HLS to evaluate the results of RTL synthesis. This optional step allows you to conrm the estimates
provided by Vivado HLS for timing and area before handing o the IP package. These gate-level results are not
included in the packaged IP.

Note : Vivado HLS estimates the timing and area resources based on built-in libraries for each FPGA. When
you use logic synthesis to compile the RTL into a gate-level implementation, perform physical placement of the
gates in the FPGA, and perform routing of the inter-connections between gates, logic synthesis might make
additional optimizations that change the Vivado HLS estimates.

www.so-logic.net 2023/01/17 15
CHAPTER 1. INTRODUCTION

16 2023/01/17 www.so-logic.net
Chapter 2

DEVELOPING CUSTOM IP CORE

USING HLS
In the previous chapter, we have dened the structure of the microprocessor based system that will be used as
a part of the solution of PWM signal generation. In this chapter, we will explain how to generate this system
using Vivado HLS tool.

2.1 Create a New Project

The rst step in creating a new HLS design will be to create a new project. We will crate a new project using
the Vivado HLS New Project wizard. The New Project wizard will create an APP project le for us. It will
be place where Vivado HLS will organize our design les and save the design status whenever the processes are
run.

To create a new project:

- Launch the Vivado HLS software:

Select Start -> All Programs -> Xilinx Design Tools -> Vitis 2022.2 -> Vitis HLS 2022.2 and the
Vivado HLS Welcome Page page will appear, see Figure 2.1.

17
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Create New Project

Figure 2.1: The Vivado HLS Welcome Page

As can be seen from the Figure above, the Vitis HLS Welcome page contains a lot of usable Quick Start
options:

Create Project - Launch the project setup wizard.

Open Project - Navigate to an existing project or select from a list of recent projects.

Clone Examples - Clone Example projects from GitHub repository to create a local copy for your use.

Tutorials - Opens the "Vivado Design Suite Tutorial: High-Level Synthesis" (UG871).

User Guide - Opens this document, the "Vivado Design Suite User Guide: High-Level Synthesis"
(UG902).

Release Notes Guide - Opens the "Vivado Design Suite User Guide: Release Notes, Installation, and
Licensing" (UG973) for the latest software version.

If any projects were previously opened, they will be shown in the Recent Projects pane, otherwise this window
is not shown in the Welcome screen.

- In the Vitis HLS Welcome Page page, choose Create Project option to open the Project wizard.

- In the Project Conguration dialog box specify the name and the location of the new project:

In the Project name eld type modulator as the name of the new project

In the Location eld click Browse button to specify the location where project data will be stored.

18 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.2: Create a New Vivado Project dialog box

Note : This step is not required when the project is specied as SystemC, because Vivado HLS
automatically identies the top-level functions.

- Click Next.

Add/Remove Files - C Based Source Files

- In the Add/Remove Files dialog box, specify the C-based design les:

Specify modulator as the top-level function in the Top Function eld.

Figure 2.3: Add/Remove Files dialog box

Click New File... button and in the Save As dialog box specify modulator.cpp as a new le name in
the File name led and click Save.

www.so-logic.net 2023/01/17 19
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.4: Save As dialog box

After adding new modulator.cpp C++ le, it should appear as a part of the Design Files section.

Figure 2.5: Add/Remove Files dialog box with added le

Click Next.

Note : You can use the Add Files button to add the existing source code les to the project.

Important: Do not add header les (with the .h sux) to the project using the Add Files button (or with
the associated add_les Tcl command).

In this example there is only one C++ design le ( modulator.cpp). When there are multiple C les to be
synthesized, you must add all of them to the project at this stage. Any header les that exist in the local directory
are automatically included in the project. If the header resides in a dierent location, use the Edit CFLAGS...
button to add the standard gcc/g++ search path information (for example, -I<path_to_header_le_dir>).

20 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Add/Remove Files - C Based TestBench Files

- In the second Add/Remove Files dialog box, specify the C-based testbench les:

Figure 2.6: Add/Remove Files dialog box

Click New File... button and in the Save As dialog box specify modulator_tb.cpp as a new testbench
le name in the File name led and click Save.

Figure 2.7: Save As dialog box with testbench le

After adding the new modulator_tb.cpp testbench le, it should appear as a part of the TestBench
Files section.

www.so-logic.net 2023/01/17 21
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.8: Add/Remove TestBench Files dialog box with added testbench le

Click Next.

Note : The testbench and all les used by the test bench (except header les) must be included. You can add
les one at a time, or select multiple les to add using the Ctrl and Shift keys.

Note : For SystemC designs with header les associated with the test bench but not the design le, you must
use the Add Files button to add the header les to the project.

In most of the example designs provided with Vivado HLS, the test bench is in a separate le from the design.
Having the test bench and the function to be synthesized in separate les keeps a clean separation between the
process of simulation and synthesis. If the test bench is in the same le as the function to be synthesized, the
le should be added as a source le and a test bench le.

As with the C source les, click the Add Files button to add the C test bench and the Edit CFLAGS button
to include any C compiler options.

If the test bench les exist in a directory, the entire directory might be added to the project, rather than the
individual les, using the Add Folders button.

Both C simulation (and RTL cosimulation) execute in subdirectories of the solution.

If you do not include all the les used by the test bench (for example, data les read by the test bench), C and
RTL simulation might fail due to an inability to nd the data les.

The Solution Conguration window (shown on the Figure 2.9) species the technical specications of the
rst solution.

A project can have multiple solutions, each using a dierent target technology, package, constraints, and/or
synthesis directives.

22 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Solution Conguration

- In the Solution Conguration dialog box accept the default solution name ( solution1), clock period
10 ns),
( and blank clock uncertainty (defaults to 12.5% of the clock period, when it is left blank then it is
undened).

Figure 2.9: Solution Conguration dialog box

The the Solution Conguration dialog box allows you to specify the details of the rst solution:
Solution Name: Vivado HLS provides the initial default name solution1, but you can specify any name
for the solution.

Clock Period: The clock period specied in units of ns or a frequency value specied with the MHz
sux (for example, 100 MHz).

Uncertainty: The clock period used for synthesis is the clock period minus the clock uncertainty. Vivado
HLS uses internal models to estimate the delay of the operations for each FPGA. The clock uncertainty
value provides a controllable margin to account for any increases in net delays due to RTL logic synthesis,
place, and route. If not specied in nanoseconds (ns) or a percentage, the clock uncertainty defaults to
12.5% of the clock period.

Part: Click to select the appropriate technology, as shown in the following gure.

- In the Solution Conguration dialog box click the part selection button to open the part selection window.

You can use the lter to reduce the number of device in the device list. If the target is a board, specify
boards in the top-left corner and the device list is replaced by a list of the supported boards (and Vivado HLS
automatically selects the correct target device).

- In the Device Selection Dialog dialog box choose a default Xilinx part or board for your project. The
main component of the Sozius development board is Zynq-7000 AP SoC, so in the Default Part dialog
box select Parts option and set the lter parameters on the same way as it is shown on the gure below.

www.so-logic.net 2023/01/17 23
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Device Selection

Figure 2.10: Device Selection Dialog dialog box

- Select xc7z020clg400-1 part and click OK.

In the Solution Conguration dialog box, the selected part name now appears under the Part Selection
heading.

Figure 2.11: Solution Conguration dialog box with selected board

- In the Solution Conguration dialog box, click Finish to open the created Vitis HLS project.

24 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Vivado HLS Project

Figure 2.12: Vitis HLS Project

After we nished with the new project creation, in a few seconds Vivado HLS project will appear, see Figure
2.12.

When Vivado HLS creates a new project, it also creates a directory with the name and at the location that we
specied in the GUI (see Figure 2.2). That means that the all project data will be stored in the project_name
modulator) directory.
(

In the Vivado HLS project you can notice the following:

The project name appears on the top line of the Explorer window

A Vivado HLS project arranges information in a hierarchical form

The project holds information on the design source, test bench, and solutions

The solution holds information on the target technology, design directives, and results

There can be multiple solutions within a project, and each solution is an implementation of the same
source code.

Note : At any time, you can change project or solution settings using the corresponding Project Settings and/or
Solution Settings buttons in the toolbar.

The Vivado HLS GUI consists of four panes:

Explorer Pane

Shows the project hierarchy. As you proceed through the validation, synthesis, verication, and IP
packaging steps, sub-folders with the results of each step are created automatically inside the solution
directory (named csim, syn, sim, and impl respectively).

When you create new solutions, they appear inside the project hierarchy alongside solution1.

www.so-logic.net 2023/01/17 25
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Flow navigator Pane

The Flow Navigator Pane provides access to commands and processes as described in Using the Flow
Navigator to take your source code through simulation, synthesis, and exported output.

Information Pane

Shows the contents of any les opened from the Explorer pane. When operations complete, the report le
opens automatically in this pane.

Console Pane

Shows the messages produced when Vivado HLS runs. Errors and warnings appear in Console pane
tabs.

Figure 2.13: Vivado HLS GUI

In the Vivado HLS GUI you can also nd:

Toolbar Buttons

You can perform the most common operations using the Toolbar buttons.

When you hold the cursor over the button, a popup tool tip opens, explaining the function. Each button
also has an associated menu item available from the pull-down menus.

Perspectives

The perspectives provide convenient ways to adjust the windows within the Vivado HLS GUI.

Synthesis Perspective

The default perspective allows you to synthesize designs, run simulations, and package the IP.

Debug Perspective

26 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Includes panes associated with debugging the C code. You can open the Debug Perspective after the
C code compiles (unless you use the Optimizing Compile mode as this disables debug information).

Analysis Perspective

Windows in this perspective are congured to support analysis of synthesis results. You can use
the Analysis Perspective only after synthesis completes.

2.2 Develop C Algorithm

Develop C Algorithm

The rst step within an HLS project is to develop a C algorithm for your design.

In this tutorial the actual algorithm will be written in C++ programming language.

As it is already explained, with the modulator project creation we have already created two empty C++ les,
modulator.cpp and modulator_tb.cpp.

Now it is time to write their content, as well as the content of the modulator.h header le that will be stored
in the same directory where these two les are saved.

The content of these three les can be found in the text below.

modulator.cpp

#include "ap_int.h"
#include "math.h"
#include "modulator.h"

// function that calculates sine wave samples value

void init_sine_table(ap_uint<width> *sine)
{
float temp;

init_sine: for (int i = 0; i < sine_samples; i ++)

// sin (2*pi*i / N) * (2^(width-1) - 1) + 2^(width-1) - 1, N = 2^depth
sine[i] = (ap_uint<width>)(sin(2*3.14*i/sine_samples)*(sine_ampl/2.0-1.0)+sine_ampl/2.0-1.0);

// pwm generator
void modulator(
ap_uint<1> sel, // signal used for selecting frequency
ap_uint<1> *pwm_o) // pointer to pwm output
{
static ap_uint<depth> counter = 0; // counter for sine wave sample counting
static ap_uint<width> sine[sine_samples]; // samples of the sine wave signal

// sine table initialization

init_sine_table(sine);

// hold pwm_o high for specified number of clock cycles

onloop: for (ap_uint<20> j = 0; j < (ap_uint<20>)(period[sel]*sine[counter]); j++)
{
*pwm_o = 1;
}

// hold pwm_o low for specified number of clock cycles

offloop: for (ap_uint<20> j = 0; j < (ap_uint<20>)(period[sel]*(sine_ampl - sine[counter])); j++)
{
*pwm_o = 0;
}

counter++;
}

www.so-logic.net 2023/01/17 27
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

modulator_tb.cpp

#include <iostream>
#include "ap_int.h"
#include "modulator.h"

using namespace std;

ap_uint<1> pwm_o; // pulse width modulated signal

int main(int argc, char **argv)

{
for (int i = 0; i < 256; i ++)
modulator(0, &pwm_o);

for (int i = 0; i < 256; i ++)

modulator(1, &pwm_o);

return 0;
}

modulator.h

#ifndef __PWM_H__
#define __PWM_H__

#include "ap_int.h"
#include <cmath>
using namespace std;

#define depth 8 // the number of bits used to represent sample count of sine wave
#define width 12 // the number of bits used to represent amplitude value

#define sine_samples 256 // maximum number of samples in one period of the signal
#define sine_ampl 4096 // maximum amplitude value of the sine wave

#define refclk_frequency 100000000 // reference clock frequency (100 MHz)

#define freq_low 1 // first frequency for the PWM signal, specified in Hz

#define freq_high 3.5 // second frequency for the PWM signal, specified in Hz

// minimum duration of high value of pwm signal for two different frequencies
const float period[2] = {(float)(refclk_frequency/(sine_ampl*sine_samples*freq_low)),
(float)(refclk_frequency/(sine_ampl*sine_samples*freq_high))};

// Prototype of top level function for C-synthesis

void modulator(
ap_uint<1> sel, // signal used for selecting frequency
ap_uint<1> *pwm_o); // pointer to pwm output

#endif

Add the Content of the Source Files

To add the content of the modulator.cpp and modulator_tb.cpp les, do the following steps:

- In the Vivado HLS Explorer pane expand Source folder and double-click on the modulator.cpp C++ le
to open it.

Figure 2.14: Source folder with modulator.cpp le

28 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

- In the opened modulator.cpp le copy the content of the le and click Save button.

- Repeat the same procedure for the modulator_tb.cpp testbench le. Therefore, in the Vivado HLS Ex-
plorer pane expand Test Bench folder and double-click on the modulator_tb.cpp le to open it.

Figure 2.15: Test Bench folder with modulator_tb.cpp le

- In the opened modulator_tb.cpp le copy the content of the le and click Save button.

- For the modulator.h header le creation it is necessary to write it in an text editor and save it in the same
folder where the rest of the les are stored.

By doing so, modulator.h header le will be automatically included in the project and you should nd it in
the Includes folder of the Explorer pane.

The content of the modulator.h header le you can also nd in the text above.

2.3 Verify C Algorithm

Verify C Algorithm

The second step within an HLS project is to conrm that the C code is correct.

This process is called C Validation or C Simulation.

Verication in the Vivado HLS ow can be separated into two distinct processes:

1. Pre-synthesis validation that validates the C program correctly implements the required functionality.

2. Post-synthesis verication that veries the RTL is correct.

Both processes are referred to as simulation: C simulation and C/RTL co-simulation.

Before synthesis, the function to be synthesized should be validated with a test bench using C simulation. A
C test bench includes a top-level function main() and the function to be synthesized. It might include other
functions. An ideal test bench has the following attributes:

The test bench is self-checking and veries the results from the function to be synthesized are correct.

If the results are correct the test bench returns a value of 0 to main(). Otherwise, the test bench should
return any non-zero values.

Vivado HLS synthesizes an OpenCL API C kernel. To simulate an OpenCL API C kernel, you must use a
standard C test bench. You cannot use the OpenCL API C host code as the C test bench.

www.so-logic.net 2023/01/17 29
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Run C Simulation

- Click the Run C Simulation toolbar button to open the C Simulation dialog box.

Figure 2.16: Run C Simulation button

Figure 2.17: C Simulation dialog box

The another way to open the C Simulation dialog box is to choose Project -> Run C Simulation option
from the main HLS toolbar menu.

In the C Simulation dialog box you can nd the following options:

Launch Debugger - This option compiles the C code and automatically opens the debug perspective.
From within the debug perspective the Synthesis perspective button (top left) can be used to return to
the synthesis perspective.

Build Only - This option compiles the C code, but does not run the simulation. Details on executing
the C simulation are covered in "Reviewing the Output of C Simulation" document.

Clean Build - This option remove any existing executable and object les from the project before
compiling the code.

Optimizing Compile - By default the design is compiled with debug information, allowing the compila-
tion to be analyzed in the debug perspective. This option uses a higher level of optimization eort when
compiling the design but removes all information required by the debugger. This increases the compile
time but should reduce the simulation run time.

- In the C Simulation dialog box, just click OK.

If no option is selected in the C Simulation dialog box, the C code is compiled and the C simulation is
automatically executed. The results are shown on the Figure 2.18. When the C code is simulated successfully,
the Console window displays a message.

30 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.18: Vivado HLS after C simulation

The design is now ready for synthesis.

Note : If the C simulation ever fails, select the Launch Debugger option in the C Simulation dialog box,
compile the design, and automatically switch to the Debug perspective. There you can use a C debugger to x
any problems.

2.3.1 C Simulation Output Files

C Simulation Output Files

When C simulation completes, a folder csim is created inside the solution1 folder.

Figure 2.19: Explorer window with C Simulation Output Files

The folder csim/build is the primary location for all les related to the C simulation:
Any les read by the test bench are copied to this folder

The C executable le csim.exe is created and run in this folder

www.so-logic.net 2023/01/17 31
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Any les written by the test bench are created in this folder.

If the Build Only option is selected in the C Simulation dialog box, the le csim.exe is created in this folder,
but the le is not executed. The C simulation is run manually by executing this le from a command shell. On
Windows the Vivado HLS command shell is available through the start menu.

The folder csim/report contains a log le of the C simulation.

The next step in the Vivado HLS design ow is to execute synthesis.

2.4 Synthesize C Algorithm into an RTL Implementation (High-Level

Synthesis)
Synthesize C Algorithm into an RTL Implementation (HLS)

In this step, you synthesize the C design into an RTL design and review the synthesis report.

- Click the Run C Synthesis toolbar button or use the Solution -> Run C Synthesis -> Active Solution
option form the main Vivado HLS menu to synthesize the design to an RTL implementation.

Figure 2.20: Run C Synthesis button

- In the C Synthesis - Active Solution dialog box, click OK.

Figure 2.21: C Synthesis - Active Solution dialog box

During the synthesis process messages are echoed to the console window. The message include information
messages showing how the synthesis process is proceeding. The messages also provide details on the synthesis
process.

When synthesis completes, the synthesis report for the top-level function opens automatically in the Information
pane as shown in the following gure.

32 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

C Synthesis Report

Figure 2.22: Information pane with synthesis report

The synthesis report provides details on both the performance and area of the RTL design. In this sub-chapter
will be explained only certain report categories which are important for the current stage of design development.

The detail explanation of all synthesis report categories is presented in the Table 2.1 of sub-chapter 2.4.2 C
Synthesis Results.

You can quickly review the performance metrics displayed in the Simplied Synthesis report to determine if
the design meets your requirements. The synthesis report contains information on the following performance
metrics:

Issue Type - Shows any issues with the results.

Latency - Number of clock cycles required for the function to compute all output values.

Initiation interval (II) - Number of clock cycles before the function can accept new input data.

Loop iteration latency - Number of clock cycles it takes to complete one iteration of the loop.

Loop iteration interv - Number of clock cycles before the next iteration of the loop starts to process
data.

Loop latency - Number of cycles to execute all iterations of the loop.

Resource Utilization - Amount of hardware resources required to implement the design based on the
resources available in the FPGA, including look-up tables (LUT), registers, block RAMs, and DSP48s.

If you specied the Run C Synthesis command on multiple solutions, the Console view reports the synthesis
transcript for each of the solutions as they are synthesized. After synthesis has completed, instead of the
Simplied Synthesis report, Vitis HLS displays a Report Comparison to compare the synthesis results for all of
the synthesized solutions.

www.so-logic.net 2023/01/17 33
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

On the Figure 2.21 is shown information pane with sythesis report ( Synthesis Summary (solution1)).

To open the synthesis details of solution1, in the Explorer window, expand the solution1, then expand syn
and under the report double-click on the modulator_csynth.rpt report le.

Figure 2.23: Explorer window with selected modulator synthesis report le

In the Performance Estimates pane, expand Timing/Summary and you can see that the clock period is
set to 10 ns, see Figure 2.23. Vivado HLS targets a clock period of Clock Target minus Clock Uncertainty
(10.00 - 2.7 = 7.3 ns in this example).

Figure 2.24: Performance Estimates report - Timing Summary

The clock uncertainty ensures there is some timing margin available for the (at this stage) unknown net delays
due to place and routing.

The estimated clock period (worst-case delay) is 6.932 ns, which meets the 7.3 ns timing requirement.

In the Performance Estimates pane, expand Latency/Summary and you can see:

The design has a latency of ? clock cycles: it takes ? clocks to output the results.

The interval is ? clock cycles: the next set of inputs is read after ? clocks. This is one cycle after the nal
output is written. This indicates the design is not pipelined. The next execution of this function (or next
transaction) can only start when the current transaction completes.

Note : In our design Vitis HLS can't calculate latency values.

In the Performance Estimates pane, expand Latency/Detail and you can see:
There are no sub-blocks in this design. Expanding the Instance section shows no sub-modules in the
hierarchy.

Expanding the Loop section you can see that all the latency delay is due to the RTL logic synthesized
from the loops named onloop and ooop. This logic executes ? times (Trip Count). Each execution
requires 1 clock cycle (Iteration Latency), for a total of ? clock cycles, to execute all iterations of the logic

34 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

synthesized from this loop (Latency).

As we already said, in our design Vitis HLS can't calculate latency values.

Figure 2.25: Performance Estimates report - Loop Latency Detail

In the Utilization Estimates pane, under the Summary section, you can see:

The design uses 1 BRAM_18K memory, 3 DSP, 629 ip-ops and 1459 LUTs. At this stage, the device
resource numbers are estimates.

The resource utilization numbers are estimates because RTL synthesis might be able to perform additional
optimizations, and these gures might change after RTL synthesis.

Figure 2.26: Utilization Estimates report - Summary

In the Utilization Estimates pane, expand Detail/Instance section and you will see:

Figure 2.27: Utilization Estimates report - Detail Instance

The resources specied here are used by the sub-blocks instantiated at this level of the hierarchy. Although
our design does not have any hierarchy, Vitis HLS introduced it when performing multiplication of oating
point value and unsigned integer value (see lines 28 and 34 in modulator.cpp source code). There are
six instances created by Vivado HLS:

fmul_32ns_32ns_32_4_max_dsp_1_U5 and fmul_32ns_32ns_32_4_max_dsp_1_U6

- used for single precision oating point multiplication

www.so-logic.net 2023/01/17 35
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

grp_modulator_Pipeline_ooop_fu_94 - block that implements "ooop" loop

grp_modulator_Pipeline_onloop_fu_87 - block that implements "onloop" loop
sitofp_64ns_32_6_no_dsp_1_U8 - used for converting signed-integer value to single-precision
oating point value and

uitofp_32ns_32_6_no_dsp_1_U7 - used for converting unsigned-integer to single-precision

oating point value.

For each instance Vitis HLS reports how many resources are necessary to implement it (number of BRAMs,
DSPs, FFs, LUTs).

- In the Interface pane, expand Summary section.

Figure 2.28: Interface report - Summary

The Interface report shows the ports and I/O protocols created by interface synthesis:
The design has a clock and reset port (ap_clk and ap_rst ). These are associated with the Source Object
modulator the design itself.

There are additional ports associated with the design as indicated by Source Object modulator. Synthesis
has automatically added some block level control ports: ap_start, ap_done, ap_idle, and ap_ready.

The Interface Synthesis tutorial provides more information about these ports.

Scalar input argument sel is implemented as a data port with no I/O protocol (ap_none ).

Finally, the function outputs pwm_o and pwm_o_ap_vld are 1-bit data ports with an associated output
valid signal indicator pwm_o.

2.4.1 C Synthesis Output Files

C Synthesis Output Files

When synthesis completes, the folder syn is now available in the solution1 folder.

Figure 2.29: Explorer window with C Synthesis Output Files

36 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

This folder contains the following elements:

The verilog and vhdl folders contain the output RTL les.
The top-level le has the same name as the top-level function for synthesis

There is one RTL le created for each sub-function that has not been inlined into a higher level
function

There could be additional RTL les to implement sub-blocks of the RTL hierarchy, such as block
RAM, and pipelined multipliers

The report folder contains a report le for the top-level function and one for every sub-function that has
not been inlined into a higher level function by Vitis HLS. The report for the top-level function provides
details on the entire design.

Important : Xilinx does not recommend using the RTL les generated in the syn/verilog or syn/vhdl folder for
synthesis in the Vivado tool. You should instead use the packaged output les for use with the Vitis application
acceleration development ow, or the Vivado Design Suite.

In cases where Vitis HLS uses Xilinx IP in the generated RTL code, such as with oating point designs, the
verilog and vhdl folders contain a script to create that IP during RTL synthesis by the Xilinx tools. If you
use the les in the syn/verilog or syn/vhdl folder directly for RTL synthesis, you must also correctly use any
script les present in those folders. If the packaged output is used, this process is performed automatically by
the Xilinx tools.

2.4.2 C Synthesis Results

C Synthesis Results

The two primary features provided to analyze the RTL design are:

1. Synthesis reports

2. Analysis Perspective

In addition, if you are more comfortable working in an RTL environment, Vivado HLS creates two projects
during the IP packaging process:

Vivado Design Suite project

Vivado IP Integrator project

Synthesis Reports

When synthesis completes, the synthesis report for the top-level function opens automatically in the information
pane (Figure 2.21). The report provides details on both the performance and area of the RTL design. The
Outline tab on the right-hand side can be used to navigate through the report.

The following table explains the categories in the synthesis report.

Table 2.1: Synthesis Report Category

Category Desription
General Information Details on when the results were generated, the version of the software
used, the project name, the solution name, and the technology details.

www.so-logic.net 2023/01/17 37
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Performance Estimates -> Timing The target clock frequency, clock uncertainty, and the estimate of the
fastest achievable clock frequency.
Performance Estimates -> Latency -> Summary Reports the latency and initiation interval for this block and any sub-
blocks instantiated in this block. Each sub-function called at this level
in the C source is an instance in this RTL block, unless it was inlined.
The latency is the number of cycles it takes to produce the output. The
initiation interval is the number of clock cycles before new inputs can be
applied. In the absence of any PIPELINE directives, the latency is one
cycle less than the initiation interval (the next input is read when the
nal output is written).
Performance Estimates -> Latency -> Detail The latency and initiation interval for the instances (sub-functions) and
loops in this block. If any loops contain sub-loops, the loop hierarchy is
shown. The min and max latency values indicate the latency to execute
all iterations of the loop. The presence of conditional branches in the
code might make the min and max dierent. The Iteration Latency is the
latency for a single iteration of the loop. If the loop has a variable latency,
the latency values cannot be determined and are shown as a question mark
(?). See the text after this table. Any specied target initiation interval is
shown beside the actual initiation interval achieved. The tripcount shows
the total number of loop iterations.
Utilization Estimates -> Summary This part of the report shows the resources (LUTS, Flip-Flops, DSP48s)
used to implement the design.
Utilization Estimates -> Details -> Instance The resources specied here are used by the sub-blocks instantiated at
this level of the hierarchy. If the design only has no RTL hierarchy, there
are no instances reported. If any instances are present, clicking on the
name of the instance opens the synthesis report for that instance.
Utilization Estimates -> Details -> Memory The resources listed here are those used in the implementation of memo-
ries at this level of the hierarchy. Vivado HLS reports a single-port BRAM
as using one bank of memory and reports a dual-port BRAM as using two
banks of memory.
Utilization Estimates -> Details -> FIFO The resources listed here are those used in the implementation of any
FIFOs implemented at this level of the hierarchy.
Utilization Estimates -> Details -> Shift Register A summary of all shift registers mapped into Xilinx SRL components.
Additional mapping into SRL components can occur during RTL synthe-
sis.
Utilization Estimates -> Details -> Expressions This category shows the resources used by any expressions such as mul-
tipliers, adders, and comparators at the current level of hierarchy. The
bit-widths of the input ports to the expressions are shown.
Utilization Estimates -> Details -> Multiplexors This section of the report shows the resources used to implement multi-
plexors at this level of hierarchy. The input widths of the multiplexors
are shown.
Utilization Estimates -> Details -> Register A list of all registers at this level of hierarchy is shown here. The report
includes the register bit-widths.
Interface Summary -> Interface This section shows how the function arguments have been synthesized
into RTL ports. The RTL port names are grouped with their protocol
and source object:these are the RTL ports created when that source object
is synthesized with the stated I/O protocol.

38 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Certain Xilinx devices use stacked silicon interconnect (SSI) technology. In these devices, the total available
resources are divided over multiple super logic regions (SLRs). When you select an SSI technology device as
the target technology, the utilization report includes details on both the SLR usage and the total device usage.

Important : When using SSI technology devices, it is important to ensure that the logic created by Vivado HLS
ts within a single SLR. For information on using SSI technology devices.

A common issue for new users of Vivado HLS is seeing a synthesis report similar to the following gure. The
latency values are all shown as a ? (question mark).

Vivado HLS performs analysis to determine the number of iteration of each loop. If the loop iteration limit is
a variable, Vivado HLS cannot determine the maximum upper limit.

If the latency or throughput of the design is dependent on a loop with a variable index, Vivado HLS reports
the latency of the loop as being unknown (represented in the reports by a question mark ?).

The TRIPCOUNT directive can be applied to the loop to manually specify the number of loop iterations
and ensure the report contains useful numbers. The -max option tells Vivado HLS the maximum number of
iterations that the loop iterates over, the -min option species the minimum number of iterations performed
and the -avg option species an average tripcount.

Note : The TRIPCOUNT directive does not impact the results of synthesis.

The tripcount values are used only for reporting, to ensure the reports generated by Vivado HLS show meaningful
ranges for latency and interval. This also allows a meaningful comparison between dierent solutions.

If the C assert macro is used in the code, Vivado HLS can use it to both determine the loop limits automatically
and create hardware that is exactly sized to these limits.

Analysis Perspective

In addition to the synthesis report, you can use the Analysis Perspective to analyze the results.

The Analysis Perspective provides both a tabular and graphical view of the design performance and resources
and supports cross-referencing between both views.

The Module Hierarchy pane provides an overview of the entire RTL design.

This view can navigate throughout the design hierarchy.

The Module Hierarchy pane shows the resources and latency contribution for each block in the RTL
hierarchy.

The Performance Prole pane provides details on the performance of the block currently selected in the
Module Hierarchy pane, in this case, the modulator block highlighted in the Module Hierarchy pane.

The performance of the block is a function of the sub-blocks it contains and any logic within this level
of hierarchy. The Performance Prole pane shows items at this level of hierarchy that contribute to the
overall performance.

Performance is measured in terms of latency and the initiation interval. This pane also includes details
on whether the block was pipelined or not.

In this example, you can see that two loops (onloop and ooop) are implemented as logic at this level of
hierarchy.

The Schedule Viewer is displayed by default in the Analysis perspective. The Schedule Viewer pane gives
you a more detailed view of the synthesized RTL. You can identify any loop dependencies that are preventing
parallelism, timing violations, and data dependencies.

www.so-logic.net 2023/01/17 39
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

To open the Schedule Viewer pane, right-click on the module in the Module Hierarchy pane, in our case
modulator module, and select Open Schedule Viewer option, see Figure 2.31.

Figure 2.30: Open Schedule Viewer option

Navigate through the module hierarchy window to view the scheduling of each individual block. Along with
adding the existing performance prole feature, it provides the II and timing violations in the performance
pane itself, so you can identify any modules with violations.

The vertical axis shows the operations in a solid gray bar and the horizontal axis shows the cycles in
consecutive order.

The vertical dashed line is the clock uncertainty, which the tool leaves for Vivado back-end processes.

The left column of the viewer shows the top-level functions, which are scheduled as shown on the right.

The new viewer shows operations which spawn multiple cycles or are shorter than one full cycle. The
displayed operation length reects the estimated timing of an operation which is utilized by the scheduler.

This viewer also displays more information on operator dependencies. When you select an operation, it
selects an operator, and then you can see the specic operator dependency within and across cycles. This
gives you a detailed analysis of data and loop-carried dependencies.

The viewer also displays timing related information. If the design has a critical or failing path, the module
hierarchy shows a negative slack window and you can navigate to the aected module. This information
can help you identify any bottlenecks in the design. By optimizing the algorithm, you can quickly converge
on the optimal hardware implementation.

The Schedule Viewer pane shows how the operations in this particular block are scheduled into clock cycles.

The left-hand column lists the resources.

Sub-blocks are green.

Operations resulting from loops in the source are coloured yellow.

Standard operations are purple.

The modulator has two main parts:

A loop called onloop, and
A loop called ooop.

The top row lists the control states in the design. Control states are the internal states used by Vivado
HLS to schedule operations into clock cycles. There is a close correlation between the control states and
the nal states in the RTL FSM, but there is no one-to-one mapping.

The following gure shows that you can select an operation and right-click the mouse ( Goto Source option)
to open the associated variable in the source code view. You can see that the write operation is implementing
the writing of data into the buf array from the input array variable.

40 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.31: C Source Code Correlation

The Analysis Perspective is a highly interactive feature. More information on the Analysis Perspective can be
found in the Design Analysis section of the Vivado Design Suite Tutorial, "High-Level Synthesis (UG871)".

Note : Even if a Tcl ow is used to create designs, the project can still be opened in the GUI and the Analysis
Perspective used to analyze the design.

Use the Synthesis perspective button to return to the synthesis view.

Generally after design analysis you can create a new solution to apply optimization directives. Using a new
solution for this allows the dierent solutions to be compared.

2.4.3 Clock, Reset, and RTL Output

Clock, Reset, and RTL Output

The most typical use of Vivado HLS is to create an initial design, then perform optimizations to meet the
desired area and performance goals.

Solutions oer a convenient way to ensure the results from earlier synthesis runs can be both preserved and
compared.

- In the Vivado HLS main toolbar press New Solution button to open the new Solution Conguration
dialog box.

www.so-logic.net 2023/01/17 41
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.32: New Solution button

The another way to open Solution Conguration dialog box is to use Project -> New Solution option
from the main Vivado HLS menu, see Figure 2.35.

Figure 2.33: New Solution option

The Solution Wizard has the same options as the nal window in the New Project wizard (Figure 2.11)
plus an additional option that allow any directives and customs constraints applied to an existing solution to
be conveniently copied to the new solution, where they can be modied or removed.

New Solution

- In the Solution Conguration dialog box, leave all parameters unchanged and click Finish.

Figure 2.34: Solution Conguration dialog box

After the new solution has been created, optimization directives can be added (or modied if they were copied

42 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

from the previous solution). The next section explains how directives can be added to solutions. Custom
constraints are applied using the conguration options.

2.4.4 Applying Optimization Directives

Applying Optimization Directives

The rst step in adding optimization directives is to open the source code in the Information pane.

Figure 2.35: Information pane with opened source code

As shown in the following gure, expand the Source container located at the top of the Explorer pane, and
double-click the source le ( modulator.cpp) to open it for editing in the Information pane.

With the source code active in the Information pane, select the Directive tab on the right to display and
modify directives for the le. The Directive tab contains all the objects and scopes in the currently opened
source code to which you can apply directives.

Note : To apply directives to objects in other C les, you must open the le and make it active in the Infor-
mation pane.

Although you can select objects in the Vivado HLS GUI and apply directives. Vivado HLS applies all directives
to the scope that contains the object. For example, you can apply an INTERFACE directive to an interface
object in the Vivado HLS GUI. Vivado HLS applies the directive to the top-level function (scope), and the
interface port (object) is identied in the directive. In the following example, port data_in on function foo is
specied as an AXI4-Lite interface:

set_directive_interface -mode s_axilite "foo" adata_in

www.so-logic.net 2023/01/17 43
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

You can apply optimization directives to the following objects and scopes:

Interfaces

When you apply directives to an interface, Vivado HLS applies the directive to the top-level function,
because the top-level function is the scope that contains the interface.

Functions

When you apply directives to functions, Vivado HLS applies the directive to all objects within the scope of
the function. The eect of any directive stops at the next level of function hierarchy. The only exception
is a directive that supports or uses a recursive option, such as the PIPELINE directive that recursively
unrolls all loops in the hierarchy.

Loops

When you apply directives to loops, Vivado HLS applies the directive to all objects within the scope
of the loop. For example, if you apply a LOOP_MERGE directive to a loop, Vivado HLS applies the
directive to any sub-loops within the loop but not to the loop itself.

Note : The loop to which the directive is applied is not merged with siblings at the same level of hi-
erarchy.

Arrays

When you apply directives to arrays, Vivado HLS applies the directive to the scope that contains the
array.

Regions

When you apply directives to regions, Vivado HLS applies the directive to the entire scope of the re-
gion. A region is any area enclosed within two braces. For example:

set_directive_interface -mode s_axilite "foo" adata_in

Note : You can apply directives to a region in the same way you apply directives to functions and loops.

Insert Directive

- To apply a directive, select an object in the Directive tab (in our case, sel), right-click on it and choose
Insert Directive... option to open the Vitis HLS Directives Editor dialog box.

Figure 2.36: Insert Directive option

44 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

- In the Vitis HLS Directives Editor dialog box click on the Directive drop-down menu and select the
appropriate directive.

The drop-down menu shows only directives that you can add to the selected object or scope. For example,
if you select an array object, the drop-down menu does not show the PIPELINE directive, because an array
cannot be pipelined.

Figure 2.37: Vitis HLS Directives Editor dialog box

In the Vitis HLS Directive Editor dialog box, you can specify either of the following Destination settings:
Source File - Vivado HLS inserts the directive directly into the C source le as a pragma.

Directive File - Vivado HLS inserts the directive as a Tcl command into the le directives.tcl in the
solution directory.

The following table describes the advantages and disadvantages of both approaches.

Table 2.2: Tcl Commands vs Pragmas

Directive Format Advantages Disadvantages

Directives le (Tcl Command) Each solution has independent directives. This If the C source les are transferred to a third-
approach is ideal for design exploration. If any party or archived, the directives.tcl le must
solution is re-synthesized, only the directives be included. The directives.tcl le is required
specied in that solution are applied. if the results are to be re-created.
Source Code (Pragma) The optimization directives are embedded into If the optimization directives are embedded in
the C source code. Ideal when the C sources the code, they are automatically applied to ev-
les are shipped to a third-party as C IP. No ery solution when re-synthesized.
other les are required to recreate the same
results. Useful approach for directives that are
unlikely to change, such as TRIPCOUNT and
INTERFACE.

- In the Vitis HLS Directive Editor dialog box:

www.so-logic.net 2023/01/17 45
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

choose INTERFACE as a directive for sel input port in the Directive drop-down list
leave selected Directive File as a Destination
choose ap_none I/O protocol as a mode (optional) option in the Options section
leave all other parameters unchanged and click OK.

Figure 2.38: Vitis HLS Directives Editor dialog box with necessary settings

- Apply the same directive with the same settings to the pwm_o output port and the Directive tab with
applied directives to selected ports looks as it is shown on the following gure.

Figure 2.39: Vitis HLS Directives Editor dialog box with necessary settings

In the following table is presented the complete list of all optimization directives provided by Vivado HLS.

46 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Table 2.3: Vivado HLS Optimization Directives

Directive Format Advantages

ALLOCATION Specify a limit for the number of operations, cores or functions used. This can force the
sharing or hardware resources and may increase latency.
ARRAY_MAP Combines multiple smaller arrays into a single large array to help reduce block RAM re-
sources.
ARRAY_PARTITION Partitions large arrays into multiple smaller arrays or into individual registers, to improve
access to data and remove block RAM bottlenecks.
ARRAY_RESHAPE Reshape an array from one with many elements to one with greater word-width. Useful for
improving block RAM accesses without using more block RAM.
DATA_PACK Packs the data elds of a struct into a single scalar with a wider word width.
DATAFLOW Enables task level pipelining, allowing functions and loops to execute concurrently. Used to
minimize interval.
DEPENDENCE Used to provide additional information that can overcome loop-carry dependencies and allow
loops to be pipelined (or pipelined with lower intervals).
EXPRESSION_BALANCE Allows automatic expression balancing to be turned o.
FUNCTION_INSTANTIATE Allows dierent instances of the same function to be locally optimized.
INLINE Inlines a function, removing all function hierarchy. Used to enable logic optimization across
function boundaries and improve latency/interval by reducing function call overhead.
INTERFACE Species how RTL ports are created from the function description.
LATENCY Allows a minimum and maximum latency constraint to be specied.
LOOP_FLATTEN Allows nested loops to be collapsed into a single loop with improved latency.
LOOP_MERGE Merge consecutive loops to reduce overall latency, increase sharing and improve logic opti-
mization.
LOOP_TRIPCOUNT Used for loops which have variables bounds. Provides an estimate for the loop iteration
count. This has no impact on synthesis, only on reporting.
OCCURRENCE Used when pipelining functions or loops, to specify that the code in a location is executed
at a lesser rate than the code in the enclosing function or loop.
PIPELINE Reduces the initiation interval by allowing the concurrent execution of operations within a
loop or function.
PROTOCOL This commands species a region of the code to be a protocol region. A protocol region can
be used to manually specify an interface protocol.
RESET This directive is used to add or remove reset on a specic state variable (global or static).
RESOURCE Specify that a specic library resource (core) is used to implement a variable (array, arithmetic
operation or function argument) in the RTL.
STREAM Species that a specic array is to be implemented as a FIFO or RAM memory channel
during dataow optimization.
UNROLL Unroll for-loops to create multiple independent operations rather than a single collection of
operations.

Applying Optimization Directives to Global Variables

Directives can only be applied to scopes or objects within a scope. As such, they cannot be directly applied to
global variables which are declared outside the scope of any function.

To apply a directive to a global variable, apply the directive to the scope (function, loop or region) where the
global variable is used. Open the directives tab on a scope were the variable is used, apply the directive and
enter the variable name manually in Directives Editor.

www.so-logic.net 2023/01/17 47
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Applying Optimization Directives to Class Objects

Optimization directives can be also applied to objects or scopes dened in a class. The dierence is typically
that classes are dened in a header le. Use one of the following actions to open the header le:

From the Explorer pane, open the Includes folder, navigate to the header le, and double-click the le
to open it.

From within the C source, place the cursor over the header le (the #include statement), to open hold
down the Ctrl key, and click the header le.

The directives tab is then populated with the objects in the header le and directives can be applied.

Important : Care should be taken when applying directives as pragmas to a header le. The le might be used
by other people or used in other projects. Any directives added as a pragma are applied each time the header
le is included in a design.

Applying Optimization Directives to Templates

To apply optimization directives manually on templates when using Tcl commands, specify the template argu-
ments and class when referring to class methods. For example, given the following C++ code:

template <uint32 SIZE, uint32 RATE>

void DES10<SIZE,RATE>::calcRUN() {...}

The following Tcl command is used to specify the INLINE directive on the function

set_directive_inline DES10<SIZE,RATE>::calcRUN

The following section outlines the various optimizations and techniques you can use to direct Vivado HLS to
produce a micro-architecture that satises the desired performance and area goals.

2.4.4.1 Clock, Reset, and RTL Output

Clock Frequency

For C and C++ designs only a single clock is supported. The same clock is applied to all functions in the design

For SystemC designs, each SC_MODULE may be specied with a dierent clock. To specify multiple clocks
in a SystemC design, use the -name option of the create_clock command to create multiple named clocks and
use the CLOCK directive or pragma to specify which function contains the SC_MODULE to be synthesized
with the specied clock. Each SC_MODULE can only be synthesized using a single clock. Clocks may be
distributed through functions, such as when multiple clocks are connected from the top-level ports to individual
blocks, but each SC_MODULE can only be sensitive to a single clock.

The clock period, in ns, is set in the Solution -> Solution Settings... (main Vivado HLS menu option).
Vivado HLS uses the concept of a clock uncertainty to provide a user dened timing margin. Using the clock
frequency and device target information Vivado HLS estimates the timing of operations in the design but it
cannot know the nal component placement and net routing: these operations are performed by logic synthesis
of the output RTL. As such, Vivado HLS cannot know the exact delays.

To calculate the clock period used for synthesis, Vivado HLS subtracts the clock uncertainty from the clock
period, as shown in the following gure.

48 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.40: Clock Period and Margin

This provides a user specied margin to ensure downstream processes, such as logic synthesis and place & route,
have enough timing margin to complete their operations. If the FPGA device is mostly utilized the placement
of cells and routing of nets to connect the cells might not be ideal and might result in a design with larger than
expected timing delays. For a situation such as this, an increased timing margin ensures Vivado HLS does not
create a design with too much logic packed into each clock cycle and allows RTL synthesis to satisfy timing in
cases with less than ideal placement and routing options.

By default, the clock uncertainty is 12.5% of the cycle time. The value can be explicitly specied beside the
clock period.

Vivado HLS aims to satisfy all constraints: timing, throughput, latency. However, if a constraints cannot be
satised, Vivado HLS always outputs an RTL design.

If the timing constraints inferred by the clock period cannot be met Vivado HLS issues message SCHED-644,
as shown below, and creates a design with the best achievable performance.

@W [SCHED-644] Max operation delay (<operation_name> 2.39ns) exceeds the effective cycle time

Even if Vivado HLS cannot satisfy the timing requirements for a particular path, it still achieves timing on
all other paths. This behavior allows you to evaluate if higher optimization levels or special handling of those
failing paths by downstream logic syntheses can pull-in and ultimately satisfy the timing.

Important : It is important to review the constraint report after synthesis to determine if all constraints is met.
The fact that Vivado HLS produces an output design does not guarantee the design meets all performance
constraints. Review the Performance Estimates section of the design report.

The option relax_ii_for_timing of the cong_schedule command can be used to change the default timing
behavior. When this option is specied, Vivado HLS automatically relaxes the II for any pipeline directive when
it detects a path is failing to meet the clock period. This option only applies to cases where the PIPELINE
directive is specied without an II value (and an II=1 is implied). If the II value is explicitly specied in the
PIPELINE directive, the relax_ii_for_timing option has no eect.

A design report is generated for each function in the hierarchy when synthesis completes and can be viewed in
the solution reports folder. The worse case timing for the entire design is reported as the worst case in each
function report. There is no need to review every report in the hierarchy.

If the timing violations are too severe to be further optimized and corrected by downstream processes, review
the techniques for specifying an exact latency and specifying exact implementation cores before considering a
faster target technology.

Reset

Typically the most important aspect of RTL conguration is selecting the reset behavior. When discussing reset
behavior it is important to understand the dierence between initialization and reset.

www.so-logic.net 2023/01/17 49
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Initialization Behavior

In C, variables dened with the static qualier and those dened in the global scope, are by default initialized
to zero. Optionally, these variables may be assigned a specic initial value. For these type of variables, the
initial value in the C code is assigned at compile time (at time zero) and never again. In both cases, the same
initial value is implemented in the RTL.

During RTL simulation the variables are initialized with the same values as the C code.

The same variables are initialized in the bitstream used to program the FPGA. When the device powers
up, the variables will start in their initialized state.

The variables start with the same initial state as the C code. However, there is no way to force a return to this
initial state. To return to their initial state the variables must be implemented with a reset.

Controlling the Reset Behavior

The reset port is used in an FPGA to return the registers and block RAM connected to the reset port to an
initial value any time the reset signal is applied. The presence and behavior of the RTL reset port is controlled
using the cong_rtl conguration.

To access the cong_rtl conguration:

In the Vivado HLD Explorer pane, select Solution2, right-click on it and choose Solution Settings...
option,

In the Solution Settings (solution2) dialog box, select General option and click Add... button to
open RTL Congurations dialog box,

In the RTL Congurations dialog box click the Command drop down list and choose cong_rtl
command,

Leave all other settings unchanged and click OK,

In the Solution Settings (solution2) dialog box, clok OK.

Important : In our design, we do not need to use reset port, so this cong_rtl conguration is not needless for
our design!

The reset settings include the ability to set the polarity of the reset and whether the reset is synchronous or
asynchronous but more importantly it controls, through the reset option, which registers are reset when the
reset signal is applied.

Important : When AXI4 interfaces are used on a design the reset polarity is automatically changed to active-Low
irrespective of the setting in the cong_rtl conguration. This is required by the AXI4 standard.

The reset option has four settings:

none - No reset is added to the design.

control - This is the default and ensures all control registers are reset. Control registers are those used
in state machines and to generate I/O protocol signals. This setting ensures the design can immediately
start its operation state.

state - This option adds a reset to control registers (as in the control setting) plus any registers or
memories derived from static and global variables in the C code. This setting ensures static and global
variable initialized in the C code are reset to their initialized value after the reset is applied.

all - This adds a reset to all registers and memories in the design.

50 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Finer grain control over reset is provided through the RESET directive. If a variable is a static or global, the
RESET directive is used to explicitly add a reset, or the variable can be removed from those being reset by
using the RESET directive's o option. This can be particularly useful when static or global arrays are present
in the design.

Initializing and Resetting Arrays

Arrays are often dened as static variables, which implies all elements be initialized to zero, and arrays are
typically implemented as block RAM. When reset options state or all are used, it forces all arrays implemented
as block RAM to be returned to their initialized state after reset. This may result in two very undesirable
attributes in the RTL design:

Unlike a power-up initialization, an explicit reset requires the RTL design iterate through each address
in the block RAM to set the value: this can take many clock cycles if N is large and require more area
resources to implement.

A reset is added to every array in the design.

To prevent placing reset logic onto every such block RAM and incurring the cycle overhead to reset all elements
in the RAM:

Use the default control reset mode and use the RESET directive to specify individual static or global
variables to be reset.

Alternatively, use reset mode state and remove the reset from specic static or global variables using the
o option to the RESET directive.

RTL Output

Various characteristics of the RTL output by Vivado HLS can be controlled using the cong_rtl congura-
tion:

Specify the type of FSM encoding used in the RTL state machines.

Add an arbitrary comment string, such as a copyright notice, to all RTL les using the -header option.

Specify a unique name with the prex option which is added to all RTL output le names.

Force the RTL ports to use lower case names.

The default FSM coding is style is onehot. Other possible options are auto, binary, and gray. If you select auto,
Vivado HLS implements the style of encoding using the onehot default, but Vivado Design Suite might extract
and re-implement the FSM style during logic synthesis. If you select any other encoding style (binary, onehot,
gray ), the encoding style cannot be re-optimized by Xilinx logic synthesis tools.

The names of the RTL output les are derived from the name of the top-level function for synthesis. If dierent
RTL blocks are created from the same top-level function, the RTL les will have the same name and cannot
be combined in the same RTL project. The prex option allows RTL les generated from the same top-level
function (and which by default have the same name as the top-level function) to be easily combined in the same
directory. The lower_case_name option ensures the only lower case names are used in the output RTL. This
option ensures the IO protocol ports created by Vivado HLS, such as those for AXI interfaces, are specied as
s_axis_<port>_tdata in the nal RTL rather than the default port name of s_axis_<port>_TDATA.

2.4.4.2 Optimizing for Throughput

Use the following optimizations to improve throughput or reduce the initiation interval.

www.so-logic.net 2023/01/17 51
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Task Pipelining

Pipelining allows operations to happen concurrently. The task does not have to complete all operations before
it begin the next operation. Pipelining is applied to functions and loops. The throughput improvements in
function pipelining are shown in the following gure.

Figure 2.41: Function Pipelining Behavior

Without pipelining the function reads an input every 3 clock cycles and outputs a value every 2 clock cycles.
The function has an Initiation Interval (II) of 3 and a latency of 2. With pipelining, a new input is read every
cycle (II=1) with no change to the output latency or resources used.

Loop pipelining allows the operations in a loop to be implemented in a concurrent manner as shown in the
following gure. In this gure, (a) shows the default sequential operation where there are 3 clock cycles between
each input read (II=3), and it requires 8 clock cycles before the last output write is performed.

In the pipelined version of the loop shown in (b), a new input sample is read every cycle (II=1) and the nal
output is written after only 4 clock cycles: substantially improving both the II and latency while using the same
hardware resources.

52 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.42: Loop Pipelining Behavior

Tasks are pipelined using the PIPELINE directive. The initiation interval defaults to 1 if not specied but may
be explicitly specied.

Pipelining is applied to the specied task not to the hierarchy below: all loops in the hierarchy below are auto-
matically unrolled. Any sub-functions in the hierarchy below the specied task must be pipelined individually.
If the sub-functions are pipelined, the pipelined tasks above it can take advantage of the pipeline performance.
Conversely, any sub-function below the pipelined task that is not pipelined, may be the limiting factor in the
performance of the pipeline.

There is a dierence in how pipelined functions and loops behave:

In the case of functions, the pipeline runs forever and never ends.

In the case of loops, the pipeline executes until all iterations of the loop are completed.

Partitioning Arrays to Improve Pipelining

Pipelining increases the throughput of the system, but sometimes existing data interface do not have sucient
data throughput to transmit all the necessary data to the data processing system. In this case pipelining
system works under their possibilities and pipelining eects of the limited. This issue is typically caused by
arrays. Arrays are implemented as block RAM which only has a maximum of two data ports. This can limit the
throughput of a read/write (or load/store) intensive algorithm. The bandwidth can be improved by splitting the
array (a single block RAM resource) into multiple smaller arrays (multiple block RAMs), eectively increasing
the number of ports.

Arrays are partitioned using the ARRAY_PARTITION directive. Vivado HLS provides three types of array
partitioning, as shown in the following gure. The three styles of partitioning are:

block - The original array is split into equally sized blocks of consecutive elements of the original array.

cyclic - The original array is split into equally sized blocks interleaving the elements of the original array.

complete - The default operation is to split the array into its individual elements. This corresponds to
resolving a memory into registers.

www.so-logic.net 2023/01/17 53
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.43: Array Partitioning

For block and cyclic partitioning the factor option species the number of arrays that are created. In the
preceding gure, a factor of 2 is used, that is, the array is divided into two smaller arrays. If the number of
elements in the array is not an integer multiple of the factor, the nal array has fewer elements.

When partitioning multi-dimensional arrays, the dimension parameter is used to specify which dimension is
partitioned. The following code shows how the dimension parameter is used to partition the following example
code:

void example (...) {

int my_array[10][6][4];
...
}

The example demonstrates how partitioning dimension 3 results in 4 separate arrays and partitioning dimension
1 results in 10 separate arrays. If zero is specied as the dimension, all dimensions are partitioned.

my_array[10][6][4] -> ARRAY_PARTITION, mode=compete, partition dimension = 3 -> my_array_0[10][6]

my_array_1[10][6]
my_array_2[10][6]
my_array_3[10][6]
my_array[10][6][4] -> ARRAY_PARTITION, mode=compete, partition dimension = 1 -> my_array_0[6][4]
my_array_1[6][4]
my_array_2[6][4]
my_array_3[6][4]
my_array_4[6][4]
my_array_5[6][4]
my_array_6[6][4]
my_array_7[6][4]
my_array_8[6][4]
my_array_9[6][4]
my_array[10][6][4] -> ARRAY_PARTITION, mode=compete, partition dimension = 0 -> 10x6x4=240 registers

The cong_array_partition conguration determines how arrays are automatically partitioned based on the
number of elements. This conguration is accessed through the Vivado HLS menu Solution -> Solution
Settings -> General -> Add -> cong_array_partition.

The partition thresholds can be adjusted and partitioning can be fully automated with the throughput_driven
option. When the throughput_driven option is selected Vivado HLS automatically partitions arrays to achieve
the specied throughput.

Loop Unrolling to Improve Pipelining

By default loops are kept rolled in Vivado HLS. That is to say that the loops are treated as a single entity: all
operations in the loop are implemented using the same hardware resources for iteration of the loop.

Vivado HLS provides the ability to unroll or partially unroll for-loops using the UNROLL directive.

The following gure shows both the powerful advantages of loop unrolling and the implications that must be
considered when unrolling loops. This example assumes the arrays a[i], b[i] and c[i] are mapped to block RAMs.

54 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

This example shows how easy it is to create many dierent implementations by the simple application of loop
unrolling.

Figure 2.44: Loop Unrolling Details

Rolled Loop - When the loop is rolled, each iteration is performed in a separate clock cycle. This
implementation takes four clock cycles, only requires one multiplier and each block RAM can be a single-
port block RAM.

Partially Unrolled Loop - In this example, the loop is partially unrolled by a factor of 2. This
implementation required two multipliers and dual-port RAMs to support two reads or writes to each
RAM in the same clock cycle. This implementation does however only take 2 clock cycles to complete:
half the initiation interval and half the latency of the rolled loop version.

Unrolled Loop - In the fully unrolled version all loop operation can be performed in a single clock cycle.
This implementation however requires four multipliers. More importantly, this implementation requires
the ability to perform 4 reads and 4 write operations in the same clock cycle. Because a block RAM only
has a maximum of two ports, this implementation requires the arrays be partitioned.

To perform loop unrolling, you can apply the UNROLL directives to individual loops in the design. Alternatively,
you can apply the UNROLL directive to a function, which unrolls all loops within the scope of the function.

If a loop is completely unrolled, all operations will be performed in parallel: if data dependencies allow. If
operations in one iteration of the loop require the result from a previous iteration, they cannot execute in
parallel but will execute as soon as the data is available. A completely unrolled loop will mean multiple copies
of the logic in the loop body.

Partial loop unrolling does not require the unroll factor to be an integer multiple of the maximum iteration
count. Vivado HLS adds an exit checks to ensure partially unrolled loops are functionally identical to the
original loop. For example, given the following code:

www.so-logic.net 2023/01/17 55
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

for(int i = 0; i < N; i++) {

a[i] = b[i] + c[i];
}

Loop unrolling by a factor of 2 eectively transforms the code to look like the following example where the
break construct is used to ensure the functionality remains the same:

for(int i = 0; i < N; i++) {

a[i] = b[i] + c[i];
if(i+1>=N) break;
a[i+1]=b[i+1]+c[i+1];
}

Because N is a variable, Vivado HLS may not be able to determine its maximum value (it could be driven from
an input port). If you know the unrolling factor, 2 in this case, is an integer factor of the maximum iteration
count N, the skip_exit_check option removes the exit check and associated logic. The eect of unrolling can
now be represented as:

for(int i = 0; i < N; i ++) {

a[i] = b[i] + c[i];
a[i+1] = b[i+1] + c[i+1];
}

This helps minimize the area and simplify the control logic.

2.4.4.3 Optimizing for Latency

In order to reduce delays in the data processing (latency) within RTL system, that is the result of the HLS
synthesis using Vivad HLS tool, it is necessary to use the following optimization directives:

Latency Constraints

Loop Merging

Loop Flattening

Latency Constraints

Vivado HLS supports the use of a latency constraint on any scope. Latency constraints are specied using the
LATENCY directive.

When a maximum and/or minimum LATENCY constraint is placed on a scope, Vivado HLS tries to ensure all
operations in the function complete within the range of clock cycles specied.

The LATENCY directive applied to a loop species the required latency for a single iteration of the loop. It
species the latency for the loop body, as the following examples shows:

for (int i=0; i<N; i++) {

#pragma HLS latency max=10
..Loop Body...
}

This example contains LATENCY directive which species that the maximum duration of the body loop exe-
cution is not greater than 10 cycles clock signal.

If the intention is to limit the total latency of all loop iterations, the latency directive should be applied to a
region that encompasses the entire loop, as in this example:

56 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Region_Loop: {
#pragma HLS latency max=10
for (int i=0; i<N; i++)
{
..Loop Body...
}
}

In this case, even if the loop is unrolled, the latency directive sets a maximum limit on all loop operations.

If Vivado HLS cannot meet a maximum latency constraint it relaxes the latency constraint and tries to achieve
the best possible result.

If a minimum latency constraint is set and Vivado HLS can produce a design with a lower latency than the
minimum required it inserts dummy clock cycles to meet the minimum latency.

Loop Merging

All rolled loops imply and create at least one state in the design FSM. When there are multiple sequential loops
it can create additional unnecessary clock cycles and prevent further optimizations.

The following gure shows a simple example where a seemingly intuitive coding style has a negative impact on
the performance of the RTL design.

Figure 2.45: Loop Directives

On the Figure 2.47, "Without Loop Merging" shows how, by default, each rolled loop in the design creates
at least one state in the FSM. Moving between those states costs clock cycles: assuming each loop iteration
requires one clock cycle, it take a total of 11 cycles to execute both loops:

1 clock cycle to enter the add loop.

4 clock cycles to execute the add loop.

1 clock cycle to exit add and enter sub.

4 clock cycles to execute the sub loop.

1 clock cycle to exit the sub loop.

For a total of 11 clock cycles.

In this simple example it is obvious that an else branch in the ADD loop would also solve the issue but in a
more complex example it may be less obvious and the more intuitive coding style may have greater advantages.

www.so-logic.net 2023/01/17 57
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

The LOOP_MERGE optimization directive is used to automatically merge loops. The LOOP_MERGE di-
rective will seek so to merge all loops within the scope it is placed. In the above example, merging the loops
creates a control structure similar to that shown in (B) in the preceding gure, which requires only 6 clocks to
complete.

Merging loops allows the logic within the loops to be optimized together. In the example above, using a
dual-port block RAM allows the add and subtraction operations to be performed in parallel.

Loop Flattening

In a similar manner to the consecutive loops discussed in the previous section, it requires additional clock cycles
to move between rolled nested loops. It requires one clock cycle to move from an outer loop to an inner loop
and from an inner loop to an outer loop.

The following example illustrates how, if no care is taken one may spend an additional 200 clock cycles to these
processes when executing external loop.

void func {int a, int b, int c, int d}

{
...
outer_loop: while(j<100) {
inner_loop: while(i<6) { // 1 cycle to enter inner
...
LOOP_BODY
...
} // 1 cycle to exit inner
}
...
}

Vivado HLS provides the set_directive_loop_atten command to allow labeled perfect and semi-perfect nested
loops to be attened, removing the need to re-code for optimal hardware performance and reducing the number
of cycles it takes to perform the operations in the loop.

Perfect loop nest - only the innermost loop has loop body content, there is no logic specied between
the loop statements and all the loop bounds are constant.

Semi-perfect loop nest - only the innermost loop has loop body content, there is no logic specied
between the loop statements but the outermost loop bound can be a variable.

For imperfect loop nests, where the inner loop has variables bounds or the loop body is not exclusively inside
the inner loop, designers should try to restructure the code, or unroll the loops in the loop body to create a
perfect loop nest.

2.4.4.4 Optimizing for Area

In order to reduce hardware resources needed to implement the RTL system which generates in HLS process
using HSL Vivado tools, it is necessary to use the following optimization directives:

Bit-Width Narrowing

Function Inlining

Array Mapping

Array Reshaping

Resource Allocation

58 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Bit-Width Narrowing

The bit-widths of the variables in the C function directly impact the size of the storage elements and operators
used in the RTL implementation. If a variables only requires 12-bits but is specied as an integer type (32-bit)
it will result in larger and slower 32-bit operators being used, reducing the number of operations that can be
performed in a clock cycle and potentially increasing initiation interval and latency.

Use the appropriate precision for the data types.

Conrm the size of any arrays that are to be implemented as RAMs or registers. The area impact of any
over-sized elements is wasteful in hardware resources.

Pay special attention to multiplications, divisions, modulus or other complex arithmetic operations. If
these variables are larger than they need to be, they negatively impact both area and performance.

Function Inlining

Function inlining removes the function hierarchy. A function is inlined using the INLINE directive.

Inlining a function may improve area by allowing the components within the function to be better shared or
optimized with the logic in the calling function. This type of function inlining is also performed automatically
by Vivado HLS. Small functions are automatically inlined.

Inlining allows functions sharing to be better controlled. For functions to be shared they must be used within
the same level of hierarchy. In this code example, function top calls f1 twice and function fsub.

fsub (int p, int q)

{
int q1 = q + 10;
f1(p1,q); // the third instance of f1 function
...
}

void top {int a, int b, int c, int d}

{
...
f1(a,b); // the first instance of f1 function
f1(a,c); // the second instance of f1 function
fsub(a,d);
...
}

Inlining function fsub and using the ALLOCATION directive to specify only 1 instance of function fsub is used,
results in a design which only has one instance of function fsub : one-third the area of the example above.

fsub (int p, int q)

{
#pragma HLS INLINE
int q1 = q + 10;
f1(p1,q);
...
}
void top {int a, int b, int c, int d}
{
#pragma HLS ALLOCATION instances=f1 limit=1 function
...
f1(a,b);
f1(a,c);
fsub(a,d);
...
}

The INLINE directive optionally allows all functions below the specied function to be recursively inlined by
using the recursive option. If the recursive option is used on the top-level function, all function hierarchy in the
design is removed.

The INLINE o option can optionally be applied to functions to prevent them being inlined. This option/em
may be used to prevent Vivado HLS from automatically inlining a function.

www.so-logic.net 2023/01/17 59
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

The INLINE directive is a powerful way to substantially modify the structure of the code without actually per-
forming any modications to the source code and provides a very powerful method for architectural exploration.

Array Mapping

When there are many small arrays in the C Code, mapping them into a single larger array typically reduces
the number of block RAM required.

Each array is mapped into a block RAM. The basic block RAM unit provide in an FPGA is 18K. If many small
arrays do not use the full 18K, a better use of the block RAM resources is map many of the small arrays into
a larger array. If a block RAM is larger than 18K, they are automatically mapped into multiple 18K units. In
the synthesis report, review Utilization Report -> Details -> Memory for a complete understanding of
the block RAMs in your design.

The ARRAY_MAP directive supports two ways of mapping small arrays into a larger one:

Horizontal mapping - corresponds to creating a new array by concatenating the original arrays. Phys-
ically, this gets implemented as a single array with more elements.

Vertical mapping - corresponds to creating a new array by concatenating the original words in the
array. Physically, this gets implemented by a single array with a larger bit-width.

Horizontal Array Mapping

The following code example has two arrays that would result in two RAM components.

void func (...) {

int8 array1[M];
int12 array2[N];
...

loop_1: for (i=0; i<M; i++) {

array1[i] = ...;
array2[i] = ...;
...
}
...
}

Arrays array1 and array2 can be combined into a single array, specied as array3 in the following example:

void func (...) {

int8 array1[M];
int12 array2[N];

#pragma HLS ARRAY_MAP variable=array1 instance=array3 horizontal

#pragma HLS ARRAY_MAP variable=array2 instance=array3 horizontal
...

loop_1: for (i=0; i<M; i++) {

array1[i] = ...;
array2[i] = ...;
...
}
...
}

In this example, the ARRAY_MAP directive transforms the arrays as shown in the following gure.

60 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.46: Horizontal Array Mapping

When using horizontal mapping, the smaller arrays are mapped into a larger array. The mapping starts at
location 0 in the larger array and follows in the order the commands are specied. In the Vivado HLS GUI,
this is based on the order the arrays are specied using the menu commands. In the Tcl environment, this is
based on the order the commands are issued.

When you use the horizontal mapping shown in Figure 2.49, the implementation in the block RAM appears as
shown in the following gure.

Figure 2.47: Memory for Horizontal Mapping

Vertical Array Mapping

In vertical mapping, arrays are concatenated by to produce an array with higher bit-widths.Vertical mapping
is applied using the vertical option to the INLINE directive. The following gure shows how the same example
as before transformed when vertical mapping mode is applied.

void func (...) {

int8 array1[M];
int12 array2[N];

#pragma HLS ARRAY_MAP variable=array2 instance=array3 vertical

#pragma HLS ARRAY_MAP variable=array1 instance=array3 vertical
...

loop_1: for (i=0;i<M;i++) {

array1[i] = ...;
array2[i] = ...;
...
}
...
}

The structure of the array3 array, which is the result of vertical mapping array1 and array2 arrays is shown
on the Figure 2.50.

www.so-logic.net 2023/01/17 61
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.48: Vertical Array Mapping

In vertical mapping, the arrays are concatenated in the order specied by the command, with the rst arrays
starting at the LSB and the last array specied ending at the MSB. After vertical mapping the newly formed
array, is implemented in a single block RAM component as shown in the following gure.

Figure 2.49: Memory for Vertical Mapping

Vertical Array Mapping

The ARRAY_RESHAPE directive combines ARRAY_PARTITIONING with the vertical mode of ARRAY_MAP
and is used to reduce the number of block RAM while still allowing the benecial attributes of partitioning:
parallel access to the data.

Given the following example code:

void func (...) {

int array1[N];
int array2[N];
int array3[N];

#pragma HLS ARRAY_RESHAPE variable=array1 block factor=2 dim=1

#pragma HLS ARRAY_RESHAPE variable=array2 cycle factor=2 dim=1
#pragma HLS ARRAY_RESHAPE variable=array3 complete dim=1
...
}

The ARRAY_RESHAPE directive transforms the arrays into the form shown in the following gure.

62 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.50: Array Reshaping

The ARRAY_RESHAPE directive allows more data to be accessed in a single clock cycle. In cases where more
data can be accessed in a single clock cycle, Vivado HLS may automatically unroll any loops consuming this data,
if doing so will improve the throughput. The loop can be fully or partially unrolled to create enough hardware to
consume the additional data in a single clock cycle. This feature is controlled using the cong_unroll command
and the option tripcount_threshold. In the following example, any loops with a tripcount of less than 16 will
be automatically unrolled if doing so improves the throughput.

config_unroll -tripcount_threshold 16

Resource Allocation

During synthesis Vivado HLS performs the following basic tasks:

First, elaborates the C, C++ or SystemC source code into an internal database containing operators.

The operators represent operations in the C code such as additions, multiplications, array reads, and
writes.

Then, maps the operators on to cores which implement the hardware operations.

Cores are the specic hardware components used to create the design (such as adders, multipliers, pipelined
multipliers, and block RAM).

Control is provided over each of these steps, allowing you to control the hardware implementation at a ne level
of granularity.

Limiting the Number of Operators

Explicitly limiting the number of operators to reduce area may be required in some cases: the default operation
of Vivado HLS is to rst maximize performance. Limiting the number of operators in a design is a useful
technique to reduce the area: it helps reduce area by forcing sharing of the operations.

The ALLOCATION directive allows you to limit how many operators, or cores or functions are used in a
design. For example, if a design called foo has 317 multiplications but the FPGA only has 256 multiplier
resources (DSP48s). The ALLOCATION directive shown below directs Vivado HLS to create a design with
maximum of 256 multiplication (mul) operators:

www.so-logic.net 2023/01/17 63
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

int32 mac_unit (int16 d[317]) {

static int32 mac;
int i;
#pragma HLS ALLOCATION instances=mul limit=256 operation
for (i=0; i<300; i++) {
#pragma HLS UNROLL
mac += mac * d[i];
}
rerun mac;
}

You can use the type option to specify if the ALLOCATION directives limits operations, cores, or functions.
The following table lists all the operations that can be controlled using the ALLOCATION directive.

Table 2.4: Vivado HLS Operators

Operator Description
add Integer Addition
ashr Arithmetic Shift-Right
dadd Double-precision oating point addition
dcmp Double-precision oating point comparison
ddiv Double-precision oating point division
dmul Double-precision oating point multiplication
drecip Double-precision oating point reciprocal
drem Double-precision oating point remainder
drsqrt Double-precision oating point reciprocal square root
dsub Double-precision oating point subtraction
dsqrt Double-precision oating point square root
fadd Single-precision oating point addition
fcmp Single-precision oating point comparison
fdiv Single-precision oating point division
fmul Single-precision oating point multiplication
frecip Single-precision oating point reciprocal
frem Single-precision oating point remainder
frsqrt Single-precision oating point reciprocal square root
fsub Single-precision oating point subtraction
fsqrt Single-precision oating point square root
icmp Integer Compare
lshr Logical Shift-Right
mul Multiplication
sdiv Signed Divider
shl Shift-Left
srem Signed Remainder
sub Subtraction
udiv Unsigned Division
urem Unsigned Remainder

Controlling the Hardware Cores

When synthesis is performed, Vivado HLS uses the timing constraints specied by the clock, the delays specied
by the target device together with any directives specied by you, to determine which core is used to implement

64 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

the operators. For example, to implement a multiplier operation Vivado HLS could use the combinational
multiplier core or use a pipeline multiplier core.

The cores which are mapped to operators during synthesis can be limited in the same manner as the operators.
Instead of limiting the total number of multiplication operations, you can choose to limit the number of combi-
national multiplier cores, forcing any remaining multiplications to be performed using pipelined multipliers (or
vice versa). This is performed by specifying the ALLOCATION directive type option to be core.

The RESOURCE directive is used to explicitly specify which core to use for specic operations. In the following
example, a 2-stage pipelined multiplier is specied to implement the multiplication for variable The following
command informs Vivado HLS to use a 2-stage pipelined multiplier for variable c. It is left to Vivado HLS
which core to use for variable d.

int func (int a, int b) {

int c, d;

#pragma HLS RESOURCE variable=c latency=2

c = a*b;
d = a*c;

return d;
}

In the following example, the RESOURCE directives specify that the add operation for variable temp and is
implemented using the AddSub_DSP core. This ensures that the operation is implemented using a DSP48
primitive in the nal design by default, add operations are implemented using LUTs.

void apint_arith(int16 inA, int16 inB, int17 *out1) {

int17 temp;
#pragma HLS RESOURCE variable=temp core=AddSub_DSP
temp = inB + inA;
out1 = temp;
}

The following table lists the cores used to implement standard RTL logic operations (such as add, multiply, and
compare).

Table 2.5: Functional Cores

Core Description
AddSub This core is used to implement both adders and subtractors.
AddSubnS N-stage pipelined adder or subtractor. Vivado HLS determines how many pipeline stages are
required.
AddSub_DSP This core ensures that the add or sub operation is implemented using a DSP48 (Using the
adder or subtractor inside the DSP48).
DivnS N-stage pipelined divider.
DSP48 Multiplications with bit-widths that allow implementation in a single DSP48 macrocell. This
can include pipelined multiplications and multiplications grouped with a pre-adder, post-
adder, or both. This core can only be pipelined with a maximum latency of 4. Values above
4 saturate at 4.
Mul Combinational multiplier with bit-widths that exceed the size of a standard DSP48 macrocell.
Note : Multipliers that can be implemented with a single DSP48 macrocell are mapped to
the DSP48 core.
MulnS N-stage pipelined multiplier with bit-widths that exceed the size of a standard DSP48 macro-
cell. Note : Multipliers that can be implemented with a single DSP48 macrocell are mapped
to the DSP48 core.

www.so-logic.net 2023/01/17 65
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Mul_LUT Multiplier implemented with LUTs.

The following table lists the cores used to implement storage elements, such as registers or memories.

Table 2.6: Storage Cores

Core Description
FIFO A FIFO. Vivado HLS determines whether to implement this in the RTL with a block RAM
or as distributed RAM.
FIFO_ BRAM A FIFO implemented with a block RAM.
FIFO_LUTRAM A FIFO implemented as distributed RAM.
FIFO_SRL A FIFO implemented as with an SRL.
RAM_1P A single-port RAM. Vivado HLS determines whether to implement this in the RTL with a
block RAM or as distributed RAM.
RAM_1P_BRAM A single-port RAM implemented with a block RAM.
RAM_1P_LUTRAM A single-port RAM implemented as distributed RAM.
RAM_2P A dual-port RAM that allows read operations on one port and both read and write operations
on the other port. Vivado HLS determines whether to implement this in the RTL with a
block RAM or as distributed RAM.
RAM_2P_BRAM A dual-port RAM implemented with a block RAM that allows read operations on one port
and both read and write operations on the other port.
RAM_2P_LUTRAM A dual-port RAM implemented as distributed RAM that allows read operations on one port
and both read and write operations on the other port.
RAM_S2P_BRAM A dual-port RAM implemented with a block RAM that allows read operations on one port
and write operations on the other port.
RAM_S2P_LUTRAM A dual-port RAM implemented as distributed RAM that allows read operations on one port
and write operations on the other port.
RAM_T2P_BRAM A true dual-port RAM with support for both read and write on both ports implemented
with a block RAM.
ROM_1P A single-port ROM. Vivado HLS determines whether to implement this in the RTL with a
block RAM or with LUTs.
ROM_1P_BRAM A single-port ROM. Vivado HLS determines whether to implement this in the RTL with a
block RAM or with LUTs.
ROM_nP_BRAM A multi-port ROM implemented with a block RAM. Vivado HLS automatically determines
the number of ports.
ROM_1P_LUTRAM A single-port ROM implemented with distributed RAM.
ROM_nP_LUTRAM A multi-port ROM implemented with distributed RAM. Vivado HLS automatically deter-
mines the number of ports.
ROM_2P A dual-port ROM. Vivado HLS determines whether to implement this in the RTL with a
block RAM or as distributed ROM.
ROM_2P_BRAM A dual-port ROM implemented with a block RAM.
ROM_2P_LUTRAM A dual-port ROM implemented as distributed ROM.
XPM_MEMORY Species the array is to be implemented with an UltraRAM. This core is only usable with
devices supporting UltraRAM blocks.

The RESOURCE directives uses the assigned variable as the target for the resource. If the assignment species

66 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

multiple identical operators, the code must be modied to ensure there is a single variable for each operator to
be controlled.

2.5 Verify the RTL Implementation

Verify the RTL Implementation

Post-synthesis verication is automated through the C/RTL co-simulation feature which reuses the pre-synthesis
C test bench to perform verication on the output RTL.

C/RTL co-simulation uses the C test bench to automatically verify the RTL design. The verication process
consists of three phases:

1. The C simulation is executed and the inputs to the top-level function, or the Device-Under-Test (DUT),
are saved as input vectors.

2. The input vectors are used in an RTL simulation using the RTL created by Vivado HLS. The outputs
from the RTL are save as output vectors.

3. The output vectors from the RTL simulation are applied to C test bench, after the function for synthesis,
to verify the results are correct. The C test bench performs the verication of the results.

The following messages are output by Vivado HLS to show the progress of the verication.

C simulation:

[SIM-14] Instrumenting C test bench (wrapc)

[SIM-302] Generating test vectors(wrapc)

At this stage, since the C simulation was executed, any messages written by the C test bench will be output in
console window or log le.

RTL simulation:

[SIM-333] Generating C post check test bench

[SIM-12] Generating RTL test bench
[SIM-323] Starting Verilog simulation (Issued when Verilog is the RTL verified)
[SIM-322] Starting VHDL simulation (Issued when VHDL is the RTL verified)

At this stage, any messages from the RTL simulation are output in console window or log le.

C test bench results checking:

[SIM-316] Starting C post checking

[SIM-1000] C/RTL co-simulation finished: PASS (If test bench returns a 0)
[SIM-4] C/RTL co-simulation finished: FAIL (If the test bench returns non-zero)

The following Figure 2.53 shows the RTL verication ow.

www.so-logic.net 2023/01/17 67
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

Figure 2.51: RTL Verication Flow

The following is required to use C/RTL co-simulation feature successfully:

The test bench must be self-checking and return a value of 0 if the test passes or returns a non-zero value
if the test fails.

The correct interface synthesis options must be selected.

Any 3rd-party simulators must be available in the search path.

Any arrays or structs on the design interface cannot use the optimization directives or combinations of
optimization directives.

To verify the RTL design produces the same results as the original C code, use a self-checking test bench to
execute the verication. The following code example shows the important features of a self-checking test bench:

int main () {
int ret=0;
...
// Execute (DUT) Function
...

// Write the output results to a file

...

// Check the results

ret = system("diff --brief -w output.dat output.golden.dat");

if (ret != 0) {
printf("Test failed !!!nn");
ret=1;
}
else {
printf("Test passed !nn");
}
...
return ret;
}

This self-checking test bench compares the results against known good results in the output.golden.dat le.

In the Vivado HLS design ow, the return value to function main() indicates the following:

Zero: Results are correct.

Non-zero value: Results are incorrect

Note : The test bench can return any non-zero value. A complex test bench can return dierent val-
ues depending on the type of dierence or failure. If the test bench returns a non-zero value after C
simulation or C/RTL co-simulation, Vivado HLS reports an error and simulation fails.

Constrain the return value to an 8-bit range for portability and safety, because the system environment
interprets the return value of the main() function.

68 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

If the test bench does not check the results but returns zero, Vivado HLS indicates that the simulation test
passed even though the results were not actually checked.

After ensuring that the preceding requirements are met, you can use C/RTL co-simulation to verify the RTL
design using Verilog or VHDL. The default simulation language is Verilog, but you can also specify VHDL.
While the default simulator is Vivado Simulator (XSim), you can use any of the following simulators to run
C/RTL co-simulation:

Vivado Simulator (XSim)

ModelSim simulator

VCS simulator (Linux only)

NC-Sim simulator (Linux only)

Riviera simulator (PC only)

2.5.1 Using C/RTL Co-Simulation

Using C/RTL Co-Simulation

To perform C/RTL co-simulation from the GUI:

- In the main Vivado HLS toolbar menu, click the C/RTL Cosimulation button. This option opens the
simulation wizard window.

Figure 2.52: Run C/RTL Cosimulation button

Figure 2.53: C/RTL Co-simulation dialog box

- In the C/RTL Co-simulation dialog box set the following parameters:

www.so-logic.net 2023/01/17 69
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

choose Vivado XSIM in the drop down list

select VHDL
choose all in the Dump Trace drop down list
leave all other parameters unchanged and click OK.

Figure 2.54: C/RTL Co-simulation dialog box with set parameters

As can be seen from the previous gure, in the C/RTL Co-simulation dialog box there is an Options section
where can be found the following options:

Setup Only - This option creates all the les (wrappers, adapters, and scripts) required to run the
simulation but does not execute the simulator. The simulation can be run in the command shell from
within the appropriate RTL simulation folder <solution_name>/sim/<RTL>.

Optimizing Compile - This option ensures a high level of optimization is used to compile the C test
bench. Using this option increases the compile time but the simulation executes faster.

Input Arguments - This option allows the specication of any arguments required by the test bench.

Dump Trace - This option generates a trace le for every function, which is saved to the <solution>/sim/<RTL>
folder. The drop-down menu allows you to select which signals are saved to the trace le. You can choose
to trace all signals in the design, trace just the top-level ports, or trace no signals. For details on using
the trace le, see the documentation for the selected RTL simulator.

Compiled Library Location - This option species the location of the compiled library for a third-party
RTL simulator.

Note : If you are simulating with a third-party RTL simulator and the design uses IP, you must use
an RTL simulation model for the IP before performing RTL simulation. To create or obtain the RTL
simulation model, contact your IP provider.

Pressing the OK button in the C/RTL Co-simulation dialog box, the co-simulation process begins. Co-
simulation ow can be traced within Vivado HLS Console window.

Vivado HLS executes the RTL simulation in the project sub-directory: <SOLUTION>/sim/<RTL>, where

SOLUTION is the name of the solution.

70 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

RTL is the RTL type chosen for simulation.

Any les written by the C test bench during co-simulation and any trace les generated by the simulator are
written to this directory.

2.5.2 Analyzing RTL Simulations

Analyzing RTL Simulations

Optionally, you can review the waveform from C/RTL cosimulation using the Open Wave Viewer... toolbar
button.

Figure 2.55: Open Wave Viewer toolbar button

To view RTL waveforms, you must select the following options before executing C/RTL cosimulation:

Verilog/VHDL Simulator Selection - Select Vivado XSIM.

Dump Trace - Select all or port.

When C/RTL cosimulation completes, the Open Wave Viewer toolbar button opens the RTL waveforms in
the Vivado IDE, see Figure 2.58.

Figure 2.56: Waveform Viewer window opened in Vivado IDE

Note : When you open the Vivado IDE using this method, you can only use the waveform analysis features,
such as zoom, pan, and waveform radix.

www.so-logic.net 2023/01/17 71
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

In theWaveform Viewer window expand Design Top Signals folder and then nd sel[0:0] port (in the C
Inputs -> return(wire) folder) and pwm_o[0:0] port (in the C Outputs -> return(wire) folder) and
expand them also, see Figure 2.59. Zoom in few times around spot where sel[0:0] port changes its value from
0 to 1 and you will see the PWM signal period change. You can also notice the change of the duty cycle of the
PWM signal, as it is being modulated by the sine wave. When sel[0:0]=0 the period of the PWM signal is 3.5
times longer then in case when sel[0:0]=1, as it was expected.

RTL Simulations Results

Figure 2.57: Waveform Viewer window with cosimulation results

2.6 Package the RTL Implementation

Package the RTL Implementation

The nal step in the Vivado HLS ow is to export the RTL design as a block of Intellectual Property (IP) which
can be used by other tools in the Xilinx design ow.

The RTL design can be packaged into the following output formats:

Vivado IP (.zip)

Vivado IP for System Generator

In addition to the packaged output formats, the RTL les are available as standalone les (not part of a packaged
format) in the verilog and vhdl directories located within the implementation directory
<project_name>/<solution_name>/impl.

When Vivado HLS reports on the results of synthesis, it provides an estimation of the results expected after
RTL synthesis: the expected clock frequency, the expected number of registers, LUTs and block RAMs. These
results are estimations because Vivado HLS cannot know what exact optimizations RTL synthesis performs or
what the actual routing delays will be, and hence cannot know the nal area and timing values.

Before exporting a design, you have the opportunity to execute logic synthesis and conrm the accuracy of
the estimates. The evaluate option invokes RTL synthesis during the export process and synthesizes the RTL
design to gates.

Note : The RTL synthesis option is provided to conrm the reported estimates. In most cases, these RTL results
are not included in the packaged IP.

72 2023/01/17 www.so-logic.net
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

For most export formats, the RTL synthesis is executed in the verilog or vhdl directories, but the results of
RTL synthesis are not included in the packaged IP.

2.6.1 Packaging IP using Vivado IP (.zip) Format

Packaging IP using Vivado IP (.zip) Format

Upon completion of synthesis and RTL verication:

Export RTL dialog box by clicking the Export RTL

- Open the toolbar button or choosing the Solution
-> Export RTL option from the main Vivado HLS menu.

Figure 2.58: Export RTL option

- In the Export RTL dialog box choose Vivado IP (.zip) option from the Export Format drop down list
and fulll the IP Conguration section (as it is shown on the following Figure).

Figure 2.59: Export RTL dialog box

The Conguration options allow the following identication tags to be embedded in the exported package.
These elds can be used to help identify the packaged RTL inside the Vivado IP Catalog.

The conguration information is used to dierentiate between multiple instances of the same design when the
design is loaded into the IP Catalog. For example, if an implementation is packaged for the IP Catalog and then
a new solution is created and packaged as IP, the new solution by default has the same name and conguration

www.so-logic.net 2023/01/17 73
CHAPTER 2. DEVELOPING CUSTOM IP CORE USING HLS

information. If the new solution is also added to the IP Catalog, the IP Catalog will identify it as an updated
version of the same IP and the last version added to the IP Catalog will be used.

An alternative method is to use the prex option in the cong_rtl conguration to rename the output design
and les with a unique prex.

- In the IP Conguration section provide the following conguration setting:

Vendor: so-logic

Library: hls

Version: 1.0

Description: An IP generated by Vitis HLS

Display Name: hls_modulator_v1.0

- In the Export RTL dialog box also click OK.

When you press OK button in the Export RTL dialog box, Vivado HLS will start exporting RTL model into
chosen format.

After the packaging process is complete, the.zip le archive in directory <project_name>/<solution_name>/impl/ip
can be imported into the Vivado IP Catalog and used in any Vivado design (RTL or IP Integrator).

Important : In this tutorial we will use only exporting IP to IP Catalog!

If you choose Vivado IP for System Generator format option, this package will be written to the
<project_name>/<solution_name>/impl/sysgen directory and will contain everything necessary to import
the design to System Generator.

A Vivado HLS generated System Generator package may be imported into System Generator using the following
steps:

1. Inside the System Generator design, right-click and use option XilinxBlockAdd to instantiate new block.

2. Scroll down the list in dialog box and select Vivado HLS.

3. Double-click on the newly instantiated Vivado HLS block to open the Block Parameters dialog box.

<project_
4. Browse to the solution directory where the Vivado HLS block was exported. Using the example,
name>/<solution_name>/impl/sysgen, browse to the <project_name>/<solution_name> directory and
select apply.

74 2023/01/17 www.so-logic.net
Chapter 3

USING DEVELOPED IP CORE IN

VIVADO DESIGN SUITE
How to integrate a custom IP within the ARM-based embedded system using Xilinx Vivado IDE and Vitis tool,
will be shown in this chapter.

Sozius Development Board

The main component of the Sozius development board is Zynq-7000 AP SoC.

The Zynq-7000 AP SoC is composed of two major functional blocks:

1. Processing System (PS) and

2. Programmable Logic (PL)

Since existing LEDs and switches on the Sozius board are connected to the PS part of the Zynq FPGA, it would
require programming PS part of the Zynq FPGA, which is not topic of this tutorial. It is the main topic in the
"Basic Embedded System Design" tutorial.

In our design we will program PL part of the Zynq FPGA with model that will be created using the IP Integrator
tool.

PS part is also required to generate clock signal for the Modulator HLS design, since the only reference clock
source on the Sozius board is connected to the PS part of the Zynq FPGA.

Properly congured PS part will be described in the socius_xz_lab_ps_bd component.

3.1 Create a New Project with Included Developed IP Core

First, a new project must be created.

Crate a new project using the Vivado IDE New Project wizard and include developed IP core (hls_modulator_v1.0)
into the new project.

Create a New Project

- Close the Vitis HLS tool and open Vivado IDE tool.

75
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

- In the Vivado IDE tool create a new project, modulator_hls, targeting the Sozius development board and
save it in the same directory where the Vivado HLS modulator project is saved. For details how to create
Vivado project, please look at the Chapter 2.2 Creating a New Project in the Basic FPGA Tutorial.

- In the Vivado IDE click Settings command from the Project Manager section to open the Settings dialog
box.

- In theSettings dialog box, under the General section, change Target language to be VHDL instead of
Verilog.

Include Developed IP Core into the Project

- In the Settings dialog box, expand IP option from the Project Settings list and select Repository
command.

- In the IP Repository window click "+" icon to add the desire repository.

- In theIP Repositories dialog box nd the HLS modulator/solution2/impl/ip folder, where is the required
so-logic_hls_modulator_1_0 IP core stored, select it and click Select.

- In the Add Repository dialog box click OK to add the selected IP core to the Repository Manager.

- In the Settings dialog box, just click OK and the required so-logic_hls_modulator_1_0 IP core should
appear in the IP Catalog of your project.

- In the Flow Navigator, under the Project Manager, click IP Catalog command to verify the presence
of the previously created IP in the IP Catalog. In the Search eld type the name of the IP core (in our case
hls_modulator_v1.0) and you should should nd it under the VITIS HLS IP section, see Figure 3.1.

Verify the Presence of the Developed IP Core in the IP Catalog

Figure 3.1: IP Catalog with added hls_modulator_v1.0 IP core

76 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

3.2 Create ARM-based Hardware Platform with Integrated Devel-

oped IP Core
Create ARM-based Hardware Platform with Integrated Developed IP Core

This sub-chapter will show how to build Zynq-7000 All Programmable (AP) SoC processor "modulator_hls"
design using Vivado IDE and Tcl programming interface.

In this sub-chapter, you will instantiate a few IPs in the IP Integrator tool and then stitch them together to
create an IP based system design.

At the end, you will run synthesis and implementation process and generate bitstream le.

Create modulator_sozius_arm rtl.vhd and sozius_components_package.vhd Source Files

The following steps describe how to create ARM-based hardware platform for Sozius development board.

- First, we will create modulator_sozius_arm_rtl.vhd and sozius_components_package.vhd les

using Vivado test editor and save them into the working directory.

modulator_sozius_arm_rtl.vhd le will hold the top-level module of our design, in which Zynq PS
component congured for Sozius development board will be instantiated

sozius_components_package.vhd le will contain Sozius PS module component declaration.

The content of the both les is presented in the text below.

modulator_sozius_arm_rtl.vhd

-- Make reference to libraries that are necessary for this file:

-- the first part is a symbolic name, the path is defined depending of the tools
-- the second part is a package name
-- the third part includes all functions from that package
-- Better for documentation would be to include only the functions that are necessary

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

library unisim;
use unisim.vcomponents.all;

library work;
use work.sozius_components_package.all;

entity modulator_sozius_arm is
port(
-- ps io
ps_ddr3_addr : inout std_logic_vector(14 downto 0);
ps_ddr3_ba : inout std_logic_vector(2 downto 0);
ps_ddr3_cas_n : inout std_logic;
ps_ddr3_ck_n : inout std_logic;
ps_ddr3_ck_p : inout std_logic;
ps_ddr3_cke : inout std_logic;
ps_ddr3_cs_n : inout std_logic;
ps_ddr3_dm : inout std_logic_vector( 3 downto 0);
ps_ddr3_dq : inout std_logic_vector(31 downto 0);
ps_ddr3_dqs_n : inout std_logic_vector( 3 downto 0);
ps_ddr3_dqs_p : inout std_logic_vector( 3 downto 0);
ps_ddr3_odt : inout std_logic;
ps_ddr3_ras_n : inout std_logic;
ps_ddr3_reset_n : inout std_logic;
ps_ddr3_we_n : inout std_logic;
ps_ddr_vrn : inout std_logic;
ps_ddr_vrp : inout std_logic;
ps_clk_i : inout std_logic;
ps_por_n_i : inout std_logic;
ps_srst_n_i : inout std_logic;
ps_phy_mdc_io : inout std_logic;
ps_phy_mdio_io : inout std_logic;
ps_phy_rx_clk_io : inout std_logic;
ps_phy_rx_ctrl_io : inout std_logic;

www.so-logic.net 2023/01/17 77
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

ps_phy_rxd_io : inout std_logic_vector(3 downto 0);

ps_phy_tx_clk_io : inout std_logic;
ps_phy_tx_ctrl_io : inout std_logic;
ps_phy_txd_io : inout std_logic_vector(3 downto 0);
ps_i2c_scl_io : inout std_logic;
ps_i2c_sda_io : inout std_logic;
ps_led_error_n_io : inout std_logic;
ps_led_front_n_io : inout std_logic_vector(1 downto 0);
ps_led_sdcard_n_io : inout std_logic;
ps_sw0_a_io : inout std_logic;
ps_sw0_b_io : inout std_logic;
ps_sw1_a_io : inout std_logic;
ps_sw1_b_io : inout std_logic;
ps_sw2_a_io : inout std_logic;
ps_sw2_b_io : inout std_logic;
ps_sw3_a_io : inout std_logic;
ps_sw3_b_io : inout std_logic;
ps_uart_rx_io : inout std_logic;
ps_uart_tx_io : inout std_logic;
ps_qspi_cs_n_io : inout std_logic;
ps_qspi_data_io : inout std_logic_vector(3 downto 0);
ps_qspi_clk_io : inout std_logic;
ps_sdio_clk_io : inout std_logic;
ps_sdio_cmd_io : inout std_logic;
ps_sdio_data_io : inout std_logic_vector(3 downto 0);
ps_usb_clk_io : inout std_logic;
ps_usb_data_io : inout std_logic_vector(7 downto 0);
ps_usb_dir_io : inout std_logic;
ps_usb_nxt_io : inout std_logic;
ps_usb_stp_io : inout std_logic
);
end entity;

architecture structural of modulator_sozius_arm is

-- Between architecture and begin is declaration area for types, signals and constants
-- Everything declared here will be visible in the whole architecture

-- declaration for fixed signal PL to PS

signal pl_clk0_s : std_logic;
signal pl_reset_n_s : std_logic;

-- ps signals
signal ps_mio_s : std_logic_vector(53 downto 0);

begin

-- instance of processor system PS

sozius_xz_lab_ps_bd_i: component sozius_xz_lab_ps_bd

port map (
ddr3_addr => ps_ddr3_addr,
ddr3_ba => ps_ddr3_ba,
ddr3_cas_n => ps_ddr3_cas_n,
ddr3_ck_n => ps_ddr3_ck_n,
ddr3_ck_p => ps_ddr3_ck_p,
ddr3_cke => ps_ddr3_cke,
ddr3_cs_n => ps_ddr3_cs_n,
ddr3_dm => ps_ddr3_dm,
ddr3_dq => ps_ddr3_dq,
ddr3_dqs_n => ps_ddr3_dqs_n,
ddr3_dqs_p => ps_ddr3_dqs_p,
ddr3_odt => ps_ddr3_odt,
ddr3_ras_n => ps_ddr3_ras_n,
ddr3_reset_n => ps_ddr3_reset_n,
ddr3_we_n => ps_ddr3_we_n,
fixed_io_ddr_vrn => ps_ddr_vrn,
fixed_io_ddr_vrp => ps_ddr_vrp,
fixed_io_mio => ps_mio_s,
fixed_io_ps_clk => ps_clk_i,
fixed_io_ps_porb => ps_por_n_i,
fixed_io_ps_srstb => ps_srst_n_i,
pl_uart_1_rxd => '0',
pl_uart_1_txd => open,
pl_spi_0_io0_i => '0',
pl_spi_0_io0_o => open,
pl_spi_0_io0_t => open,
pl_spi_0_io1_i => '0',
pl_spi_0_io1_o => open,
pl_spi_0_io1_t => open,
pl_spi_0_sck_i => '0',
pl_spi_0_sck_o => open,
pl_spi_0_sck_t => open,
pl_spi_0_ss1_o => open,
pl_spi_0_ss2_o => open,
pl_spi_0_ss_i => '0',
pl_spi_0_ss_o => open,
pl_spi_0_ss_t => open,
pl_iic_1_scl_i => '0',
pl_iic_1_scl_o => open,

78 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

pl_iic_1_scl_t => open,

pl_iic_1_sda_i => '0',
pl_iic_1_sda_o => open,
pl_iic_1_sda_t => open,
sdio_0_cdn => '1', -- pl_sd_cd_n_i,
usbind_0_port_indctl => open,
usbind_0_vbus_pwrfault => '1', -- pl_usb_fault_n_i,
usbind_0_vbus_pwrselect => open,
pl_clk0 => pl_clk0_s,
pl_reset_n => pl_reset_n_s
);

-- assignment of MIO to board names

ps_mio_s (53) <= ps_phy_mdio_io;

ps_mio_s (52) <= ps_phy_mdc_io;
ps_mio_s (51) <= ps_uart_tx_io;
ps_mio_s (50) <= ps_uart_rx_io;
ps_mio_s (49) <= ps_led_error_n_io;
ps_mio_s (48 downto 47) <= ps_led_front_n_io(1 downto 0);
ps_mio_s (46) <= ps_led_sdcard_n_io;
ps_mio_s (45 downto 42) <= ps_sdio_data_io;
ps_mio_s (41) <= ps_sdio_cmd_io;
ps_mio_s (40) <= ps_sdio_clk_io;
ps_mio_s (39) <= ps_usb_data_io(7);
ps_mio_s (38) <= ps_usb_data_io(6);
ps_mio_s (37) <= ps_usb_data_io(5);
ps_mio_s (36) <= ps_usb_clk_io;
ps_mio_s (35) <= ps_usb_data_io(3);
ps_mio_s (34) <= ps_usb_data_io(2);
ps_mio_s (33) <= ps_usb_data_io(1);
ps_mio_s (32) <= ps_usb_data_io(0);
ps_mio_s (31) <= ps_usb_nxt_io;
ps_mio_s (30) <= ps_usb_stp_io;
ps_mio_s (29) <= ps_usb_dir_io;
ps_mio_s (28) <= ps_usb_data_io(4);
ps_mio_s (27) <= ps_phy_rx_ctrl_io;
ps_mio_s (26 downto 23) <= ps_phy_rxd_io;
ps_mio_s (22) <= ps_phy_rx_clk_io;
ps_mio_s (21) <= ps_phy_tx_ctrl_io;
ps_mio_s (20 downto 17) <= ps_phy_txd_io;
ps_mio_s (16) <= ps_phy_tx_clk_io;
ps_mio_s (15) <= ps_i2c_sda_io;
ps_mio_s (14) <= ps_i2c_scl_io;
ps_mio_s (13) <= ps_sw3_b_io;
ps_mio_s (12) <= ps_sw3_a_io;
ps_mio_s (11) <= ps_sw2_b_io;
ps_mio_s (10) <= ps_sw2_a_io;
ps_mio_s (9) <= ps_sw1_b_io;
ps_mio_s (8) <= ps_sw1_a_io;
ps_mio_s (7) <= ps_sw0_b_io;
ps_mio_s (6) <= ps_qspi_clk_io;
ps_mio_s (5 downto 2) <= ps_qspi_data_io;
ps_mio_s (1) <= ps_qspi_cs_n_io;
ps_mio_s (0) <= ps_sw0_a_io;

end architecture;

sozius_components_package.vhd

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

package sozius_components_package is

component sozius_xz_lab_ps_bd is
port (
pl_clk0 : out std_logic;
pl_reset_n : out std_logic;
ddr3_cas_n : inout std_logic;
ddr3_cke : inout std_logic;
ddr3_ck_n : inout std_logic;
ddr3_ck_p : inout std_logic;
ddr3_cs_n : inout std_logic;
ddr3_reset_n : inout std_logic;
ddr3_odt : inout std_logic;
ddr3_ras_n : inout std_logic;
ddr3_we_n : inout std_logic;
ddr3_ba : inout std_logic_vector ( 2 downto 0 );
ddr3_addr : inout std_logic_vector ( 14 downto 0 );
ddr3_dm : inout std_logic_vector ( 3 downto 0 );
ddr3_dq : inout std_logic_vector ( 31 downto 0 );
ddr3_dqs_n : inout std_logic_vector ( 3 downto 0 );

www.so-logic.net 2023/01/17 79
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

ddr3_dqs_p : inout std_logic_vector ( 3 downto 0 );

fixed_io_mio : inout std_logic_vector ( 53 downto 0 );
fixed_io_ddr_vrn : inout std_logic;
fixed_io_ddr_vrp : inout std_logic;
fixed_io_ps_srstb : inout std_logic;
fixed_io_ps_clk : inout std_logic;
fixed_io_ps_porb : inout std_logic;
sdio_0_cdn : in std_logic;
usbind_0_port_indctl : out std_logic_vector ( 1 downto 0 );
usbind_0_vbus_pwrselect : out std_logic;
usbind_0_vbus_pwrfault : in std_logic;
pl_iic_1_sda_i : in std_logic;
pl_iic_1_sda_o : out std_logic;
pl_iic_1_sda_t : out std_logic;
pl_iic_1_scl_i : in std_logic;
pl_iic_1_scl_o : out std_logic;
pl_iic_1_scl_t : out std_logic;
pl_spi_0_sck_i : in std_logic;
pl_spi_0_sck_o : out std_logic;
pl_spi_0_sck_t : out std_logic;
pl_spi_0_io0_i : in std_logic;
pl_spi_0_io0_o : out std_logic;
pl_spi_0_io0_t : out std_logic;
pl_spi_0_io1_i : in std_logic;
pl_spi_0_io1_o : out std_logic;
pl_spi_0_io1_t : out std_logic;
pl_spi_0_ss_i : in std_logic;
pl_spi_0_ss_o : out std_logic;
pl_spi_0_ss1_o : out std_logic;
pl_spi_0_ss2_o : out std_logic;
pl_spi_0_ss_t : out std_logic;
pl_uart_1_txd : out std_logic;
pl_uart_1_rxd : in std_logic
);
end component;

component sozius_xz_ps_bd is

Add modulator_sozius_arm rtl.vhd and sozius_components_package.vhd Source Files into the

Project

- When we nished with the modulator_sozius_arm_rtl.vhd and sozius_components_package.vhd

les creation, add them into the "modulator_hls" project using Flow Navigator Add Sources option.

Note : To create and add these modules, use steps for creating modules, explained in the Chapter 2.4.1 Creating
a Module Using Vivado Text Editor, in the Basic FPGA Tutorial.

Congure Zynq PS Part to work on Sozius Development Board

Next, we must congure the Zynq PS part to work on Sozius development board.

This includes a number of conguration steps, one of them being the proper conguration of the PS GPIO
module to connect to the LEDs and switches that are present on the Sozius board.

All these PS conguration steps can be done using the Vivado GUI, by creating a block design. Since this task
includes a lot of manual settings of the Zynq PS, a better approach would be to do this manual conguration
only once and then to create a Tcl script le that can be used in all future congurations of the Zynq PS part.

The Tcl script that should be used to correctly congure Zynq PS to work on Sozius board is sozius_xz_lab_ps_bd.tcl.

This Tcl script le is too long to be shown in the tutorial, so ask your instructor for details.

- Execute the presented Tcl le in the Vivado IDE.

Execute Tcl File in the Vivado IDE

- Go to the Tcl console window in the Vivado IDE and type the following and press enter:

source <path>/sozius_xz_lab_ps_bd.tcl

<path> stands for the full path to the folder where the sozius_xz_lab_ps_bd.tcl Tcl le is stored.

80 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Figure 3.2: Tcl Console window

Block Diagram after Vivado Tcl Script Execution

After Vivado has nished with the Tcl script execution, a created block diagram containing Zynq PS will be
visible in the Vivado IDE.

Figure 3.3: Block diagram of Zynq PS congured to run on Sozius board

Add All the Necessary IPs into the Design Canvas

- The next step will be to add all the necessary IPs into the design canvas.

Note : For details how to add necessary IPs into the Vivado project, please look at the "Chapter 11.1 IP
Integrator" in the "Basic FPGA Tutorial".

Add the following IPs into the design canvas:

hls_modulator_v1.0 (modulator_0 )

Binary Counter (c_counter_binary_0 )

ILA (Integrated Logic Analyzer) (ila_0 )

VIO (Virtual Input/Output) (vio_0 )

Utility Vector Logic (util_vector_logic_0 )

2x Constant (xconstant )

www.so-logic.net 2023/01/17 81
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Block Diagram with Added Necessary IPs

After adding all the necessary IPs into the design canvas, it should look like the same as it is shown on the
following gure.

Figure 3.4: IP Integrator design canvas with instantiated all the necessary IPs

The next step will be to make all the necessary IP re-customizations.

- Double-click on the Binary Counter (c_counter_binary_0 ) IP and in the Binary Counter (12.0) Re-
customization IP dialog box set the following parameters:

in the Basic tab:

set Output Width value to 32, see Figure 3.5 and
in the Control tab:
enable Clock Enable (CE) and Synchronous Clear (SCLR) options, see Figure 3.6 and click
OK.

82 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Binary Counter (c_counter_binary_0 ) Re-customization - Basic Tab

Figure 3.5: Binary Counter (12.0) re-customization IP dialog box - Basic tab

www.so-logic.net 2023/01/17 83
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Binary Counter (c_counter_binary_0 ) Re-customization - Control Tab

Figure 3.6: Binary Counter (12.0) re-customization IP dialog box - Control tab

- Double-click on the Utility Vector Logic ((util_vector_logic_0 ) IP and in the Utility Vector Logic (2.0)
Re-customization IP dialog box set the following parameters:

change the C_OPERATION to not and

set the C_SIZE to be 1, see Figure 3.7, and
click OK.

84 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Utility Vector Logic (util_vector_logic_0 ) Re-customization

Figure 3.7: Utility Vector Logic (2.0) re-customization IP dialog box

- Double-click on the ILA IP and in the ILA (Integrated Logic Analzyer (6.2)) dialog box, in the General
Options, set the following parameters:
select Native as Monitor Type
set 2 as Number of Probes
enable Capture Control option in the Trigger And Storage Settings section, as it is shown on the
Figure 3.8.

www.so-logic.net 2023/01/17 85
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

ILA (Integrated Logic Analzyer) (ila_0 ) Re-customization - General Options Tab

Figure 3.8: ILA (Integrated Logic Analyzer (6.2)) Re-customize IP dialog box - General Options

and in the Probe Ports(0..7), set the following parameters:

set 32 bits as Probe Width[1..4096] value of PROBE0 probe, as it is shown on the Figure 3.9, and
click OK.

86 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

ILA (Integrated Logic Analzyer) (ila_0 ) Re-customization - Probe Ports(0..1) Tab

Figure 3.9: ILA (Integrated Logic Analyzer (6.2)) Re-customize IP dialog box - Probe Ports(0..7)

- In case of 2x Constant (xconstant ) IP cores:

leave Constant (xconstant_0 ) IP core with default values, where Const Val value is 1, and
in the Constant (xconstant_1 ) IP core, change the Const Val value to be 0, see Figure 3.10.

www.so-logic.net 2023/01/17 87
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Constant xconstant Re-customization

Figure 3.10: Constant Re-customize IP dialog box

VIO (Virtual Input/Output) (vio_0 ) and hls_modulator_v1.0 (modulator_0 ) Re-customizations

- In case of VIO (Virtual Input/Output) (vio_0 ) and hls_modulator_v1.0 (modulator_0 ) IP cores,

leave all parameters unchanged.

After all the necessary IP re-customizations, the IP Integrator design canvas should look as it is shown on the
Figure 3.11.

Figure 3.11: IP Integrator design canvas after all necessary IP re-customizations

- Next step will be to manually connect the IPs. Connect all the IPs on the same way as it is shown on the
Figure 3.12.

88 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Manually Connect the IPs

Figure 3.12: IP Integrator design canvas with manually connected IPs

Note : As you can see from the Figure 3.12 we connected all the IPs, except:

all clock ports, and

ap_done, ap_idle and ip_ready ports of the hls_modulator_v1.0 ap_ctrl bus.

- In the IP Integrator window, click the Run Connection Automation link and the list of the ports/interfaces
that can use the Connection Automation feature will show up.

- In the Run Connection Automation dialog box enable All Automation (0 out of 4 selected) and click
OK.

After running the connection automation, the connections will be made and highlighted in the IP Integrator
design canvas, see Figure 3.13.

www.so-logic.net 2023/01/17 89
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Run Connection Automation

Figure 3.13: IP Integrator design canvas after running connection automation

Validate and Save Block Design

- From the sidebar menu of the design canvas, run the IP subsystem design rule checks by clicking the Validate
Design button.

- In the Validate Design dialog box, click OK.

- At this point, you should save the IP integrator design.

Use the File -> Save Block Design command from the main menu to save the design.

Create and Add Constraints File

- Now is the time to create constraints le for the Socius board, sozius_xz_modulator_vio.xdc.

Open Vivado text editor, copy your constraints code in it or write directly in it and save the constraints le in
your working directory.

sozius_xz_modulator_vio.xdc constraints le into the modulator_hls project using Flow Navigator
- Add
Add Sources -> Add constraints option.

The complete sozius_xz_modulator_vio.xdc constraints le you can nd in the text below.

Note : If you do not know how to add constraints les to the project, please see Sub-chapter 9.1 Creating XDC
File for XDC constraints le, in the Basic FPGA Tutorial.

sozius_xz_modulator_vio.xdc

# set properties for bitstream genration

set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design]
#set_property BITSTREAM.GENERAL.XADCENHANCEDLINEARITY ON [current_design]
#set_property BITSTREAM.GENERAL.XADCPOWERDOWN ENABLE [current_design]

# set configuration bank voltages

set_property CFGBVS VCCO [current_design]
set_property CONFIG_VOLTAGE 3.3 [current_design]

# set condition for power analyzer

set_operating_conditions -ambient_temp 50
set_operating_conditions -board small
set_operating_conditions -airflow 250

90 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

set_operating_conditions -heatsink low

set_operating_conditions -board_layers 12to15

# pins must be implemented !!

set_property PACKAGE_PIN V13 [get_ports pl_phy_reset_n_o]
set_property IOSTANDARD LVCMOS33 [get_ports pl_phy_reset_n_o]

Synthesize, Implement, Generate Bitstream File and Program Device

- Synthesize your design with Run Synthesis option from the Flow Navigator / Synthesis (see "Sub-chapter
7.5.2 Run Synthesis", in the "Basic FPGA Tutorial" ).

- Implement your design with Run Implementation option from the Flow Navigator / Implementation
(see "Sub-chapter 9.2.2 Run Implementation", in the "Basic FPGA Tutorial" ).

- Generate bitstream le with "Generate Bitstream" option from the Flow Navigator / Program and
Debug (see Sub-Chapter 9.3 Generate Bitstream File, in the "Basic FPGA Tutorial" ).

- Program your Sozius device (see "Sub-Chapter 9.4 Program Device", in the "Basic FPGA Tutorial" ).

Note : Because of using Sozius development board, it is necessary to arm the processor that is present on the
board. To arm the processor you have to run some dummy application (e.g. hello world) using the SDK tool.
If you are not familiar with this procedure, open the "Sub-Chapter 9.4 Program Device", in the "Basic FPGA
Tutorial" where you can nd all of these information and more.

3.3 Debug the Design with Included Developed IP Core

Debug the Design with Included Developed IP Core

Vivado Logic Analyzer is an integrated logic analyzer in the Vivado Design Suite.

In this chapter you will learn how to debug your ARM-based system using the Vivado logic analyzer and you
will take advantage of it's functions to debug and discover some potential root causes of your design.

The next step in our design process is to set up the ILA core. When the debug cores are detected upon
refreshing a hardware device, the default dashboard for each debug core is automatically opened. The default
ILA Dashboard can be seen on the Figure 3.14.

www.so-logic.net 2023/01/17 91
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Set Up the ILA Core

Figure 3.14: ILA Dashboard

- Open the VIO dashboard by clicking the hw_vios tab and press blue + button in the middle of the VIO
dashboard to add the probes.

- In the Add Probes window select both oered Net and vio_0_probe_out0 probes and click OK, see
Figure 3.15.

Add Probes to the VIO Dashboard

Figure 3.15: Add Probes to the VIO window

92 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Note : In the VIO Probes modulator_0_pwm_o

window, you can observe the rate of change of the
modulator_0_pwm_o signal by changing the value of
(PWM) signal. You can change the frequency of the
the vio_0_probe_out0 probe from 0 to 1 and from 1 to 0. The default vio_0_probe_out0 value is 0.

Add Probes to the Trigger Setup Window

- Turn back to the ILA dashboard by clicking the hw_ila_1 tab and in the Trigger Setup window press blue
+ button in the middle to add the probes.

- In the Add Probes window select modulator_0_pwm_o_1 probe and click OK.

Set the Compare Values in the Trigger Setup Window

Now, when the ILA debug probe Net_1 is in the Trigger Setup window, we will create trigger conditions
and debug probe compare values.

- In the Trigger Setup window, leave == (equal) value in the Operator cell, [H] (Hexadecimal) value in
the Radix cell and set the Value parameter to be 0 (logical zero).

Figure 3.16: Changing the Compare Values in the Trigger Setup window

Change the Capture Mode in the ILA Settings

- In the ILA Settings window, change the Capture mode to be BASIC in the Capture Mode Settings
section.

Figure 3.17: ILA Settings window

Add Probes to the Capture Setup Window

- In the Capture Setup window press blue + button in the middle to add the probes.

- In the Add Probes window select only modulator_0_pwm_o_1 probe and click OK.

www.so-logic.net 2023/01/17 93
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Set the Compare Values in the Capture Setup Window

Capture Setup window, leave == (equal) value in the Operator cell, [B] (Binary)
- In the value in the
Radix cell and set the Value parameter to be F (1-to-0 transition).

Figure 3.18: Changing the Compare Values in the Capture Setup window

Run ILA Core Trigger

- After we set all the ILA core parameters, we can run or arming the ILA core trigger.

Once the ILA core captured data has been uploaded to the Vivado IDE, it is displayed in the Waveform
Viewer.

Note : After triggering the ILA core, in the waveform viewer change the c_counter_binary_0_Q[31:0] Wave-
form Style from Digital to Analog, and your captured waveform should look like as the waveform on the
following gure.

Captured Waveform of the Sine Signal when vio_0_probe_out0 =0

Figure 3.19: Captured waveform of the sine signal, when sel=0

Change the Value of the vio_0_probe_out0 Signal and Arm the Trigger

- Turn back to the VIO Probes window and change the Value of the vio_0_probe_out0 signal from 0 to 1.

- Arm the trigger ones more and after triggering the ILA core your captured waveform should look like as the
waveform on the following gure.

94 2023/01/17 www.so-logic.net
CHAPTER 3. USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE

Captured Waveform of the Sine Signal when vio_0_probe_out0 =1

Figure 3.20: Captured waveform of the sine signal, when sel=1

By comparing the waveforms shown on Figures 3.19 and 3.20 we can observe that they dier in the amplitude
value. This is expected since the waveforms actually represent the width of the PWM pulse generated by the
modulator module. Since the frequencies of two generated PWM signals dier (one has a frequency of 1 Hz
and the other of 3.5 Hz) and the PWM pulse width measurement module always uses the same frequency for
measuring the duration of the PWM pulse, when the PWM frequency increases the duration of the PWM pulse
will decrease, therefore decreasing the amplitude of the output signal of the PWM pulse width measurement
module.

www.so-logic.net 2023/01/17 95

Vivado HLS
No ratings yet
Vivado HLS
110 pages
Basic HLS Tutorial
No ratings yet
Basic HLS Tutorial
84 pages
Ug902 Vivado High Level Synthesis
100% (1)
Ug902 Vivado High Level Synthesis
673 pages
FPGA
No ratings yet
FPGA
9 pages
24 Vivado HLS Intro
No ratings yet
24 Vivado HLS Intro
34 pages
Heterogen Vivado Hls
No ratings yet
Heterogen Vivado Hls
341 pages
Ug902 2
No ratings yet
Ug902 2
3 pages
Section 1HLS Overview Powerpoint
No ratings yet
Section 1HLS Overview Powerpoint
28 pages
Introduction To High-Level Synthesis With Vivado HLS
No ratings yet
Introduction To High-Level Synthesis With Vivado HLS
39 pages
Lecture05 - High-Level Digital Design Automation
No ratings yet
Lecture05 - High-Level Digital Design Automation
36 pages
Vivado High Level Synthesis Tutorial
No ratings yet
Vivado High Level Synthesis Tutorial
280 pages
HLS Tutorial
No ratings yet
HLS Tutorial
42 pages
Introduction To High-Level Synthesis With Vivado HLS
No ratings yet
Introduction To High-Level Synthesis With Vivado HLS
39 pages
Ug902 4
No ratings yet
Ug902 4
3 pages
Basic Embedded System Design Tutorial-2022.2
No ratings yet
Basic Embedded System Design Tutorial-2022.2
143 pages
Ug908 Vivado Programming Debugging
No ratings yet
Ug908 Vivado Programming Debugging
105 pages
Vivado HLS Tutorial UG871
No ratings yet
Vivado HLS Tutorial UG871
271 pages
System-On-Chip Design Using High-Level Synthesis Tools
No ratings yet
System-On-Chip Design Using High-Level Synthesis Tools
9 pages
Ug908 Vivado Programming Debugging
No ratings yet
Ug908 Vivado Programming Debugging
130 pages
HLS Introduction Gajski Design and Test
No ratings yet
HLS Introduction Gajski Design and Test
10 pages
High-Level Synthesis of Vlsis: Theda
No ratings yet
High-Level Synthesis of Vlsis: Theda
40 pages
Ug902 Vivado High Level Synthesis
No ratings yet
Ug902 Vivado High Level Synthesis
672 pages
Ug871 Vivado High Level Synthesis Tutorial
No ratings yet
Ug871 Vivado High Level Synthesis Tutorial
272 pages
VivadoHLS Overview PDF
No ratings yet
VivadoHLS Overview PDF
43 pages
Evaluation of High Level Synthesis Frameworks: Analysis Across Three Programming Paradigms
No ratings yet
Evaluation of High Level Synthesis Frameworks: Analysis Across Three Programming Paradigms
79 pages
Course Plan For DSD
No ratings yet
Course Plan For DSD
6 pages
Hls Survey PDF
No ratings yet
Hls Survey PDF
27 pages
Ug901 Vivado Synthesis
No ratings yet
Ug901 Vivado Synthesis
161 pages
Course5 System Design Flow On Zynq Zybo Lab
No ratings yet
Course5 System Design Flow On Zynq Zybo Lab
121 pages
High Level Synthesis With Catapultc: Michal Stala
No ratings yet
High Level Synthesis With Catapultc: Michal Stala
29 pages
Xilinx Vivado High Level Synthesis Case Studies
No ratings yet
Xilinx Vivado High Level Synthesis Case Studies
5 pages
Fpga Guide
No ratings yet
Fpga Guide
263 pages
Creating A Processor System Lab
No ratings yet
Creating A Processor System Lab
28 pages
ECE699 Lecture 12
No ratings yet
ECE699 Lecture 12
48 pages
005 Fpgades
No ratings yet
005 Fpgades
24 pages
Chapter 8 Bee 3113
No ratings yet
Chapter 8 Bee 3113
52 pages
VLSI LAB Manual - Updated
No ratings yet
VLSI LAB Manual - Updated
88 pages
Ug902 1
No ratings yet
Ug902 1
3 pages
Ug897 Vivado Sysgen User
No ratings yet
Ug897 Vivado Sysgen User
256 pages
Vivado Design Suite Synthesis
No ratings yet
Vivado Design Suite Synthesis
293 pages
High-Level Synthesis: Hao Zheng Comp Sci & Eng University of South Florida
No ratings yet
High-Level Synthesis: Hao Zheng Comp Sci & Eng University of South Florida
26 pages
VHDL Sim Syn Soc
No ratings yet
VHDL Sim Syn Soc
37 pages
02 Vivado Tutorial I
No ratings yet
02 Vivado Tutorial I
67 pages
Vivado HLS User Guide Overview
No ratings yet
Vivado HLS User Guide Overview
3 pages
Run Fast With Vivado HLS
No ratings yet
Run Fast With Vivado HLS
4 pages
CESE4040 - Processor Design Project Guide
No ratings yet
CESE4040 - Processor Design Project Guide
32 pages
Design and Verification Using High-Level Synthesis: Andres Takach
No ratings yet
Design and Verification Using High-Level Synthesis: Andres Takach
6 pages
Vlsi Lab Manual
No ratings yet
Vlsi Lab Manual
98 pages
High Level Synthesis A Use Case Comparison With Hardware Descrip
No ratings yet
High Level Synthesis A Use Case Comparison With Hardware Descrip
36 pages
High-Level Synthesis Overview
No ratings yet
High-Level Synthesis Overview
28 pages
Lecture Notes and Practical Courses On System On Chip
No ratings yet
Lecture Notes and Practical Courses On System On Chip
160 pages
Ug897 Vivado Sysgen User
No ratings yet
Ug897 Vivado Sysgen User
191 pages
Vivado FPGA Training Guide
No ratings yet
Vivado FPGA Training Guide
86 pages
Ug871 Vivado High Level Synthesis Tutorial
No ratings yet
Ug871 Vivado High Level Synthesis Tutorial
264 pages
Lecture 6 - RTL Synthesis
100% (2)
Lecture 6 - RTL Synthesis
72 pages
Physical Sinthesys Tutorial PDF
No ratings yet
Physical Sinthesys Tutorial PDF
40 pages
HLS Tips and Tricks
No ratings yet
HLS Tips and Tricks
23 pages
VHDL From Scratch To Expert
No ratings yet
VHDL From Scratch To Expert
13 pages
TA3 Workbook Answer Key - Module 1
No ratings yet
TA3 Workbook Answer Key - Module 1
5 pages
ch16 Plane Frame Analysis Using The Stiffness Method (For Student) (Compatibility Mode)
No ratings yet
ch16 Plane Frame Analysis Using The Stiffness Method (For Student) (Compatibility Mode)
33 pages
English Test (26 August) (Narration)
No ratings yet
English Test (26 August) (Narration)
16 pages
Ghaus-ul-Azam, The Greatest Saint of All Time
No ratings yet
Ghaus-ul-Azam, The Greatest Saint of All Time
7 pages
Scienceofetymolo 00 Skeauoft
No ratings yet
Scienceofetymolo 00 Skeauoft
274 pages
Occult Symbolism Explained
100% (1)
Occult Symbolism Explained
26 pages
Helen Keller FREE English Sketchnotes Worksheets-C
No ratings yet
Helen Keller FREE English Sketchnotes Worksheets-C
7 pages
CPP STL Containers
No ratings yet
CPP STL Containers
1 page
Chapter3 DataLinkLayer
100% (1)
Chapter3 DataLinkLayer
147 pages
MML Commands
No ratings yet
MML Commands
5 pages
Toeic Speaking: Part 1: Questions 1-2
No ratings yet
Toeic Speaking: Part 1: Questions 1-2
21 pages
Shell Scripting Interview Questions and Answers
No ratings yet
Shell Scripting Interview Questions and Answers
7 pages
Lesson 6 Comparing Numbers Up To 10 000
No ratings yet
Lesson 6 Comparing Numbers Up To 10 000
13 pages
Grammar Test 2024
No ratings yet
Grammar Test 2024
4 pages
Grade 4 Session 1
No ratings yet
Grade 4 Session 1
29 pages
CMIS320-Project 2-DatabaseDesign
No ratings yet
CMIS320-Project 2-DatabaseDesign
5 pages
Exam Long Questions
No ratings yet
Exam Long Questions
8 pages
Huawei HCIP Cloud Exam Prep
100% (1)
Huawei HCIP Cloud Exam Prep
11 pages
Linguistic Profile: Manambu Language
No ratings yet
Linguistic Profile: Manambu Language
25 pages
Foucault's Discourse Analysis Notes
No ratings yet
Foucault's Discourse Analysis Notes
14 pages
Articles: I/ The Indefinite Articles "A" and "An"
No ratings yet
Articles: I/ The Indefinite Articles "A" and "An"
2 pages
Google - Professional Machine Learning Engineer.v2024 10 23.q109
No ratings yet
Google - Professional Machine Learning Engineer.v2024 10 23.q109
120 pages
MYTHOLOGY and FOLKLORE
No ratings yet
MYTHOLOGY and FOLKLORE
59 pages
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
No ratings yet
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
9 pages
Mu6Me-Iia-1: Cabolutan Elementary School Cabolutan, San Agustin Romblon Lesson Plan
No ratings yet
Mu6Me-Iia-1: Cabolutan Elementary School Cabolutan, San Agustin Romblon Lesson Plan
2 pages
Samsung India Tax Invoice B2C 2024
No ratings yet
Samsung India Tax Invoice B2C 2024
1 page
CELTA Pre-Interview Task Guide
No ratings yet
CELTA Pre-Interview Task Guide
6 pages
Purification
No ratings yet
Purification
2 pages
The Noun - Eng
No ratings yet
The Noun - Eng
2 pages
SIM7500 - SIM7600 - SIM7800 Series - FTPS - AT Command Manual - V1.00
No ratings yet
SIM7500 - SIM7600 - SIM7800 Series - FTPS - AT Command Manual - V1.00
29 pages

Basic HLS Tutorial-2022.2

Uploaded by

Basic HLS Tutorial-2022.2

Uploaded by

Basic HLS Tutorial

2 DEVELOPING CUSTOM IP CORE USING HLS 17

3 USING DEVELOPED IP CORE IN VIVADO DESIGN SUITE 75

1.2 Purpose of this Tutorial

The following project is designed for:

 Designing Surface: VIVADO 2022.2

After completing this tutorial, you will be able to:

 Launch and navigate the Vivado High-Level Synthesis (HLS) tool

 Create a project using New Project Creation Wizard

 Develop a C algorithm for your design

 Verify a C algorithm of your design

 Synthesize a C algorithm into an RTL implementation (High-Level Synthesis)

 Generate reports and analyze the design

 Verify the RTL implementation

 Package the RTL implementations

1.3 Objectives of this Tutorial

Figure 1.1: Example of the PWM signal

1.4 One Possible Solution for the Modulator Design

Figure 1.2: Sine wave with 256 samples

1.5 About HLS

 Improved productivity for hardware designers

 Improved system performance for software designers

HLS Design Methodology

Using a high-level synthesis design methodology allows you to:

 Develop algorithms at the C-level

 Verify at the C-level

 Control the C synthesis process through optimization directives

Create specic high-performance hardware implementations.

 Create readable and portable C source code

High-level synthesis includes the following phases:

 Length of the clock cycle or clock frequency

 User-specied optimization directives

 Control logic extraction

1.6 Design Steps

Figure 1.3: Design Steps

1. First, we will develop algorithm at the C-level.

2. Then we will verify the algorithm at the C-level.

3. After that, we will synthesize the C algorithm into an RTL implementation.

4. Then, we will generate comprehensive reports and analyze the design.

5. Then, we verify the RTL implementation.

6. At the end, package the RTL implementation into a selection of IP formats.

1.7 Vivado HLS Design Flow

HLS Design Flow

Figure 1.4: Vivado HLS Design Flow

Inputs and Outputs

Vivado HLS Inputs

Following are the inputs to Vivado HLS:

 C function written in C, C++, SystemC, or an OpenCL API C kernel

 C test bench and any associated les

Vivado HLS Outputs

Following are the outputs from Vivado HLS:

 RTL implementation les in hardware description language (HDL) formats

 VHDL (IEEE 1076-2000)

 Verilog (IEEE 1364-2001)

This output is the result of synthesis, C/RTL co-simulation, and IP packaging.

Test Bench, Language Support, and C Libraries

Vivado HLS Rules

 Only one function is allowed as the top-level function for synthesis.

Vivado HLS Test Bench

Vivado HLS Language Support

Vivado HLS supports the following standards for C compilation/simulation:

 ANSI-C (GCC 4.6)

 C++ (G++ 4.6)

 SystemC (IEEE 1666-2006, version 2.2)

C, C++, and SystemC Language Constructs

 Dynamic memory allocation

 Operating system (OS) operations

OpenCL API C Language Constructs

Vivado HLS C Libraries

 Arbitrary precision data types

 Half-precision (16-bit) oating-point data types

Synthesis, Optimization, and Analysis

Vivado HLS Synthesis, Optimization, and Analysis Steps

1. Create a project with an initial solution.

2. Verify the C simulation executes without error.

3. Run synthesis to obtain a set of results.

Designing Surface: VIVADO 2022.2

Launch and navigate the Vivado High-Level Synthesis (HLS) tool

Create a project using New Project Creation Wizard

Develop a C algorithm for your design

Verify a C algorithm of your design

Synthesize a C algorithm into an RTL implementation (High-Level Synthesis)

Generate reports and analyze the design

Verify the RTL implementation

Package the RTL implementations

Improved productivity for hardware designers

Improved system performance for software designers

Develop algorithms at the C-level

Verify at the C-level

Control the C synthesis process through optimization directives

Create specic high-performance hardware implementations.

Create readable and portable C source code

Length of the clock cycle or clock frequency

User-specied optimization directives

Control logic extraction

C function written in C, C++, SystemC, or an OpenCL API C kernel

C test bench and any associated les

RTL implementation les in hardware description language (HDL) formats

VHDL (IEEE 1076-2000)

Verilog (IEEE 1364-2001)

Only one function is allowed as the top-level function for synthesis.

ANSI-C (GCC 4.6)

C++ (G++ 4.6)

SystemC (IEEE 1666-2006, version 2.2)

Dynamic memory allocation

Operating system (OS) operations

Arbitrary precision data types

Half-precision (16-bit) oating-point data types

Specify a latency for the completion of functions, loops, and regions.

Specify a limit on the number of resources used.

Vivado HLS RTL Verication

Vivado Simulator (XSim)

System Generator for DSP

Synthesized Checkpoint (.dcp)

Create Project - Launch the project setup wizard.

Specify modulator as the top-level function in the Top Function eld.

Figure 2.5: Add/Remove Files dialog box with added le

Figure 2.7: Save As dialog box with testbench le

Figure 2.9: Solution Conguration dialog box

Figure 2.11: Solution Conguration dialog box with selected board

A Vivado HLS project arranges information in a hierarchical form

Flow navigator Pane

In the Vivado HLS GUI you can also nd:

Figure 2.14: Source folder with modulator.cpp le

Figure 2.15: Test Bench folder with modulator_tb.cpp le

2. Post-synthesis verication that veries the RTL is correct.

The C executable le csim.exe is created and run in this folder

The folder csim/report contains a log le of the C simulation.