Video Processing with FPGA’s
Course Prepared by
Digitronix Nepal
www.digitronixnepal.com
Video Processing Masterclass with FPGA
Section 1. Overview of High Level Synthesis Tool
Lecture 1 : HLS Introduction, VIVADO HLS Overview
Lecture 2 : HLS Design Flow, review of C/C++ on HLS and HLS Libraries
Lecture 3 : Lab1: Counter Design and Synthesizing in VIVADO HLS
Video Processing Masterclass with FPGA
Objective of the Section
After Completing this section you will be able to:
• Describe and explain about High Level Synthesis Tools
• Explain about HLS design flow, HLS Constructs and Libraries
• Design Counter (basic circuit) in HLS.
Video Processing Masterclass with FPGA
Lecture 1 : HLS Introduction, VIVADO HLS
Overview
• This is a tool for synthesis of digital hardware directly from a high level
description developed in C, C++, and can generate(Create Synthesis)
VHDL/Verilog/ and System C Source form the C/C++ source.
• The defining aspect of HLS is that the designed functionality and its
hardware implementation are kept separate — the C-based description
does not implicitly fix the hardware architecture, as is inherently true in
RTL-level design — and this provides great flexibility.
• the HLS process provides an integrated mechanism for generating and
assessing variations on the hardware implementation, making it easy and
convenient to find the best architecture.
Video Processing Masterclass with FPGA
Need for High-Level Synthesis
• Algorithmic-based approaches are getting popular due to accelerated
design time and time to market (TTM)
• Larger designs pose challenges in design and verification of hardware at HDL level
• Industry trend is moving towards hardware acceleration to enhance
performance and productivity
• CPU-intensive tasks can be offloaded to hardware accelerator in FPGA
• Hardware accelerators require a lot of time to understand and design
• Vivado HLS tool converts algorithmic description written in C-based design
flow into hardware description (RTL)
• Elevates the abstraction level from RTL to algorithms
• High-level synthesis is essential for maintaining design productivity for
large designs
Video Processing Masterclass with FPGA
High-Level Synthesis: HLS
➢High-Level Synthesis ………………
• C/C++/OpenCL source converts into RTL ………………
C, C++,
(VHDL/Verilog) or IP format. SystemC
Constraints/ Directives
• Extracts control and dataflow from the
source code.
• User can apply directives for implementing Vivado HLS
the design, setting up the pragma for
optimization or control.
➢HLS: ………………
………………
VHDL
• Creating project on modules, integrating Verilog
System C
the library files (header files)
• Enables design optimization for resources
utilization and latency of the project. RTL Export
IP-XACT Sys Gen PCore
Video Processing
Masterclass with FPGA
VIVADO HLS Tool Overview:
• HLS is developed for implementing complex signal processing and
mathematical implementation on FPGA, while this implementation is
quite complex on HDL.
• HLS converts the C/C++ source in to HDL source , i.e
VHDL/Verilog/SystemC
• There are many libraries and functions for signal processing and math
computation on HLS.
Video Processing Masterclass with FPGA
Invoke Vivado HLS from Windows Menu
The first step is to open or create a
project
Video Processing Masterclass with FPGA
Vivado HLS GUI
Information
Auxiliary Pane
Pane
Project
Explorer
Pane
Console
Pane
12- 9
Video Processing Masterclass with FPGA
Vivado HLS Projects and Solutions
• Vivado HLS is project based
• A project specifies the C/C++/OpenCL code which will be synthesized
• Each project is based on one set of source code or main module and
project can have user defined name Source
• A project can contain multiple solutions
• Solutions are different implementations of the same source code
• Solution auto-named as solution1, solution2, etc. or can have user
defined names
• Each Solutions can have different clock frequencies, target boards,
synthesis directives
Project Level Solution Level
• Projects and solutions are stored in a hierarchical directory
structure
• Top-level is the project directory
• The local storage disk directory structure is identical to the structure
shown in the GUI project explorer (except for source code location) Reference: Xilinx
12- 10
Video Processing Masterclass with FPGA
Vivado HLS Step 1: Create or Open a project
• Start a new project
• The GUI will start the project wizard to guide you through all the steps
Optionally use the Toolbar Button to
Open New Project
• Open an existing project
• All results, reports and directives are automatically saved/remembered
• Use “Recent Project” menu for quick access
12- 11
Video Processing Masterclass with FPGA
Lecture 2 : HLS Design Flow, review of C/C++
on HLS and HLS Libraries
• All the C/C++ standard libraries can be invoked(call/used) on the
C/C++ sources.
• Aside of Standard C/C++ library there are HLS libraries for
• Video Processing
• Signal Processing
• Mathematical calculations
• HLS is highly preferred for algorithm implementation of signal
processing, Machine Vision/Neural Net etc.
Video Processing Masterclass with FPGA
HLS Design Flow:
………………
………………
C, C++, Constraints/ Directives
SystemC
Vivado HLS
………………
………………
VHDL
Verilog
System C
RTL Export
IP-XACT Sys Gen PCore
Reference: Xilinx
Video Processing Masterclass with FPGA
The Key Attributes of C code
Functions: functions in the source code represent the design hierarchy: the same
void fir ( in hardware
data_t *y,
coef_t c[4],
data_t x Top Level IO : The arguments of the top-level function of source code determine
){ the hardware RTL (VHDL/Verilog) interface ports of input, output or in/out.
static data_t shift_reg[4];
acc_t acc; Data-Types: All variables are of a defined type. Different types of datatype can influence
int i; the area and performance. As some data type are 8 bit some are 16 bit or more.
acc=0;
loop: for (i=3;i>=0;i--) { Loops: Functions on the source model may contain loops. Handling of loops can have a
if (i==0) {
acc+=x*c[0]; major impact on area and performance as loops take large number of LUT and FF..
shift_reg[0]=x;
} else {
shift_reg[i]=shift_reg[i-1]; Arrays: Arrays on source code or module can influence the device IO and become
acc+=shift_reg[i] * c[i];
}
performance bottlenecks. Array must be defined the specific size, undefined array
} wont support on HLS.
*y=acc;
}
Operators: Operators in the source code or module may require sharing to control
area or specific hardware implementations to meet performance. Operations
consume LUT, so the use of operator for operation also play role on resource
consumption and performance standards.
The resource or control sharing can be planned as well as pipelined on HLS.
Reference: Xilinx
Video Processing Masterclass with FPGA
Functions & RTL Hierarchy
• Each function is translated into an RTL block
• Verilog module, VHDL entity
Source Code RTL hierarchy
void A() { ..body A..}
void B() { ..body B..} foo_top
void C() { C
B(); B
} A
void D() {
B();
} D
B
void foo_top() {
A(…);
C(…);
D(…)
} my_code.c
• Functions may be inlined to dissolve their hierarchy
• Small functions may be automatically inlined Reference: Xilinx
Video Processing Masterclass with FPGA
Types = Operator Bit-sizes
Code Operations Types
void fir (
data_t *y, Standard C types
coef_t c[4],
data_t x
){ RDx long long (64-bit) short (16-bit) unsigned types
RDc int (32-bit) char (8-bit)
static data_t shift_reg[4];
acc_t acc; float (32-bit) double (64-bit)
int i; >=
acc=0; -
loop: for (i=3;i>=0;i--) {
if (i==0) { == Arbitary Precision types
acc+=x*c[0];
shift_reg[0]=x; + C: ap(u)int types (1-1024)
} else {
shift_reg[i]=shift_reg[i-1]; * C++: ap_(u)int types (1-1024)
acc+=shift_reg[i]*c[i];
+ ap_fixed types
} C++/SystemC: sc_(u)int types (1-1024)
}
*y=acc;
* sc_fixed types
} WRy Can be used to define any variable to be a specific bit-
width (e.g. 17-bit, 47-bit etc).
From any C code example Operations are The C types define the size of the hardware
... extracted… used: handled automatically
Video Processing Masterclass with FPGA
Reference: Xilinx
Loops
• By default, loops are rolled
• Each C loop iteration ➔ Implemented in the same state N
• Each C loop iteration ➔ Implemented with same resources
void foo_top (…) {
... foo_top
Add: for (i=3;i>=0;i--) {
b = a[i] + b;
...
} Synthesis
+
b
a[N]
Loops require labels if they are to be referenced by Tcl
directives
(GUI will auto-add labels)
• Loops can be unrolled if their indices are statically determinable at elaboration time
• Not when the number of iterations is variable
• Unrolled loops result in more elements to schedule but greater operator mobility
• Let’s look at an example …. Reference: Xilinx
Video Processing Masterclass with FPGA
Arrays in HLS
• An array in C code is implemented by a memory in the RTL
• By default, arrays are implemented as RAMs, optionally a FIFO
foo_top
N-1 SPRAMB
void foo_top(int x, …) A[N]
{ N-2 A_in DIN DOUT A_out
int A[N];
L1: for (i = 0; i < N; i++) … Synthesis ADDR
A[i+x] = A[i] + i;
1 CE
}
0 WE
• The array can be targeted to any memory resource in the library
• The ports (Address, CE active high, etc.) and sequential operation (clocks from address to data
out) are defined by the library model
• All RAMs are listed in the Vivado HLS Library Guide
• Arrays can be merged with other arrays and reconfigured
• To implement them in the same memory or one of different widths & sizes
• Arrays can be partitioned into individual elements
• Implemented as smaller RAMs or registers
Reference: Xilinx
Video Processing Masterclass with FPGA
Top-Level IO Ports
• Top-level function arguments
• All top-level function arguments have a default hardware port type
• When the array is an argument of the top-level function
• The array/RAM is “off-chip”
• The type of memory resource determines the top-level IO ports
• Arrays on the interface can be mapped & partitioned
• E.g. partitioned into separate ports for each element in the array
void foo_top( int A[3*N] , int x) DPRAMB
{ foo_top
L1: for (i = 0; i < N; i++) DIN0 DOUT0
A[i+x] = A[i] + i; Synthesis ADDR0
+
}
CE0
WE0
Number of ports defined by the RAM DIN1 DOUT1
resource ADDR1
• Default RAM resource CE1
WE1
• Dual port RAM if performance can be improved otherwise Single Port RAM
Reference: Xilinx
Video Processing Masterclass with FPGA
The following libraries are included with Vivado HLS:
Name Description
Arbitrary Precision Data
Integer and fixed-point (ap_cint.h, ap_int.h and systemc.h)
Types
HLS Stream Models for streaming data structures. Designed to obtain best performance and area (hls_stream.h)
Extensive support for the synthesis of the standard C (math.h) and C++ (cmath.h) math libraries. The support
includes floating point and fixed-point functions: abs, atan, atanf, atan2, atan2, ceil, ceilf, copysign, copysignf,
HLS Math cos, cosf, coshf, expf, fabs, fabsf, floorf, fmax, fmin, logf, fpclassify, isfinite, isinf, isnan, isnormal, log, log10,
Video Processing Masterclass with FPGA
modf, modff, recip, recipf, round, rsqrt, rsqrtf, 1/sqrt, signbit, sin, sincos, sincosf, sinf, sinhf, sqrt, tan, tanf,
trunc
Video library to implement several aspects of modeling video design in C++ with video Functions, specific
data types, memory line buffer and memory window (hls_video.h). Through a data type hls::Mat, Vivado HLS
is also compatible with existing OpenCV functions: AXIvideo2cvMat, AXIvideo2CvMat, AXIvideo2IplImage,
cvMat2AXIvideo, CvMat2AXIvideo, cvMat2hlsMat, CvMat2hlsMat, CvMat2hlsWindow, hlsMat2cvMat,
HLS Video hlsMat2CvMat, hlsMat2IplImage, hlsWindow2CvMat, IplImage2AXIvideo, IplImage2hlsMat, AbsDiff, AddS,
AddWeighted, And, Avg, AvgSdv, Cmp, CmpS, CornerHarris, CvtColor, Dilate, Duplicate, EqualizeHist, Erode,
FASTX, Filter2D, GaussianBlur, Harris, HoughLines2, Integral, InitUndistortRectifyMap, Max, MaxS, Mean,
Merge, Min, MinMaxLoc, MinS, Mul, Not, PaintMask, PyrDown, PyrUp, Range, Remap, Reduce, Resize, Set,
Scale, Sobel, Split, SubRS, SubS, Sum, Threshold, Zero
HLS IP Integrate the LogiCORE IP FFT and FIR Compiler (hls_fft.h, hls_fir.h, ap_shift_reg.h)
HLS Linear Algebra Support for the following functions: cholesky, cholesky_inverse, matrix_multiply, qrf, qr_inverse, svd
(hls_linear_algebra.h)
Support for the following functions: atan2, awgn, cmpy, convolution_encoder, nco, qam_demod, qam_mod,
HLS DSP
sqrt, viterbi_decoder (hls_dsp.h)
Reference: Xilinx
Lecture 3 : Lab1: Counter Design and
Synthesizing in VIVADO HLS
#include<iostream>
#include<stdlib.h>//#include<conio.h>
using namespace std;
int main()
{ int count = 0;
bool reset=false;
while(1) {
Counter C++ Module: cout<<""<<count<<endl;
if (reset==true)
count = 0;
count++;
if(count > 15)
count = 0;
for(int i=0; i<450000000;i++);
}
return 0;}
Video Processing Masterclass with FPGA
Lecture 3 : Lab1: Counter Design and
Synthesizing in VIVADO HLS
Design Steps:
• Open VIVADO HLS, create new project “counter”
• Insert the C++ Source and Target ZedBoard FPGA
• For Synthesizing the design→Go to Run C Synthesis (Active Solution)
• Now expand Syn Folder there must have VHDL/Verilog and System C
Generated.
• For Simulating C/C++ source we need to have separate source; we
will see simulate design process in next section (lab 2).
Video Processing Masterclass with FPGA
Video Processing Masterclass with FPGA
HLS Design References:
Video Processing Masterclass with FPGA
Key Documents
For the most current links to Vivado High-Level Synthesis resources, use the
Design Hub View in Vivado Document Navigator and select "High-Level
Synthesis".
Name Description
UG1197 UltraFast High-Level
Productivity Design Methodology Methodology guide
Guide
WP416 Vivado Design Suite Vivado Design Suite Backgrounder
High-Level Synthesis Tutorial
UG871 Vivado Design Suite Tutorial
UG902 Vivado Design Suite User High-Level Synthesis User Guide
Guide
UG958 Vivado Design Suite
Model-based DSP Design using System Generator
Reference Guide
Video Processing Masterclass with FPGA
Application Notes
XAPP599 Floating Point Design with Vivado HLS
XAPP745 Processor Control of Vivado HLS Designs
Implementing Memory Structures for Video
XAPP793
Processing
Zynq All Programmable SoC Sobel Filter
XAPP890
Implementation
Accelerating OpenCV Applications with Zynq-7000
XAPP1167
AP SoC using Vivado HLS Video Libraries
Video Processing Masterclass with FPGA
Let’s Go to
VIVADO HLS for the project
Video Processing Masterclass with FPGA
Thank You!
Video Processing Masterclass with FPGA