0% found this document useful (0 votes)

111 views49 pages

Building Smart Socs: Holger Keding

Uploaded by

venkvpk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views49 pages

Building Smart Socs: Holger Keding

Uploaded by

venkvpk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Building Smart SoCs

Using Virtual Prototyping for the Design and SoC Integration of Deep
Learning Accelerators

Holger Keding
Solutions Architect

© Accellera Systems Initiative 1

Agenda
• Deep Learning Market and Technology Trends
• How to Design a Deep Learning Accelerator (DLA)
• Analytical Performance Modeling
• Shift Left Architecture Analysis and Optimization with Virtual Prototyping
• Example
• Importing Network Algorithms as prototxt + generate analytical model spreadsheet
• Find suited configuration and scaling parameters in analytical model
• Validate first results, and explore architecture for dynamic and power aspects using
Virtual Platforms
• Summary
Increasing number of AI Accelerators

Source: Qualcomm AI Day Speaker Presentation 2019

Deep Learning Technology Trends
New Neural Network algorithms
– Higher accuracy, lower size and less processing Neural
– But: less data re-use, less cycles per byte Network

Neural Network Compiler optimizations Neural

– Loop-tiling, -unrolling, and -parallelization Network
– Splitting and fusing of Neural Network layers Compiler
– Memory layout optimization across layers Deep
– Optimized code generation to utilize available Learning DDR/

Interconnect
hardware accelerators Accelerator HBM
Multi-core AI
Deep Learning Accelerator optimizations CPU SRAM SoC
IO SRAM
– Schedule workload on parallel hardware engines IO
– Optimize and reduce data transfers IO
to and from memory
AI SoC Design Challenges
Brute-force Processing of Huge Data Sets

• Choosing the right algorithm and architecture: CPU, GPU, FPGA, vector DSP, ASIP
– CNN graphs evolving fast, need short time to market, cannot optimize for one single graph
– Joint design of algorithm, compiler, and target architecture
– Joint optimization of power, performance, accuracy, and cost
• Highly parallel compute drives memory requirements
– High on-chip and chip to chip bandwidth at low latency
– High memory bandwidth requirements for parameters and layer to layer communication
• Performance analysis requires realistic workloads to consider dynamic effects
– Scheduling of AI operators on parallel processing elements
– Unpredictable interconnect and memory access latencies
Large Design Space drives Differentiation by
AI Algorithm & Architecture
Agenda
• Deep Learning Market and Technology Trends
• How to Design a Deep Learning Accelerator (DLA)
• Analytical Performance Modeling
• Shift Left Architecture Analysis and Optimization with Virtual Prototyping
• Example
• Importing Network Algorithms as prototxt + generate analytical model spreadsheet
• Find suited configuration and scaling parameters in analytical model
• Validate first results, and explore architecture for dynamic and power aspects using
Virtual Platforms
• Summary
How to design a DLA?
validate
validate High-Level Architecture back-annotate
back-annotate
+ Good for hardware exploration
Analytical Models RTL Simulation
+ Simulations in minutes/hours
+ Good first order ~ Varying Accuracy + Perfect accuracy
+ Results within minutes Refine Refine - High computational needs
- Omits dynamic effects - High turn-around costs

Functional LT Model (VDK)

+ Good for SW development
+ Simulations in minutes/hours
Refine
+ Trace Ops, Memory accesses
- Low Timing Accuracy

validate
back-annotate
Analytical Performance Models
Simple Example: Amdahl’s Law [1]

[1] Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities (1967)

• Simple insightful formula, with restricted applicability, though.

• “All models are wrong but some are useful” (George Box, 1978)
Analytical Models – Roofline Models (1)
𝑝𝑝(𝑓𝑟𝑒𝑞𝑐𝑙𝑘 , #𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠)
Theoretical maximum
compute power
ILP or SIMD
observed
performance
Only Thread-Level Parallelism

2 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑝𝑠
= 0.25
8 𝑏𝑦𝑡𝑒𝑠 𝑓𝑒𝑡𝑐ℎ𝑒𝑑 𝑏𝑦𝑡𝑒

Roofline: an insightful visual performance model for multicore architectures (Williams, Waterman, Patterson,2009)
Analytical Models – Roofline Models (2)

Theoretical maximum
compute power

slope =
maximum
memory
bandwidth 2 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑝𝑠
𝑜𝑝𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ⋅ 𝑚𝑒𝑚_𝑏𝑎𝑛𝑑𝑤𝑖𝑑𝑡ℎ𝑝𝑒𝑎𝑘 = 0.25
8 𝑏𝑦𝑡𝑒𝑠 𝑓𝑒𝑡𝑐ℎ𝑒𝑑 𝑏𝑦𝑡𝑒

Roofline: an insightful visual performance model for multicore architectures (Williams, Waterman, Patterson,2009)
Analytical Models – Roofline Models (3)

compute bound

memory bound

Roofline: an insightful visual performance model for multicore architectures (Williams, Waterman, Patterson,2009)
Analytical Models – Roofline Models

Roofline: an insightful visual performance model for multicore architectures (Williams, Waterman, Patterson,2009)
Example: Analytical Model for CNN Convolutional Layer (1)
Conv1 of AlexNet

for(row=0; row<oh; row++){

for(col=0; col<ow; col++){ Maths Textbook 𝑛𝑀𝐴𝐶 = 𝑜ℎ ⋅ 𝑜𝑤 ⋅ 𝑜𝑐 ⋅ 𝑘𝑤 ⋅ 𝑘ℎ ⋅ 𝑖𝑐
for(k=0; k<oc; k++){
for(ti=0; ti<ic; t i ++){ Convolution algorithm = 55 ⋅ 55 ⋅ 96 ⋅ 11 ⋅ 11 ⋅ 3
for(i=0; i<kh; i++){
for(j=0; j<kw; j++){ = 105,415,200
L : outputfm [ k ] [ row ] [ col ] +=
kernels[ k ][ ti ][ i ][ j ]∗
inputfm[ ti ][ sw∗row+i ][ sh∗col+j ];
}}}}}}
Example: Analytical Model for CNN Convolutional Layer (2)
Conv1 of AlexNet
But: here we assume unlimited
amount of local memory

𝑛𝑀𝐴𝐶 = 𝑜ℎ ⋅ 𝑜𝑤 ⋅ 𝑜𝑐 ⋅ 𝑘𝑤 ⋅ 𝑘ℎ ⋅ 𝑖𝑐 = 55 ⋅ 55 ⋅ 96 ⋅ 11 ⋅ 11 ⋅ 3 = 105,415,200

𝑑𝑀𝐴𝐶 = 𝑑𝑖𝑓𝑚𝑎𝑝 + 𝑑𝑘𝑒𝑟𝑛𝑒𝑙 = (𝑖𝑤 ⋅ 𝑖ℎ ⋅ 𝑖𝑐 + 𝑘𝑤 ⋅ 𝑘ℎ ⋅ 𝑖𝑐 ⋅ 𝑘) ⋅ 𝐵𝑖 ≈ 0.38𝑀𝑖𝐵

𝑛
⇒ 𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝐼 = 𝑑𝑀𝐴𝐶 ≈ 278 𝑜𝑝𝑠/𝐵
𝑀𝐴𝐶
Example: Analytical Model for CNN Convolutional Layer (3)
Conv1 of AlexNet
Opposite extreme: we assume no
local memory

𝑛𝑀𝐴𝐶 = 𝑜ℎ ⋅ 𝑜𝑤 ⋅ 𝑜𝑐 ⋅ 𝑘𝑤 ⋅ 𝑘ℎ ⋅ 𝑖𝑐 = 55 ⋅ 55 ⋅ 96 ⋅ 11 ⋅ 11 ⋅ 3 = 105,415,200

𝑑𝑀𝐴𝐶 = 2 ⋅ 𝑛𝑀𝐴𝐶 ⋅ 𝐵𝑖 ≈ 420𝑀𝑖𝐵

𝑛 1
⇒ 𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝐼 = 𝑑𝑀𝐴𝐶 ≈ 4 𝑜𝑝𝑠/𝐵
𝑀𝐴𝐶
Example: Analytical Model for CNN Convolutional Layer (4)
Conv1 of AlexNet
Practical setup: limited amount
of local memory

for(row=0; row<oh; row++){

for(col=0; col<ow; col++){
Maths Textbook
for(k=0; k<oc; k++){ Convolution algorithm
for(ti=0; ti<ic; t i ++){
for(i=0; i<kh; i++){
for(j=0; j<kw; j++){
L : outputfm [ k ] [ row ] [ col ] +=
kernels[ k ][ ti ][ i ][ j ]∗inputfm[ti][sw∗row+i][sh∗col+j];
}}}}}}
Example: Analytical Model for CNN Convolutional Layer (5)
Conv1 of AlexNet – with very simple tiling
Practical setup: limited amount
of local memory

Width + Height
+ Channel
+ Kernel Tiling
Example: Analytical Model for CNN Convolutional Layer (6)
Conv1 with tiling

Source: Optimizing FPGA-based Accelerator Design for Deep

Convolutional Neural Networks, Cheng Zhang, 2015
Example: Analytical Model for CNN Convolutional Layer (6)
Conv1 with tiling

Now it gets more tricky: Taking into acount non-integer

relations of tiling parameters and channel dimensions:

Tiling also brings the operational intensity

closer to the optimum HW utilization point
Example: Analytical Model, Mapping Conv to HW Resources
#MAC cells can be configured to scale
up/down peak performance

Tiling parameters and MAC Cell number and depth should

match tiling parameters
Roofline model

Operational Intensity (Operations/Byte)

Roofline model

Operational Intensity (Operations/Byte)

Analytical Model as Python Generated Spreadsheet
Expressions represent both Algorithmic and HW -> calculate attainable performance
Exploring different numbers of MAC cells and their depth
Analytical Model Summary
What is achieved and what comes next?

What we have seen:

+ Good first order analysis of static effects
+ Results within minutes
~ Requires deep understanding of
both algorithm and architecture

What is not covered

- Implementation overhead is hard
to predict and not ‚priced in‘ in
first round
- Omits dynamic effects
- Joint performance and power
is difficult
How to design a DLA?
validate
validate High-Level Architecture back-annotate
back-annotate
+ Good for hardware exploration
Analytical Models RTL Simulation
+ Simulations in minutes/hours
+ Good first order ~ Varying Accuracy + Perfect accuracy
+ Results within minutes Refine Refine - High computational needs
- Omits dynamic effects - High turn-around costs

Functional LT Model (VDK)

+ Good for SW development
+ Simulations in minutes/hours
Refine
+ Trace Ops, Memory accesses
- Low Timing Accuracy

validate
back-annotate
Shift Left Architecture Analysis and Optimization

translate Neural
Network
explore
Power,
Performance Neural
Network
NN Workload Model Compiler
explore
map
results Deep
Deep Learning DDR/

Interconnect
Learning DDR/ Accelerator HBM
Interconnect

Accelerator AI
HBM AI Multi-core
Multi-core SRAM SoC
SoC CPU
CPU SRAM
Model IO SRAM
IO SRAM
IO IO
IO IO
Platform Architect Ultra
Providing a Comprehensive Library of Generic and Vendor Specific Models

Capture Workload Model

Capture Architecture Model Interconnect Models
Generic:
•SBL-TLM2-FT (AXI)
•SBL-GCCI (ACE, CHI)

Memory Subsystems IP Specific:

Analyze Power & Performance •Arteris FlexNoC & Ncore
• Generic multiport •Arm AHB/APB
memory controller •Arm PL300
Traffic, Processors, RTL (GMPMC) •Arm SBL-301
• DesignWare uMCTL2 •Arm SBL-400
• Task-based and trace-based
memory controller •Synopsys DW AXI
workload models
• DesignWare LPDDR5
• Cycle accurate processor for
User Traffic, memory controller
ARM, ARC, Tensilica,
for CEVA
Scenarios
• Co-simulate with RTL
• RTL Co-simulation/emulation
Exploration
Workload Modeling and Mapping
• Workload Model cycles: 0 cycles: 2000
rd_bytes: 0x200 rd_bytes: 0
– Task level parallelism and dependencies wr_bytes: 0 wr_bytes: 0
– Characterized with processing cycles and
memory accesses Task B
(read image)
• SoC Platform Model Task D
– Accurate SystemC Transaction level models of Task A (proc conv)
processing elements, interconnect and memory Task C
(read kernel)
• Map workload to platform
• Analyze performance metrics
– End-to-end constraints ACC

interconnect
– Workload activity Memory
subsystem
– Utilization of resources record
DMA
– Interconnect metrics
• Latency, Throughput, Contention Virtual Prototype
• Outstanding transactions
• …
System Level Power Modeling
• Workload Model Task B
(read image)
– Task level parallelism and dependencies Task D
Task A
– Characterized with processing cycles (proc conv)
Task C
and memory accesses (read kernel)
• SoC Platform Model
– Accurate SystemC Transaction level
models of processing elements, ACC

interconnect
interconnect and memory Memory
• System-level Power Overlay Model subsystem
DMA
– Define power state machine per
component Virtual Prototype
– Bind power models to
Virtual Prototype records
sleep
– Measure power and Energy/Power
recording sleep idle idle
performance based idle
on real activity and utilization active active page page
miss hit
IP Power Models
Platform Architect Ultra AI Exploration Pack (XP)
Exploration & optimization of AI designs
• Automated generation of workloads from AI
frameworks
– AI Operator Library for Neural Network modeling
• E.g. Convolution, Matmul, MaxPool, BatchNorm etc.
– Example workload model of ResNet50 Neural Network
– Utility to convert prototxt description to workload model
CNN
using AI operator library Operator Library workload model
• AI centric HW architecture model library
– VPUs configured to represent AI compute and DMA engines
– Interconnect and memory subsystem models
– Example performance model of
NVIDIA Deep Learning Accelerator (NVDLA)
NVDLA Performance
• AI centric analysis views: memory + processing Model Example
utilization
Workload Model of One Convolution Layer
AI algorithm params Mapping params

read
input
calculate write output Scaling parameters reflect
convolutions feature maps the DLA architecture – can
read be taken from analytical
coefficients model.

Workload params
Agenda
• Deep Learning Market and Technology Trends
• How to Design a Deep Learning Accelerator (DLA)
• Analytical Performance Modeling
• Shift Left Architecture Analysis and Optimization with Virtual Prototyping
• Example
• Importing Network Algorithms as prototxt + generate analytical model spreadsheet
• Find suited configuration and scaling parameters in analytical model
• Validate first results, and explore architecture for dynamic and power aspects using
Virtual Platforms
• Summary
Example: Resnet-18 (Inference) with NV-DLA
Resnet18 task graph
Resnet18
Import prototxt
Neural
Network

map Goals:
 100 ms latency,  minimize power,  minimize energy
Optimize Hardware configuration:
– SIMD width
– Burst size, outstanding transactions
NVDLA platform
– speed of DDR memory and of data path
ResNet-18 Workload model generated with AI-XP
Example: Brief Overview of NVDLA
Convolution Engine (CONV_CORE)
• Works on two sets of data: offline-trained kernels (weights) and input
features (images)
• configurable MAC units and convolutional buffer (RAM)
• Executes operations such as tf.nn.conv2d
Single Data Point Processor (SDP)
• Applies linear and non-linear (activation) functions onto individual data points.
• Executes e.g. tf.nn.batch_normalization, tf.nn.bias_add, tf.nn.elu, tf.nn.relu,
tf.sigmoid, tf.tanh, and more.
Planar Data Processor (PDP)
• Applies common CNN spatial operations such as min/max/avg pooling
• Executes e.g. tf.nn.avg_pool, tf.nn.max_pool, tf.nn.pool.
Cross-channel Data Processor (CDP)
• Processes data from different channels/features, e.g. local response normalization
(LRN) function
• Executes e.g. tf.nn.local_response_normalization
Data Reshape Engine (RUBIK)
• Performs data format transformations (splitting, slicing, merging, …)
• Executes e.g. tf.nn.conv2d_transpose, tf.concat, tf.slice, etc.
VP Simulation Results of Initial Configuration
Task trace

Transaction trace

DDR utilization

Resource utlization

Throughput

Outstanding
transactions

Performance limited by processing, use wider SIMD data path

Simulation Reveals Implementation Effects… (1)
Differences between calculated and measured data read/write amount

AlexNet (Norm1):
Expected: 580,800 Bytes
Measured: 654,720 Bytes

Inflation by ~12.72%

 “Dark Bandwidth”
Simulation Reveals Implementation Effects… (2)
Differences between calculated and measured execution time

Convolutional Layers 1&2 of LeNet on NVDLA

Back-Annotate Simulation Findings To Analytical Model
Caffe .prototxt

Platform Architect / Simulation Model Spreadsheet / Analytical Model

Impact of SIMD Width on Performance
Resource Utilization of CONV Datapath (yellow), CONV DMA (red) and other components

SIMD
8

processing bound
SIMD
16

SIMD
32

performance
Diminishing

gains
SIMD
64
memory
SIMD CONV DMA load bandwidth
128 bound
CONV PE load
DDR Memory Bandwidth and Power Improvement
DMA
SIMD-
128
Utilization
Resource

Conv PE
25% faster
SIMD-
64
SIMD-128

Conv PE Power
Power consumption

10% lower
DDR Power
total energy
SIMD-64

20% lower
average power
Resnet 18 Example Sweep
Goal: 100 ms latency, minimize power & energy
Sweep parameters
– Burst size: 16, 32
– Outstanding transactions: 4, 8
– DDR memory speed: DDR4-1866, DDR4-2400
– Clock frequency of data path: 1, 1.33, 2GHz
– SIMD width: 64, 128 operations per cycle
Sensitivity

Root-Cause
Analysis
Sweep Over Hardware Parameters, Latency

Outstanding
transactions

GHz

SIMD

DDR4 speed
Burst size
Power/Performance/Energy Trade-off Analysis

Optimal
solution

Outstanding Tx

Datapath GHz
SIMD
Burst size DDR
Example: Resnet-18 with NV-DLA
Resnet18 task graph

Resnet18
generate
Neural
Network

map
Goal:
– 100 ms latency, minimize energy

Optimize Hardware configuration:

– SIMD width: 128 operations per cycle
– Burst size: 32 bytes
– outstanding transactions: 8
NVDLA platform – speed of DDR memory: DDR4-1866
– speed of data path: 1GHz
Summary
• Be fast and get it right!
• Shift Left with Virtual Prototyping
• Joint Optimization of Algorithm,
Architecture, and Compiler
Task graph
Neural
generate Network
Analytical Model

Explore & Refine map

analyze

Sensitivity
Virtual HW Platform
Power/Performance
Thank You

Questions

Neural Network Accelerators: CS223 Computer Architecture & Organization
No ratings yet
Neural Network Accelerators: CS223 Computer Architecture & Organization
45 pages
RAPIDO 2023 Paper 2868
No ratings yet
RAPIDO 2023 Paper 2868
6 pages
Annf PDF
No ratings yet
Annf PDF
42 pages
Lecture02 - High-Level Digital Design Automation
No ratings yet
Lecture02 - High-Level Digital Design Automation
34 pages
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
No ratings yet
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
73 pages
An Efficient Reconfigurable Hardware Accelerator For CNN
No ratings yet
An Efficient Reconfigurable Hardware Accelerator For CNN
5 pages
Hardware Dataflow For Convolutional Neural Network Accelerator
No ratings yet
Hardware Dataflow For Convolutional Neural Network Accelerator
6 pages
Lec5 Tpu
No ratings yet
Lec5 Tpu
44 pages
IMCA An Efficient in Memory Convolution Accelerator For Artificial Intelligence Applications
No ratings yet
IMCA An Efficient in Memory Convolution Accelerator For Artificial Intelligence Applications
15 pages
Neuromorphic Analog Signal Processing White Paper
No ratings yet
Neuromorphic Analog Signal Processing White Paper
9 pages
Implementing AI Models On FPGAs - A Comprehensive T
No ratings yet
Implementing AI Models On FPGAs - A Comprehensive T
43 pages
Basic Design Approaches To Accelerating Deep Neural Networks
No ratings yet
Basic Design Approaches To Accelerating Deep Neural Networks
93 pages
ML System Architecture Guide
No ratings yet
ML System Architecture Guide
47 pages
RISC-V CNN Accelerator Design
100% (1)
RISC-V CNN Accelerator Design
6 pages
Advanced Python and HPC Optimization
No ratings yet
Advanced Python and HPC Optimization
70 pages
s7122 Stephen Jones Cuda Optimization Tips Tricks and Techniques
No ratings yet
s7122 Stephen Jones Cuda Optimization Tips Tricks and Techniques
71 pages
On Approximate Computing Techniques
No ratings yet
On Approximate Computing Techniques
58 pages
08 Hardware Parallelism
No ratings yet
08 Hardware Parallelism
50 pages
CUHK ELEG5764 AIIC 2025 Lec1
No ratings yet
CUHK ELEG5764 AIIC 2025 Lec1
53 pages
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
No ratings yet
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
10 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
3 Lecture 21 01 25
No ratings yet
3 Lecture 21 01 25
62 pages
15IF11 Multicore D PDF
No ratings yet
15IF11 Multicore D PDF
67 pages
Neuromorphic Architectures Lec 4-16-1731320691
No ratings yet
Neuromorphic Architectures Lec 4-16-1731320691
276 pages
PE Implementation Paper
No ratings yet
PE Implementation Paper
2 pages
AI VLSI Project Report
No ratings yet
AI VLSI Project Report
2 pages
Levental Uchicago 0330D 17419
No ratings yet
Levental Uchicago 0330D 17419
163 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
GPU Performance Analysis with Roofline
No ratings yet
GPU Performance Analysis with Roofline
77 pages
Roofline
No ratings yet
Roofline
56 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Implementation of A Fast Artificial Neural Network Library (Fann)
No ratings yet
Implementation of A Fast Artificial Neural Network Library (Fann)
92 pages
Hardware Implementation of Neural Networks
No ratings yet
Hardware Implementation of Neural Networks
5 pages
Paper 1
No ratings yet
Paper 1
4 pages
FPGA-Accelerated CNN for Embedded Systems
No ratings yet
FPGA-Accelerated CNN for Embedded Systems
102 pages
3 DL ConvNets
No ratings yet
3 DL ConvNets
46 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
DaDianNao A Machine-Learning Supercomputer
No ratings yet
DaDianNao A Machine-Learning Supercomputer
14 pages
Onur Ddca 2023 Lecture2a Tradeoffs Metrics Mindset Afterlecture
No ratings yet
Onur Ddca 2023 Lecture2a Tradeoffs Metrics Mindset Afterlecture
111 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Modeling A Non-Uniform Memory Access Architecture For Optimizing
No ratings yet
Modeling A Non-Uniform Memory Access Architecture For Optimizing
79 pages
Machine Learning Re Defining Semiconductor Industry 1598272842
No ratings yet
Machine Learning Re Defining Semiconductor Industry 1598272842
33 pages
Advanced Topics For AI
No ratings yet
Advanced Topics For AI
30 pages
Week2 - 1
No ratings yet
Week2 - 1
64 pages
Performance Modeling For CNN Inference Accelerators On FPGA
No ratings yet
Performance Modeling For CNN Inference Accelerators On FPGA
14 pages
Review 1
No ratings yet
Review 1
16 pages
Comparch Fall2020 Lecture14 Simulation
No ratings yet
Comparch Fall2020 Lecture14 Simulation
43 pages
EE5902R Chapter 1 Slides
No ratings yet
EE5902R Chapter 1 Slides
46 pages
Report About Neural Network For Image Classification
No ratings yet
Report About Neural Network For Image Classification
51 pages
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
31 pages
ALTERA Efficient - Neural - Networks
No ratings yet
ALTERA Efficient - Neural - Networks
2 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
Mlunit 1
No ratings yet
Mlunit 1
63 pages
Neural Networks for Tech Students
No ratings yet
Neural Networks for Tech Students
24 pages
Atharv's Resume
No ratings yet
Atharv's Resume
1 page
Uber
No ratings yet
Uber
46 pages
2.5 Virtualization of CPU, Memory, and I-O Devices
No ratings yet
2.5 Virtualization of CPU, Memory, and I-O Devices
15 pages
Mag B550M Mortar
No ratings yet
Mag B550M Mortar
1 page
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
No ratings yet
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
18 pages
Computational Intelligence in Communications and Business Analytics
No ratings yet
Computational Intelligence in Communications and Business Analytics
369 pages
MB Manual X870-Aorus-Elite-Wifi7-Ice 1005 e
No ratings yet
MB Manual X870-Aorus-Elite-Wifi7-Ice 1005 e
46 pages
Supermarket Management System Project Report
100% (1)
Supermarket Management System Project Report
84 pages
Advant: Station 100 Series
No ratings yet
Advant: Station 100 Series
38 pages
RiverFlow2D Installation Instructions
No ratings yet
RiverFlow2D Installation Instructions
12 pages
FUNCTIONING PROGRAMMING, ALGORITHMS, CODE EDITORS & Amp OPERATING SYSTEMS
No ratings yet
FUNCTIONING PROGRAMMING, ALGORITHMS, CODE EDITORS & Amp OPERATING SYSTEMS
4 pages
Ocean Voyager S-VDR
No ratings yet
Ocean Voyager S-VDR
7 pages
Python Micro Project of Calculators
No ratings yet
Python Micro Project of Calculators
15 pages
Tech Note 222configuring OPCLink Using TCPIP and A Standard Network Card For Windows NT 4.0 To Access S7-400 PLCs Via The Siemens S7 OPC Server
No ratings yet
Tech Note 222configuring OPCLink Using TCPIP and A Standard Network Card For Windows NT 4.0 To Access S7-400 PLCs Via The Siemens S7 OPC Server
16 pages
Beginning Photo Retouching and Restoration Using Gimp Learn To Retouch and Restore Your Photos Like A Pro 2nd Edition 2nd Phillip Whitt Download
100% (5)
Beginning Photo Retouching and Restoration Using Gimp Learn To Retouch and Restore Your Photos Like A Pro 2nd Edition 2nd Phillip Whitt Download
86 pages
Ibm X3650 M4
No ratings yet
Ibm X3650 M4
70 pages
eProject: IT Student Learning Tool
No ratings yet
eProject: IT Student Learning Tool
6 pages
Ensemble Controller R13 3 3 ReleaseNotes RevA
No ratings yet
Ensemble Controller R13 3 3 ReleaseNotes RevA
69 pages
MS Excel Basic Operation & Navigation
No ratings yet
MS Excel Basic Operation & Navigation
30 pages
Arnav Sastry Resume
No ratings yet
Arnav Sastry Resume
1 page
Employee Tracking
No ratings yet
Employee Tracking
17 pages
In-Sight 8000 Series Vision Systems: Patmax, Completely Reinvented
No ratings yet
In-Sight 8000 Series Vision Systems: Patmax, Completely Reinvented
5 pages
Mac OS X User Guide For Myanmar
83% (6)
Mac OS X User Guide For Myanmar
229 pages
Faraz Ict Skills
No ratings yet
Faraz Ict Skills
16 pages
Reduced Instruction Set Computer
No ratings yet
Reduced Instruction Set Computer
16 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
Unit 1 4 Software Characteristics
No ratings yet
Unit 1 4 Software Characteristics
6 pages
QAD - Manufacturing EnterpriseApplications UG v2008SE
100% (1)
QAD - Manufacturing EnterpriseApplications UG v2008SE
398 pages
Homework Check Sheet
100% (2)
Homework Check Sheet
6 pages
Dev - Itw 1 4
No ratings yet
Dev - Itw 1 4
4 pages