0% found this document useful (0 votes)

53 views63 pages

GDC AMD Ryzen Processor Software Optimization

Uploaded by

jansenmarvin2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views63 pages

GDC AMD Ryzen Processor Software Optimization

Uploaded by

jansenmarvin2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

AMD RYZEN™ PROCESSOR

SOFTWARE OPTIMIZATION
KEN MITCHELL
AGENDA
• Abstract
• Speak Biography
• Products
• Microarchitecture
• Data Flow
• Best Practices
• Optimizations

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 2
ABSTRACT

• Join AMD for an introduction to the AMD

Ryzen™ family of processors which power
today’s game consoles and PCs.
• Learn about Ryzen™ products.
• Dive into instruction sets, cache hierarchies,
resource sharing, and simultaneous multi-
threading.
• Discover profiling tools and techniques.
• Gain insight into code optimization
opportunities and lessons learned with
examples including C/C++, assembly, and
hardware performance-monitoring counters.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 3
SPEAKER BIOGRAPHY
• Ken Mitchell is a Principal Member of
Technical Staff in the AMD Game
Engineering team where he focuses on
helping game developers utilize AMD
processors efficiently. His previous work
includes automating & analyzing PC
applications for performance projections of
future AMD products as well as developing
benchmarks. Ken studied computer science
at the University of Texas at Austin.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 4
PRODUCTS

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 5
AMD RYZEN™ 6000 SERIES MOBILE PROCESSORS
MAX. BOOST BASE DEFAULT
MODEL GRAPHICS MODEL CORES THREADS CLOCK CLOCK TDP
AMD Ryzen™ 9 6980HX AMD Radeon™ 680M 8 16 Up to 5.0GHz 3.3GHz 45W
AMD Ryzen™ 9 6980HS AMD Radeon™ 680M 8 16 Up to 5.0GHz 3.3GHz 35W
AMD Ryzen™ 9 6900HX AMD Radeon™ 680M 8 16 Up to 4.9GHz 3.3GHz 45W
AMD Ryzen™ 9 6900HS AMD Radeon™ 680M 8 16 Up to 4.9GHz 3.3GHz 35W
AMD Ryzen™ 7 6800H AMD Radeon™ 680M 8 16 Up to 4.7GHz 3.2GHz 45W
AMD Ryzen™ 7 6800HS AMD Radeon™ 680M 8 16 Up to 4.7GHz 3.2GHz 35W
AMD Ryzen™ 7 6800U AMD Radeon™ 680M 8 16 Up to 4.7GHz 2.7GHz 15-28W
AMD Ryzen™ 5 6600H AMD Radeon™ 660M 6 12 Up to 4.5GHz 3.3GHz 45W
AMD Ryzen™ 5 6600HS AMD Radeon™ 660M 6 12 Up to 4.5GHz 3.3GHz 35W
AMD Ryzen™ 5 6600U AMD Radeon™ 660M 6 12 Up to 4.5GHz 2.9GHz 15-28W

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 6
AMD RYZEN™ 5000 SERIES DESKTOP PROCESSORS

Integrated Total L3 Max Boost Base Default

Model Graphics Cores Threads Cache Clock Clock TDP
AMD Ryzen™ 9 5950X - 16 32 64 Up to 4.9 GHz 3.4 GHz 105 W
AMD Ryzen™ 9 5900X - 12 24 64 Up to 4.8 GHz 3.7 GHz 105 W
AMD Ryzen™ 7 5800X3D - 8 16 96 Up to 4.5 GHz 3.4 GHz 105 W
AMD Ryzen™ 7 5800X - 8 16 32 Up to 4.7 GHz 3.8 GHz 105 W
AMD Ryzen™ 5 5600X - 6 12 32 Up to 4.6 GHz 3.7 GHz 65 W
AMD Ryzen™ 7 5700G Radeon™ 8 16 16 Up to 4.6 GHz 3.8 GHz 65 W
AMD Ryzen™ 5 5600G Radeon™ 6 12 16 Up to 4.4 GHz 3.9 GHz 65 W

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 7
AMD RYZEN™ THREADRIPPER™ PRO 5000WX SERIES PROCESSORS

Integrated Max Boost Base Default

Model Graphics Cores Threads Clock Clock TDP
AMD Ryzen™ Threadripper™ PRO 5995WX - 64 128 Up to 4.5 GHz 2.7 GHz 280 W
AMD Ryzen™ Threadripper™ PRO 5975WX - 32 64 Up to 4.5 GHz 3.6 GHz 280 W
AMD Ryzen™ Threadripper™ PRO 5965WX - 24 48 Up to 4.5 GHz 3.8 GHz 280 W
AMD Ryzen™ Threadripper™ PRO 5955WX - 16 32 Up to 4.5 GHz 4.0 GHz 280 W
AMD Ryzen™ Threadripper™ PRO 5945WX - 12 24 Up to 4.5 GHz 4.1 GHz 280 W

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 8
MICROARCHITECTURE

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 9
“ZEN 3”
• +19% IPC Improvement
• Unified 8-Core CCD
• 32MB L3$ per CCD
• Improved Load Store Unit
• Wider FP & Int
• New Instructions
• Improved SMT fairness

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 10
SIMULTANEOUS MULTI-THREADING
Program Threads • High performance cores have gaps in utilization
A B which may be filled by additional hardware
threads—this is Simultaneous Multi-Threading
Program Core Program (SMT)
Counter #1 Counter #2
Thread Thread
• Although each hardware thread has its own
#1 #2
program counter and architectural register set,
Architectural Architectural they share core resources
Register Set #1 Register Set #2

Scheduler

Register Files, Execution Units

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 11
CORE RESOURCE SHARING DEFINITIONS

Category Definition
Competitively shared Resource entries are assigned on demand. A thread may use all resource
entries.

Watermarked Resource entries are assigned on demand. When in two-threaded mode a

thread may not use more resource entries than are specified by a watermark
threshold.

Statically partitioned Resource entries are partitioned when entering two-threaded mode. A thread
may not use more resource entries than are available in its partition.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 12
CORE RESOURCE SHARING EVOLUTION

Resource Competitively Shared Watermarked Statically Partitioned

Integer Scheduler X
Integer Register File X
Load Queue X
Floating Point Physical Register X
Floating Point Scheduler X
Memory Request Buffers X
Op Queue X
Store Queue X
Write Combining Buffer X
Retire Queue X

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 13
DESKTOP CACHE HIERARCHY EVOLUTION

uOP/Core L1I/Core L1D/Core L2/Core L3/CCX

Core K KB KB KB MB
“Zen 3” 4 32 32 512 32*
“Zen 2” 4 32 32 512 16
“Zen 1” 2 64 32 512 8

*excluding products with AMD 3D V-Cache™ technology.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 14
INSTRUCTION SET EVOLUTION

CLFLUSHOPT

WBNOINVD
MONITORX
XSAVEOPT
FSGSBASE
VPCLMUL

OSXSAVE
PCLMUL
RDSEED

XSAVES
XSAVEC
XGETBV

CLZERO
MOVBE
RDRND

SSE4.2
XSAVE
SSE4.1

SSSE3
SMAP
CLWB

SMEP
VAES

AVX2
BMI2

FMA
F16C
ADX

AVX
SHA

AES
BMI
Core
“Zen 3” 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
“Zen 2” 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
“Zen 1” 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
“Jaguar” 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 15
SOFTWARE PREFETCH INSTRUCTIONS
Prefetch T0|T1|T2|NTA
• Load a cache line from the specified memory
address into the data-cache level specified by Fill lines
L1 Aggressively
the locality reference hint T0, T1, T2, or NTA. 32 KB
Evict Prefetch
NTA lines
• Lines filled into the L2 cache with L2
PREFETCHNTA are marked for quicker eviction 512 KB

from the L2, and when evicted from the L2 are

not inserted into the L3.
L3
32,768 KB

Memory
Gigabytes

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 16
HARDWARE PREFETCHERS L1

Category Definition
L1 Stream Uses history of memory access patterns to fetch additional sequential lines in ascending or
descending order.

L1 Stride Uses memory access history of individual instructions to fetch additional lines when each
access is a constant distance from the previous.

L1 Region Uses memory access history to fetch additional lines when the data access for a given
instruction tends to be followed by a consistent pattern of other accesses within a localized
region.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 17
HARDWARE PREFETCHERS L2

Category Definition
L2 Stream Uses history of memory access patterns to fetch additional sequential lines in ascending or
descending order.

L2 Up/Down Uses memory access history to determine whether to fetch the next or previous line for all
memory accesses.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 18
AMD PREFERRED CORE
PerformanceSchedulingClass • Some AMD products have cores which are faster
(higher is better) than other cores.
0
• The system BIOS describes the CPPC Highest
2 Performance ranking for each logical processor.
4 • The Windows Kernel creates a
6 PerformanceSchedulingClass ranking based on this
8 information and uses it during scheduling.
10
• Logical processor 0 and CCD0 may not be the
Logical processor

12
14 fastest.
16
• Testing done by AMD performance labs February 12,
18
2022 on an AMD reference motherboard equipped
20
with 16GB DDR4-3200MHz, Ryzen™ 9 5950X with
22
Radeon™ RX 6900 XT, Win11 Pro x64 22000.493.
24
Hypothetic example shown. Actual results may vary.
26
28
30

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 19
DATA FLOW

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 20
AMD RYZEN™ 7 6800U MOBILE PROCESSOR

32B fetch 32K Unified

32B/cycle 32B/cycle DRAM
Memory 8B/cycle
I-Cache Channel
8-way Controller
512K L2 uclk memclk
32B/cycle 32B/cycle
I+D Cache 16M L3
3*32B load 32K 32B/cycle 8-way Data 4x32B/cycle
I+D Cache RDNA2
D-Cache Fabric 
2*32B store 8-way 16-way
cclk l3clk 32B/cycle
Media

IO Hub
64B/cycle
Controller
fclk lclk

• AMD Ryzen™ 7 6800U, 15W TDP, 8 Cores, 16 Threads, up to 4.7 GHz max boost clock, 2.7 GHz base clock,
integrated GPU.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 21
AMD RYZEN™ 9 5950X DESKTOP PROCESSOR CCD CCD

IOD

32B fetch 32K Unified

32B/cycle 32B/cycle DRAM
Memory 16B/cycle
I-Cache Channel
8-way Controller
512K L2 uclk memclk
32B/cycle 32M L3
I+D Cache 32B/cycle R
Data
3*32B load 32K 32B/cycle 8-way I+D Cache Fabric
D-Cache 16-way 16B/cycle W
2*32B store 8-way
cclk
64B/cycle IO Hub
l3clk Controller
fclk lclk

• AMD Ryzen™ 9 5950X, 105W TDP, 16 Cores, 32 Threads, up to 4.9 GHz max boost clock, 3.4 GHz base clock.
• Two Core Complex Die (CCD). Each CCD has one 32M L3 Cache Cluster.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 22
AMD RYZEN™ THREADRIPPER™ PRO 5995WX PROCESSOR 0 1

2 3
IOD
4 5

6 7

32B fetch 32K Unified

• AMD Ryzen™ Threadripper™ Pro 5995WX, 280W TDP, 64 Cores, 128 Threads, up to 4.5 GHz boost, 2.7 GHz
base.
• Two CCDs per Data Fabric Quadrant shown.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 23
BEST PRACTICES

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 24
REDUCE BUILD TIMES
Msbuild.exe UE4.sln • Performance of UE4.27.2 binaries compiled
-target:Engine\UE4:Rebuild with Microsoft Visual Studio.
-property:Configuration=Shipping • Testing done by AMD technology labs, February
-property:Platform=Win64 5, 2022 on the following system. Test
(less is better) configuration: AMD Ryzen™ Threadripper™ PRO
240 231 5995WX, Enermax LIQTECH TR4 II series
360mm liquid cooler, 256GB (8 x 32GB 2R
180 RDDR4-3200 at 24-22-22-52) memory, AMD
Radeon™ RX 6800 XT GPU with driver 21.10.2
seconds

119
120 (October 25, 2021), 2TB M.2 NVME SSD, AMD
Reference Motherboard, Windows® 11 x64 build
60 21H2, 1920x1080 resolution. Actual results may
vary.
0
VS2017, Without Virus VS2022, With Virus
Exclusion Folders Exclusion Folders
System Configuration

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 25
USE THE LATEST COMPILER AND WINDOWS® SDK
Msbuild.exe UE4.sln • Get the latest build and link time improvements.
-target:Engine\UE4:Rebuild • Get the latest library and runtime optimizations.
-property:Configuration=Shipping • Performance of UE4.27.2 binaries compiled with
-property:Platform=Win64 Microsoft Visual Studio.
(less is better)
• Testing done by AMD technology labs, February 5,
240
205 2022 on the following system. Test configuration:
AMD Ryzen™ Threadripper™ PRO 5995WX,
180 Enermax LIQTECH TR4 II series 360mm liquid
cooler, 256GB (8 x 32GB 2R RDDR4-3200 at 24-
seconds

121 119
120 22-22-52) memory, AMD Radeon™ RX 6800 XT
GPU with driver 21.10.2 (October 25, 2021), 2TB
60 M.2 NVME SSD, AMD Reference Motherboard,
Windows® 11 x64 build 21H2, 1920x1080
0
resolution. Actual results may vary.
2017 v15.9.43 2019 v16.11.9 2022 v17.05
Visual Studio Build Tools

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 26
ADD VIRUS AND THREAT PROTECTION EXCLUSIONS
Msbuild.exe UE4.sln • WARNING: Not recommended for CI/CD systems.
Exclusions may make your device vulnerable to threats.
-target:Engine\UE4:Rebuild
-property:Configuration=Shipping • Add project folders to virus and threat protection
settings exclusions for faster build times.
-property:Platform=Win64
(less is better) • Faster rebuild time after optimization!
180 150 • Performance of UE4.27.2 binaries compiled with
119 119 Microsoft Visual Studio 2022 v17.0.5
seconds

120
• Testing done by AMD technology labs, February 5, 2022
60 on the following system. Test configuration: AMD
Ryzen™ Threadripper™ PRO 5995WX, Enermax LIQTECH
0 TR4 II series 360mm liquid cooler, 256GB (8 x 32GB 2R

C:\
None

e-4.27.2-release
C:\UnrealEngin

RDDR4-3200 at 24-22-22-52) memory, AMD Radeon™

RX 6800 XT GPU with driver 21.10.2 (October 25, 2021),
2TB M.2 NVME SSD, AMD Reference Motherboard,
Windows® 11 x64 build 21H2, 1920x1080 resolution.
Actual results may vary.
Folder Exclusions

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 27
PREFER SHIPPING CONFIGURATION BUILDS FOR CPU PROFILING
UE4.27.2 InfiltratorDemo DX12 1080p • Debug and development builds may greatly reduce performance.
• Stats collection may cause cache pollution.
(higher is better)
• Logging may create serialization points.
240
• Debug builds may disable multi-threading optimizations.
193 192 • While investigating open issues, developers may submit change
180 requests which enable debug features on Test and Shipping
158 configurations. Be sure to disable debug features before you ship!
Average FPS

• Performance of UE4.27.2 binaries compiled with Microsoft Visual

Studio 2022 v17.0.5
120
• Testing done by AMD technology labs, February 5, 2022 on the
following system. Test configuration: AMD Ryzen™ Threadripper™
PRO 5995WX, Enermax LIQTECH TR4 II series 360mm liquid
60
cooler, 256GB (8 x 32GB 2R RDDR4-3200 at 24-22-22-52) memory,
AMD Radeon™ RX 6800 XT GPU with driver 21.10.2 (October 25,
2021), 2TB M.2 NVME SSD, AMD Reference Motherboard,
0 Windows® 11 x64 build 21H2, 1920x1080 resolution. Actual results
Shipping Test Development may vary.
Build Configuration

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 28
DISABLE ANTI-TAMPER WHILE CPU PROFILING
• Build a binary similar-to Shipping configuration but without Anti-Tamper or Anti-Cheat which may
prevent CPU profiling tools from properly loading symbols.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 29
AUDIT CONTENT
• Ask artists to recommend profiling scenes of interest!
• For example, an indoor dungeon, an outdoor city, an outdoor forest, large crowds, or a specific time of day.

• Run UE4Editor MapCheck!

• It may find some performance issues.
• https://docs.unrealengine.com/en-US/BuildingWorlds/LevelEditor/MapErrors/index.html

• Use Unity AssetPostprocessor!

• Enforce minimum standards.
• https://docs.unity3d.com/Manual/BestPracticeUnderstandingPerformanceInUnity4.html

• Check stats before CPU profiling!

• If the scene far exceeds its draw budget or has many duplicate objects, consider reporting the issue to its artists
and profiling a different scene. Otherwise, you may risk profiling hot spots which may not be hot after the art
issues are resolved.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 30
TEST COLD SHADER CACHE FIRST TIME USER EXPERIENCE
rem Run as administrator
rem Disable Steam Shader Pre-Caching before running this script
rem Reboot after running this script to clear any shaders still in system memory

setlocal enableextensions
cd /d "%~dp0"
rmdir /s /q "%LOCALAPPDATA%\D3DSCache"
rmdir /s /q "%LOCALAPPDATA%\AMD\DxCache"
rmdir /s /q "%LOCALAPPDATA%\AMD\GLCache"
rmdir /s /q "%LOCALAPPDATA%\AMD\VkCache"
rmdir /s /q "%ProgramData%\NVIDIA Corporation\NV_Cache"
rmdir /s /q "%ProgramFiles(x86)%\Steam\steamapps\shadercache"

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 31
OPTIMIZATIONS

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 32
TOPICS
• Use the AMD Core Counts Sample
• Use Modern Sync APIs
• Avoid False Sharing
• Prefer data access patterns matching hardware prefetcher behaviors
• Use Software Prefetch instructions for linked data structures experiencing cache misses
• Align Memcpy source and destination pointers
• Avoid Penalties while mixing SSE and AVX instructions
• Support Hybrid Graphics
• Use Preferred Video and Audio Codecs

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 33
USE THE AMD CORE COUNTS SAMPLE
• This advice is specific to AMD processors and is not general guidance for all processor vendors
• Many applications show SMT benefits and use of all logical processors is recommended
• However, games often suffer from SMT and cache contention on the main or render threads during
gameplay
• Creating the thread pool based on physical core count rather than logical processor count may reduce this
contention
• Profile your game to determine the ideal thread count
• Game initialization—including decompressing assets and compiling/warming shaders—may benefit
from logical processors using SMT dual-thread mode
• Game play may prefer physical core count using SMT single-thread mode
• See https://gpuopen.com/learn/cpu-core-counts/

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 34
USE MODERN SYNC APIS
Sync API Test • Prefer std::mutex which has good performance
(less is better) and low cpu utilization.
25,000 • Performance of binaries compiled with Microsoft
20,000 Visual Studio 2022 v17.0.4.
milliseconds

15,000 • Testing done by AMD technology labs, January 3,

2022 on the following system. Test
10,000
Core Isolation configuration: AMD Ryzen™ 5950X, NZXT Kraken
5,000 Memory Integrity X62 cooler, 16GB (2 x 8GB DDR4-3600 16-16-16-
Off 36) memory, AMD Radeon™ RX 6900 XT GPU
0
Core Isolation with driver 21.11.2 (November 11, 2021), 2TB M.2
Memory Integrity NVME SSD, AMD Reference Motherboard,
On Windows® 11 x64 build 21H2, 1920x1080
resolution. Actual results may vary.

API

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 35
USE MODERN SYNC APIS
Sync API Test • Prefer std::mutex which has good performance
(less is better) and low cpu utilization.
100% • Performance of binaries compiled with Microsoft
Total CPU Utilization

80% Visual Studio 2022 v17.0.4.

60% • Testing done by AMD technology labs, January 3,
2022 on the following system. Test
40%
Core Isolation configuration: AMD Ryzen™ 5950X, NZXT Kraken
20% Memory Integrity X62 cooler, 16GB (2 x 8GB DDR4-3600 16-16-16-
Off 36) memory, AMD Radeon™ RX 6900 XT GPU
0%
Core Isolation with driver 21.11.2 (November 11, 2021), 2TB M.2
Memory Integrity NVME SSD, AMD Reference Motherboard,
On Windows® 11 x64 build 21H2, 1920x1080
resolution. Actual results may vary.

API

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 36
USE MODERN SYNC APIS: SHARED CODE
#include "intrin.h" int main(int argc, char* argv[]) {
#include <chrono> using namespace std::chrono;
#include <numeric>
float b0 = (argc > 1) ? strtof(argv[1], NULL) : 1.0f;
#include <thread>
#include <vector> float c0 = (argc > 2) ? strtof(argv[2], NULL) : 2.0f;
#include <mutex> std::fill((float*)b, (float*)(b + LEN), b0);
#include <Windows.h> std::fill((float*)c, (float*)(c + LEN), c0);
#define LEN 128 int num_threads = std::thread::hardware_concurrency();
std::vector<std::thread> threads = {};
alignas(64) float b[LEN][4][4]; auto t0 = high_resolution_clock::now();
alignas(64) float c[LEN][4][4]; for (size_t i = 0; i < num_threads; ++i) {
threads.push_back(std::thread(fn));
}
for (size_t i = 0; i < num_threads; ++i) {
threads[i].join();
}
auto t1 = high_resolution_clock::now();
wprintf(L"time (ms): %lli\n", \
duration_cast<milliseconds>(t1 - t0).count());
return EXIT_SUCCESS;
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 37
USE MODERN SYNC APIS: BAD USER SPIN LOCK
namespace MyLock { void fn() {
typedef unsigned LOCK, *PLOCK; alignas(64) float a[LEN][4][4];
enum { LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 }; std::fill((float*)a, (float*)(a + LEN), 0.0f);
void Lock(PLOCK pl) { float r = 0.0;
while (LOCK_IS_TAKEN == \ for (size_t iter = 0; iter < 100000; iter++) {
_InterlockedCompareExchange(\ MyLock::Lock(&gLock);
reinterpret_cast<long*>(pl), \ for (int m = 0; m < LEN; m++)
LOCK_IS_TAKEN, LOCK_IS_FREE)) { for (int i = 0; i < 4; i++)
} for (int j = 0; j < 4; j++)
} for (int k = 0; k < 4; k++)
void Unlock(PLOCK pl) { a[m][i][j] += b[m][i][k] * c[m][k][j];
_InterlockedExchange(reinterpret_cast<long*>(pl),\ r += std::accumulate((float*)a, \
LOCK_IS_FREE); (float*)(a + LEN), 0.0f);
} MyLock::Unlock(&gLock);
} }
wprintf(L"result: %f\n", r);
MyLock::LOCK gLock; }

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 38
USE MODERN SYNC APIS: IMPROVED USER SPIN LOCK
namespace MyLock { void fn() {
typedef unsigned LOCK, *PLOCK; alignas(64) float a[LEN][4][4];
enum { LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 }; std::fill((float*)a, (float*)(a + LEN), 0.0f);
void Lock(PLOCK pl) { float r = 0.0;
while ((LOCK_IS_TAKEN == *pl) || \ for (size_t iter = 0; iter < 100000; iter++) {
(LOCK_IS_TAKEN == \ MyLock::Lock(&gLock);
_InterlockedExchange(pl, LOCK_IS_TAKEN))) { for (int m = 0; m < LEN; m++)
_mm_pause(); for (int i = 0; i < 4; i++)
} for (int j = 0; j < 4; j++)
} for (int k = 0; k < 4; k++)
void Unlock(PLOCK pl) { a[m][i][j] += b[m][i][k] * c[m][k][j];
_InterlockedExchange(reinterpret_cast<long*>(pl),\ r += std::accumulate((float*)a, \
LOCK_IS_FREE); (float*)(a + LEN), 0.0f);
} MyLock::Unlock(&gLock);
} }
wprintf(L"result: %f\n", r);
alignas(64) MyLock::LOCK gLock; }

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 39
USE MODERN SYNC APIS: WAITFORSINGLEOBJECT
// MyLock not required. Let the OS do the work! void fn() {
alignas(64) float a[LEN][4][4];
HANDLE hMutex; std::fill((float*)a, (float*)(a + LEN), 0.0f);
float r = 0.0;
int main(int argc, char* argv[]) { for (size_t iter = 0; iter < 100000; iter++) {
hMutex = CreateMutex(NULL,FALSE,NULL); WaitForSingleObject(hMutex, INFINITE);
// otherwise main is the same as before. for (int m = 0; m < LEN; m++)
// ... for (int i = 0; i < 4; i++)
} for (int j = 0; j < 4; j++)
for (int k = 0; k < 4; k++)
a[m][i][j] += b[m][i][k] * c[m][k][j];
r += std::accumulate((float*)a, \
(float*)(a + LEN), 0.0f);
ReleaseMutex(hMutex);
}
wprintf(L"result: %f\n", r);
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 40
USE MODERN SYNC APIS: STD::MUTEX
// MyLock not required. Let the OS do the work! void fn() {
std::mutex mutex; alignas(64) float a[LEN][4][4];
std::fill((float*)a, (float*)(a + LEN), 0.0f);
float r = 0.0;
for (size_t iter = 0; iter < 100000; iter++) {
mutex.lock();
for (int m = 0; m < LEN; m++)
for (int i = 0; i < 4; i++)
for (int j = 0; j < 4; j++)
for (int k = 0; k < 4; k++)
a[m][i][j] += b[m][i][k] * c[m][k][j];
r += std::accumulate((float*)a, \
(float*)(a + LEN), 0.0f);
mutex.unlock();
}
wprintf(L"result: %f\n", r);
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 41
AVOID FALSE SHARING
False Sharing Test • Reduced execution time by 90% after
(less is better) optimization!
35,000 • Performance of binaries compiled with
Microsoft Visual Studio 2022 v17.0.5.
30,000 28,598
• Testing done by AMD technology labs, February
25,000 5, 2022 on the following system. Test
milliseconds

configuration: AMD Ryzen™ Threadripper™ PRO

20,000
5995WX, Enermax LIQTECH TR4 II series
15,000 360mm liquid cooler, 256GB (8 x 32GB 2R
RDDR4-3200 at 24-22-22-52) memory, AMD
10,000 Radeon™ RX 6800 XT GPU with driver 21.10.2
5,000
(October 25, 2021), 2TB M.2 NVME SSD, AMD
2,422 Reference Motherboard, Windows® 11 x64 build
0 21H2, 1920x1080 resolution. Actual results may
before after vary.
optimization

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 42
AVOID FALSE SHARING
#include <chrono> int main(int argc, char* argv[]) {
#include <numeric> int numThreads = std::thread::hardware_concurrency();
#include <thread> ThreadData* a = static_cast<ThreadData*>(_aligned_malloc(
#include <vector> numThreads*sizeof(ThreadData), 64));
if (nullptr == a) return EXIT_FAILURE;
#if defined (APPLY_OPTIMIZATION) std::vector<std::thread> threads = {};
/* 64 bytes */ auto t0 = high_resolution_clock::now();
struct alignas(64) ThreadData { unsigned long sum; }; for (size_t i = 0; i < numThreads; ++i) {
#else threads.push_back(std::thread(fn, &a[i], i));
/* 4 bytes */ }
struct ThreadData { unsigned long sum; }; for (size_t i = 0; i < numThreads; ++i) {
#endif threads[i].join();
}
using namespace std::chrono; auto t1 = high_resolution_clock::now();
#define NUM_ITER 100000000 wprintf(L"time (ms): %lli\n",
duration_cast<milliseconds>(t1 - t0).count());
void fn(ThreadData* p, size_t seed) { for (size_t i = 0; i < numThreads; ++i) {
srand(static_cast<unsigned int>(seed)); wprintf(L"sum[%llu] = %lu\n", i, (* (a + i)).sum);
p->sum = 0; }
for (int i = 0; i < NUM_ITER; i++) { _aligned_free(a);
p->sum += rand() % 2; return EXIT_SUCCESS;
} }
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 43
PREFER DATA ACCESS PATTERNS MATCHING HARDWARE PREFETCHER
BEHAVIORS

Streaming

Stride

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 44
STREAMING HARDWARE PREFETCHER

Uses history of memory access patterns to fetch additional sequential lines in ascending or descending order.

alignas(64) float a[LEN];

// …
float sum = 0.0f;
for (size_t i = 0; i < LEN; i++) {
sum += a[i]; // streaming prefetch
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 45
STRIDE HARDWARE PREFETCHER

Uses memory access history of individual instructions to fetch additional lines when each access is a constant
distance from the previous.

struct S { double x1, y1, z1, w1; char name[256]; double x2, y2, z2, w2; };
alignas(64) S a[LEN];
// …
double sumX1 = 0.0f, sumX2 = 0.0f;
for (size_t i = 0; i < LEN; i++) {
sumX1 += a[i].x1; // stride prefetch 0
sumX2 += a[i].x2; // stride prefetch 1
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 46
USE SOFTWARE PREFETCH INSTRUCTIONS FOR LINKED DATA
Nvidia PhysX 4.1 KaplaDemo • Over 60% faster after optimization!
AMD Ryzen™ 7 4700G, NVidia GeForce RTX™ 2080
(higher is better) • Performance of binaries compiled with
250 Microsoft Visual Studio 2019 v16.8.3.
210 • Testing done by AMD technology labs, January
200 4, 2021 on the following system. Test
configuration: AMD Ryzen™ 7 4700G, AMD
Wraith Spire Cooler, 16GB (2 x 8GB DDR4-3200
At start of demo
Average FPS

150
125 at 22-22-22-52) memory, NVidia GeForce RTX™
2080 GPU with driver 460.89 (December 15,
100 2020), 512GB M.2 NVME SSD, AMD Ryzen™
Reference Motherboard, Windows® 10 x64
50 build 20H2, 1920x1080 resolution. Actual
results may vary
0
before after
optimization

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 47
USE SOFTWARE PREFETCH INSTRUCTIONS FOR LINKED DATA…
// Copyright (c) 2021 NVIDIA Corporation. All rights reserved PxMat44 pose(c->getGlobalPose());
// ConvexRenderer.cpp from https://github.com/NVIDIAGameWorks/PhysX/tree/4.1/physx float* mp = (float*)pose.front();
void ConvexRenderer::updateTransformations() float* ta = tt;
{ for (int k = 0; k < 16; k++) {
for (int i = 0; i < (int)mGroups.size(); i++) { *(tt++) = *(mp++);
ConvexGroup *g = mGroups[i]; }
if (g->texCoords.empty()) PxVec3 matOff = c->getMaterialOffset();
continue; ta[3] = matOff.x;
float* tt = &g->texCoords[0]; ta[7] = matOff.y;
for (int j = 0; j < (int)g->convexes.size(); j++) { ta[11] = matOff.z;
const Convex* c = g->convexes[j]; int idFor2DTex = c->getSurfaceMaterialId();
#if defined(APPLY_OPTIMIZATION) int idFor3DTex = c->getMaterialId();
int distance = 4; // TODO find ideal number const int MAX_3D_TEX = 8;
size_t future = (j + distance) % g->convexes.size(); ta[15] = (float)(idFor2DTex*MAX_3D_TEX + idFor3DTex);
_mm_prefetch(0x0F8 + (char*)(g->convexes[future]), _MM_HINT_NTA); // mPxActor }
_mm_prefetch(0x100 + (char*)(g->convexes[future]), _MM_HINT_NTA); // mLocalPose glBindTexture(GL_TEXTURE_2D, g->matTex);
_mm_prefetch(0x148 + (char*)(g->convexes[future]), _MM_HINT_NTA); // mMaterialOffset.x glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, g->texSize,
_mm_prefetch(0x14C + (char*)(g->convexes[future]), _MM_HINT_NTA); // mMaterialOffset.y g->texSize, GL_RGBA, GL_FLOAT, &g->texCoords[0]);
_mm_prefetch(0x150 + (char*)(g->convexes[future]), _MM_HINT_NTA); // mMaterialOffset.z glBindTexture(GL_TEXTURE_2D, 0);
_mm_prefetch(0x164 + (char*)(g->convexes[future]), _MM_HINT_NTA); //mSurfaceMaterialId }
_mm_prefetch(0x160 + (char*)(g->convexes[future]), _MM_HINT_NTA); // mMaterialId }
#endif

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 48
ALIGN MEMCPY SOURCE AND DESTINATION POINTERS
• Update the compiler for the latest memcpy , memset , and other C runtime optimizations!
• Memcpy behavior is undefined if dest and src overlap.
• The compiler may generate Rep Move String instructions which have defined overlapping behavior.
• Alignas(64) may allow faster rep movs microcode.
• Alignas(4096) may reduce store-to-load conflicts.
• The processor uses linear address bits 0 thru 11 to determine Store-To-Load-Forward eligibility.
• PMCx024 LsBadStatus2 StliOther counts store-to-load conflicts where a load was unable to complete
due to a non-forwardable conflict with an older store.
• Alignas(4096) may benefit probe filtering on AMD Threadripper™ and EPYC™ processors.
• Aligning to the bit_floor may provide a good balance of cache hits and alignment:
• std::clamp(std::bit_floor(count), 4, 4096);

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 49
AVOID PENALTIES WHILE MIXING SSE AND AVX INSTRUCTIONS
mesh_to_sdf.exe --maxload • There is a significant penalty for mixing SSE and AVX
instructions when the upper 128 bits of the YMM
AVX2(8-wide) registers contain non-zero data.
(less is better)
• Benchmark execution time was reduced by 60% after
35,000
31,512 VZeroUpper optimization.
30,000 • Performance of binaries compiled with Microsoft Visual
25,000 Studio 2022 v17.0.5.
milliseconds

• Testing done by AMD technology labs, February 5, 2022

20,000
on the following system. Test configuration: AMD
15,000 12,219
Ryzen™ Threadripper™ PRO 5995WX, Enermax LIQTECH
TR4 II series 360mm liquid cooler, 256GB (8 x 32GB 2R
10,000 RDDR4-3200 at 24-22-22-52) memory, AMD Radeon™
RX 6800 XT GPU with driver 21.10.2 (October 25, 2021),
5,000
2TB M.2 NVME SSD, AMD Reference Motherboard,
0 Windows® 11 x64 build 21H2, 1920x1080 resolution.
before after Actual results may vary.
optimization

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 50
AVOID PENALTY FOR MIXING SSE AND AVX INSTRUCTIONS
• Use PMCx00E Floating Point Dispatch Faults > 0 to find code which may be missing VZeroUpper or
VZeroAll instructions during AVX to SSE and SSE to AVX transitions.
• Optimization 1:
• Use the /arch:AVX compiler flag.
• AVX is supported by 94% of users according to the January 2022 Steam Hardware & Software Survey.
• Optimization 2:
• Return a __m256 value using pass-by-reference in the function parameter list rather than the
function return type.
• Optimization 3:
• Use __forceinline on the function definition.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 51
AVOID PENALTY FOR MIXING SSE AND AVX INSTRUCTIONS
// Before Optimization // After Optimization
__m256 udTriangle_sq_precalc_SIMD_8grid( void udTriangle_sq_precalc_SIMD_8grid(
const __m256 p_x, const __m256 p_y, const __m256 p_x, const __m256 p_y,
const __m256 p_z, const tri_precalc_t &pc ) const __m256 p_z, const tri_precalc_t& pc,
{ __m256 &ret )
// ... {
__m256 res = _mm256_blendv_ps( res1, res0, // ...
cmp ); ret = _mm256_blendv_ps( res1, res0,
cmp );
return res;
}
}

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 52
Before the optimization,
FP_DISPATCH_FAULTS may occur because
there is no VZeroUpper or VZeroAll
instruction during the AVX to SSE transition.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 53
After the optimization,
FP_DISPATCH_FAULTS have been reduced
because there is a VZeroUpper instruction
during the AVX to SSE transition.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 54
SUPPORT HYBRID GRAPHICS

• Use
IDXGIFactory6::EnumAdapterByGpuPreference
DXGI_GPU_PREFERENCE_HIGH_PERFORMANC
E for game applications.

• The user may change preferences per

application in Graphics settings.

• Testing done by AMD performance labs January

24, 2022 on a Dell G5 15 SE laptop equipped
with, 16GB DDR4-3200MHz, Ryzen™ 9 4900H
with Radeon™ RX 5600M, Win11 Pro x64
22000.434.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 55
USE PREFERRED VIDEO AND AUDIO CODECS
• Prefer H264 video and AAC audio codecs as recommended
by the Unreal Engine Electra Plugin.

• Hardware accelerated codecs may increase hours of battery

life and reduce CPU work.

• Radeon™ RX 6500 XT and Radeon™ RX 6400 Supported

Rendering Format:
• 4K H264 Decode Yes.
• WMV3 Decode No.
• See amd.com for more.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 56
SOFTWARE OPTIMIZATION GUIDES
• AMD Family 19h is “Zen 3”

• AMD Family 17h Models 30h is “Zen 2”

• See
https://developer.amd.com/resources/develop
er-guides-manuals/

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 57
Design faster. Render faster. Iterate faster.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 59
DISCLAIMER AND NOTICES
Disclaimer The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and
typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences
between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security
vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this
information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without
obligation of AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO
REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF
ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD is not responsible for any electronic virus or damage or losses therefrom that may be caused by changes or modifications that you make to
your system, including but not limited to antivirus software. Changes to your system configurations and settings, including but not limited to
antivirus software, is done at your sole discretion and under no circumstances will AMD be liable to you for any such changes. You assume all risk
and are solely responsible for any damages that may arise from or are related to changes that you make to your system, including but not limited
to antivirus software.
AMD, the AMD Arrow logo, Ryzen™, Threadripper™, Radeon™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other
product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Microsoft,
Windows, and Visual Studio are registered trademarks of Microsoft Corporation in the US and/or other countries. Unreal® is a trademark or
registered trademark of Epic Games, Inc. in the United States of America and elsewhere. NVIDIA is a trademark and/or registered trademark of
NVIDIA Corporation in the U.S. and/or other countries. Steam is a trademark and/or registered trademark of Valve Corporation. PCIe is a
registered trademark of PCI-SIG.
AMD products or technologies may include hardware to accelerate encoding or decoding of certain video standards but require the use of
additional programs/applications.
2022 Advanced Micro Devices, Inc. All rights reserved.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 60
DISCLAIMER AND NOTICES
Code sample on slide 48 is modified.
Copyright (c) 2022 NVIDIA Corporation. All rights reserved. Code Sample is licensed subject to the following:
“Redistribution and use in source and binary forms, with or without modification, are permitted provided that the
following conditions are met: Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other materials provided with the
distribution. Neither the name of NVIDIA CORPORATION nor the names of its contributors may be used to endorse or
promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED
BY THE COPYRIGHT HOLDERS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.”
MeshToSDF, Copyright 2022 Mikkel Gjoel under MIT License. https://github.com/pixelmager/MeshToSDF
Infiltrator Demo uses the Unreal® Engine. Unreal® is a trademark or registered trademark of Epic Games, Inc. in the
United States of America and elsewhere.
Unreal® Engine, Copyright 1998 – 2022, Epic Games, Inc. All rights reserved.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 61
DISCLAIMER AND NOTICES
• Claim “Zen 3” +19% IPC uplift
• Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz
frequency on 8-core "Zen 2" Ryzen™ 7 3800XT and "Zen 3" Ryzen™ 7 5800X desktop processors configured with Windows® 10,
NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600. Results may vary. R5K-003
• Design faster. Render faster. Iterate faster. Create more, faster with AMD Ryzen™ processors
• Testing by AMD Performance Labs as of September 23, 2020 using a Ryzen™ 9 5950X and Intel Core i9-10900K configured with
DDR4-3600C16 and NVIDIA GeForce RTX 2080 Ti. Results may vary. R5K-039
• The information contained herein is for informational purposes only and is subject to change without notice. Timelines, roadmaps,
and/or product release dates shown herein are plans only and subject to change. "Zen 2" and "Zen 3" are codenames for AMD
architectures, and are not product names. GD-122
• Engineering projections are not a guarantee of final performance. Performance projections by AMD engineering staff based on
expected Ryzen™ Threadripper™ Pro 5000 WX series processors vs Ryzen™ Threadripper™ Pro 3000 WX series processors. Specific
projections are based on reference design platforms and are subject to change when final products are released in market.

AMD PUBLIC | GDC22 | AMD Ryzen™ Processor Software Optimization | March 2022 62

Intel Optimization Reference Manual V1 050
No ratings yet
Intel Optimization Reference Manual V1 050
895 pages
AMD Computex 2025 Press Deck
No ratings yet
AMD Computex 2025 Press Deck
49 pages
GDC 2023 AMD Ryzen Processor Software Optimization
No ratings yet
GDC 2023 AMD Ryzen Processor Software Optimization
77 pages
Advance Micro Devices: Exploring The Rise of The Underdog of The Chip Manifacturing Industry
No ratings yet
Advance Micro Devices: Exploring The Rise of The Underdog of The Chip Manifacturing Industry
86 pages
Angle Encoder & Mounting Kit Mounting Guide: MAN Diesel
100% (4)
Angle Encoder & Mounting Kit Mounting Guide: MAN Diesel
14 pages
Ryzen Master Quick Reference Guide
No ratings yet
Ryzen Master Quick Reference Guide
51 pages
7th Gen Core Family Mobile U y Processor Lines Datasheet Vol1 Rev008
No ratings yet
7th Gen Core Family Mobile U y Processor Lines Datasheet Vol1 Rev008
216 pages
Ryzen - Wikipedia
No ratings yet
Ryzen - Wikipedia
42 pages
Computer Architecture and Organization Case Study GROUP 6
No ratings yet
Computer Architecture and Organization Case Study GROUP 6
5 pages
Software Optimization Guide For AMD EPYC™ 7003 Processors
No ratings yet
Software Optimization Guide For AMD EPYC™ 7003 Processors
55 pages
7th Gen Core Family Mobile U y Processor Lines Datasheet Vol 1
No ratings yet
7th Gen Core Family Mobile U y Processor Lines Datasheet Vol 1
214 pages
Software Optimization Guide For The AMD Zen5 Microarchitecture
No ratings yet
Software Optimization Guide For The AMD Zen5 Microarchitecture
64 pages
8th Gen Core Family Datasheet Vol 1
No ratings yet
8th Gen Core Family Datasheet Vol 1
135 pages
Amd Ryzen Cpu Optimization
No ratings yet
Amd Ryzen Cpu Optimization
55 pages
D386 Study Guide - Essentials
No ratings yet
D386 Study Guide - Essentials
37 pages
Datasheet, Vol. 1 - 7th Gen Intel® Core™ Processor U - Y-Platforms PDF
No ratings yet
Datasheet, Vol. 1 - 7th Gen Intel® Core™ Processor U - Y-Platforms PDF
118 pages
LXI Dishwasher SERVICE MANUAL
No ratings yet
LXI Dishwasher SERVICE MANUAL
52 pages
AMD Procesadores
No ratings yet
AMD Procesadores
44 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Lecture 03
No ratings yet
Lecture 03
32 pages
Objectives: Have A Correct Pronunciation With The Implementation of New Words Related To Microprocessors
No ratings yet
Objectives: Have A Correct Pronunciation With The Implementation of New Words Related To Microprocessors
4 pages
Datasheet, Vol. 1 - 7th Gen Intel® Core™ Processor U - Y-Platforms
No ratings yet
Datasheet, Vol. 1 - 7th Gen Intel® Core™ Processor U - Y-Platforms
118 pages
COAMP-activity of Microprocessor
No ratings yet
COAMP-activity of Microprocessor
30 pages
PCWorld 01 2024
No ratings yet
PCWorld 01 2024
112 pages
Every AMD Ryzen Processors Explained in Detail
No ratings yet
Every AMD Ryzen Processors Explained in Detail
5 pages
Pricelist - Corbell April 2025
No ratings yet
Pricelist - Corbell April 2025
22 pages
Ryzen Master Quick Reference Guide
No ratings yet
Ryzen Master Quick Reference Guide
45 pages
Josiah PowerPoint
No ratings yet
Josiah PowerPoint
23 pages
EB8000 User Manual
No ratings yet
EB8000 User Manual
755 pages
AMD Ryzen™ Processors For Enterprise Laptops
No ratings yet
AMD Ryzen™ Processors For Enterprise Laptops
14 pages
04 AMD Edge AI TechDay - Singapore - 2024 - FrankWang
No ratings yet
04 AMD Edge AI TechDay - Singapore - 2024 - FrankWang
29 pages
AMD Ryzen™ AI - Windows PCs With AI Built in
No ratings yet
AMD Ryzen™ AI - Windows PCs With AI Built in
10 pages
FAD+2022 Saeid+Moshkelani Final+Post PDF
No ratings yet
FAD+2022 Saeid+Moshkelani Final+Post PDF
18 pages
IBM Spectrum Protect For Windows Installation Guide
No ratings yet
IBM Spectrum Protect For Windows Installation Guide
194 pages
Philips Chassis Q551.1e La
No ratings yet
Philips Chassis Q551.1e La
290 pages
Hc2024 Amd Vpeng
No ratings yet
Hc2024 Amd Vpeng
36 pages
Dell Latitude 7290 LA-F311P R20 SBMLK12 13 AR MB
No ratings yet
Dell Latitude 7290 LA-F311P R20 SBMLK12 13 AR MB
58 pages
HC 2023 AMD Ryzen 7040 Series Processor 8-26-2023
No ratings yet
HC 2023 AMD Ryzen 7040 Series Processor 8-26-2023
27 pages
AMD Embedded Solutions Update
No ratings yet
AMD Embedded Solutions Update
23 pages
EXV2080 User Manual
No ratings yet
EXV2080 User Manual
78 pages
Dolby Ims3000 User Manual Issue 4
No ratings yet
Dolby Ims3000 User Manual Issue 4
134 pages
Ryzen Pro New Series Updated
No ratings yet
Ryzen Pro New Series Updated
11 pages
AMD Ryzen 7000
100% (1)
AMD Ryzen 7000
27 pages
Day1 Robot+MH1
No ratings yet
Day1 Robot+MH1
75 pages
Işlemci Teknolojileri
No ratings yet
Işlemci Teknolojileri
9 pages
WR Workbench Ocd Ice Hardware Ref 2.6
No ratings yet
WR Workbench Ocd Ice Hardware Ref 2.6
126 pages
CPU Database TechPowerUp
No ratings yet
CPU Database TechPowerUp
1 page
Schematics Mainboard p16001 6
No ratings yet
Schematics Mainboard p16001 6
13 pages
A Comprehensive Comparison of AMD CPUs
No ratings yet
A Comprehensive Comparison of AMD CPUs
3 pages
4, Prepare For Configuration
No ratings yet
4, Prepare For Configuration
55 pages
U03 - Exercises - III - v2 (SRC)
No ratings yet
U03 - Exercises - III - v2 (SRC)
6 pages
Amd Ryzen 7000 Series Desktop Processors Quick Reference Guide
No ratings yet
Amd Ryzen 7000 Series Desktop Processors Quick Reference Guide
2 pages
AMD Ryzen
No ratings yet
AMD Ryzen
4 pages
PLC, PCC, and DCS
No ratings yet
PLC, PCC, and DCS
6 pages
Ryzen 9 9900x3d 9950x3d How To Sell Non Competitive
No ratings yet
Ryzen 9 9900x3d 9950x3d How To Sell Non Competitive
2 pages
HPE - A00054164en - Us - HPE Rapid Setup Software Installation and Configuration Guide
No ratings yet
HPE - A00054164en - Us - HPE Rapid Setup Software Installation and Configuration Guide
23 pages
IGCSE Storage: RAM & ROM Basics
100% (1)
IGCSE Storage: RAM & ROM Basics
16 pages
Licensing-Sizing-Guide Forescout
No ratings yet
Licensing-Sizing-Guide Forescout
26 pages
PowerEdge Architecture Technical Overview
No ratings yet
PowerEdge Architecture Technical Overview
24 pages
EX00-CCTV-FS-XX0001 - 001008-Model
No ratings yet
EX00-CCTV-FS-XX0001 - 001008-Model
1 page
ThinkPad P Series
No ratings yet
ThinkPad P Series
14 pages
Dahua DSS7116DR en Datasheet
No ratings yet
Dahua DSS7116DR en Datasheet
12 pages
EtherNet - IP Embedded Switch Technology Application Guide, ENET-AT007D-EN-P
No ratings yet
EtherNet - IP Embedded Switch Technology Application Guide, ENET-AT007D-EN-P
5 pages
Ficha Técnica Modelo Biostar
No ratings yet
Ficha Técnica Modelo Biostar
4 pages
AMD Ryzen 5000 Series - Content Creation
No ratings yet
AMD Ryzen 5000 Series - Content Creation
2 pages
File List
No ratings yet
File List
15 pages
BCSoft Doku Eng
No ratings yet
BCSoft Doku Eng
14 pages
Amd Ryzen 7000 Series Mobile How To Sell Guide Generational
No ratings yet
Amd Ryzen 7000 Series Mobile How To Sell Guide Generational
3 pages
Amd Ryzen 3 7320u How To Sell Guide Generational
No ratings yet
Amd Ryzen 3 7320u How To Sell Guide Generational
2 pages
Amd Ryzen Embedded 8000 Product Brief
No ratings yet
Amd Ryzen Embedded 8000 Product Brief
4 pages
Applied Operating System - Draft
No ratings yet
Applied Operating System - Draft
5 pages
ds-LIFEBOOK U9310
No ratings yet
ds-LIFEBOOK U9310
7 pages
List of AMD Ryzen Microprocessors Aaaa - Wikipedia
No ratings yet
List of AMD Ryzen Microprocessors Aaaa - Wikipedia
4 pages
Jutcv
No ratings yet
Jutcv
3 pages
iPhone 14 Pro Purchase Receipt
No ratings yet
iPhone 14 Pro Purchase Receipt
2 pages
Exploring Instruction Set Architectural Variations x86 Arm and Riscv in Computeintensive Applications
No ratings yet
Exploring Instruction Set Architectural Variations x86 Arm and Riscv in Computeintensive Applications
6 pages
Ryzen Pro Benefits Infographic
No ratings yet
Ryzen Pro Benefits Infographic
1 page
The-Right-Tool-For-The-Job-Sales-Tool (TIPOCS DE CPU PARA TIPOS DE TRABALHO)
No ratings yet
The-Right-Tool-For-The-Job-Sales-Tool (TIPOCS DE CPU PARA TIPOS DE TRABALHO)
2 pages
Device-Independent I/O Software
No ratings yet
Device-Independent I/O Software
2 pages
AMD Ryzen 3 2200G Quad-Core CPU
No ratings yet
AMD Ryzen 3 2200G Quad-Core CPU
5 pages
GSV2712 Brief
No ratings yet
GSV2712 Brief
3 pages
Tiger Lake Infographic
No ratings yet
Tiger Lake Infographic
1 page
Intel DSKTP Chipsetchart
No ratings yet
Intel DSKTP Chipsetchart
3 pages
A
No ratings yet
A
1 page
Ryzen Ai Max Series How To Sell Guide Competitive
No ratings yet
Ryzen Ai Max Series How To Sell Guide Competitive
2 pages
20659636-C AMD Ryzen 5000 Series Processors QRG Non-Competitive FNL
No ratings yet
20659636-C AMD Ryzen 5000 Series Processors QRG Non-Competitive FNL
2 pages
Amd Ryzen 8000f How To Sell Guide Competitive
No ratings yet
Amd Ryzen 8000f How To Sell Guide Competitive
2 pages
Amd Ryzen 8000f Quick Reference Guide Competitive
No ratings yet
Amd Ryzen 8000f Quick Reference Guide Competitive
2 pages
3Rd Gen Amd Ryzen Threadripper Processors: Quick Reference Guide
No ratings yet
3Rd Gen Amd Ryzen Threadripper Processors: Quick Reference Guide
2 pages

GDC AMD Ryzen Processor Software Optimization

Uploaded by

GDC AMD Ryzen Processor Software Optimization

Uploaded by

AMD RYZEN™ PROCESSOR

• Join AMD for an introduction to the AMD

Integrated Total L3 Max Boost Base Default

Integrated Max Boost Base Default

Register Files, Execution Units

Watermarked Resource entries are assigned on demand. When in two-threaded mode a

Resource Competitively Shared Watermarked Statically Partitioned

uOP/Core L1I/Core L1D/Core L2/Core L3/CCX

*excluding products with AMD 3D V-Cache™ technology.

from the L2, and when evicted from the L2 are

32B fetch 32K Unified

32B fetch 32K Unified

32B fetch 32K Unified

RDDR4-3200 at 24-22-22-52) memory, AMD Radeon™

• Performance of UE4.27.2 binaries compiled with Microsoft Visual

• Run UE4Editor MapCheck!

• Use Unity AssetPostprocessor!

• Check stats before CPU profiling!

15,000 • Testing done by AMD technology labs, January 3,

80% Visual Studio 2022 v17.0.4.

configuration: AMD Ryzen™ Threadripper™ PRO

alignas(64) float a[LEN];

• Testing done by AMD technology labs, February 5, 2022

• The user may change preferences per

• Testing done by AMD performance labs January

• Hardware accelerated codecs may increase hours of battery

• Radeon™ RX 6500 XT and Radeon™ RX 6400 Supported

• AMD Family 17h Models 30h is “Zen 2”

You might also like