0% found this document useful (0 votes)

30 views9 pages

Survey of Nvidia RTX Technolog

The document presents a survey of Nvidia RTX technology, which is a hardware-accelerated ray tracing solution. It discusses the implementation details, performance comparisons with software implementations, and the challenges of developing rendering systems that can utilize RTX technology. The authors conducted experiments to understand the acceleration methods used by RTX and the feasibility of creating similar solutions for other GPU manufacturers.

Uploaded by

s1777838

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views9 pages

Survey of Nvidia RTX Technolog

Uploaded by

s1777838

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ISSN 0361-7688, Programming and Computer Software, 2020, Vol. 46, No. 4, pp. 297–304. © Pleiades Publishing, Ltd.

, 2020.
Russian Text © The Author(s), 2020, published in Programmirovanie, 2020, Vol. 46, No. 4.

Survey of Nvidia RTX Technology

V. V. Sanzharova,*, V. A. Frolovb,c,**, and V. A. Galaktionovb,***
aGubkin Russian State University of Oil and Gas., Moscow, Russia
b
Keldysh Institue of Applied Mathematics Russian Academy of Sciences, Moscow, Russia
cMoscow State University, Moscow, Russia

*e-mail: [email protected]
**e-mail: [email protected]
***e-mail: [email protected]
Received December 25, 2019; revised January 9, 2020; accepted January 13, 2020

Abstract—Nvidia RTX is a proprietary hardware-accelerated ray tracing technology. Since implementation

details are unknown, there were many questions in the development community about the hardware imple-
mentation: which stages of the ray tracing pipeline are hardware-accelerated and which of them can be effi-
ciently implemented in software. In this paper we present the results of our experiments with RTX aimed at
understanding the inner workings of this technology. We tried to address the questions intriguing developers
around the world: What kind of acceleration can be obtained in practical applications compared to software
implementations and what is the technological basis of this acceleration? How arduous is it to develop a ren-
dering system with support for hardware acceleration, which at the same time can work on a GPU without
RTX (using software implementation of ray tracing), or even perform calculations on a CPU? How effective
is the software emulation of RTX (available on some of the previous generation Nvidia GPUs) and to what
extent is it possible to bring its effectiveness to that of the hardware accelerated one? How hard will it be to
create an analog of RTX if one needs to run application on GPUs from other manufacturers?

DOI: 10.1134/S0361768820030068

1. INTRODUCTION 2. RELATED WORK

First specialized hardware solutions related to ray
Ray tracing is the basic operation not only in real- tracing were PCI cards for volumetric data visualiza-
istic computer graphics, but also in many other application, which implemented ray marching and Phong
tions (including physical simulation, collision detec- shading (for example, [1, 2]).
tion, computer geometry, neutron transfer simulation Another notable early implementation was the
in reactors, medical data visualization, scientific visual- SaarCOR architecture [3] and its updated version in
ization, etc.). There were quite a few implementations the FPGA [4]. The SaarCOR chip implemented the
of hardware acceleration for ray tracing in the past, but entire ray tracing algorithm - scene and camera data
none of them were widely available – that is integrated were loaded to a separate DRAM memory connected
into a consumer grade graphics accelerator. Therefore, to the chip. Like the work mentioned earlier, Saar-
it is difficult to overestimate the importance of the COR used packet tracing (in groups of 64 rays). Saar-
emergence of technology such as Nvidia RTX. COR used deep pipelining to conceal the high latency
of memory access, similar to modern GPUs: while
For researchers and developers around the world some groups of rays load data, other groups which
who use ray tracing in their solutions today, it is crucial already have data on the chip can calculate ray-trian-
to know the feasibility of adopting hardware accelera- gle intersection. SaarCOR and related systems didn’t
spread, their main drawback is that the solution as a
tion technology (partially or in full extent). First of all, whole is highly specialized.
the complexity of GPU software development is 2–5
An alternative for a specialized chip is the place-
times higher than for CPU development and espe- ment of a large number of conventional processors on
cially so when using specific GPU functionality. Sec- a single board with a PCI-e interface [5–8]. Such
ond, Nvidia RTX is a closed-source technology pro- solutions have the highest flexibility and can be used
vided today exclusively by only one manufacturer of not only for ray tracing acceleration. But they didn’t
graphics accelerators. achieve popularity because of high cost.

297
298 SANZHAROV et al.

Performance ratio between primary and secondary rays

4.5
4.0 rtx2070
rtx1070
3.5 hydra2070
hydra1070
3.0
2.5
2.0
1.5
1.0
0.5
0
Sponza Crysponza SanMiguel Hairballs

Fig. 1. Performance ratio between primary and secondary rays, i.e. how many times does performance drop when switching from
primary to secondary rays.

Finally, there is a group of solutions aimed at devel- of the BVH tree. The main difference of [14] is that
oping hardware extensions for graphics processors or treelets can store bounding volumes data with reduced
developing similar massively parallel programmable precision of 5 bits per plane instead of 32 bits for the
systems. One of the first programmable solutions of standard floating point type. This reduces the memory
this type was presented in [9]. Tree traversal and inter- load and improves the performance of the GPU
section calculations were implemented in a special cache. In addition, the solution proposed in [14] is rel-
block with fixed functionality, while user programs atively cheap in terms of the occupied crystal area –
(shaders) were executed on the so-called Shader Pro- the number of transistors used.
cessing Unit (SPU), which was very similar in archi- There are solutions aimed at the hardware imple-
tecture to the early GPU processor cores. Like Saar- mentation of ray tracing for mobile systems, where
COR, the work [9] used packet tracing, which caused system power consumption is an important parameter
significant performance drops for rays diverging in dif- [15, 16]. These works target classical ray tracing [17]
ferent directions – also called incoherent rays. The implementation, and unlike many of the works dis-
same problem is observed in many GPU ray tracing cussed above, they use the MIMD architecture with
implementations [10, 11]. VLIW processors to reduce energy and efficiency
One of the solutions to the random access memory losses during calculations for diverging rays.
problem was proposed in [12]. It involves dividing the To summarize, a lot of hardware implementations
memory request stream into at least 2 streams – the of ray tracing have been developed so far. A more com-
data stream for the rays (ray stream) and the data plete review can be found in [18]. In addition, some
stream for the scene (BVH tree, scene stream). It can commercial companies also presented their solutions
be said that [12] expands on the traditional approach [19], although at present they are not publicly avail-
of hiding memory latency with the help of deep pipe- able. Thus, RTX is the first technology available to the
lining, widely used in the GPU, – once the treelet (a general public. But since this technology is closed, it is
fragment of the BVH tree) is loaded to the cache it is unclear what particular acceleration methods RTX
traversed by all the rays that are currently being pro- uses. To understand this, we examined Nvidia RTX as
cessed on the GPU. The authors of [12] claim that in a black box, conducting various experiments and mea-
this way they manage to avoid random access. suring performance. For this purpose, we imple-
For GPUs, in addition to incoherent rays, there is mented basic path tracing algorithm using the Vulkan
also the problem of an irregular distribution of work. interface for RTX.
When there are few active threads/rays in the SIMD
thread group (warp), the efficiency of the SIMD GPU
processor is reduced. To solve this problem, in [10, 11], 2.1. Path Tracing on GPU
thread compaction and path regeneration were used, and GPU ray tracing by itself is a concise and indepen-
in [13] a block regeneration technique was proposed. dent task that can be solved effectively in a variety of
In [8, 10, 14] authors used the idea of grouping the ways. However, the problem changes radically when it
BVH tree into the so-called treelets – small fragments is necessary to build an extensible ray-tracing based

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

SURVEY OF NVIDIA RTX TECHNOLOGY 299

Sponza (66 K triagles) Cry Sponza (262 K triagles)

Sun Miguel (11 M triagles) Hairballs (224 M triagles)

Fig. 2. Test scenes. Sponza and Crysponza – low detail scenes with predominantly rectangular geometry. San Miguel contains a
lot of non-rectangular forms and is the one most close to scenes found in practical applications. Hairballs scene uses instancing
intensively; the base mesh consists of difficult for BVH tree geometric forms – thin hairs.

software system with a large number of different fea- on each other’s performance, because the kernel
tures while maintaining the initial level of perfor- requires as many registers as the heaviest state [20, 21].
mance at least approximately. This task is largely non- 2) “Separate kernel” – an approach in which the
trivial even for CPU implementations, but on the code is organized (usually manually) in the form of
GPU it requires special approaches. Currently, there several kernels, communicating with each other
are three general approaches: explicitly through data buffers in memory [20]. This
1) “Uber-kernel” – an approach in which the code approach solves the main disadvantages of the uber-
is organized manually or automatically (usually the kernel, and thanks to the explicit division into kernels,
latter) in the form of a finite state machine inside one it allows maintaining the performance of critical sec-
computational kernel. The state machine is used to tions of code. However, it has an increased complexity
reduce register pressure, since each state in the top of development due to the need for explicit data trans-
switch operator gets all the registers available for the fer, which is especially noticeable in the presence of
program (kernel) at its disposal. The main disadvan- sorting or compaction of threads [13]. In addition,
tages of this approach are significant performance there are increased overhead costs for launching and
losses on branching (when different threads execute waiting for kernels to finish execution, as well as the
different states), and the influence of different states data transfer itself. Therefore, this approach can slow

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

300 SANZHAROV et al.

Table 1. Millions of rays per second for 1024 × 1024 resolution and 1 sample per pixel, GTX1070 GPU (software imple-
mentation of RTX by Nvidia)
Scene Primary, MRays/sec. Secondary, MRays/sec. Tertiary, MRays/sec.

Sponza, RTX 103 42 38

Sponza, Hydra 214 62 59
Crysponza, RTX 95 24 14
Crysponza, Hydra 132 42 37
San Miguel, RTX 26 10 6
San Miguel, Hydra 55 23 20
Hairballs, RTX 22 9.5 7
Hairballs, Hydra 27.6 22.8 25

down the program in simple scenarios where the over- reveals that Microsoft is adding some of its own func-
head becomes comparable with the useful work of the tionality implemented in their HLSL compiler. Among
kernels. the additional features available in DirectX12, it is
3) “Wavefront path tracing” – a complex approach worth to mention the so-called “inline” ray tracing
which is based on grouping work and data for rays in feature that appeared in the new version (called DXR
separate queues [21]. Queues are executed in different Tier 1.1). This features allows to call ray tracing func-
kernels, and the result is stored in the memory also by tions in an arbitrary shader (pixel, compute, etc.)
calling specific kernels. Thanks to the grouping by without creating a special ray tracing pipeline [22].
conditional shaders, wavefront path tracing has lower In this case, the calling code performs all the work
losses on branching than previous approaches. How- necessary to use the results of ray tracing – calcula-
ever, the sorting and compaction of threads are strictly tions for ray intersections with one or the other kind of
required in this approach, therefore its overhead is primitive, ray misses, etc.
even higher than in the case of previous approach. Based on this we’ve chosen Vulkan as the main API
for our experiments. Ray tracing in Vulkan is used as a
separate type of pipeline along with the traditional
3. KNOWN DETAILS graphics and compute pipelines. To use this pipeline,
Currently, RTX technology is available through first it is necessary to build an acceleration structure in
hardware-software interfaces (Application Program- the form of a two-level tree. The lower level of the tree
ming Interfaces or APIs) such as DirectX12, Vulkan, (Bottom Level Acceleration Structure or BLAS) is built
and OptiX. The Vulkan API is of most interest to our for individual objects (RTX supports user-defined geo-
study, as it was designed specifically to provide devel- metric primitives) or meshes. The top level of the tree
opers with the most transparent access to the function- (Top Level Acceleration Structure or TLAS) is built for
ality of GPUs at a low level. This approach differs, for instances of these objects/meshes. Regarding the con-
example, from OptiX, in which Nvidia seeks to hide struction of accelerating structures in RTX the latest
implementation details to make life easier for the appli- information was presented at the SIGGRAPH confer-
cation developer. As for DirectX12, a careful analysis ence in 2019 [23].

Table 2. Millions of rays per second for 1024 × 1024 resolution and 1 sample per pixel, RTX2070 GPU (hardware acceler-
ated RTX implementation)
Scene Primary, MRays/sec. Secondary, MRays/sec. Tertiary, MRays/sec.
Sponza, RTX 970 534 490
Sponza, Hydra 480 122 130
Crysponza, RTX 788 386 337
Crysponza, Hydra 276 92 80
San Miguel, RTX 286 180 151
San Miguel, Hydra 127 48 42
Hairballs, RTX 282 238 289
Hairballs, Hydra 61 50 56

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

SURVEY OF NVIDIA RTX TECHNOLOGY 301

Fig. 3. Comparison of open-source path tracing implementation in HydraRenderer and RTX path tracer on GTX1070 (left) and
RTX2070 (right). For both images the left part (left to the dashed lines) shows performance for primary (coherent) rays, right part
(right to the dashed lines) shows performance for secondary divergent rays.

The ray tracing pipeline itself has 5 programmable 4. EXPERIMENTS

stages: (1) ray generation, (2) ray miss, (3) the nearest To test our hypotheses about internal structure of
intersection (ray hit) which is called after the intersec- RTX, we implemented a basic path tracing algorithm
tion is found, (4) a special stage for the implementa- and compared it to the open software path tracing
tion of opacity and similar effects (called “any hit” implementation in OpenCL in Hydra Renderer [26].
which is confusing) and, finally, (5) the intersection of To measure the performance in all our experiments,
the ray with the geometric primitive. Instead of the we used 2 graphics cards: GTX1070 and RTX2070.
fifth stage, one can use the built-in implementation of For GTX1070 ray tracing support in Vulkan is imple-
the ray-triangle intersection, which is apparently mented in software, while RTX2070 has hardware
implemented in hardware [24]. acceleration.
Rays carry the so-called “ray payload” – some data Experiment 1. In the first experiment, we measured
such as coordinates or color between the stages of the the time for different ray tracing depths from which we
pipeline. Nvidia recommends making the ray payload obtained time for different bounces (by sequentially
as small as possible, just like when transferring data subtracting the time for previous depth value from the
between stages of the graphics pipeline [25]. In [23], it time for current depth). From these values we esti-
can be noticed that the data between the various stages mated the number of rays per second for each bounce
of the ray tracing pipeline is transmitted not through using formula 4.1:
memory, but through internal queues. However, the
width ∗ height ∗ spp
larger the ray payload size, the less these queues can help. rays = (4.1)
ts
It is interesting to note that the sub-pass function-
ality in the Vulkan API basically allows you to transfer Using the Nvidia Nsight Graphics software, we
data between different processing cores without measured the time spent on “vkCmdTraceRaysNV”
unloading it into memory, and thus simulate the data call in Vulkan and compared it with the time spent on
transfer mechanism used in RTX (in accordance with the execution of the OpenCL kernel “BVH4Traversa-
[25]). However, it will be necessary to arrange compu- lInstKernel” in Hydra Renderer [26]. In this experi-
tations in the graphics pipeline in a specific way, since ment, we were interested in performance for several
the sub-passes were designed specifically with the cases: coherent (primary) and diverging rays (secondary
deferred shading algorithm in mind. In addition, the and tertiary, Fig. 1), as well as the dependence of perfor-
sub-passes mechanism is primarily intended for use on mance on the complexity of the geometry (Fig. 2,
mobile graphic processors, and at the moment there is Tables 1, 2).
no guarantee that sub-passes are not completely Experiment 2. Our next experiment purpose was to
ignored on desktop GPUs – that is in reality, all data check for the existence of an internal mechanism for
can actually be transmitted through memory regard- the distribution of irregular work, when some rays
less of sub-passes use. spawn a lot of new rays, and others generate few or no
new rays at all. Such recursive path tracing is, in a
Finally, it should be noted that Nvidia has been devel- sense, a traditional “challenge” for GPU implementa-
oping ray tracing as part of their OptiX product for the tions, since a naive implementation of this algorithm
last 5-7 years. Therefore, we once again draw attention to on a GPU using a stack in a single kernel is extremely
the work [21], in which the so-called “wavefront path inefficient [20, 21]. In order to get conclusive results,
tracing” is proposed. we conducted the experiment as follows: in the ray

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

302 SANZHAROV et al.

GTX1070, % drop RTX2070, % drop

8000

Performance drop (in times) 6000

4.5
4.0
3.5
3.0 4000
2.5
2.0
1.5 2000
1.0
0.5
0 0
Initial RTX ray tracing Perlin noise 100 200 300 400 500
(10 rays Payload size, 4 bytes
(10−40 rays) (10−40 evaluations)
or evaluations)
(а) Performance drop for a loop with (b) Performance drop (in percent)
random iteration number (10 to 40) depending on ray payload size in bytes

Fig. 4. Performance sustainability in different experiments.

generation shader, we spawned random amount of umn in Fig. 1) has a significant lead over all software
rays (10 to 40) and measured the drop in performance. implementations and does not slow down for more
And next, we conducted a similar experiment with than 2 times on any test scene. Second, on the Hair-
ordinary computations, when some heavy computa- balls scene, where ray grouping cannot help in princi-
tion (for example, perlin noise evaluation) was also ple due to the high complexity of the geometry, the
performed randomly from 10 to 40 times (Fig. 4a). hardware implementation and open-source software
Experiment 3. In this experiment, our goal was to implementations (hydra2070 and hydra1070 columns)
check for the presence of internal queues in RTX that that do not perform ray grouping behave identically
transmit data between the various stages of the ray and do not significantly lose performance. At the same
tracing pipeline. To do this, we sequentially increased time, the Nvidia RTX software implementation
the ray payload and measured the percentage drop in (gtx1070) demonstrates unexpected behavior: on a
performance to understand at what point the data simple Sponza scene it is in the lead, but on the other
transfer becomes a bottleneck (Fig. 4b). more complex scenes it is significantly outperformed
by the open-source implementation. That is, it has a
significantly higher performance drop percentage in
5. RESULTS transition from primary to secondary rays.
This can be caused by one of two main reasons:
Result 1. Nvidia RTX is primarily aimed at acceler-
ating random access to memory in tracing a large 1) If the software implementation of RTX (on
number of diverging rays. This conclusion follows GTX1070) is made in the form of a monolithic kernel,
from Fig. 3, on the right. On a small scene (Sponza) then the performance advantage on a simple scene is a
for primary (coherent) rays the hardware implementa- result of reduced overhead, since there is no need to
tion of Nvidia RTX outperforms the open-source soft- transfer data between different kernels. At the same
ware implementation from [26] by no more than 2 time, losses on complex scenes is a direct consequence
times. However, for secondary rays this ratio reaches of the known shortcomings of the uber kernels [20, 21].
5–6 times. In addition, on a heavy scene (Hair Balls) 2) If the software implementation of RTX (on
RTX achieves the same 4–5 times advantage, and the GTX1070) has a form of “wavefront path tracing” then
fact that acceleration is preserved for scenes where without proper hardware support for work distribution
memory is a bottleneck confirms our assumption. this approach is apparently not effective enough.
Result 2. RTX implements some sort of a mecha- We consider the first scenario as more probable,
nism for ray grouping. This is confirmed by the analy- however, since RTX is a closed-source technology, the
sis of the ray tracing performance degradation, pre- second option cannot be completely ruled out.
sented in Fig. 1. One can notice the following: first, Result 3. RTX implements some sort of internal
the hardware implementation (rtx2070, the first col- mechanism for irregular work distribution. This

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

SURVEY OF NVIDIA RTX TECHNOLOGY 303

mechanism apparently works on a principle similar to mance Graphics on Graphics Hardware, ACM Euro-
“wavefront path tracing” [21]. This conclusion is con- graphics, Eurographics Assoc., 2002, pp. 137–146.
firmed by the following observation during experi- 2. Pfister, H., et al., The VolumePro real-time ray-casting
ment 2: when for each pixel we generated random system, in Computer Graphics and Interactive Tech-
amount of rays (from 10 to 40), we measured 2 times niques, New York: Association for Computing Machin-
performance drop compared to scenario of consis- ery, 1999, pp. 251–260.
tently generating 10 rays per pixel. On the other hand, 3. Schmittler, J., Wald, I., and Slusallek, P., SaarCOR: a
when we repeated the same experiment with Perlin hardware architecture for ray tracing, Proc. ACM Special
noise computation, the performance drop was exactly Interest Group on Computer Graphics Conf. on Graphics
Hardware, Eurographics Assoc., 2002, pp. 27–36.
4 times, as it should be on the GPU due to the fact that
all threads in the SIMD warp group must wait for the 4. Schmittler, J., et al., Realtime ray tracing of dynamic
scenes on an FPGA chip, Proc. ACM SIGGRAPH/EU-
slowest one to complete execution (Fig. 4a). ROGRAPHICS Conf. on Graphics Hardware, ACM,
Result 4. RTX implements data transfer between 2004, pp. 95–106.
different stages of the pipeline (i.e., different user pro- 5. Hall, D., The AR350: today’s ray trace rendering pro-
grams) through the queues on the chip. This is con- cessor, Proc. Eurographics/SIGGRAPH Workshop on
firmed by the nature of the performance drop with Graphics Hardware – Hot 3D Session 1, Los Angeles,
increasing ray payload (Fig. 4b). This is further con- 2001.
firmed by the introduction of “Mesh shaders” in RTX 6. Seiler, L., et al., Larrabee: a many-core x86 architec-
hardware. ture for visual computing, ACM Trans. Graph., 2008,
vol. 7, no. 3, art. 18.
Result 5. Nvidia RTX is an extremely complex
technology that is difficult to effectively implement in 7. TRaX: Spjut, J., et al., A multi-threaded architecture
for real-time ray tracing, Proc. Symp. on Application
software. This conclusion is confirmed by the low effi- Specific Processors, Institute of Electrical and Electron-
ciency of the RTX software implementation from the ics Engineers, 2008, pp. 108–114.
Nvidia itself on the GTX1070 graphics card, which
8. Kopta, D., et al., An energy and bandwidth efficient ray
loses 2–3 times in performance to a simple open- tracing architecture, in High-Performance Graphics,
source ray tracing software implementation (Table 1). ACM, 2013, pp. 121–128.
The low performance in this case is probably the result of 9. Woop, S., Schmittler, J., and Slusallek, P., RPU: a pro-
high flexibility of the technology and the desire to make grammable ray processing unit for realtime ray tracing,
it as general as possible. And without proper hardware ACM Trans. Graph. (TOG), ACM, 2005, vol. 24, no. 3,
support such complex implementation is slow. pp. 434–444.
10. Aila, T. and Karras, T., Architecture considerations for
tracing incoherent rays, in High-Performance Graphics,
6. CONCLUSIONS Eurographics Assoc., 2010, pp. 113–122.
Nvidia RTX technology is a fairly general mecha- 11. Nocak, J., Havran, V., and Dachsbacher, C., Path re-
nism combining various hardware functionalities, generation for interactive path tracing, Proc. EURO-
which can be used not only in ray tracing, but also in GRAPHICS, Norrköping, 2010, pp. 61–64.
other applications (see [24] as an example of such use). 12. Shkurko, K., et al., Dual streaming for hardware-accel-
The main mechanisms used by RTX include: (1) erated ray tracing, in High Performance Graphics, ACM,
arranging random memory access during the tracing 2017, p. 12.
of diverging rays and (2) a mechanism for GPU work 13. Frolov, V.A. and Galaktionov, V.A., Low overhead path
creation, which includes (3) data transfer between dif- regeneration, Progr. Comput. Software, 2016, vol. 42,
no. 6, pp. 382–387.
ferent kernels through a cache on a chip. For the user,
RTX greatly simplifies development and provides high 14. Keely, S., Reduced precision hardware for ray tracing,
flexibility. On the other hand, this technology signifi- Proc. High-Performance Graphics, Los Angeles, 2014,
pp. 29–40.
cantly limits portability, since RTX is implemented as
a separate type of pipeline in Vulkan, and there is 15. Nah, J.H., et al., RayCore: a ray-tracing hardware ar-
chitecture for mobile devices, ACM Trans. Graph.
practically no other way to use the code developed for (TOG), ACM, 2014, vol. 33, no. 5, p. 162.
RTX in any other way. This problem is partially solved
in DirectX12 (DXR Tier 1.1) by introduction of the 16. Lee, W.J., et al., SGRT: a mobile GPU architecture for
real-time ray tracing, Proc. High-Performance Graphics,
“inline” ray tracing, which allows the use of RTX in the ACM, 2013, pp. 109–119.
“traditional” graphics/compute pipelines. However, the
17. Whitted, T., An improved illumination model for shad-
use of DirectX12 itself reduces portability even more. ed display, ACM Spec. Interest Group Comput. Graph.
Interact. Techn., 1979, vol. 13, no. 2, p. 14.
REFERENCES 18. Deng, Y., et al., Toward real-time ray tracing: a survey
on hardware acceleration and microarchitecture tech-
1. Meißner, M., et al., VIZARD II: a reconfigurable in- niques, ACM Comput. Surv. (CSUR), 2017, vol. 50, no. 4,
teractive volume rendering system, Proc. High-Perfor- p. 58.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

304 SANZHAROV et al.

19. Imagination technologies. PowerVR Ray Tracing, 23. Viitanen Timo, Acceleration data structure hardware
2019. https://www.imgtec.com/graphics-proces- (and software), Proc. SIGGRAPH 2019, Los Angeles,
sors/architecture/powervr-ray-tracing/ 2019.
20. Frolov, V.A., Kharlamov, A.A., and Ignatenko, A.V.,
Biased solution of integral illumination equation via ir- 24. Wald, I., et al., RTX beyond ray tracing: exploring the
radiance caching and path tracing on GPUs, Progr. use of hardware ray tracing cores for tet-mesh point lo-
Comput. Software, 2011, vol. 37, no. 5, pp. 252–259. cation, Proc. High-Performance Graphics 2019, Stras-
21. Laine, S., Karras, T., and Aila, T., Megakernels consid- bourg, July 8–10, 2019.
ered harmful: wavefront path tracing on GPUs, Proc. 25. Nvidia. Ray tracing developer resources, 2019.
5th High-Performance Graphics Conf. (HPG’13), New https://developer.nvidia.com/rtx/raytracing
York: ACM, 2013, pp. 137–143.
22. Microsoft. DirectX, DXR ray tracing specification, 26. Ray Tracing Systems, Keldysh Institute of Applied
2019. Mathematics, Moscow State Uiversity. Hydra Render-
https://microsoft.github.io/DirectX-Specs/d3d/Ray- er. Open source rendering system, 2019.
tracing.html https://github.com/Ray-Tracing-Systems/HydraAPI

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

Applsci 12 09599 v2
No ratings yet
Applsci 12 09599 v2
14 pages
70 Rasterization HighPerfSwRasterizationOnGPU
No ratings yet
70 Rasterization HighPerfSwRasterizationOnGPU
10 pages
Ray Tracing On GPU
No ratings yet
Ray Tracing On GPU
44 pages
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
No ratings yet
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
22 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
GPU Gems 2 Programming Techniques For High Performance Graphics and General Purpose Computation 1st Edition Matt Pharr Instant Download
No ratings yet
GPU Gems 2 Programming Techniques For High Performance Graphics and General Purpose Computation 1st Edition Matt Pharr Instant Download
52 pages
Are We Done With Ray Tracing
No ratings yet
Are We Done With Ray Tracing
91 pages
GPU Ray Tracing Implementation
No ratings yet
GPU Ray Tracing Implementation
44 pages
Graphics Pipeline & Rasterization MIT
No ratings yet
Graphics Pipeline & Rasterization MIT
98 pages
McGuire SIGGRAPH19 From Raster To Rays
No ratings yet
McGuire SIGGRAPH19 From Raster To Rays
45 pages
Efficient GPU Screen-Space Ray Tracing
No ratings yet
Efficient GPU Screen-Space Ray Tracing
13 pages
Nah 2014
No ratings yet
Nah 2014
15 pages
Architectural Details of Tesla GPU Microarchitecture
No ratings yet
Architectural Details of Tesla GPU Microarchitecture
9 pages
Weiskopf 2004
No ratings yet
Weiskopf 2004
9 pages
GPU-Based Nonlinear Ray Tracing: EUROGRAPHICS 2004 / M.-P. Cani and M. Slater (Guest Editors)
No ratings yet
GPU-Based Nonlinear Ray Tracing: EUROGRAPHICS 2004 / M.-P. Cani and M. Slater (Guest Editors)
9 pages
Evolution of GPU Architecture
No ratings yet
Evolution of GPU Architecture
21 pages
GPU Insights for Tech Enthusiasts
No ratings yet
GPU Insights for Tech Enthusiasts
35 pages
SGRT hpg13
No ratings yet
SGRT hpg13
11 pages
Lee 2013
No ratings yet
Lee 2013
12 pages
Thesis
No ratings yet
Thesis
47 pages
GPU-Accelerated Ray Tracing Optimization
No ratings yet
GPU-Accelerated Ray Tracing Optimization
10 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Ray Tracing for Real-Time Rendering
No ratings yet
Ray Tracing for Real-Time Rendering
27 pages
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Updated 2025
No ratings yet
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Updated 2025
115 pages
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Online Version
No ratings yet
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Online Version
139 pages
How A GPU Works - Kayvon Fatahalian
No ratings yet
How A GPU Works - Kayvon Fatahalian
87 pages
How Modern GPUs Work and Evolve
No ratings yet
How Modern GPUs Work and Evolve
87 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Intel-Rtrt - Applications-Developer-Guide-V4
No ratings yet
Intel-Rtrt - Applications-Developer-Guide-V4
29 pages
GPU Gems 3: Advanced Graphics Techniques
No ratings yet
GPU Gems 3: Advanced Graphics Techniques
1 page
Presented by Ragasudha.B Pavitha.P
No ratings yet
Presented by Ragasudha.B Pavitha.P
13 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Gpu Rendering Thesis
100% (2)
Gpu Rendering Thesis
5 pages
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
No ratings yet
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
11 pages
TDCI Arch
No ratings yet
TDCI Arch
77 pages
Mingpu: A Minimum Gpu Library For Computer Vision: Pavel Babenko and Mubarak Shah
No ratings yet
Mingpu: A Minimum Gpu Library For Computer Vision: Pavel Babenko and Mubarak Shah
30 pages
RayTracing Tutorial
100% (1)
RayTracing Tutorial
38 pages
(Videogame) Rendering 102
No ratings yet
(Videogame) Rendering 102
32 pages
Tutorial CGASE
No ratings yet
Tutorial CGASE
18 pages
GPU Evolution for Tech Enthusiasts
No ratings yet
GPU Evolution for Tech Enthusiasts
21 pages
GPU & CUDA Programming Essentials
No ratings yet
GPU & CUDA Programming Essentials
73 pages
Real-Time Rendering 4th-Real-Time Ray Tracing
No ratings yet
Real-Time Rendering 4th-Real-Time Ray Tracing
50 pages
Accelerating Large Graph Algorithms On The GPU Using CUDA
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using CUDA
12 pages
Analyzing CUDA Workloads Using A Detailed GPU Simulator
No ratings yet
Analyzing CUDA Workloads Using A Detailed GPU Simulator
12 pages
Cuda Opencl
No ratings yet
Cuda Opencl
17 pages
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
No ratings yet
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
39 pages
Assingmentbic 10503
No ratings yet
Assingmentbic 10503
13 pages
59 HowThingsWork
No ratings yet
59 HowThingsWork
5 pages
Raytracing Dynamic Scenes On The GPU Using Grids: Sashidhar Guntury and P.J. Narayanan
No ratings yet
Raytracing Dynamic Scenes On The GPU Using Grids: Sashidhar Guntury and P.J. Narayanan
12 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
2 pages
Advances in Ray Tracing Developer Tools: March 21th, 2024 - S62398
No ratings yet
Advances in Ray Tracing Developer Tools: March 21th, 2024 - S62398
59 pages
08 gpuSoftwareRasterLaineAndPantaleoni BPS2011
No ratings yet
08 gpuSoftwareRasterLaineAndPantaleoni BPS2011
40 pages
C4-IEEE ParallelRT
No ratings yet
C4-IEEE ParallelRT
8 pages
Electronics 10 02730
No ratings yet
Electronics 10 02730
26 pages
Shader Performance Analysis On A Modern GPU Architecture
No ratings yet
Shader Performance Analysis On A Modern GPU Architecture
10 pages
Real-Time Ray Tracing Through The Eyes of A Game Developer: Jacco Bikker
No ratings yet
Real-Time Ray Tracing Through The Eyes of A Game Developer: Jacco Bikker
10 pages
Desktop Publishing
No ratings yet
Desktop Publishing
10 pages
Addressing - Moods
No ratings yet
Addressing - Moods
9 pages
8580bf5e 586f 455b 9b04 D2477a6c6bbgfg7 - AngularJS - Syllabus - BestDotNetTraining
No ratings yet
8580bf5e 586f 455b 9b04 D2477a6c6bbgfg7 - AngularJS - Syllabus - BestDotNetTraining
4 pages
Class 10 IT 402 Important Questions Updated Syllabus
No ratings yet
Class 10 IT 402 Important Questions Updated Syllabus
3 pages
Section6Exercise1 MakingPredictions ParticulateMatterExposure PDF
No ratings yet
Section6Exercise1 MakingPredictions ParticulateMatterExposure PDF
66 pages
Windows Hardware Design
No ratings yet
Windows Hardware Design
1,324 pages
DAX Interview Questions
No ratings yet
DAX Interview Questions
8 pages
2023-01-01
No ratings yet
2023-01-01
3 pages
7 Question Quiz How Good Is Your Ad Backup and Recovery Solution Ebook 29604
No ratings yet
7 Question Quiz How Good Is Your Ad Backup and Recovery Solution Ebook 29604
10 pages
2018 FEMAP Symposium - Using FEMAP With LS-DYNA - Applied CAx
No ratings yet
2018 FEMAP Symposium - Using FEMAP With LS-DYNA - Applied CAx
28 pages
Main Projrct
No ratings yet
Main Projrct
61 pages
Bokeh Cheat Sheet
No ratings yet
Bokeh Cheat Sheet
1 page
AOS7 Troubleshooting
No ratings yet
AOS7 Troubleshooting
179 pages
Bricscad Bim: The Value Proposition
No ratings yet
Bricscad Bim: The Value Proposition
20 pages
What Is Normalization in DBMS (SQL) - 1NF, 2NF, 3NF, BCNF Database With Example
No ratings yet
What Is Normalization in DBMS (SQL) - 1NF, 2NF, 3NF, BCNF Database With Example
8 pages
P3 and p4: Required Formula
No ratings yet
P3 and p4: Required Formula
18 pages
Venu Babu Ravur
No ratings yet
Venu Babu Ravur
1 page
Real Log Book
No ratings yet
Real Log Book
24 pages
TOPIC 5 - Introduction To BIM - PART 3
No ratings yet
TOPIC 5 - Introduction To BIM - PART 3
76 pages
Practical Linear Algebra A Geometry Toolbox
100% (1)
Practical Linear Algebra A Geometry Toolbox
506 pages
Python: A Industrial Training and Project Presentation
No ratings yet
Python: A Industrial Training and Project Presentation
11 pages
Hydro Flask 2025 03 01 2025 03 31
No ratings yet
Hydro Flask 2025 03 01 2025 03 31
7 pages
Web Technologies Week 03-04 (CSS)
No ratings yet
Web Technologies Week 03-04 (CSS)
50 pages
Second Year Roadmap by Ankush
No ratings yet
Second Year Roadmap by Ankush
13 pages
Machine Learning Techniques LAB FILE - KAI651
No ratings yet
Machine Learning Techniques LAB FILE - KAI651
16 pages
Week 4-7 Nptel Haskell HRST
No ratings yet
Week 4-7 Nptel Haskell HRST
16 pages
C# Windows Programming Guide
No ratings yet
C# Windows Programming Guide
45 pages
Nagaraju Juluru@fusion p2p
No ratings yet
Nagaraju Juluru@fusion p2p
5 pages
Cours de Béton Armé Selon L'eurocode PDF
No ratings yet
Cours de Béton Armé Selon L'eurocode PDF
1 page
Spring Data JPA Annotations Guide
No ratings yet
Spring Data JPA Annotations Guide
15 pages

Survey of Nvidia RTX Technolog

Uploaded by

Survey of Nvidia RTX Technolog

Uploaded by

ISSN 0361-7688, Programming and Computer Software, 2020, Vol. 46, No. 4, pp. 297–304. © Pleiades Publishing, Ltd.

Survey of Nvidia RTX Technology

Abstract—Nvidia RTX is a proprietary hardware-accelerated ray tracing technology. Since implementation

1. INTRODUCTION 2. RELATED WORK

Performance ratio between primary and secondary rays

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

Sponza (66 K triagles) Cry Sponza (262 K triagles)

Sun Miguel (11 M triagles) Hairballs (224 M triagles)

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

Sponza, RTX 103 42 38

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

The ray tracing pipeline itself has 5 programmable 4. EXPERIMENTS

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

GTX1070, % drop RTX2070, % drop

Performance drop (in times) 6000

Fig. 4. Performance sustainability in different experiments.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020

You might also like