Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views9 pages

Survey of Nvidia RTX Technolog

The document presents a survey of Nvidia RTX technology, which is a hardware-accelerated ray tracing solution. It discusses the implementation details, performance comparisons with software implementations, and the challenges of developing rendering systems that can utilize RTX technology. The authors conducted experiments to understand the acceleration methods used by RTX and the feasibility of creating similar solutions for other GPU manufacturers.

Uploaded by

s1777838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views9 pages

Survey of Nvidia RTX Technolog

The document presents a survey of Nvidia RTX technology, which is a hardware-accelerated ray tracing solution. It discusses the implementation details, performance comparisons with software implementations, and the challenges of developing rendering systems that can utilize RTX technology. The authors conducted experiments to understand the acceleration methods used by RTX and the feasibility of creating similar solutions for other GPU manufacturers.

Uploaded by

s1777838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ISSN 0361-7688, Programming and Computer Software, 2020, Vol. 46, No. 4, pp. 297–304. © Pleiades Publishing, Ltd.

, 2020.
Russian Text © The Author(s), 2020, published in Programmirovanie, 2020, Vol. 46, No. 4.

Survey of Nvidia RTX Technology


V. V. Sanzharova,*, V. A. Frolovb,c,**, and V. A. Galaktionovb,***
aGubkin Russian State University of Oil and Gas., Moscow, Russia
b
Keldysh Institue of Applied Mathematics Russian Academy of Sciences, Moscow, Russia
cMoscow State University, Moscow, Russia

*e-mail: [email protected]
**e-mail: [email protected]
***e-mail: [email protected]
Received December 25, 2019; revised January 9, 2020; accepted January 13, 2020

Abstract—Nvidia RTX is a proprietary hardware-accelerated ray tracing technology. Since implementation


details are unknown, there were many questions in the development community about the hardware imple-
mentation: which stages of the ray tracing pipeline are hardware-accelerated and which of them can be effi-
ciently implemented in software. In this paper we present the results of our experiments with RTX aimed at
understanding the inner workings of this technology. We tried to address the questions intriguing developers
around the world: What kind of acceleration can be obtained in practical applications compared to software
implementations and what is the technological basis of this acceleration? How arduous is it to develop a ren-
dering system with support for hardware acceleration, which at the same time can work on a GPU without
RTX (using software implementation of ray tracing), or even perform calculations on a CPU? How effective
is the software emulation of RTX (available on some of the previous generation Nvidia GPUs) and to what
extent is it possible to bring its effectiveness to that of the hardware accelerated one? How hard will it be to
create an analog of RTX if one needs to run application on GPUs from other manufacturers?

DOI: 10.1134/S0361768820030068

1. INTRODUCTION 2. RELATED WORK


First specialized hardware solutions related to ray
Ray tracing is the basic operation not only in real- tracing were PCI cards for volumetric data visualiza-
istic computer graphics, but also in many other applica- tion, which implemented ray marching and Phong
tions (including physical simulation, collision detec- shading (for example, [1, 2]).
tion, computer geometry, neutron transfer simulation Another notable early implementation was the
in reactors, medical data visualization, scientific visual- SaarCOR architecture [3] and its updated version in
ization, etc.). There were quite a few implementations the FPGA [4]. The SaarCOR chip implemented the
of hardware acceleration for ray tracing in the past, but entire ray tracing algorithm - scene and camera data
none of them were widely available – that is integrated were loaded to a separate DRAM memory connected
into a consumer grade graphics accelerator. Therefore, to the chip. Like the work mentioned earlier, Saar-
it is difficult to overestimate the importance of the COR used packet tracing (in groups of 64 rays). Saar-
emergence of technology such as Nvidia RTX. COR used deep pipelining to conceal the high latency
of memory access, similar to modern GPUs: while
For researchers and developers around the world some groups of rays load data, other groups which
who use ray tracing in their solutions today, it is crucial already have data on the chip can calculate ray-trian-
to know the feasibility of adopting hardware accelera- gle intersection. SaarCOR and related systems didn’t
spread, their main drawback is that the solution as a
tion technology (partially or in full extent). First of all, whole is highly specialized.
the complexity of GPU software development is 2–5
An alternative for a specialized chip is the place-
times higher than for CPU development and espe- ment of a large number of conventional processors on
cially so when using specific GPU functionality. Sec- a single board with a PCI-e interface [5–8]. Such
ond, Nvidia RTX is a closed-source technology pro- solutions have the highest flexibility and can be used
vided today exclusively by only one manufacturer of not only for ray tracing acceleration. But they didn’t
graphics accelerators. achieve popularity because of high cost.

297
298 SANZHAROV et al.

Performance ratio between primary and secondary rays


4.5
4.0 rtx2070
rtx1070
3.5 hydra2070
hydra1070
3.0
2.5
2.0
1.5
1.0
0.5
0
Sponza Crysponza SanMiguel Hairballs

Fig. 1. Performance ratio between primary and secondary rays, i.e. how many times does performance drop when switching from
primary to secondary rays.

Finally, there is a group of solutions aimed at devel- of the BVH tree. The main difference of [14] is that
oping hardware extensions for graphics processors or treelets can store bounding volumes data with reduced
developing similar massively parallel programmable precision of 5 bits per plane instead of 32 bits for the
systems. One of the first programmable solutions of standard floating point type. This reduces the memory
this type was presented in [9]. Tree traversal and inter- load and improves the performance of the GPU
section calculations were implemented in a special cache. In addition, the solution proposed in [14] is rel-
block with fixed functionality, while user programs atively cheap in terms of the occupied crystal area –
(shaders) were executed on the so-called Shader Pro- the number of transistors used.
cessing Unit (SPU), which was very similar in archi- There are solutions aimed at the hardware imple-
tecture to the early GPU processor cores. Like Saar- mentation of ray tracing for mobile systems, where
COR, the work [9] used packet tracing, which caused system power consumption is an important parameter
significant performance drops for rays diverging in dif- [15, 16]. These works target classical ray tracing [17]
ferent directions – also called incoherent rays. The implementation, and unlike many of the works dis-
same problem is observed in many GPU ray tracing cussed above, they use the MIMD architecture with
implementations [10, 11]. VLIW processors to reduce energy and efficiency
One of the solutions to the random access memory losses during calculations for diverging rays.
problem was proposed in [12]. It involves dividing the To summarize, a lot of hardware implementations
memory request stream into at least 2 streams – the of ray tracing have been developed so far. A more com-
data stream for the rays (ray stream) and the data plete review can be found in [18]. In addition, some
stream for the scene (BVH tree, scene stream). It can commercial companies also presented their solutions
be said that [12] expands on the traditional approach [19], although at present they are not publicly avail-
of hiding memory latency with the help of deep pipe- able. Thus, RTX is the first technology available to the
lining, widely used in the GPU, – once the treelet (a general public. But since this technology is closed, it is
fragment of the BVH tree) is loaded to the cache it is unclear what particular acceleration methods RTX
traversed by all the rays that are currently being pro- uses. To understand this, we examined Nvidia RTX as
cessed on the GPU. The authors of [12] claim that in a black box, conducting various experiments and mea-
this way they manage to avoid random access. suring performance. For this purpose, we imple-
For GPUs, in addition to incoherent rays, there is mented basic path tracing algorithm using the Vulkan
also the problem of an irregular distribution of work. interface for RTX.
When there are few active threads/rays in the SIMD
thread group (warp), the efficiency of the SIMD GPU
processor is reduced. To solve this problem, in [10, 11], 2.1. Path Tracing on GPU
thread compaction and path regeneration were used, and GPU ray tracing by itself is a concise and indepen-
in [13] a block regeneration technique was proposed. dent task that can be solved effectively in a variety of
In [8, 10, 14] authors used the idea of grouping the ways. However, the problem changes radically when it
BVH tree into the so-called treelets – small fragments is necessary to build an extensible ray-tracing based

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


SURVEY OF NVIDIA RTX TECHNOLOGY 299

Sponza (66 K triagles) Cry Sponza (262 K triagles)

Sun Miguel (11 M triagles) Hairballs (224 M triagles)

Fig. 2. Test scenes. Sponza and Crysponza – low detail scenes with predominantly rectangular geometry. San Miguel contains a
lot of non-rectangular forms and is the one most close to scenes found in practical applications. Hairballs scene uses instancing
intensively; the base mesh consists of difficult for BVH tree geometric forms – thin hairs.

software system with a large number of different fea- on each other’s performance, because the kernel
tures while maintaining the initial level of perfor- requires as many registers as the heaviest state [20, 21].
mance at least approximately. This task is largely non- 2) “Separate kernel” – an approach in which the
trivial even for CPU implementations, but on the code is organized (usually manually) in the form of
GPU it requires special approaches. Currently, there several kernels, communicating with each other
are three general approaches: explicitly through data buffers in memory [20]. This
1) “Uber-kernel” – an approach in which the code approach solves the main disadvantages of the uber-
is organized manually or automatically (usually the kernel, and thanks to the explicit division into kernels,
latter) in the form of a finite state machine inside one it allows maintaining the performance of critical sec-
computational kernel. The state machine is used to tions of code. However, it has an increased complexity
reduce register pressure, since each state in the top of development due to the need for explicit data trans-
switch operator gets all the registers available for the fer, which is especially noticeable in the presence of
program (kernel) at its disposal. The main disadvan- sorting or compaction of threads [13]. In addition,
tages of this approach are significant performance there are increased overhead costs for launching and
losses on branching (when different threads execute waiting for kernels to finish execution, as well as the
different states), and the influence of different states data transfer itself. Therefore, this approach can slow

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


300 SANZHAROV et al.

Table 1. Millions of rays per second for 1024 × 1024 resolution and 1 sample per pixel, GTX1070 GPU (software imple-
mentation of RTX by Nvidia)
Scene Primary, MRays/sec. Secondary, MRays/sec. Tertiary, MRays/sec.

Sponza, RTX 103 42 38


Sponza, Hydra 214 62 59
Crysponza, RTX 95 24 14
Crysponza, Hydra 132 42 37
San Miguel, RTX 26 10 6
San Miguel, Hydra 55 23 20
Hairballs, RTX 22 9.5 7
Hairballs, Hydra 27.6 22.8 25

down the program in simple scenarios where the over- reveals that Microsoft is adding some of its own func-
head becomes comparable with the useful work of the tionality implemented in their HLSL compiler. Among
kernels. the additional features available in DirectX12, it is
3) “Wavefront path tracing” – a complex approach worth to mention the so-called “inline” ray tracing
which is based on grouping work and data for rays in feature that appeared in the new version (called DXR
separate queues [21]. Queues are executed in different Tier 1.1). This features allows to call ray tracing func-
kernels, and the result is stored in the memory also by tions in an arbitrary shader (pixel, compute, etc.)
calling specific kernels. Thanks to the grouping by without creating a special ray tracing pipeline [22].
conditional shaders, wavefront path tracing has lower In this case, the calling code performs all the work
losses on branching than previous approaches. How- necessary to use the results of ray tracing – calcula-
ever, the sorting and compaction of threads are strictly tions for ray intersections with one or the other kind of
required in this approach, therefore its overhead is primitive, ray misses, etc.
even higher than in the case of previous approach. Based on this we’ve chosen Vulkan as the main API
for our experiments. Ray tracing in Vulkan is used as a
separate type of pipeline along with the traditional
3. KNOWN DETAILS graphics and compute pipelines. To use this pipeline,
Currently, RTX technology is available through first it is necessary to build an acceleration structure in
hardware-software interfaces (Application Program- the form of a two-level tree. The lower level of the tree
ming Interfaces or APIs) such as DirectX12, Vulkan, (Bottom Level Acceleration Structure or BLAS) is built
and OptiX. The Vulkan API is of most interest to our for individual objects (RTX supports user-defined geo-
study, as it was designed specifically to provide devel- metric primitives) or meshes. The top level of the tree
opers with the most transparent access to the function- (Top Level Acceleration Structure or TLAS) is built for
ality of GPUs at a low level. This approach differs, for instances of these objects/meshes. Regarding the con-
example, from OptiX, in which Nvidia seeks to hide struction of accelerating structures in RTX the latest
implementation details to make life easier for the appli- information was presented at the SIGGRAPH confer-
cation developer. As for DirectX12, a careful analysis ence in 2019 [23].

Table 2. Millions of rays per second for 1024 × 1024 resolution and 1 sample per pixel, RTX2070 GPU (hardware acceler-
ated RTX implementation)
Scene Primary, MRays/sec. Secondary, MRays/sec. Tertiary, MRays/sec.
Sponza, RTX 970 534 490
Sponza, Hydra 480 122 130
Crysponza, RTX 788 386 337
Crysponza, Hydra 276 92 80
San Miguel, RTX 286 180 151
San Miguel, Hydra 127 48 42
Hairballs, RTX 282 238 289
Hairballs, Hydra 61 50 56

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


SURVEY OF NVIDIA RTX TECHNOLOGY 301

Fig. 3. Comparison of open-source path tracing implementation in HydraRenderer and RTX path tracer on GTX1070 (left) and
RTX2070 (right). For both images the left part (left to the dashed lines) shows performance for primary (coherent) rays, right part
(right to the dashed lines) shows performance for secondary divergent rays.

The ray tracing pipeline itself has 5 programmable 4. EXPERIMENTS


stages: (1) ray generation, (2) ray miss, (3) the nearest To test our hypotheses about internal structure of
intersection (ray hit) which is called after the intersec- RTX, we implemented a basic path tracing algorithm
tion is found, (4) a special stage for the implementa- and compared it to the open software path tracing
tion of opacity and similar effects (called “any hit” implementation in OpenCL in Hydra Renderer [26].
which is confusing) and, finally, (5) the intersection of To measure the performance in all our experiments,
the ray with the geometric primitive. Instead of the we used 2 graphics cards: GTX1070 and RTX2070.
fifth stage, one can use the built-in implementation of For GTX1070 ray tracing support in Vulkan is imple-
the ray-triangle intersection, which is apparently mented in software, while RTX2070 has hardware
implemented in hardware [24]. acceleration.
Rays carry the so-called “ray payload” – some data Experiment 1. In the first experiment, we measured
such as coordinates or color between the stages of the the time for different ray tracing depths from which we
pipeline. Nvidia recommends making the ray payload obtained time for different bounces (by sequentially
as small as possible, just like when transferring data subtracting the time for previous depth value from the
between stages of the graphics pipeline [25]. In [23], it time for current depth). From these values we esti-
can be noticed that the data between the various stages mated the number of rays per second for each bounce
of the ray tracing pipeline is transmitted not through using formula 4.1:
memory, but through internal queues. However, the
width ∗ height ∗ spp
larger the ray payload size, the less these queues can help. rays = (4.1)
ts
It is interesting to note that the sub-pass function-
ality in the Vulkan API basically allows you to transfer Using the Nvidia Nsight Graphics software, we
data between different processing cores without measured the time spent on “vkCmdTraceRaysNV”
unloading it into memory, and thus simulate the data call in Vulkan and compared it with the time spent on
transfer mechanism used in RTX (in accordance with the execution of the OpenCL kernel “BVH4Traversa-
[25]). However, it will be necessary to arrange compu- lInstKernel” in Hydra Renderer [26]. In this experi-
tations in the graphics pipeline in a specific way, since ment, we were interested in performance for several
the sub-passes were designed specifically with the cases: coherent (primary) and diverging rays (secondary
deferred shading algorithm in mind. In addition, the and tertiary, Fig. 1), as well as the dependence of perfor-
sub-passes mechanism is primarily intended for use on mance on the complexity of the geometry (Fig. 2,
mobile graphic processors, and at the moment there is Tables 1, 2).
no guarantee that sub-passes are not completely Experiment 2. Our next experiment purpose was to
ignored on desktop GPUs – that is in reality, all data check for the existence of an internal mechanism for
can actually be transmitted through memory regard- the distribution of irregular work, when some rays
less of sub-passes use. spawn a lot of new rays, and others generate few or no
new rays at all. Such recursive path tracing is, in a
Finally, it should be noted that Nvidia has been devel- sense, a traditional “challenge” for GPU implementa-
oping ray tracing as part of their OptiX product for the tions, since a naive implementation of this algorithm
last 5-7 years. Therefore, we once again draw attention to on a GPU using a stack in a single kernel is extremely
the work [21], in which the so-called “wavefront path inefficient [20, 21]. In order to get conclusive results,
tracing” is proposed. we conducted the experiment as follows: in the ray

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


302 SANZHAROV et al.

GTX1070, % drop RTX2070, % drop


8000

Performance drop (in times) 6000


4.5
4.0
3.5
3.0 4000
2.5
2.0
1.5 2000
1.0
0.5
0 0
Initial RTX ray tracing Perlin noise 100 200 300 400 500
(10 rays Payload size, 4 bytes
(10−40 rays) (10−40 evaluations)
or evaluations)
(а) Performance drop for a loop with (b) Performance drop (in percent)
random iteration number (10 to 40) depending on ray payload size in bytes

Fig. 4. Performance sustainability in different experiments.

generation shader, we spawned random amount of umn in Fig. 1) has a significant lead over all software
rays (10 to 40) and measured the drop in performance. implementations and does not slow down for more
And next, we conducted a similar experiment with than 2 times on any test scene. Second, on the Hair-
ordinary computations, when some heavy computa- balls scene, where ray grouping cannot help in princi-
tion (for example, perlin noise evaluation) was also ple due to the high complexity of the geometry, the
performed randomly from 10 to 40 times (Fig. 4a). hardware implementation and open-source software
Experiment 3. In this experiment, our goal was to implementations (hydra2070 and hydra1070 columns)
check for the presence of internal queues in RTX that that do not perform ray grouping behave identically
transmit data between the various stages of the ray and do not significantly lose performance. At the same
tracing pipeline. To do this, we sequentially increased time, the Nvidia RTX software implementation
the ray payload and measured the percentage drop in (gtx1070) demonstrates unexpected behavior: on a
performance to understand at what point the data simple Sponza scene it is in the lead, but on the other
transfer becomes a bottleneck (Fig. 4b). more complex scenes it is significantly outperformed
by the open-source implementation. That is, it has a
significantly higher performance drop percentage in
5. RESULTS transition from primary to secondary rays.
This can be caused by one of two main reasons:
Result 1. Nvidia RTX is primarily aimed at acceler-
ating random access to memory in tracing a large 1) If the software implementation of RTX (on
number of diverging rays. This conclusion follows GTX1070) is made in the form of a monolithic kernel,
from Fig. 3, on the right. On a small scene (Sponza) then the performance advantage on a simple scene is a
for primary (coherent) rays the hardware implementa- result of reduced overhead, since there is no need to
tion of Nvidia RTX outperforms the open-source soft- transfer data between different kernels. At the same
ware implementation from [26] by no more than 2 time, losses on complex scenes is a direct consequence
times. However, for secondary rays this ratio reaches of the known shortcomings of the uber kernels [20, 21].
5–6 times. In addition, on a heavy scene (Hair Balls) 2) If the software implementation of RTX (on
RTX achieves the same 4–5 times advantage, and the GTX1070) has a form of “wavefront path tracing” then
fact that acceleration is preserved for scenes where without proper hardware support for work distribution
memory is a bottleneck confirms our assumption. this approach is apparently not effective enough.
Result 2. RTX implements some sort of a mecha- We consider the first scenario as more probable,
nism for ray grouping. This is confirmed by the analy- however, since RTX is a closed-source technology, the
sis of the ray tracing performance degradation, pre- second option cannot be completely ruled out.
sented in Fig. 1. One can notice the following: first, Result 3. RTX implements some sort of internal
the hardware implementation (rtx2070, the first col- mechanism for irregular work distribution. This

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


SURVEY OF NVIDIA RTX TECHNOLOGY 303

mechanism apparently works on a principle similar to mance Graphics on Graphics Hardware, ACM Euro-
“wavefront path tracing” [21]. This conclusion is con- graphics, Eurographics Assoc., 2002, pp. 137–146.
firmed by the following observation during experi- 2. Pfister, H., et al., The VolumePro real-time ray-casting
ment 2: when for each pixel we generated random system, in Computer Graphics and Interactive Tech-
amount of rays (from 10 to 40), we measured 2 times niques, New York: Association for Computing Machin-
performance drop compared to scenario of consis- ery, 1999, pp. 251–260.
tently generating 10 rays per pixel. On the other hand, 3. Schmittler, J., Wald, I., and Slusallek, P., SaarCOR: a
when we repeated the same experiment with Perlin hardware architecture for ray tracing, Proc. ACM Special
noise computation, the performance drop was exactly Interest Group on Computer Graphics Conf. on Graphics
Hardware, Eurographics Assoc., 2002, pp. 27–36.
4 times, as it should be on the GPU due to the fact that
all threads in the SIMD warp group must wait for the 4. Schmittler, J., et al., Realtime ray tracing of dynamic
scenes on an FPGA chip, Proc. ACM SIGGRAPH/EU-
slowest one to complete execution (Fig. 4a). ROGRAPHICS Conf. on Graphics Hardware, ACM,
Result 4. RTX implements data transfer between 2004, pp. 95–106.
different stages of the pipeline (i.e., different user pro- 5. Hall, D., The AR350: today’s ray trace rendering pro-
grams) through the queues on the chip. This is con- cessor, Proc. Eurographics/SIGGRAPH Workshop on
firmed by the nature of the performance drop with Graphics Hardware – Hot 3D Session 1, Los Angeles,
increasing ray payload (Fig. 4b). This is further con- 2001.
firmed by the introduction of “Mesh shaders” in RTX 6. Seiler, L., et al., Larrabee: a many-core x86 architec-
hardware. ture for visual computing, ACM Trans. Graph., 2008,
vol. 7, no. 3, art. 18.
Result 5. Nvidia RTX is an extremely complex
technology that is difficult to effectively implement in 7. TRaX: Spjut, J., et al., A multi-threaded architecture
for real-time ray tracing, Proc. Symp. on Application
software. This conclusion is confirmed by the low effi- Specific Processors, Institute of Electrical and Electron-
ciency of the RTX software implementation from the ics Engineers, 2008, pp. 108–114.
Nvidia itself on the GTX1070 graphics card, which
8. Kopta, D., et al., An energy and bandwidth efficient ray
loses 2–3 times in performance to a simple open- tracing architecture, in High-Performance Graphics,
source ray tracing software implementation (Table 1). ACM, 2013, pp. 121–128.
The low performance in this case is probably the result of 9. Woop, S., Schmittler, J., and Slusallek, P., RPU: a pro-
high flexibility of the technology and the desire to make grammable ray processing unit for realtime ray tracing,
it as general as possible. And without proper hardware ACM Trans. Graph. (TOG), ACM, 2005, vol. 24, no. 3,
support such complex implementation is slow. pp. 434–444.
10. Aila, T. and Karras, T., Architecture considerations for
tracing incoherent rays, in High-Performance Graphics,
6. CONCLUSIONS Eurographics Assoc., 2010, pp. 113–122.
Nvidia RTX technology is a fairly general mecha- 11. Nocak, J., Havran, V., and Dachsbacher, C., Path re-
nism combining various hardware functionalities, generation for interactive path tracing, Proc. EURO-
which can be used not only in ray tracing, but also in GRAPHICS, Norrköping, 2010, pp. 61–64.
other applications (see [24] as an example of such use). 12. Shkurko, K., et al., Dual streaming for hardware-accel-
The main mechanisms used by RTX include: (1) erated ray tracing, in High Performance Graphics, ACM,
arranging random memory access during the tracing 2017, p. 12.
of diverging rays and (2) a mechanism for GPU work 13. Frolov, V.A. and Galaktionov, V.A., Low overhead path
creation, which includes (3) data transfer between dif- regeneration, Progr. Comput. Software, 2016, vol. 42,
no. 6, pp. 382–387.
ferent kernels through a cache on a chip. For the user,
RTX greatly simplifies development and provides high 14. Keely, S., Reduced precision hardware for ray tracing,
flexibility. On the other hand, this technology signifi- Proc. High-Performance Graphics, Los Angeles, 2014,
pp. 29–40.
cantly limits portability, since RTX is implemented as
a separate type of pipeline in Vulkan, and there is 15. Nah, J.H., et al., RayCore: a ray-tracing hardware ar-
chitecture for mobile devices, ACM Trans. Graph.
practically no other way to use the code developed for (TOG), ACM, 2014, vol. 33, no. 5, p. 162.
RTX in any other way. This problem is partially solved
in DirectX12 (DXR Tier 1.1) by introduction of the 16. Lee, W.J., et al., SGRT: a mobile GPU architecture for
real-time ray tracing, Proc. High-Performance Graphics,
“inline” ray tracing, which allows the use of RTX in the ACM, 2013, pp. 109–119.
“traditional” graphics/compute pipelines. However, the
17. Whitted, T., An improved illumination model for shad-
use of DirectX12 itself reduces portability even more. ed display, ACM Spec. Interest Group Comput. Graph.
Interact. Techn., 1979, vol. 13, no. 2, p. 14.
REFERENCES 18. Deng, Y., et al., Toward real-time ray tracing: a survey
on hardware acceleration and microarchitecture tech-
1. Meißner, M., et al., VIZARD II: a reconfigurable in- niques, ACM Comput. Surv. (CSUR), 2017, vol. 50, no. 4,
teractive volume rendering system, Proc. High-Perfor- p. 58.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


304 SANZHAROV et al.

19. Imagination technologies. PowerVR Ray Tracing, 23. Viitanen Timo, Acceleration data structure hardware
2019. https://www.imgtec.com/graphics-proces- (and software), Proc. SIGGRAPH 2019, Los Angeles,
sors/architecture/powervr-ray-tracing/ 2019.
20. Frolov, V.A., Kharlamov, A.A., and Ignatenko, A.V.,
Biased solution of integral illumination equation via ir- 24. Wald, I., et al., RTX beyond ray tracing: exploring the
radiance caching and path tracing on GPUs, Progr. use of hardware ray tracing cores for tet-mesh point lo-
Comput. Software, 2011, vol. 37, no. 5, pp. 252–259. cation, Proc. High-Performance Graphics 2019, Stras-
21. Laine, S., Karras, T., and Aila, T., Megakernels consid- bourg, July 8–10, 2019.
ered harmful: wavefront path tracing on GPUs, Proc. 25. Nvidia. Ray tracing developer resources, 2019.
5th High-Performance Graphics Conf. (HPG’13), New https://developer.nvidia.com/rtx/raytracing
York: ACM, 2013, pp. 137–143.
22. Microsoft. DirectX, DXR ray tracing specification, 26. Ray Tracing Systems, Keldysh Institute of Applied
2019. Mathematics, Moscow State Uiversity. Hydra Render-
https://microsoft.github.io/DirectX-Specs/d3d/Ray- er. Open source rendering system, 2019.
tracing.html https://github.com/Ray-Tracing-Systems/HydraAPI

PROGRAMMING AND COMPUTER SOFTWARE Vol. 46 No. 4 2020


Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like