Full Text 01
Full Text 01
Global Illumination in
Real-Time using Voxel
Cone Tracing on Mobile
Devices
Conrad Wahlén
Master of Science Thesis in Electrical Engineering
Global Illumination in Real-Time using Voxel Cone Tracing on Mobile Devices
Conrad Wahlén
LiTH-ISY-EX–16/5011–SE
v
Acknowledgments
The process of writing this thesis has been a long one. A bit longer than I (and
others) thought at the beginning. But I am grateful to everyone involved for the
support and for pushing me over the finish line.
Special thanks to Mindroad and Åsa Detterfelt, to Mikael Persson and to Inge-
mar Ragnemalm.
vii
Contents
Acronyms xi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Additional Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Theoretical Background 5
2.1 Light Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Rendering equation . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Bi-directional Reflectance Distribution Function . . . . . . 6
2.2 Evolution of GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Programming Graphics . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 OpenGL (ES) . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ix
x Contents
5 Results 25
5.1 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.2 Timing method . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.3 Visual comparison . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.4 Average rendering time . . . . . . . . . . . . . . . . . . . . . 30
5.1.5 Average time per step . . . . . . . . . . . . . . . . . . . . . . 30
5.1.6 Soft shadow angle . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.7 Voxel grid size . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Image comparison . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Average rendering time . . . . . . . . . . . . . . . . . . . . . 36
5.2.3 Average time per step . . . . . . . . . . . . . . . . . . . . . . 36
5.2.4 Soft shadows varying angle . . . . . . . . . . . . . . . . . . 36
5.2.5 Voxel grid size . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Conclusions 39
6.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2.1 Possibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2.2 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2.3 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3 Mobile and desktop development . . . . . . . . . . . . . . . . . . . 43
6.3.1 OpenGL and OpenGL ES . . . . . . . . . . . . . . . . . . . . 43
6.3.2 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.3 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography 45
Acronyms
AO Ambient Occlusion.
AR Augmented Reality.
GI Global Illumination.
IR Instant Radiosity.
PM Photon Mapping.
PT Path Tracing.
RT Ray Tracing.
xi
xii Acronyms
VR Virtual Reality.
Introduction
1
As humans we are predominantly ocular creatures. Vision being our main sen-
sory input to interpret and understand the world around us. It is not surprising
then that recreating images of our world has been done since the dawn of human-
ity. Computer graphics has enabled an unprecedented opportunity to simulate
and capture realistic images of our and other realities. Development of computa-
tional resources for these tasks has increased the quality and complexity of scenes
dramatically. Still a lot of work remains to be able to interact with the rendered
scenes.
1.1 Motivation
Ever since humans drew paintings on the walls of caves, we have been interested
in making images and models of this world. From this innate passion both art
and physics has some common ancestor.
The invention of computer graphics has resulted in a unique opportunity to
merge art and physics; to create works of art that not only look real but stems
from computational models of the real world, and to create unreal worlds that
still behave as they would be real.
To achieve realism both the direct and indirect light must be simulated. Di-
rect light meaning light that is directly shining on a surface and indirect light
meaning that the light has interacted with the scene in some way first. Combin-
ing these two result in Global Illumination (GI).
Thanks to the work in [9], there is also an equation that can be used to cal-
culate GI in a point, referred to as the rendering equation. While this equation
is very difficult (which in science means practically impossible) to solve for most
cases. By approximating it, it is possible to find solutions good enough for most
purposes. As computational power is growing, fewer approximations need to be
1
2 1 Introduction
made.
For most interactive and real-time applications, direct light and its effects are
simple to compute. The problem with GI stems from the complexity with indi-
rect light. Since environmental interactions could imply everything from simple
bounces to effects such as caustics. These effects are usually approximated with
techniques that use a minimal amount of resources. It can be precomputed tex-
tures where advanced lighting has been calculated before use. Or the screen
information could be used to approximate indirect shadows, called screen space
Ambient Occlusion (AO).
By using GI techniques, it is not only possible to remove many of the special
solutions for lighting effects. But also to add effects that are otherwise difficult to
simulate and add a lot of realism. For example caustics and soft shadows, both
direct and indirect.
1.2 Purpose
Traditionally GI has been used for offline rendering [18]. Meaning it is not used
in interactive or real-time applications. The increase in hardware performance
and development of new algorithms has lead to implementations that are able to
produce real-time frame-rates. There have also been demonstrations of simple
variants on low-end hardware such as a mobile device.
While mobile hardware is still far from as capable as high-end desktop hard-
ware, the chip architecture and the mobility it offers is unique. Considering the
rise of Virtual Reality (VR) and Augmented Reality (AR), it offers a truly wire-
less experience. By making high-end graphics available on low-end hardware, it
allows the experiences to be more immersive and easier to use.
An alternative to this is presented in [5], where graphics is calculated on a
server and streamed to the device. The drawback of this approach is the need for
a network connection which limits the mobility. A solution like this could also
benefit by knowing the limits of the device.
• Is there a method for Global Illumination that scales well enough to be used on
limited hardware such as a mobile device?
• What are the limiting factors of the mobile device? And are there any potential
benefits of using mobile devices for GI?
1.4 Limitations
The solution will only be available on devices with the following specifications.
1.5 Source Code 3
• Frame rate
• Dynamic scenes
• Graphical glitches
• Visual Quality
The solution will be exclusively tested on a Samsung S7 Edge with the Mali T880
MP12 Graphics Processing Unit (GPU).
5
6 2 Theoretical Background
The integration is over the upper hemisphere (Ω+ oriented around the normal
of the point. The incoming radiance (Li ) is the outgoing radiance of a certain di-
rection at another point in the scene. The second function (fr ) is the Bi-directional
Reflectance Distribution Function (BRDF) and will be explained in the next sec-
tion. The final term is a scaling factor based on the incident angle of the incoming
light.
The goal of a GI algorithm is to solve (or approximate an answer to) these
equations.
BRDF calculates the amount of outgoing light from the point x in the direction ω
coming from direction ωi .
There are two restrictions for the BRDF to be physically plausible. It has to
conserve energy; it cannot send out more light that is coming in. And it has
to be symmetric; the outgoing radiance has to be the same if the incoming and
outgoing directions are swapped.
Different surfaces reflect incoming light in different ways. The two extremes
are: light will be spread evenly on the hemisphere in the surface normal direction,
or the light will be reflected. The first case is called a Lambertian surface and an
example of one is matte paper. Regardless from where you look at surface, the
brightness will be the same. The other extreme can also be called a mirror.
The first example is usually referred to as the diffuse part of the light calcu-
lation. A more general formulation of the second extreme where the light is not
simply reflected but spread in the direction of reflection is usually called the spec-
ular part. Combining the diffuse and specular part creates a good approximation
for most common BRDFs.
To describe more of the layers in the start of the chapter more details can be
added to the rendering equation. For example transmitted radiance can be de-
scribed with a Bi-directional Transmittance Distribution Function (BTDF). How-
ever, this thesis will not cover these effects.
tant. The evolution for the GPU (a massively parallell processor) from a graphics
processor to a more general processing unit is therefore quite natural. This has
also meant that the way that a GPU is used in graphics has changed from sim-
ply calculation transformations and simple light models, to more advanced light
simulation approaches (trying to solve the rendering equation). Some of these
approaches will be discussed in the next chapter.
3.1.1 Radiosity
Radiosity is an iterative solution to the finite element method of solving the ren-
dering equation. It typically only deals with diffuse forms of light. Although
9
10 3 Global Illumination Algorithms
radiosity has been used in other areas, the first computer graphics implementa-
tion can be found in [7]. The overview of the algorithm can be seen below in
algorithm 1.
The algorithm starts by discretizing the scene into patches. This is done to
make each patch similar in size instead of the possibly large triangles a model is
made of. This means an alternative representation of the scene is necessary.
Each patch is then matched with every other patch and a mutual light trans-
port is calculated. If the patches are occluded from each other this transport is 0.
The transport depends on the size of a patch projected on a hemisphere around
the other patch. This means that patches with similar normals will not transmit
much light to each other and similarly for patches that are far apart since their
projection will be rather small.
When the transport of light has been calculated the final image is created
by iterative application of the light between patches. In the first iteration only
patches with direct light are lit up. Each following iteration will describe one
bounce of the light. This iterative process can either continue until convergeance
(maxError) or until a desired result has been achieved (maxIteration).
The VPL algorithm loops for a certain amount of bounces. In the first iteration
only point lights from direct light sources are considered. And in the following
iterations the light sources that are placed by tracing the scene will model indirect
light bounces.
For each point light that is traced in the scene a shadow map is created. Dif-
ferent implementations of this algorithm use different resolution of the shadow
map depending on the number of bounces of the point light. Since the diffuse
light is very low frequency (no details) the more bounces the light has made the
less detail is needed.
For the final rendering the shadow maps for each light are combined to make
an approximation of indirect shadows and lights. If the amount of rendered
lights are too few the result will be a banded image.
In the algorithm above the indirect lights are found using an algorithm simi-
lar to the VPL algorithm in section 3.2.1, but tracing more points. This is possible
since only the location and direction of the point lights are stored and no shadow
maps are generated. The light information is then inserted into the Light Propa-
gation Volume which is a grid representing the whole scene.
12 3 Global Illumination Algorithms
which mutates the paths and saves the ones that contribute the most to the result.
This alteration creates a better result faster, especially in scenes with narrow pas-
sages for the light.
When the paths have been established the result is rendered taking the contri-
bution to each pixel from the paths it is connected to. To get a good result, a lot
of paths are needed which takes a lot of time to calculate.
The algorithm starts by tracing photons from each light source into the scene.
The photons can either be absorbed, reflected or transmitted when hitting a sur-
face. When a photon is absorbed it is saved to the photon map of the scene. This
acts as a density map of photons in the scene, storing positions, directions and
colors of the photons that have been traced.
To render the resulting scene the photon map is sampled for the closest pho-
tons for each pixel. Then an estimation of the incoming light in that point is
created, which is used to color the final pixel value.
voxels from fragments. [17] adds a triangle based algorithm to create a hybrid
solution. A solid voxelization single pass algorithm is shown in [6]. During the
voxelization information about the scene is saved in each voxel. What informa-
tion to save is an implementation detail and several variations have been made,
in [17] three different representations of voxel data are compared.
After the scene has been voxelized the direct light information should be
added to the voxels. There are several variations to this problem as well. In [4] the
light information is injected to the voxels by rasterizing the scene from each light
source and adding the light information from each fragment. In [16] and [17] the
reflected light is calculated at the moment of voxelization using the material data
and a simple shadowmap.
There are some alternatives for storing the voxel data. In [3] a sparse octree
data model is used for the voxel representation. The sparse octree approach re-
moves the memory needed to store empty voxels and only stores actual voxels as
leaves in the tree. The main issue with this approach is that the data structure is
difficult to implement and update efficiently on the GPU.
A full octree is shown in [19], this approach is simple and can be implemented
as a 3D texture, which also allows for simple mipmapping. The drawback of this
approach is that it consumes a lot of memory even for scenes that are mostly
empty. The structure does not offer a simple way of finding non-empty voxels.
However, by constructing an active-voxel list in the voxelization step only active
voxels need to be updated when lights or objects change, as shown in [17].
A more recent approach is to use clipmaps as shown in [16]. This is a mod-
ification of the full octree, instead of storing all data in the detailed levels they
are clipped by distance. The mentioned drawback of this approach is flickering
effects on smaller objects that are far from the camera.
In the final step of the algorithm each fragment is cone traced. Larger cones
are used for diffuse light and smaller ones for shadows and specular light. Cone
tracing steps through the voxel representation of the scene, each step sampling
the scene using quadrilinear interpolation. The cone angle determines how
quickly the step and mip level will increase.
3.4 Conclusion 15
3.4 Conclusion
An overview of the different algorithms presented in this chapter is shown in
table 3.1.
4.1 Overview
The main outline of the implementation is presented below in figure 4.1 below.
Each step of the algorithm will be presented in the following sections.
17
18 4 Implementation of Voxel Cone Tracing
n • ~l, 0)
θ = max(~ (4.1)
Figure 4.2: Shadow map result (left) and scene shaded with shadow map
(right)
should only be performed for pixels that are going to be displayed. The data
that is stored for each fragment is; albedo (diffuse color from texture or set color),
position (in world coordinates), normals, tangents, bitangents (only on desktop),
and depth. On mobile only four textures could be written in addition to the depth
buffer in one pass. This resulted in the bitangent being calculated in the shaders
using the normal and tangent, instead of rendering the scene a second time.
4.4 Voxelization
As the voxelization of the scene is crucial for the tracing part it is important that
a good result is reached in this step. The implemented voxelization algorithm
is from [3], which is a simple method to implement with good results for scenes
with a mix of large and small triangles, seen in [17]. In this approach the render-
ing pipeline is utilized to create the voxels. The steps are described in algorithm 7.
There are some issues with this approach which are discussed in [17].
The reason for choosing 3D textures in this implementation is mainly for sim-
plicity. Clipmaps would have been the preferred alternative but little documen-
tation for 3D clipmaps were found during the information gathering process.
normal and will then project the input triangle along that axis for rasterization.
In the fragment shader the fragment data, color and shadow map result, is used
to create a voxel (shown in 4.4.2. The count part of the voxel is set to eight,
and the shadow map result is multiplied by eight. This is done so that the first
iteration of the mipmapping is not a special case. The voxel is then inserted into
a 3D texture using the fragment coordinates (x,y and z) as texture coordinates.
The order they are used depends on the dominant axis.
An active voxel list is also created which contains the position of all voxels
which are not empty along with the count of active voxels. The positions are
stored in a 32 bit integer as a RB11G10. This allows for at least 1024 integer
positions in each dimension. This list is used both for mipmapping the 3D texture
and also for rendering the voxels. It could also be used to update relevant parts
of the texture in dynamic scenes.
if a value should be overwritten or not. This ordering makes lit fragments more
important that unlit ones.
The reasoning behind the other bits is explained in the next section.
4.5 Mipmapping
The mipmap pipeline starts during the voxelization process by creating the first
level of the active voxel list. The active voxel list consists in part of an indirect
draw command buffer for each mipmap level, which is used for drawing only
the non-empty voxels. The second part is an indirect compute command buffer
for each mipmap level, which can be used for compute shader operations on the
voxels, for example for creating all the mipmap levels in the 3D texture, as seen
in algorithm 8.
The compute shader goes through the active voxel list and uses it to calculate
the values for each voxel on the next level in the list. Since the data is saved in
a 3D texture 8 voxels are used to create the data in the next level. The voxel
data from the current level is combined in the following way. Each voxel will
atomically add data to the next voxel to maximize parallelization. For each voxel
the next level counter will increase by 1, the light counter will increase following
equation 4.5. The color of the next level voxel will increase by 18 of the color
value of the current voxel. This allows the sampling of the voxels to calculate the
average, though some precision is lost.
1 if lightcurrent > countercurrent /2
lightnext = (4.5)
0 otherwise
the sample should occupy r is given by equation 4.6. From that the mip level is
given by equation 4.7. To find the next sample radius equation 4.8 is used, where
the last term is the cone ratio and is constant given the angle θ of the cone to
trace.
θ
r = d · sin (4.6)
2
θ
sin 2
ri+1 = (ri + di ) · θ
(4.8)
1 − sin 2
Two special cases can be noted in these equations, θ = 0, which leads to mip =
log2 (0) and θ = π, which leads to ri+1 = (ri + di ) · 10 . However, these cases are
solved using two different approaches. A θ ≈ 0 would mean that the cone is close
to a ray and should therefore use other methods for tracing. The case of θ = π
4.6 Voxel Cone Tracing 23
means that the cone covers the whole hemisphere, which would not give useful
results. Instead to sample a larger angle, multiple cones with different directions
are used.
In this implementation two different cone traces are used. The one described
above is the general cone trace and is used for the soft shadows and could be
used for specular reflections. To get the shadows for each pixel a cone is traced
towards the light. If a sample point includes filled voxels shadow value will
decrease (zero is full shadow and one full light) depending on how filled the
sampling area is. If the trace reaches the boundary of the scene or the shadow
value has been decreased to zero the trace is stopped. The angle of the cone will
determine how soft the edges of the shadow will be. The total shadow value for
each pixel is calculated using equation 4.9.
The resulting color for each pixel is then calculated using equation 4.10,
where c is the resulting color, either from a fragment or trace. As seen in the
24 4 Implementation of Voxel Cone Tracing
equation, the result is only considering diffuse light, both when it comes to di-
rect and indirect light. Multiplying the resulting diffuse trace with the color of
the fragment is important to get realistic color interactions. A red indirect light
should not color a green surface for example, this is an approximation of different
colors being absorbed by different materials.
4.7 Summary
In table 4.1 below a short summary of the alternatives for the implementation is
presented. The bold alternatives are the ones implemented in this thesis. The
other alternatives presented were considered but disregarded due to the reasons
presented earlier in this chapter. These are some of the available choices, but the
ones seen below represent the most important ones.
5.1 Comparisons
Five different scenes were rendered to compare the performance of the algorithm
and show the increasing detail and realism added by the implemented algorithm.
5.1.1 Hardware
The algorithm was tested on the following hardware to compare the performance
of the implemented algorithm on a mobile, laptop and desktop GPU.
Further specifications of the GPUs are shown below in table 5.1. An asterisk
in the VRAM column signifies shared memory with the Central Processing Unit
(CPU).
25
26 5 Results
time individual steps for each frame by waiting for the command queue to ex-
ecute. CPU timers were used because there are no GPU timers available for
OpenGL ES 3.1, and using them on the other platforms could create skewed re-
sults. Each scene was rendered for a number of frames before a frame average
over 5 frames was recorded. This was then repeated by restarting the program to
get a fair average between multiple runs.
The Cornell box scene was run in multiple lighting conditions as follows.
• Scene 1: No GI
• Scene 2: No GI and shadowmapping
• Scene 3: AO and shadowmapping
• Scene 4: Diffuse indirect light, AO and shadowmapping
Each item in the list increases the performance and visual quality of the scene.
• In the first example the scene is shaded with a simple phong shader with
both diffuse and specular reflections of the global light.
• In the second example a shadowmap is added to increase the realism.
Platform CS RT V M Tr Tot
Mobile 1.93 2.53 300.98 17.48 533.18 856.10
Laptop 0.22 0.89 2.87 0.96 22.23 27.16
Desktop 0.17 0.35 1.32 0.47 7.97 10.27
Table 5.3: Average time (ms) per step for scene 5
5.2 Analysis
The results from the mobile and desktop show the same final rendering of the
scene. The performance difference is clear between the different platforms, and
though most of the results do not show a real time solution on mobile, the AO
does run on real time frame rates.
• Use low resolution light rendering and extrapolate to high resolution model
rendering.
Conclusions
6
The results from the previous chapter show that the implemented algorithm does
not reach real time frame rates on mobile. However, the scalability of the algo-
rithm result in real time AO. Even though most of the code is usable on both
mobile and desktop, there are some differences worth noting.
6.1 Experiments
The results of the experiments, as shown in the previous chapter, indicate that
there is still more development required before real time GI is realistic on mobile
hardware. The only applicable real-time use of the implemented algorithm was
the AO. It might be possible with extensive optimization to realize diffuse indi-
rect light as well. The tracing of shadows, and therefore also specular indirect
light, is a long way off. Using VCT on a mobile device for something like AO
might increase the visual quality compared to screen space AO, but it has a high
cost in memory and does not work for dynamic objects.
The voxelization times on the mobile platform are high when compared to the
other platforms. One reason for this might be that the only measurement on the
mobile platform was during the initial construction since the 3D texture could
not be cleared. It is therefore difficult to say how the voxelization process would
perform when done continuously. Unfortunately there is no clearTexture in
the next version of OpenGL ES (3.2) either. So a continuous voxelization has to
be implemented with Vulkan if at all possible. However, it is possible to voxelize
the scene each frame using a laptop or desktop. This also means that the voxel
structure is completely static on the mobile platform.
In [24] it is suggested that mipmapping is a bottleneck, but in this thesis it is
demonstrated that this is not the case. The active voxel list reduces the mipmap-
ping to only those sections of the 3D texture which have voxels, which causes the
39
40 6 Conclusions
mipmapping to require less time than the actual voxelization. Another possibil-
ity which was not explored in this thesis is the possibilities of using the active
voxel list to update light changes.
The choice of isotropic voxels were part because of the performance and mem-
ory aspects but also to utilize the glGenerateMipmap command, which would
have been used instead of developing a mipmap solution. Unfortunately that
command did not work while working on this implementation and mipmapping
was done using the active voxel list instead.
6.1.1 Method
The different scenes used in the previous chapter were selected to highlight some
important differences in visual quality and performance. The reason no specular
bounce was demonstrated is because the same code was used to trace shadows,
which gave a more distinct visual difference. Since it performs a similar function
the performance difference would be insignificant.
Unfortunately there was no simple way of clearing a 3D texture in OpenGL ES
3.1, which meant that the voxelization could not be dynamic without reallocating
a new texture each frame. This resulted in the timings for creating the voxel
representation being based on fewer measurements and more importantly the
measurements were taken on the first run of the function. This might skew the
result towards a slower time than expected because of extra initialization costs.
6.1.2 Improvements
A problem with the voxel representation is that objects are empty inside. This
causes several minor problems. For example when tracing two nearby objects,
the initial sampling offset might cause the first sample to be taken from inside
the hollow object. This causes the ambient occlusion tracing to fail, and the error
is easily seen. For example in figure 6.1, the shadow looks nice a bit away from
where the object touches the ground but closer to the contact point the shadow
disappears.
The empty voxel objects also cause problems with the mipmapping. Deciding
if a certain voxel in the next resolution should be empty or filled must depend on
the existence of a single voxel, rather than deciding depending on the number of
available voxels. This might cause objects to grow too much in higher mip-levels.
With filled objects, small objects would instead be removed in higher mip-levels.
6.2.1 Possibility
6.2 Problem Statement 41
Figure 6.1: Ambient occlusion error visible below the yellow ball
There have been previous results, like in [1], that show that simple GI algorithms
are possible. But the results in this thesis indicate that the performance of mod-
ern mobile hardware is still not sufficient. A state of the art algorithm is still too
much to handle without extensive optimizations. And with the current result in
mind, the resulting visual quality might no longer be of interest.
In short advanced GI using VCT is still far from mobile graphics.
6.2.2 Scaling
Is there a method for GI that scales well enough to be used on limited hardware
such as a mobile device?
When talking about scaling there are two major aspects to consider; memory
scaling and performance scaling.
VCT requires a lot of memory for the voxel representation of the scene. For
example the final implementation used in this thesis used a 3D-texture with a
resolution of 2563 . Each voxel is represented by 4 bytes. This means 2563 ×
4 ≈ 67 MB is needed. Increasing the resolution to 512 in each dimension yields,
5123 × 4 ≈ 536 MB. For good quality in larger scenes a high resolution is needed
to sufficiently represent small objects. In this thesis each voxel described a part of
the scene using only 4 bytes, describing color and light. Using more data would
make it possible to also include addition information, such as a direction or even
directed colors which should increase the visual quality and resolve some visual
errors. An example of storing spherical harmonics for each voxel can be seen
42 6 Conclusions
in [17]. This means that the baseline for visual quality and memory is quite high,
especially for larger scenes. Improvements such as clipmaps instead of mipmaps
help to reduce the memory footprint, but it consumes a lot of memory compared
to methods without alternative representations of the scene.
VCT is a tracing algorithm with little dependency between traces. But in the
case of cone tracing, each trace consist of many cones, looping over steps. This
can result in imbalanced workloads between threads. If the threads are grouped,
like in the case of most GPUs, this leads to idle threads. Therefore, just scaling
the performance is not as effective.
To scale the performance of VCT there are other options. Resolution is an
important parameter that influences performance, since each pixel will result in
a new trace, likely with a similar result. Two options to reduce this dependency
are:
6.2.3 Limits
What are the limiting factors of the mobile device? And are there any potential
benefits on using mobile devices for Global Illumination?
The major limiting factor of the mobile device is the lack of performance of the
GPU, which in turn is limited by power and size. A comparison of desktop and
mobile GPUs in [13], also shows that there are different considerations that need
to be made when implementing and optimizing algorithms. When it comes to the
particular algorithm implemented in this thesis, there are some potential bene-
fits using mobile hardware for advanced computer graphics (that could possibly
be shown with more comprehensive experiments). Since the algorithm is com-
pletely implemented on the GPU, moving data between CPU and GPU is not a
6.3 Mobile and desktop development 43
factor when it comes to performance, which in other cases can make a big differ-
ence. As explained in the previous section the workload for threads in the same
workgroup might be imbalanced, which can cause a problem on a GPU architec-
ture that clusters multiple threads together (like most desktop GPUs). However,
the mobile GPU used in this thesis has 12 independent cores meaning that this
should not be a problem.
6.3.2 Android
Developing for a mobile platform, the availability of third-party tools and code
is less frequent compared to desktop development. Especially when it comes
to native development of GPU programming on mobile. Development for mo-
bile devices is very specific for model and brand of the device which determines
OpenGL implementations and available features. Compared to bigger program-
ming languages, Android NDK is not very commonly used which affects the pos-
sibility to find answers to problems on Google and Stack Overflow. Just com-
paring the android tag to android-ndk tag on Stack Overflow speaks of the
difference, over 900000 for android and just over 9700 for the android-ndk
(October 2016). OpenGL is a bit closer with 27000 for opengl and 12000 for
opengl-es. The NDK does allow more low level control which is necessary for
certain optimizations, but it is also more complicated and lacks many of the in-
cluded help functions available when using higher level Java code.
44 6 Conclusions
Utilizing the very latest features may also cause some problems. The mobile
device used was able to run OpenGL ES 3.2, which has some useful features.
However, there were no libraries available which made it possible to compile
code for the newer API, since it was not part of the available version of Android.
Also, no third-party solutions could be found.
6.3.3 Hardware
When developing for mobile it is not straight-forward to receive output and per-
formance data. Especially not directly on the device. Even though current high-
end mobile devices are powerful they are still not able to stand completely on
their own. A laptop or desktop is still needed for debugging, tracing and out-
put. Even though the screen has a high resolution there is no multi-tasking and
because of the small physical size, text and input to be displayed is limited.
Another major difference is the lack of input methods for mobile devices. Ev-
erything has to made for touch instead of mouse and keyboard. This makes it
much more difficult to modify variables live and more difficult to navigate a 3D
environment.
Development on specific hardware can be both helpful and problematic. Per-
formance evaluation tools are usually available for most platforms, and Android
is no exception. However, the Android Studio performance tool required root
access which was not available at the time of writing, and seemed limited when
it came to OpenGL. Since the application used native C++ code and OpenGL,
the number of tools available decreased. Many of the mobile GPU manufactur-
ers have their own tools, which in the Mali case were very useful. However, the
profiling tool to measure performance of the application was not free and not
directly available in a free version.
Bibliography
[1] Minsu Ahn, Inwoo Ha, Hyong-Euk Lee, and James D. K. Kim. Real-time
global illumination on mobile device. In Mobile Devices and Multimedia:
Enabling Technologies, Algorithms, and Applications, volume 9030, pages
903005–903005–5, 2014. Cited on pages 15 and 41.
[3] Cyril Crassin and Simon Green. Octree-based sparse voxelization using the
gpu hardware rasterizer. OpenGL Insights, pages 303–318, 2012. Cited on
pages 13, 14, and 19.
[4] Cyril Crassin, Fabrice Neyret, Miguel Sainz, Simon Green, and Elmar Eise-
mann. Interactive indirect illumination using voxel cone tracing. In Com-
puter Graphics Forum, volume 30, pages 1921–1930. Wiley Online Library,
2011. Cited on page 14.
[5] Cyril Crassin, David Luebke, Michael Mara, Morgan McGuire, Brent Oster,
Peter Shirley, and Peter-Pike Sloan1 Chris Wyman. Cloudlight: A system for
amortizing indirect lighting in real-time rendering. Journal of Computer
Graphics Techniques Vol, 4(4), 2015. Cited on page 2.
[6] Elmar Eisemann and Xavier Décoret. Single-pass gpu solid voxelization for
real-time applications. In Proceedings of Graphics Interface 2008, GI ’08,
pages 73–80, Toronto, Ont., Canada, Canada, 2008. Canadian Information
Processing Society. ISBN 978-1-56881-423-0. URL http://dl.acm.org/
citation.cfm?id=1375714.1375728. Cited on page 14.
[7] Cindy M Goral, Kenneth E Torrance, Donald P Greenberg, and Bennett Bat-
taile. Modeling the interaction of light between diffuse surfaces. In ACM
SIGGRAPH Computer Graphics, volume 18, pages 213–222. ACM, 1984.
Cited on page 10.
45
46 Bibliography
[8] Henrik Wann Jensen. Realistic image synthesis using photon mapping, vol-
ume 364. Ak Peters Natick, 2001. Cited on page 13.
[12] Eric P Lafortune and Yves D Willems. Bi-directional path tracing. In Pro-
ceedings of Third International Conference on Computational Graphics and
Visualization Techniques (Compugraphics ’93), pages 145–153, 1993. Cited
on page 12.
[13] Arian Maghazeh, Unmesh D Bordoloi, Petru Eles, and Zebo Peng. General
purpose computing on low-power embedded gpus: Has it come of age? In
Embedded Computer Systems: Architectures, Modeling, and Simulation
(SAMOS XIII), 2013 International Conference on, pages 1–10. IEEE, 2013.
Cited on page 42.
[14] James McLaren. Cascaded voxel cone tracing, 2014. CEDEC Presentation.
Cited on page 42.
[17] Randall Rauwendaal. Voxel based indirect illumination using spherical har-
monics. PhD thesis, Oregon State University, 2013. Cited on pages 14, 19,
20, and 42.
[18] Tobias Ritschel, Carsten Dachsbacher, Thorsten Grosch, and Jan Kautz. The
state of the art in interactive global illumination. Comput. Graph. Forum,
31(1):160–188, February 2012. ISSN 0167-7055. Cited on pages 2 and 9.
[19] Andrei Simion, Victor Asavei, Sorin Andrei Pistirica, and Ovidiu Poncea.
Practical GPU and voxel-based indirect illumination for real time computer
games. In 20th International Conference on Control Systems and Computer
Science (CSCS), pages 379–384. IEEE, 2015. Cited on page 14.
Bibliography 47
[20] Sinje Thiedemann, Niklas Henrich, Thorsten Grosch, and Stefan Müller.
Voxel-based global illumination. In Symposium on Interactive 3D Graphics
and Games, pages 103–110. ACM, 2011. Cited on page 13.
[21] Eric Veach and Leonidas J Guibas. Metropolis light transport. In Proceed-
ings of the 24th annual conference on Computer graphics and interactive
techniques, pages 65–76. ACM Press/Addison-Wesley Publishing Co., 1997.
Cited on page 12.
[25] Gerald A Winer, Jane E Cottrell, Virginia Gregg, Jody S Fournier, and Lori A
Bica. Fundamentally misunderstanding visual perception: Adults’ belief in
visual emissions. American Psychologist, 57:417, 2002. Cited on page 5.