WARNING
Content Under Development
See
release page for
latest official PDF version.
**Chapter 2: One Dimensional MC Integration**
Integration is all about computing areas and volumes, so we could have framed Chapter 1 in an
integral form if we wanted to make it maximally confusing. But sometimes integration is the most
natural and clean way to formulate things. Rendering is often such a problem. Let’s look at a
classic integral:
$$ I = \int_{0}^{2} x^2 dx $$
In computer sciency notation, we might write this as:
$$ I = area( x^2, 0, 2 ) $$
We could also write it as:
$$ I = 2 \cdot average(x^2, 0, 2) $$
This suggests a MC approach:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++
int main() {
int inside_circle = 0;
int inside_circle_stratified = 0;
int N = 1000000;
float sum;
for (int i = 0; i < N; i++) {
float x = 2*drand48();
sum += x*x;
}
std::cout << "I =" << 2*sum/N << "\n";
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This, as expected, produces approximately the exact answer we get with algebra, $I = 8/3$. But we
could also do it for functions that we can’t analytically integrate like $log(sin(x))$. In graphics,
we often have functions we can evaluate but can’t write down explicitly, or functions we can only
probabilistically evaluate. That is in fact what the ray tracing `color()` function of the last two
books is -- we don’t know what color is seen in every direction, but we can statistically estimate
it in any given dimension.
One problem with the random program we wrote in the first two books is small light sources create
too much noise because our uniform sampling doesn’t sample the light often enough. We could lessen
that problem if we sent more random samples toward the light, but then we would need to downweight
those samples to adjust for the over-sampling. How we do that adjustment? To do that we will need
the concept of a _probability density function_.
First, what is a _density function_? It’s just a continuous form of a histogram. Here’s an example
from the histogram Wikipedia page:
![Figure 2-1][fig02-1]
If we added data for more trees, the histogram would get taller. If we divided the data into more
bins, it would get shorter. A discrete density function differs from a histogram in that it
normalizes the frequency y-axis to a fraction or percentage (just a fraction times 100). A
continuous histogram, where we take the number of bins to infinity, can’t be a fraction because the
height of all the bins would drop to zero. A density function is one where we take the bins and
adjust them so they don’t get shorter as we add more bins. For the case of the tree histogram above
we might try:
$$ \text{Bin-height} = \frac{(\text{Fraction of trees between height }H\text{ and }H’)}{(H-H’)} $$
That would work! We could interpret that as a statistical predictor of a tree’s height:
$$ \text{Probability a random tree is between } H \text{ and } H’ = \text{Bin-height}\cdot(H-H’)$$
If we wanted to know about the chances of being in a span of multiple bins, we would sum.
A _probability density function_, henceforth pdf, is that fractional histogram made continuous.
Let’s make a _pdf_ and use it a bit to understand it more. Suppose I want a random number $r$
between 0 and 2 whose probability is proportional to itself: $r$. We would expect the pdf $p(r)$
to look something like the figure below. But how high should it be?
![Figure 2-2][fig02-2]
The height is just $p(2)$. What should that be? We could reasonably make it anything by
convention, and we should pick something that is convenient. Just as with histograms we can sum up
(integrate) the region to figure out the probability that $r$ is in some interval $(x0,x1)$:
$$ \text{Probability } x0 < r < x1 = C \cdot area(p(r), x0, x1) $$
where $C$ is a scaling constant. We may as well make $C = 1$ for cleanliness, and that is exactly
what is done in probability. And we know the probability $r$ has the value 1 somewhere, so for
this case
$$ area(p(r), 0, 2) = 1 $$
Since $p(r)$ is proportional to $r$ , _i.e._, $p = C’ \cdot r$ for some other constant $C’$
$$
area(C’r, 0, 2) = \int_{0}^{2} C’ r dr
= \frac{C’r^2}{2}(2) - \frac{C’r^2}{2}(0)
= 2C’
$$
So $p(r) = r/2$.
How do we generate a random number with that pdf $p(r)$? For that we will need some more machinery.
Don’t worry this doesn’t go on forever!
Given a random number from `d = drand48()` that is uniform and between 0 and 1, we should be able
to find some function $f(d)$ that gives us what we want. Suppose $e = f(d) = d^2$. That is no
longer a uniform _pdf_ . The _pdf_ of $e$ will be bigger near 0 than it is near 1 (squaring a
number between 0 and 1 makes it smaller). To take this general observation to a function, we need
the cumulative probability distribution function $P(x)$:
$$ P(x) = area(p, -\infty, x) $$
Note that for x where we didn’t define $p(x)$, $p(x) = 0$, _i.e._, the probability of an $x$ there
is zero. For our example _pdf_ $p(r) = r/2$ , the $P(x)$ is:
$$ P(x) = 0 : x < 0 $$
$$ P(x) = \frac{x^2}{4} : 0 < x < 2 $$
$$ P(x) = 1 : x > 2 $$
One question is, what’s up with $x$ versus $r$? They are dummy variables -- analogous to the
function arguments in a program. If we evaluate $P$ at $x = 0.5$ , we get:
$$ P(1.0) = \frac{1}{4} $$
This says _the probability that a random variable with our pdf is less than one is 25%_ . This
gives rise to a clever observation that underlies many methods to generate non-uniform random
numbers. We want a function `f()` that when we call it as `f(drand48())` we get a return value with
a pdf $\frac{x^2}{4}$ . We don’t know what that is, but we do know that 25% of what it returns
should be less than 1, and 75% should be above one. If $f()$ is increasing, then we would expect
$f(0.25) = 1.0$. This can be generalized to figure out $f()$ for every possible input:
$$ f(P(x)) = x $$
That means $f$ just undoes whatever $P$ does. So,
$$ f(x) = P^-1 (x) $$
The -1 means “inverse function”. Ugly notation, but standard. For our purposes what this means is,
if we have pdf $p()$ and its cumulative distribution function $P()$ , then if we do this to a random
number we’ll get what we want:
$$ e = P^-1 (drand48()) $$
For our _pdf_ $p(x) = x/2$ , and corresponding $P(x)$ , we need to compute the inverse of $P$. If
we have
$$ y = \frac{x^2}{4} $$
we get the inverse by solving for $x$ in terms of $y$:
$$ x = \sqrt{4y} $$
Thus our random number with density $p$ we get by:
$$ e = \sqrt{4*drand48()} $$
Note that does range 0 to 2 as hoped, and if we send in $1/4$ for `drand48()` we get 1 as desired.
We can now sample our old integral
$$ I = \int_{0}^{2} x^2 $$
We need to account for the non-uniformity of the _pdf_ of $x$. Where we sample too much we should
down-weight. The _pdf_ is a perfect measure of how much or little sampling is being done. So the
weighting function should be proportional to $1/pdf$ . In fact it is exactly $1/pdf$ :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++
#include