Cosmology Lecture Notes
Cosmology Lecture Notes
Computational Cosmology
Moritz Münchmeyer
I Basics of Cosmology 2
6 Inflation 26
6.1 The flatness problem 27
6.2 The horizon problem 27
6.3 Inflationary expansion 30
6.4 The field theory of inflation 32
6.5 The quantum field theory of inflation 34
6.6 Primordial perturbations from inflation 35
1
II Introduction to Computation and Statistics in Cosmology 40
10 Basics of Statistics 60
10.1 Estimators 60
10.2 Likelihoods, Posteriors, Bayes Theorem 61
10.3 Gaussian Likelihoods 62
10.4 Using the likelihood and the posterior 64
10.5 Fisher forecasting 65
10.6 Sampling the posterior: MCMC 68
10.7 Other algorithms beyond MCMC 70
10.8 Goodness of fit 71
10.9 Model comparison 72
2
14.3 Mask and mode coupling 84
14.4 Pseudo-Cl estimator and PyMaster 86
14.5 Wiener filtering 87
14.6 Likelihood of the CMB 88
14.7 Tools to sample the CMB likelihood 88
16 Primordial non-Gaussianity 91
16.1 Primordial bispectra 91
16.2 CMB bispectrum 92
16.3 Optimal estimator for bispectra 93
16.4 The separability trick 94
3
21 Overview of LSS Perturbation Theory 110
21.1 Fluid approximation 110
21.2 Standard (Eulerian) Perturbation Theory 111
21.3 Lagrangian Perturbation theory (LPT) 115
26 Non-Gaussianity 141
26.1 Tightening measurements of cosmological parameters 141
26.2 Primordial non-Gaussianity 142
1
Part I
Basics of Cosmology
In the first part of this course we will briefly review basics of cosmology. My main goal is to
introduce the coordinates as well as the physical parameters that we want to measure in data
later on.
Further reading
There are many excellent textbooks that go deeper into these foundations. This section is based
primarily on the text books
• Daniel Baumann - Cosmology (2021) as well as his TASI lecture notes (arxiv: 0907.5424).
Material similar to the textbook is also available here: http://cosmology.amsterdam/
education/cosmology/.
• http://www.damtp.cam.ac.uk/user/tong/teaching.html
I recommend all of David’s lecture notes. Another popular textbook which we will use is
1 pc = 3.26 ly (2.1)
Parsecs are the typical distance of nearby stars. A parsec is equal to the distance at which 1
AU (astronomical unit – average distance between Earth and the Sun) is seen at an angle of
1
one arc second, which is 3600 of a degree. The size of our galaxy is more conveniently given in
kilo parsec (kpc). The Milkyway is about 30 kpc in diameter, and we are about 8kpc from the
center. The distance to other galaxies is usually given in mega parsec (Mpc). The nearest spiral
galaxy, Andromeda, is about 1Mpc away from us. The comoving distance (we’ll explain the term
“comoving” soon) to the edge of the observable universe is about 14.3 Gpc.
An order of magnitude estimate is that the observable universe contains about 100 billion
galaxies and a typical galaxy contains about 100 billion stars. There is no reason to believe that
our galaxy or star are particularly special in the cosmological sense.
2
3 Expansion of the Universe
In this section we want to understand the equations that govern the evolution of the entire
universe. Cosmology can be understood in two steps:
• On large scales (i.e. after smoothing out small scale irregularities such as galaxies), the
universe is uniform. By studying its average contents we can understand the back-
ground expansion of the universe. The Cosmological Principle states: On the
largest scales, the universe is spatially homogeneous and isotropic.
• On smaller scales, there are initially small and later very large inhomogeneities (such as
galaxies). The evolution of these cosmological perturbations on top of the background
expansion is much more complicated, but tells us much of what we know about the universe.
Following the standard practice of cosmology courses, we will first discuss the uniform large-scale
universe and then later discuss perturbations.
To start describing the universe mathematically we first need to define coordinates. A cru-
cial feature of cosmology is that space-time cannot be treated statically because the universe is
expanding very substantially during its history.
A large part of theoretical and computational cosmology can be done by assuming that the uni-
verse is flat. To date there is no experimental evidence for any curvature of space on large scales.
We thus focus on flat expanding space-time and are brief on the generalization to curvature.
3
X
dl2 = dx2 + dy 2 + dz 2 = δij dxi dxj , (3.1)
i,j=1
where the Kronecker delta δij = diag(1, 1, 1) is the metric. If we were to use spherical coordi-
nates instead we would get
3
X
2 2 2 2 2 2 2
dl = dr + r dθ + r sin θdϕ = gij dxi dxj , (3.2)
i,j=1
where (x1 , x2 , x3 ) = (r, θ, ϕ) and the metric is gij = diag(1, r2 , r2 sin2 θ).
Since Einstein we know that physics is really happening in space-time and that distances in
time and space are not independently invariant. We instead need a metric that turns space-time
coordinates xµ = (ct, xi ) into the invariant space-time distance (also called invariant line
element)
3
3
X
ds2 = gµν dxµ dxν ≡ gµν dxµ dxν . (3.3)
µ,ν=0
In the specific case of special relativity where space-time is not curved (Minkowski space),
using Euclidean coordinates, this line element is
3
X
ds2 = −c2 dt2 + δij dxi dxj (3.4)
i,j=1
= −c dt + dx2
2 2
(3.5)
and the Minkowski metric is gµν = diag(−1, 1, 1, 1). Recall from special relativity that ds
can be positive, negative or null.
In general relativity, the metric depends on the position in space-time, gµν (t, x). The metric is
of course coordinate dependent. To give a physical description of curvature that is independent
of the choice of coordinates one needs to use the formalism of Riemann geometry. In this course
we won’t need much of that.
The spatial coordinates r are called the comoving coordinates. The comoving coordinate of
an object does not change due to the expansion of space time. The comoving coordinate system
expands with spacetime, as illustrated in Fig. 1. In computational cosmology we usually work
with comoving coordinates (e.g. comoving galaxy positions in an N-body simulation). The scale
factor is usually defined to be equal to 1 today, a(ttoday ) = 1. To define the coordinates we also
need to set some origin O where r = 0 and t = 0.
We also define the physical coordinates rphys (t) = a(t)r(t). If an object has a trajectory
r(t) in comoving coordinates and rphys = a(t)r in physical coordinates, the physical velocity of
the object is
drphys da dr
vphys ≡ = r + a(t) ≡ H(t)rphys + vpec , (3.7)
dt dt dt
where we have introduced the Hubble parameter
ȧ
H≡ (3.8)
a
and the peculiar velocity
vpec ≡ a(t)ṙ (3.9)
4
Figure 1. Comoving coordinate grid on an expanding spacetime.
The first term Hrphys is the Hubble flow, which is the physical velocity of the object due to
the expansion of space between the origin and the object. This expression is a version of Hubble’s
law (though not the original one where H is time-independent). The second term, the peculiar
velocity, describes the motion of the object relative to the cosmological rest frame. Typical
peculiar velocities of galaxies are hundreds of km/s, so β = vc ≃ 10−3 . The present day value of
the Hubble parameter is1
H0 ≃ 67.8 km s−1 Mpc−1 (3.10)
This is telling us that a galaxy 1 Mpc away will be seen to be retreating at a speed of 67.8 km/s
due to the expansion of space. Galaxies that are farther away than a few Mpc thus have a
larger recession speed due to the Hubble flow than due to their peculiar velocities. A common
definition of the Hubble parameter is given by introducing h so that H0 = 100 h km s−1 Mpc−1
with h ≈ 0.678.
It is often useful to write the metric in polar coordinates:
where
is the metric on the unit two-sphere. This metric is useful to describe observations by an observer
at the coordinate center of the universe. The radial coordinate r is called the comoving distance
to the origin.
A further way to write the metric is by introducing conformal time
dt
dη = (3.13)
a(t)
1
You may notice that I am primarily a CMB cosmologist (see the “Hubble tension”).
5
Conformal time slows down with the expansion of the universe. The metric is then
The scale-factor is now a time-dependent overall factor infront of a static metric. Conformal
coordinates are especially useful to analyze light rays and causality.
Note that for flat space-time k = 0 there is no difference between comoving distance r and radial
coordinate χ. Finally, using again conformal time, this metric can be written as
6
3.4 Redshift
The expansion of the universe means that light rays which travel through the universe also
get stretched. This is called cosmological redshift. We define the dimensionless redshift
parameter
λ0 − λ1 f1 − f0
z= = (3.20)
λ1 f0
where λ0 is the observed wave length and λ1 is the emitted wavelength. It turns out that the
ratio of wavelength scales as the ratio of the scale factor:
λ0 a(t0 )
= (3.21)
λ1 a(t1 )
This result is intuitive: the photon wave is stretched with the expansion of space. Let’s derive
this result starting from the metric. In general relativity, light rays travel along null geodesics,
meaning that ds = 0. A light ray on a radial direction (with fixed θ and ϕ) will thus obey
c dt = ±a(t)dχ (3.22)
where the minus sign describes light moving towards us (i.e. as t gets larger χ gets smaller).
Thus we have
c dt
= ±dχ (3.23)
a(t)
Let’s consider a crest of the light wave to be emitted at time t1 from distance χ1 . We observe
the crest at time t0 at position χ0 = 0. Thus we get the integral equation
Z t0 Z χ1
c dt
= dχ first crest (3.24)
t1 a(t) 0
The next crest of the wave is emitted at time t1 + δt1 and received at time t0 + δt0 . Thus the
integral equation is
Z t0 +δt0 Z χ1
c dt
= dχ second crest (3.25)
t1 +δt1 a(t) 0
The right hand side of these two relations are the same and thus we have
Z t0 Z t0 +δt0
dt dt
= (3.26)
t1 a(t) t1 +δt1 a(t)
from which it follows that
t1 +δt1 t0 +δt0
dt dt
Z Z
= . (3.27)
t1 a(t) t0 a(t)
Because a(t) does not change significantly in a single tick δt1 or δt0 it follows that
δt1 δt0
= (3.28)
a(t1 ) a(t0 )
7
The time difference between two wave crests is δt = λc , with c the same at emission and reception,
and thus we confirm that
λ0 a(t0 )
= . (3.29)
λ1 a(t1 )
The same result can be derived more formally by considering massless particles in General
Relativity (see Baumann’s book Sec 2.2). With the scale factor normalized to a(t0 ) = 1 we find
from Eq. (3.20) that
1
1+z = (3.30)
a(t1 )
Redshifts of galaxies are roughly in the range 0 < z < 10 (at higher redshifts no galaxies have had
time to form since the big bang) and the redshift of the CMB is about z = 1100. For example, a
galaxy at red shift 2 is observed when the universe was 1/3 of its current size.
v = zc = H0 d. (3.33)
which is the famous linear relation between distance and recession velocity found by Hubble.
This relation is valid for z ≪ 1. In the next section we describe the Hubble law valid at any
distance.
8
frame of reference in which the events are simultaneous. Here our frame of reference is the one
provided by the global FLRW metric.
The proper distance dp between the origin and a point at coordinate (r, θ, ϕ) at fixed time t is
Z Z
dp = ds = a(t)dχ = a(t)χ (3.34)
The proper distance, and its time derivative appear in the Hubble law:
˙ = ȧχ = ȧ aχ = H(t)dp
dp (3.35)
a
The Hubble law in flat space-time discussed above in Eq. 3.7 agrees with this expression.
The Hubble distance (or Hubble radius) is defined as the distance where the recession
velocity of an object without peculiar velocity becomes equal to the speed of light. This is the
case when
d˙p = H0 dp = c (3.36)
and thus
c
dH (t0 ) = (3.37)
H0
For H0 = 67.8 km s−1 Mpc−1 this gives dH = 4420Mpc. This is the distance of galaxies that are
currently receding at the speed of light (not at the time when their light was emitted).
Note that the Hubble constant has inverse units of time. Another common definition is thus
the Hubble time
1 a
tH = = = 4.45 × 1017 s = 14.4Gyr (3.38)
H0 ȧ
This turns out to be pretty close to the age of the universe, which is somewhat accidental. The
Hubble time is the age the universe would have, if the expansion had been linear, which is not
the case as we shall soon see.
Measuring the Hubble parameter directly is difficult. One needs to measure both the distance
of objects as well as their recession speed, which are not directly observable. A primary tool to
do this are SN1a supernovae, but we won’t be covering this method this semester. However, the
Hubble parameter can also be measured somewhat more indirectly with the CMB and with LSS.
9
4.1 Cosmological fluids and equation of state
According to the cosmological principle, we want to consider homogeneous and isotropic contents
of the universe, which are called cosmological fluids and specified by:
• energy density ρ(t). This has units energy per volume, and thus E 4 in natural units.
• pressure P (t). This is the flux of momentum across a surface of unit area (which is
equivalent to force per area if there were a wall). The units are also E 4 in natural units.
Note that positive pressure leads to gravitational attraction (i.e. wants to contract), not
expand as would be the case for a balloon. This is because the kinetic energy of the particles
contributes to the positive energy density which attracts gravitationally.
The relation between the energy and pressure P = P (ρ) is called the equation of state of the
fluid. Note that in GR, both energy density and pressure gravitates, i.e. they are part of the
energy momentum tensor. The equation of state is calculated in (relativistic) thermodynamics.
The two main forms of cosmological fluids in the universe are non-relativistic particles, also
called dust or simply matter and relativistic particles such as photons, also called radiation.
The cosmological fluid is made up of particles which obey the relativistic relation
E 2 = p2 c2 + m2 c4 (4.1)
The two fluids come from considering this equation in its two limits:
10
The factor 1/3 comes from 3-dimensional space.
Finally we need the equation of state parameter of dark energy, which is
P
w= = −1 dark energy (4.4)
ρ
We’ll talk more about this exotic substance later. The key point is that this substance has a
negative pressure that leads to gravitational repulsion.
11
This is the expression of energy conservation in a cosmological setting. Note however that energy
is a subtle concept in cosmology, due to the broken time translation invariance.
Using the equation of state P = wρ and assuming a single substance with given w we get
ρ̇ ȧ
= −3(1 + w) = −3(1 + w)H (4.9)
ρ a
To find the relation between ρ and a we can integrate this equation to get:
ρ a
log = −3(1 + w) log (4.10)
ρ0 a0
and thus
ρ(a) = ρ0 a−3(1+w) (4.11)
where we’ve used the fact that a(t0 ) = 1 and where ρ0 is the density today (i.e. at a = 1).
Using their equation of state we find the following scalings for our three substances:
• Matter (w = 0):
1
ρm ∝ (4.12)
a3
This is just the dilution with the volume that grows as V ∝ a3 .
• Radiation (w = 1/3):
1
ρr ∝ (4.13)
a4
Radiation is not only diluted with the volume but in addition there is a linear redshift effect
on the wave length and thus on the energy E = hc λ.
ρΛ = const. (4.14)
Dark energy has a constant energy density. It does not dilute with the expansion of space. A
universe where ρΛ ̸= 0 will always ultimately be dominated by dark energy. There are also
more complicated dark energy models where dark energy is not the cosmological constant.
In these, the equation of state can deviate from w = −1 and can also be time dependent.
So far there is no evidence for such models.
The different substances dilute differently with an expanding universe, and thus their mutual
importance changes. This is a crucial result in cosmology. In addition, note that total energy is
not conserved. This is due to the broken time translation invariance in an expanding universe.
12
4.4 Friedmann equation
The continuity equation tells us ρ(a) but it is not enough to determine a(t) or ρ(t) for a given
collection of homogeneous fluids. For this we need the famous Friedmann Equation. The
dynamics of the scale factor is dictated by the energy density ρ(t) through the Friedmann equation
2
2 ȧ 8πG kc2
H ≡ = ρ − (4.15)
a 3c2 R02 a2
where R0 is the curvature scale, and, as in the FLRW metric, k is either -1, 0, or +1 determining
the curvature of space, and G is Newton’s gravitational constant given by
The Friedmann Equation, continuity equation, and equation of state together form a closed set
of equations that determines the background evolution of the universe.
By taking the time derivative of the Friedmann equation and using the continuity equation one
can derive a further useful equation which is called the acceleration equation or the second
Friedmann Eq. or the Raychaudhuri equation. It gives the acceleration rate of the scale
factor as
ä 4πG
= − 2 (ρ + 3P ) (4.17)
a 3c
4.5 Solutions to the Friedmann equation for a single fluid in flat space
The Friedmann equation is easy to solve if we consider a flat universe k = 0 with only a single
type of fluid. From the continuity Eq. we had
ȧ D
= 3/2(1+w) (4.20)
a a
and then integrate this equation:
Z a Z t
′ ′ 21 (1+3w)
da a =D dt′ (4.21)
0 0
where we picked time t = 0 to be the time of the big bang where a(t = 0) = 0. This leads to
1 3
3 a 2 (1+w) = D t (4.22)
2 (1 + w)
13
The common convention is that today at time t0 the scale factor is a(t0 ) = 1. This time is given
by
−1
3
t0 = D (1 + w) (4.23)
2
Plugging this definition of t0 in our solution we can write it as
2/(3+3w)
t
a(t) = (4.24)
t0
Let’s now consider the three types of fluids:
• Matter: For w = 0 we get
t 2/3
a(t) = (4.25)
t0
This is known as the Einstein-de Sitter universe. It can be used to approximate our current
universe if we neglect dark energy. In this universe the Hubble constant today is
ȧ 21
H0 = = (4.26)
a a=1 3 t0
With H0 ≃ 70 km s−1 Mpc−1 this gives an age of the universe of
t0 ≃ 1010 yrs (4.27)
It turns out that there are stars that are older than that, which shows that our universe
does not contain only matter.
14
4.6 Critical density
From the Friedmann equation Eq.(4.15), there is a certain density ρ for which the universe would
be flat, i.e. k = 0:
H 2 3c2
ρcrit = (4.34)
8πG
The Hubble parameter is time dependent so the critical density also varies. Today, the critical
energy density is
H 2 3c2
ρcrit,0 = 0 (4.35)
8πG
and the critical mass density is thus
ρcrit,0
≃ ×10−26 kg m−3 (4.36)
c2
which is about one hydrogen atom per cubic meter. The subscript 0 for today is often dropped
in the literature.
A very useful and common definition is the density of all fluids together relative to the critical
density, called the density parameter:
ρT OT (t)
ΩT OT (t) = (4.37)
ρcrit (t)
Our constraints on Ω today are roughly ΩT OT = 0.999 ± 0.002. Of course, curvature could
still exist but be smaller than that. Before the discovery of dark energy, several measurements
pointed to Ω ∼ 0.3.
It’s important to note that a flat universe will remain flat forever. To see this we re-write the
Friedmann equation as
kc2
1 − ΩT OT (t) = − 2 2 2 (4.38)
R0 a H
from which for κ = 0 it follows that ΩT OT (t) = 1.
However flatness is dynamically unstable. If the density is just slightly above or below the
critical density, the universe will become more curved quickly. This poses the question of why our
universe was so flat to begin with that it is still flat today. This can be explained by cosmological
inflation as we will discuss later.
8πG X kc2
H2 = ρw − (4.39)
3c2 R2 a2
w=m,r,Λ
15
It is useful to write the curvature on an equal footing as the fluids by defining
3kc4
ρk = − (4.41)
8πGR02 a2
ρk,0 kc2
Ωk = =− 2 2 (4.42)
ρcrit,0 R0 H0
H2 Ωr Ωm Ωk
2 = 4 + 3 + 2 + ΩΛ (4.43)
H0 a a a
da′
Z a
−1
t(a) = H0 p (4.46)
0 Ωr a′−2 + Ωm a′−1 + Ωk + ΩΛ a′2
which can be evaluated numerically. The age of the universe today is given by setting a = 1.
Note that Ωm , Ωr and ΩΛ change over time, but in a flat universe ΩT OT = 1 does not change
change, as we saw above. In the same way a closed universe stays closed and an open universe
stays open (ΩT OT changes but not its sign).
• Matter density Ωm = 0.310 ± 0.007. This is the combined density of cold dark matter
and baryons.
• Baryon density ΩB h2 = 0.0224 ± 0.0002 (i.e. ΩB ≈ 0.05). Baryons are a part of matter
(the rest being dark matter ΩCDM ≈ 0.26). We need both of these components to fit
observations for reasons that we will discuss later.
16
• Hubble constant H0 = (67.9 ± 0.7) km s−1 Mpc−1 . There is currently a famous ∼ 5σ
disagreement of different measurements called the Hubble tension which we will talk
about later.
We’ll discuss these in the section about inflation. Finally a sixth parameter of ΛCDM is the
so-called
• Optical depth τ . This parameter describes how transparent the universe is for CMB light.
Physically it depends on how many free electrons there are, which depends on the process
of reionization.
17
To find the redshift where matter and radiation were equal, we equate their densities:
ρm (zeq ) = ρr (zeq )
ρm,0 (1 + zeq )3 = ρr,0 (1 + zeq )4
ρcrit,0 Ωm (1 + zeq )3 = ρcrit,0 Ωr (1 + zeq )4
ΩM
zeq = − 1 ≈ 3250
ΩR
Therefore the equality of the two happened when the scale factor was about 3000 times smaller
than today. Using a(t) = (t/t0 )2/3 , valid during matter domination, gives teq = 70.000 yrs. A
more accurate calculation using Eq.(4.46) gives teq = 50.000 yrs.
The growth of perturbations (such as those in CMB) depends sensitively on which component
is dominating the universe. Radiation pressure suppresses structure growth. We will study this
topic later.
We may also ask about matter-dark energy equality. Using a similar calculation one finds that
this happened about 4 billion years ago, which is relatively recently in cosmological terms.
• Evolution of the different fluids. Plot ΩX for m, r, Λ as well as their sum as a function of
the scale factor from a = 10−5 to a = 100 on a log-x plot. Do the same as a function of
time.
• Evolution of H(a) and aH(a). Plot these functions for the same range of the scale factor.
Mark the radiation, matter and DE dominated regions. Do the same as a function of log
time and linear time.
18
generally match observations very well. Beyond that energy scale, theorists have come up with
different models. Of course, this is an opportunity to probe physics beyond the standard model.
I want to give only a very brief overview of this material. While important, these topics have
been worked out in detail and have been put into code packages that can be used without detailed
understanding for most practical purposes.
5.1 Overview
We can get an idea of the hot big bang simply from the following facts:
• The temperature in the past was T = T0 /a(t). This is because for radiation (which dom-
inates the thermodynamics even in the matter dominated universe) we have λ ∝ a as we
have derived and the temperature of a black body scales as λpeak = const.
T (Wien’s law). We
have already calculated a(t).
• The temperature of the universe now is T0 ≃ 2.75K, which is the temperature of the Cosmic
Microwave Background. A clump of gas without any source of heat (also no gravitational
heating) will be at this temperature.
• We know the masses of particles and binding energies of bound states. For example the
binding energy of electrons in atoms is of order eV. If the kinetic energy exceeds the binding
energy, the bound state will be broken up. The binding energy of nuclei is of order 1 MeV
and the binding energy of nucleons is of order 1 GeV. This tells us very roughly at what
temperature these bounds states form. The actual temperatures are lower because of the
tail of the Boltzmann distribution.
These facts correctly suggest a thermal history that is illustrated in Fig. 2. The key events in
the thermal history of the universe are also listed in table 1. To understand it in more detail we
need to review some thermodynamics.
19
Event Temperature Energy Time
Inflation < 1028 K < 1016 GeV > 10−34 s
Dark matter decouples ? ? ?
Baryogenesis (matter-antimatter asymmetry, ? ? ?
GUT?, quark-gluon plasma)
EW phase transition (symmetry breaking due 1015 K 100GeV 10−11 s
to Higgs)
Hadrons form (protons, neutrons) from quark- 1012 K 150MeV 10−5 s
gluon plasma. QCD phase transition.
Neutrinos decouple (weak interaction) 1010 K 1MeV 1s
Big Bang Nucleosynthesis BBN: sets element 109 K 100keV 200s
abundances
Atoms form (helium, hydrogen) 3400K 0.30eV 260, 000yrs
Photons decouple (transparent universe) 2900K 0.25eV 380, 000yrs
First stars 50K 4meV 100million yrs
First galaxies 20K 1.7meV 1billion yrs
Dark energy 3.8K 0.33meV 9billion yrs
Today 2.7K 0.24meV 13.8billion yrs
Table 1. Key events in the histroy of the universe (adapted from Baumann table 1.2)
of momentum: f (p, t). To treat inhomogeneities we will later need the full f (x, p, t) and its
differential equation, the Boltzmann equation.
Γ = nσv,
with n the number density of particles, σ their cross-section, and v their velocity (all three are
in general a function of temperature). Note that this is the interaction rate per particle (which
is why it is linear in n), not the total interaction rate per volume. Γ has units of inverse time.
In an expanding universe, it turns out (from the Boltzmann equation) that particles can be
in thermal equilibrium if Γ ≫ H, that is the interaction rate is much larger than the Hubble
rate. To understand this better remember that the age of the universe is roughly tage ≃ H −1 ,
and we want the typical interaction time to be much smaller than the age of the universe. To
20
Figure 2. Thermal history of the universe (plot from the Particle Data Group).
summarize, particles fall out of thermal equilibrium when their interaction rate drops below the
Hubble expansion rate of the universe. At that moment, the particles stop interacting with the
rest of the thermal bath, which is called decoupling, and a relic abundance is created. Both
the creation and the annihilation of such relic particles is negligibly small after decoupling.
We will first assume that we are in thermal equilibrium, but later discuss beyond equilibrium
phenomena.
21
where the − sign is for bosons and the + sign is for fermions. It gives the probability that a
particle chosen at random has the momentum p. The distributions depend on the temperature
T and the chemical potential µ (which can depend on temperature and thus on time in an
expanding universe). The chemical potential describes the response of a system to a change in
particle numbers. Since for photons µ = 0 and for particle-antiparticle pairs µX = −µX̄ we
can ignore the chemical potential in much of the following discussion. The chemical potential
is important when the particle number changes, for example during recombination where the
number of free electrons changes.
From the distribution functions we can calculate the important thermodynamic quantities.
where g is the number of internal degrees of freedom of the particle (e.g. number of spin
states).
• The pressure is
g p2
Z
P (T ) = d3 pf (p, T ) (5.6)
(2π)3 3E(p)
where E(p) is the relativistic energy of the particles. Note that in the ultra-relativistic case
E = p and thus P = 31 ρ as expected.
In thermal equilibrium we can have several different particle species with masses mi and
chemical potential µi but at the same temperature T .
where ζ(3) ≈ 1.202 is the Riemann zeta function, and for the energy density
(
π2 4 1 (bosons)
ρ= gT × 7 . (5.8)
30 8 (fermions)
Note the scaling with temperature and the fact that bosons and fermions only differ in a constant
factor. A typical use of this result is to calculate the number density and energy density of photons
22
today, given the observed temperature of the CMB, T0 ≈ 2.73 K:
2ζ(3) 3
nγ,0 = T ≈ 410 photons cm−3 ,
π2 0
π2 4
ργ,0 = T ≈ 4.6 × 10−34 g cm−3 .
15 0
In terms of the critical density, the photon energy density is then
as mentioned earlier.
For non-relativisitic particles (m ≫ T ), the result for bosons and fermions is the same.
The integral gives
mT 3/2 − m
n=g e T (5.10)
2π
The exponential suppression is called Boltzmann suppression. Physically it means that parti-
cles and anti-particles still annihilate when the temperature becomes low, but they are no longer
created in pair production. This means that when the temperature of the universe falls below
the particle mass and the particle is still in thermal equilibrium (i.e. Γ ≫ H), then the particle’s
abundance and energy density drop rapidly.
For non-relativistic particles one also gets
ρ = mn (5.11)
and
P = nT (5.12)
which is the ideal gas law P V = N kb T with kB = 1 and thus P ∼ 0 since T ≪ m.
where the parameter g ∗ , called the effective number of relativistic degrees of freedom, is a weighted
sum of the multiplicity factors of all particles. This factor is defined as
4 4
∗
X Ti 7 X Ti
g (T ) = gi + gi . (5.14)
T 8 T
i∈bosons i∈fermions
Here we are allowing the possibility that the species have a different temperature Ti from the
photon temperature T , hence the power-law factors in this definition of g ∗ . The effective number
of relativistc degrees of freedom are plotted in Fig.3. Above about 100GeV all particles of the
23
Figure 3. Evolution of effective number of relativistic degrees of freedom assuming the Standard Model
particle content. The EW and QCD phase transitions are also indicated. From Daniel Baumann’s cos-
mology lectures.
standard model are relativistic. Considering all quarks, leptons, gauge bosons, gluons and the
Higgs (with their helicity or spin and their anti-particles) this adds up to g ∗ = 106.75 The
fractional number is possible due to the 7/8 prefactor. As the universe cools down, the heavier
particles drop out of this sum. In the end we are left with 3.38 relativistic degrees of freedom.
Of these, 2 are for the photon (2 polarisations) and the rest is for neutrinos. The counting of
relativistic degrees of freedom for neutrinos is subtle. In particular the neutrino temperature
today (∼ 1.9 K) is not the same as that of the photons (∼ 2.7 K) because they decouple from
the thermal bath before electrons and protons annihilate (which heats the thermal bath). Also
today neutrinos are not relativistic anymore.
Several physical observables are sensitive to the effective number of relativistic species, and
this is an avenue to detect new physics. We usually parametrise the “extra” degrees of freedom
by the effective number of neutrino species Neff . Constraints on Neff come from
• Element abundances: BBN is sensitive to the expansion rate, which is sensitive to Neff .
• The CMB power spectrum and so-called CMB spectral distortions also constrain Neff at a
later time.
The constraint from Planck is Neff = 2.99 ± 0.34. The theory expectation from the standard
model is not exactly 3 but rather 3.046 which is due to the fact that neutrinos deviate a bit from
a Fermi-Dirac distribution due to the energy dependence of the weak interaction.
24
5.4 Beyond Equilibrium: The integrated Boltzmann equation
To study processes that are not in thermal equilibrium we need the Boltzmann equation. The
(integrated) Boltzmann equation for a homogeneous particle species ni is given in general by:
1 d(ni a3 )
= Ci [{nj }] (5.15)
a3 dt
The left hand side is just the conservation of particle number if the right hand side is zero.
The right hand side is the collision term that describes the interaction with all other particle
species nj . The collision term includes cross-sections between particles, which is where the
standard model of particle physics and QFT scattering amplitude calculations come in. Solving
this equation goes beyond this course material.
We want to again point out one important non-equilibrium phenomenon: the freeze out.
The terms decoupling and freeze-out are closely related. Decoupling means that interactions
effectively stop and freeze-out means the creation of a relic density. Above we found that when
the temperature of the universe falls below the particle mass (thus we are in the non-relativistic
regime) and the particle is still in thermal equilibrium, then the particle’s abundance and energy
density are exponentially suppressed in m/T . We have also discussed that for the particles to be
in thermal equilibrium, we need their interaction rate to be larger than the expansion rate Γ ≫ H.
However if the particle drops out of thermal equilibrium before the Boltzmann suppression kicks
in, i.e. Γ < H, we say that it “freezes out”. In this case some relic density of the massive
particle remains which is constant in comoving volume and does not change (unless the particle
decays, such as neutrons).
If we had time to study the integrated Boltzmann equation in more detail, we would in
particular examine:
• The formation of the light elements during the Big Bang nucleosynthesis (BBN). This is
one of the big successes of the standard model of cosmology, making predictions for the
abundance of elements that agree very well with data (with the possible exception of the
lithium problem).
• The production of dark matter (which likely has a relic density set in the early universe).
• The decoupling of neutrinos in the early universe, when the weak interaction becomes too
weak to couple them to the thermal bath.
• The neutron freeze-out which set the initial neutron to proton ratio. Remember that free
neutrons decay.
• The period of recombination where electrons and nuclei form neutral atoms and the universe
becomes transparent.
• Baryogenesis, the somewhat unknown process that led to the matter-antimatter asymmetry
observed today.
25
5.5 Beyond Homogeneity: The Einstein-Boltzmann equations
So far in this unit we have been considering the homogeneous universe. Of course, the universe
is only interesting because it is not homogeneous. Here I want to outline what the full set of
equations are that govern the universe, without assuming homogeneity. The relevant equations
with inhomogeneity are still the Einstein equation and the Boltzmann equation, which of course
are coupled to eachother. The Boltzmann equation for an inhomogeneous anisotropic fluid with
phase space density f (x, p, t) is schematically
dfa (x, p, t)
= C[{fb (x, p, t)}], (5.16)
dt
where fa is the phase space density of particle a and we have other particles {fb } which also have
Boltzmann Equations. Note that unlike the integrated Boltzmann equation for number densities
we saw above, this is an equation for the full phase space density. The phase space density goes
into the energy momentum tensor of Einsteins equation. For completeness, the energy momentum
tensor for a given phase space distribution function f (x, p, t) is
g dP1 dP2 dP3 P µ P ν
Z
µ
Tν (x, t) = f (x, p, t)
−det[gαβ ] (2π)3 P0
(Dodelson Eq 3.20) where the degeneracy factor g counts the internal states. The EM tensor of
course defines the metric through the Einstein equation
8πG
Gµν = Tµν (5.17)
c4
These equations can be solved analytically in relativistic cosmological perturbation the-
ory. To do so, one expands the metric in perturbations around FLRW, and the particle content
in perturbations around the average density. Relativistic cosmological perturbation theory is in
particular required to calculate the Cosmic Microwave Background. In fact, the full Boltzmann
equation is not required for an analytic treatment, since one can make a fluid approximation on
large enough scales. However, on scales where the mean free path length of the photon becomes
important (during recombination) such an approximation must break down. In practice, we thus
solve the Einstein-Boltzmann equations of the early universe numerically, with codes such as
CAMB and CLASS. Even the numerical solution starts with a perturbative ansatz, which gives
the linearized Einstein-Boltzmann equations. As we will discuss more, linear perturbation
theory is enough to calculate the CMB to excellent precision.
On the other hand, in the late universe, perturbations are non-linear. This is the domain
of structure formation. Fortunately, in this domain relativistic effects are small and we can
work with Newtonian perturbation theory and Newtonian simulations. In summary, in
cosmology one very rarely needs perturbation theory that is both (general) relativistic and non-
linear. We will learn much more about perturbations in the CMB and in large-scale structure
during this course.
6 Inflation
To complete our overview of the evolution of the universe, we need to discuss the earliest (highest
energy) epoch of the universe which we can currently understand, the period of cosmological
26
inflation. Unlike the hot big bang, inflation is still somewhat speculative. It makes predictions
that we can verify with observations, but these predictions are not so unique that we would
consider the theory to be proven. There is however, in the opinion of most (but not all) cosmol-
ogists, no competitive theory that would be equally attractive. In fact, inflation is such a good
framework to set up the initial conditions of the universe that it is almost treated as a fact by
many cosmologists. It does in particular the following things for us:
• It makes the universe flat even if it started out not being flat. There is some debate and
ongoing research about this question (e.g. does inflation even start in an inhomogenous
universe), but the majority opinion seems to be that it works.
• It solves the horizon problem, which is that we find thremal equilibrium and correlations
between parts of the universe that would not be causally connected without inflation.
• It sets up the horizon exit for the later re-entry of perturbations, which is crucial
to explain the matter and CMB power spectrum (turnover and BAO phases).
• It solves the so-called magnetic monopole problem, which depends on a speculative GUT
theory and may thus not exist. We will not cover this problem.
In my opinion it is near certain that accelerated expansion and a quantum origin of primordial
perturbations really happened in the universe. Whether this happened through a weakly coupled
slowly rolling scalar field, as is the case in most inflation models, is less certain. Whether anything
happened “before inflation” and whether this question makes sense is not known, and likely won’t
be answered before we have a complete non-perturbative theory of quantum gravity.
Inflation is treated beautifully in all the references that we pointed out for this unit, and I will
compress the material very significantly.
27
causal contact at that time. More than that, as we will see, different parts of the CMB sky have
small temperature perturbations which are correlated. To establish a correlation, clearly causal
contact is also required. It turns out that in the hot big bang picture which we developed so
far, parts of the CMB that are further than about 1 degree apart in angle were not in causal
contact prior to recombination without inflation. This is the horizon problem which inflation
solves. Let’s now put this into equations.
We then integrate this equation for a light ray ds = 0. If the Big Bang “started” with the
singularity at ti ≡ 0 then comoving particle horizon at time t is:
dt′
Z t
comov
dh (t) ≡ χ = c ′
= c(η − ηi ) (6.3)
0 a(t )
R
where the h stands for “horizon”. Recall our definition of conformal time η = dt/a(t). For
a scale factor that goes to zero at tBB = 0 (i.e. matter or radiation dominated) we can set
ηi = 0 and the comoving horizon is just η. The size of the physical particle horizon is (using
Eq.(3.34))
dt′
Z t
phys comov
dh (t) = a(t)dh (t) = a(t)c ′
(6.4)
tBB a(t )
The particle horizon can be nicely illustrated in a spacetime diagram. To understand that,
we start from the metric Eq.(6.1). We see that a light ray is given by χ(η) = ±c η + const. and
thus can be drawn as a 45◦ angle in the χ-cη plane. The resulting diagram is called a spacetime
diagram. The spacetime diagram for the particle horizon is shown in Fig. 4. One often also
defines the event horizon which is the forward (rather than backwards) light-cone and tells us
what events we can influence in the
future.
2
t 3(1+w)
In a flat universe with a(t) = t0 and w = const., the physical particle horizon today
is:
2
dphys = H −1 . (6.5)
h 1 + 3w 0
28
Conformal time
Observer
Particle horizon
Conformal distance
Conformal time
today
Recombination
Conformal distance
Big bang singularity Particle horizon
Figure 5. Particle horizon for CMB perturbations without inflation. CMB regions we see today in the
sky were causally disconnected in the past.
29
the particle horizon at time t is defined by
t
dt′
Z
dh (t) = c a(t) = 3ct (6.7)
0 a(t′ )
Let’s write this in terms of red shift:
a(t) ∼ tn (6.11)
30
if n > 1. As t′ approaches 0 we get more and more contributions to dh so it diverges. Recall
from Eq.(6.3) that dh = c(η − ηi ). We see that an early accelerating phase buys us conformal
time and allows all regions of the universe to have been in causal contact in the inflationary past.
For inflation, the natural choice is to use time coordinates so that inflation starts at ηi = −∞
(since the lower end of the integral leads to the divergence) and ends at conformal time ηf = 0.
Patching the spacetime of inflation to the spacetime of the later universe together we get Fig. 6,
which shows that light cones now overlap. For accelerated power law expansion there is still a
big bang at t = 0 where a(t) = 0, but there is an infinite amount of conformal time after that.
However we should not expect that our equations hold for a(t) smaller than the Planck length,
since we don’t know non-perturbative quantum gravity.
Inflation models naturally generate exponential expansion rather than power law expansion,
i.e.
a(t) ∝ eHinf t (6.15)
Unlike the power law acceleration we considered above, for exponential expansion there is no
big bang in the past since a(t) > 0 at all times. This means that there is no natural choice for
t = 0 and our time integral can go from ti = −∞ to tf , which again makes the comoving horizon
diverge, if exponential expansion went on infinitely long. The Hubble parameter has dimension of
energy (or inverse time) so the exponent is dimensionless as it should be. Exponential expansion
means that a patch of space time of physical size di grows to a size df = di eHinf T in time T . We
define the numer of e-folds of inflation by N = Hinf T .
Let’s estimate how many e-folds we need to solve the horizon problem, by inflating a causally
connected patch before inflation to the size of our current universe. Before inflation started,
the physical particle horizon had some value di that was causally connected. A natural scale is
−1
the physical Hubble distance Eq.(3.37) di = cHinf . After exponential expansion, this connected
N
patch has the physical size df = e di . Then, due to the expansion of the universe since the end
of inflation, the patch grows to
df eN di eN c
dnow = = = (6.16)
ainf ainf Hinf ainf
where ainf is the scale factor at the end of inflation, the beginning of the ordinary evolution of
the universe. We want dnow to be much larger than the Hubble horizon today, i.e. dnow ≫ cH0−1 .
It thus follows that
Hinf
eN > ainf (6.17)
H0
Most of the relative expansion since the end of inflation happened during the radiation era, in
which H ∝ a12 . Thus we have HHinf
0
= a21 from which we get
inf
1/2
Hinf
e N
> = a−1
inf (6.18)
H0
In this relation H0 is known but Hinf or equivalently ainf are not. A possible value of H during
inflation could be 1014 GeV or below, so let’s use this value as an example. The Hubble constant
31
today is H0 ∼ 10−18 s−1 . Let’s convert this to GeV via E = ℏω where ℏ ∼ 10−15 eVs. This gives
H0 ∼ 10−33 eV = 10−42 GeV. Thus we have
1/2
1014 GeV
N
e > = 1028 (6.19)
10−42 GeV
Solving for N this gives the often quoted estimate that we need around 60 e-folds of inflation.
We can also estimate how long inflation needs to last from N = Hinf T . With Hinf = 1014 GeV ∼
1038 s−1 we get that T ∼ 10−36 s. So inflation can be extremely brief. However from the discussion
presented here, there is no limit for how long inflation can last in principle, we only set a lower
limit. In particle physics models of inflation there can be an upper limit. The length of inflation is
also connected to the subject of eternal inflation. In some models inflation never ends globally.
Accelerated expansion also solves the flatness problem. To drive accelerated expansion as in
Eq.(6.11) one needs an inflation energy density that goes as
1
ρinf ∼ (6.20)
a(2/n)
with n > 1 (this follows from Eq.(4.24) and Eq.(4.11)). This clearly dilutes more slowly than
curvature, radiation and matter. In fact, in the case of exponential expansion, ρinf does not
dilute at all, like dark energy, as we have seen. This means that after a long period of inflation,
curvature, matter and radiation will all have diluted away to negligible amounts and the universe
is empty except for ρinf .
While this solves the flatness problem, we are left with a new problem: why is the universe
not empty. What is missing is a mechanism that ends inflation and converts the inflationary
energy density into ordinary (relativistic) matter and radiation. This mechanism exists and is
called reheating. Interestingly, reheating is somewhat natural in a quantum field theory of
inflation, i.e. there are simple models that have this behavior. In summary, the matter and
radiation in our universe is believed to have been created with the energy in the field that drove
inflation.
The action (6.21) is the sum of the gravitational Einstein-Hilbert action, SEH , and the action of
a scalar field with canonical kinetic term, Sϕ . The potential V (ϕ) describes the self-interactions
32
Conformal time
today
Recombination
Conformal distance
End of inflation,
Reheating
Inflation
Figure 6. Particle horizon for CMB perturbations with inflation (adapted from 0907.5424).
of the scalar field (in addition there can be derivative self-interactions). Assuming the FRW
metric for gµν and restricting to the case of a homogeneous field ϕ(t, x) ≡ ϕ(t), the scalar energy-
momentum tensor
(ϕ) 2 δSϕ
Tµν ≡ −√ (6.22)
−g δg µν
takes the form of a perfect fluid (see e.g. Baumann’s book for the math) with
1 2
ρϕ = ϕ̇ + V (ϕ) , (6.23)
2
1
pϕ = ϕ̇2 − V (ϕ) . (6.24)
2
The resulting equation of state
1 2
pϕ ϕ̇ − V
wϕ ≡ = 21 , (6.25)
ρϕ 2
2 ϕ̇ + V
shows that a scalar field can lead to negative pressure (wϕ < 0) and accelerated expansion
(wϕ < −1/3, see Eq.(4.24)) if the potential energy V dominates over the kinetic energy 12 ϕ̇2 .
This is why the field needs to be rolling slowly. Inflation ends when the field rolls into a steeper
33
inflation
end of inflation
reheating
part of the potential as show in the Figure. Finally the field rolls into a minimum and starts
oscillating around it. During this phase of oscillation, the inflaton acts like pressure-less matter
(wϕ = 1 above) and decays into other particles (those of the standard model if this is all there is).
This process is called reheating and is very model dependent and very complicated. It turns out
that predictions for cosmology don’t depend much on reheating, all we need is that the inflaton
energy is ultimately getting transformed into a thermal bath of standard model particles.
• If you had QFT, you know that the standard method to quantize a field theory for which
you know the Lagrangian or Hamiltonian is to promote the field ϕ(x, t) and its conjugate
momentum π(x, t) to operators and impose canonical commutation relations between them.
This is correct here too. Schematically
h i
ϕ̂(x, t), π̂(x′ , t) = iδ 3 (x − x′ ) (6.26)
• The quantization of inflation leads to exactly the kind of primordial perturbations we need
to seed the structure formation of the universe. As you know, in quantum mechanics there
is a fundamental uncertainty on quantities which means that the inflaton field ϕ cannot be
exactly homogeneous but rather must have small “quantum wiggles” in it. A heuristic way
to think about this is through Heisenbergs uncertainty principle in the form ∆t ∆E ∼ 1
−1
(where we set ℏ = 1). The time scale is set by the Hubble time ∆T ∼ Hinf and the
energy fluctuations are set by the fluctuations in ϕ, i.e. ∆E ∼ δϕ. The uncertainty relation
predicts that we should see fluctuations of size δϕ ∼ Hinf .
• Because the potential is so flat during inflation, interaction terms such as ϕ3 are very small.
Inflation is thus an almost free (i.e. linear) field theory. The quantization of inflation thus
34
leads to a collection of (nearly) uncoupled quantum harmonic oscillators. Said differently,
the Fourier modes of the inflaton field act like independent harmonic oscillators.
• Inflation does include and requires perturbative quantum gravity. Well below the Planck
scale, we can quantize gravity in the sense of an effective field theory that integrates out the
unknown UV physics of the ultimate quantum gravity theory. Depending on the details of
the model, in particular its energy scale, inflation can be more or less sensitive to unknown
quantum gravity “UV physics”.
As we shall see later, perturbations are best discussed in Fourier space, because these Fourier
modes evolve almost independently (i.e. they are not coupled in the free theory). We therefore
express perturbations as
• The comoving curvature perturbation R. Under some gauge conditions this is the cur-
vature that a local observer would observe. This quantity is also conserved on superhorizon
scales (see next section).
With some gauge subtleties on superhorizon scales, Φ is related to the energy density by
Poisson’s equation ∇2 Φ ∝ ρ.
35
The distinction of these is not important in this course. The first main point to take away here is
that we need a single scalar field to describe the scalar curvature perturbations of the
universe (which are those induced by scalar density perturbations). This does not include tensor
perturbations which are gravitational waves. Scalar and tensor perturbations together make up
the full metric. So far there is no experimental evidence of primordial tensor perturbations.
It turns out that the inflaton perturbations δϕ generate curvature perturbations as
Hδϕ
R≈− , (6.30)
ϕ̇
where ϕ̇ is the inflaton speed. We won’t derive this equation. The second main point to take
away is thus that we can calculate the curvature perturbations R in a given inflation model, and
that they are sourced by inflaton quantum perturbations δϕ.
1
dcomov
H = (6.31)
aH
It gives the size of the Hubble radius in comoving coordinates. An accelerating phase (ä > 0) is
equivalent to a shrinking Hubble radius
d 1
ä > 0 ⇔ <0 (6.32)
dt aH
We see that the comoving Hubble radius is shrinking during inflation, but growing during ordinary
cosmological evolution (matter and radiation). You can think of the Hubble radius as the “size
of the currently causally connected patch at time t”, while the particle horizon tells us about the
“size of the causally connected patch when considering the entire past”.
The comoving Hubble radius is crucial to understand the behavior of cosmological pertur-
bations. For these perturbations R(k, t) we can study their equations of motion and find that
their behavior (i.e. whether they grow, stay constant or get smaller with time) depends crucially
on their size compared to the comoving Hubble radius. We can divide perturbations into two
classes, by comparing their comoving wave number with the comoving Hubble horizon:
36
inflation radiation domination matter domination
comoving horizon
super-horizon
shorter density fluctuation
re-enters in radiation dom.
sub-horizon
reheating matter-radiation
equality
Figure 8. Horizon exit and re-entering of perturbations. On the y-axis is the comoving scale of the
perturbation 1/k and the scale of the comoving Hubble radius 1/(aH). The modes that leave the horizon
the latest (the smallest wave length λ) re-enter the horizon first (last out, first in).
• The curvature perturbation Rk exits the horizon during inflation and stops evolving. This
can be proven in relativistic perturbation theory. Horizon exit is also connected to the fact
that these perturbations classicalize (i.e. get a concrete value that we observe, like in a
measurement). Note that the curvature perturbation does not change due to re-heating,
which is why the details of reheating don’t matter for cosmological predictions.
• At some point after the end of inflation the curvature perturbation Rk re-enters the horizon
and starts evolving again. Later we will see that the time when perturbations re-enter the
horizon (during radiation domination or matter domination) is crucial for their amplitude.
We have not yet discussed how perturbations evolve in time. This depends on whether they are
subhorizon or superhorizon and whether they evolve during radiation domination or during matter
domination, or later during Lambda domination. The study of subhorizon and superhorizon
evolution of perturbations is required to understand the qualitative properties of the matter
power spectrum. We will briefly get back to this later in the course.
k3
⟨Rk Rk′ ⟩ = (2π)3 δ(k + k′ ) PR (k) , ∆2s ≡ ∆2R = PR (k) . (6.35)
2π 2
37
Figure 9. Reconstruction of the primordial power spectrum from Planck (1807.06211). The k scales
where the power spectrum is constrained by the CMB are limited by the size of the observable universe
to the left and by the small-scale damping of the CMB to the right. One can clearly see that ns ̸= 1.
Here, ⟨ ... ⟩ defines the ensemble average of the fluctuations. The power spectrum is often approx-
imated by a power law form
ns (k⋆ )−1+ 1 αs (k⋆ ) ln(k/k⋆ )
k 2
∆2s (k) = As (k⋆ ) , (6.36)
k⋆
where k⋆ is an arbitrary reference or pivot scale. The scale-dependence of the power spectrum is
defined by the scalar spectral index (or tilt)
d ln ∆2s
ns − 1 ≡ , (6.37)
d ln k
where scale-invariance corresponds to the value ns = 1. We may also define the running of the
spectral index by
dns
αs ≡ . (6.38)
d ln k
The free parameters we want to measure are thus
• The primordial spectral amplitude As . Planck measured As ∼ 2.1 × 10−9 for kp = 0.05
Mpc−1 .
• The running of the spectral index αs (although no running has been detected).
This parametrization of the primordial power spectrum is very useful in practice. However, one
can also directly reconstruct the power spectrum without a parametrization, as shown in Fig. 9.
38
In the same way inflation predicts small primordial tensor fluctuations. These are the
primordial gravitational waves and are also given by a Gaussian power spectrum for the two
polarization modes. For these one can measure in particular
Primordial gravitational waves have not been detected so there are only bounds on these param-
eters. The current constraint is about r < 0.05, i.e. tensor modes are less than 5% of the scalar
modes. This bound will be significantly improved by upcoming CMB experiments. A detection
of non-zero r is perhaps the best chance for a big fundamental physics discovery in the coming
decade.
Finally, both scalar and tensor perturbations are not expected to be precisely Gaussian. At
the very least, the coupling to gravity which is a non-linear theory, leads to some mode-coupling
between perturbations. More than that, the inflaton potential as well as so-called derivative
interactions also lead to primordial non-Gaussianity. The most obvious way to look for
primordial non-Gaussianity is to search for a non-zero 3-point function
This three-point function is called the bispectrum. Non-Gaussianity can come in many different
bispectrum shapes (as well as higher N-point functions). The most famous bispectrum amplitude
parameter is
In many inflation models primordial non-Gaussianity is too small to be detected any time soon
but there are also well-motivated scenarios where a detection could be around the corner. We
will get back to this topic, which is a main research topic of mine, in more detail later in this
course.
Here we have discussed the initial conditions in the curvature field. This curvature field is
then converted into initial conditions for matter and radiation. In most models, the initial
conditions for the fluids δr , δCDM and δbaryons are all the same (up to an overall amplitude),
since they are seeded by the same curvature field. Such initial conditions are called adiabatic
initial conditions. There could in principle also be perturbations where the different fluids have
different perturbations. Such perturbations are called isocurvature perturbations. Currently
there is no experimental evidence for isocurvature perturbations and the most straight forward
models of inflation and re-heating don’t generate them. In this course, as in most cosmological
analyses, we will assume adiabatic initial conditions.
39
Part II
Introduction to Computation and Statistics
in Cosmology
In this section we introduce some of the main computational tools and data types used in cos-
mology. A practical goal will be to be able to analyze a dark matter simulation, extract its power
spectrum, and run MCMC to determine its cosmological parameters. We also want to learn
how to Fisher forecast experimental sensitivity, and compare it to the result in our simulation
analysis. Analyzing a dark matter simulation comes without the practical complications of a real
CMB or galaxy survey. We will discuss these real world complication in later units.
Further reading
The general references of Part 1 all contain some material on statistics and data analysis, in
particular
Lecture notes or reviews that are specifically about data analysis in cosmology include:
• Verde - A practical guide to Basic Statistical Techniques for Data Analysis in Cosmology.
arxiv:0712.3028
• Leclercq, Pisani, Wandelt - Cosmology: from theory to data, from data to theory. arxiv:1403.1260
These initial conditions of the universe then evolve forward in time, according to the
standard model of cosmology or its extensions. For example, the distributions of matter in
the sky δm (which can be probed e.g. by mapping galaxies δg ) is given by some complicated
40
function F of the initial conditions which depends on the ΛCDM parameters (as well as other
physical constants).
F Λ (R,t)
R −−−−−→ δm (t) (7.2)
The function (or simulation) that connects the initial conditions to whatever we observe in the
data is sometimes called the forward model and it depends on physical parameters Λ that
we want to measure such as Ωm or the mass of neutrinos mν . By measuring δm , we can learn
both about primordial parameters such as As , ns and the parameters that influence the time
evolution which we called Λ here. Roughly speaking, the function F is known exactly on large
scales, approximately known on intermediate scales, and computationally intractable on small
scales. A typical course on theoretical cosmology would now spend some weeks with calculating
the function Eq.(7.2) analytically in cosmological perturbation theory, which amounts to
solving the Euler and Poisson equations perturbatively. Instead, we will focus on how to
perform data analysis and just use results from perturbation theory where needed. We will get
back to (non-relativistic) perturbation theory in Sec. 21.
In some modern analyses in cosmology one tries to reconstruct the initial conditions R(x)
directly from data such as the galaxy density δg (x). However, in the vast majority of analyses,
we don’t aim to reconstruct the initial conditions directly, but only their statistical parameters
such as As , ns , together with the parameters of cosmological time evolution Λ. This makes sense
because the theory of the initial conditions only makes predictions for statistical parameters. For
example, no theory can predict where in space a galaxy will form, but we can predict statistical
properties of the galaxy field. For the same reason we don’t usually have to analyze the volumetric
data δg (x) directly but instead only summary statistics of this data.
The most important summary statistic (which in the Gaussian case carries all the information)
is the power spectrum of the field. In many cases, we will measure the observed power spectrum
Pgobs of the galaxy data, and compare it to the theoretical power spectrum Pgtheo (Λ, As , ns ) which
depends on cosmological parameters. By adjusting these parameters so that Pgtheo matches Pgobs
we arrive at a measurement of our cosmological parameters. What we said here for galaxy
density measurements, is also true for all other data sources that probe the matter and radiation
distribution of the universe, in particular the Cosmic Microwave Background (CMB). The CMB
is a particularly clean probe of cosmology because, as we shall see, it is linear in the initial
conditions. Schematically, the “forward model” of the CMB is the linear mapping
Rk = T Λ (k)ΘCM
k
B
(7.3)
where ΘCMk
B are the Fourier modes of the CMB temperature perturbations and T (k) is the
so-called linear transfer function which depends on cosmological parameters Λ. On the other
hand, for the non-linear galaxy field, the Fourier modes are coupled to eachother in a complicated
way.
In the present section we will develop the tools to analyze the matter distribution through
the power spectrum in a simulated cosmological volume. This setup is already enough to write
interesting papers in cosmology. In later units, we will use the same tools to analyze realistic
data from the CMB and galaxy surveys, which comes with many interesting complications. We
41
will then also discuss how to go beyond the power spectrum to extract even more information
from cosmological data.
• Primary CMB anisotropies. The primary CMB is the jewel of cosmological data. This is
because it has perfectly understood physics, with a linear map to the initial conditions. Our
best constraints on primordial physics come from the primary CMB. On the other hand,
it cannot directly probe late time physics such as dark energy. For primordial physics,
the only limitation is the number of independent modes. Modes here means either
independent pixels or independent Fourier modes. First, the CMB is a 2d probe, while e.g.
a galaxy survey is a 3d probe. Second, because of the free streaming length of photons,
primary CMB anisotropies are damped away on small scales. This limits the number of
available modes in the CMB to roughly
where ℓmax is the maximum multipole scale, as we will see later. The Baryon Acoustic
Oscillations in the power spectrum of the CMB reveal cosmological parameters such as
Ωm and ΩB . The CMB is also polarized. While so-called E-mode polarization has been
measured and roughly doubles the information in the CMB, cosmologist look for primordial
B-mode polarization which would reveal the presence of primordial gravitational
waves.
• Secondary CMB anisotropies. Two things happen to photons on the way from re-
combination to us. First, all photons are gravitationally lensed by the intervening matter.
From the observed CMB one can reconstruct the so-called lensing potential, which is a
weighted radial integral over the matter density on the line of sight. In this way, the CMB
can also be used to probe physics that happens at later times in the universe, such as the
“clumping” of non-relativistic neutrinos due to their non-zero mass. Second, a part of the
CMB photons (a few percent) will hit a free electron and get re-scattered. Depending on
the radial velocitiy of the electron, the photon will either gain or lose energy. This is the
Sunyaev-Zeldovich (SZ) effect. The SZ effect can for example be used to probe the
temperature of gas in clusters.
• Large-scale structure (LSS) with galaxy surveys. The distribution of galaxies probes
the initial conditions of the universe as well as later time physics such as dark energy and
neutrino masses. Galaxies are arranged in a cosmic web of voids, filaments, walls and
clusters. The advantage over the CMB is that this is a 3-dimensional probe, and that it is
not affected by the CMB damping scale, so that we can in principle probe far more modes.
The disadvantage is that the smaller modes are very non-linear and hard to model. There is
42
however a redshift dependent scale of gravitational collapse, where primordial information
should be entirely erased. The number of modes is
3
kmax
NLSS ∼ (8.2)
kmin
so it goes cubic rather than quadratic since it is a volumetric probe. The resulting number
depends very sensitively on the experiment and theoretical assumptions, which we will re-
visit later. Roughly speaking, current experiments have less independent accessible modes
than the CMB but future experiments will have more. As in the case of the CMB, light from
galaxies is also lensed. This lensing distorts the image of galaxies, which is called cosmic
shear or galaxy weak lensing. Weak lensing probes the same cosmological volume
as the galaxy positions, but it probes all matter (including dark matter) rather than
only luminous matter, which gives somewhat different information. Using large-scale
structure, one can measure the Baryon Acoustic Oscillations in the power spectrum.
These provide a standard ruler that can measure distances, and thus the expansion history
of the universe.
• Large-scale structure (LSS) with intensity mapping. The universe can of course not
only be probed by identifying galaxies in the visible spectrum but in general by mapping
any sort of radiation. In particular, one can map known emission and absorption lines
of both atoms and molecules in the universe. There are many different such lines that
I won’t review here. A current exciting experimental front is 21cm intensity mapping
which looks for the 21cm spin-flip transition line of neutral hydrogen. The universe contains
plenty of neutral hydrogen. The hardware for a 21 interferometer is in principle cheap,
only requiring a set of antennas and a supercomputer to correlate them. Achieving the
required frequency resolution for precise redshifts is easy. However, due to extremely large
foregrounds, this technique is not yet quite ready for cosmology. Even further in the future,
it may be possible to do 21 cm intensity mapping of the dark ages, the time before the
first galaxies formed. In principle, there is an gigantic amount of primordial information
hidden there (N ∼ 1018 ). At the time scale of several decades it may be possible to
access this information. A different intensity mapping technique, that is already in use, is
Lyman-α mapping which looks for the Lyman-α forest, absorption lines in the emission
of distant quasars due to neutral hydrogen in the intergalactic medium.
These are the data sources for which we are developing tools in this course. They have in
common that they probe the universe as a density field. There is a different category of probes
which looks at individual objects. Some of the main probes here are:
• Type 1a Supernova Distance Measurements. The discovery of dark energy was made
possible by measuring distances (rather than redshifts) using type 1a supernovae. Their
key property is that they have a known brightness (standard candle), so one can measure
the so called luminosity distance. Type 1a SN thus probe the expansion history of the
universe.
43
• Strong lensing (of quasars and galaxies by galaxies and galaxy clusters). If there is a
dense enough chunk of matter infront of a cosmological light source, one can get multiple
images or Einstein rings. From these strongly lensed images one can obtain a measurement
of the lens profile (thus probing the dark matter profile) and, if the light source is time
variable, one can get time delay measurements. These time delay measurements can be
used to measure the Hubble constant.
The list above is not meant to be complete, but covers the most important data sources for cos-
mology (rather than for the large field of multi-messenger astrophysics, which studies individual
sources).
• The number density n(x) of a discrete tracer such as galaxies where n(x) = δN (x)/δV .
44
By construction, the spatial average of δ vanishes,
In this section we will primarily think about non-relativistic cosmology (such as galaxy sur-
veys), so that the pressure can be ignored. However, the statistical techniques we develop are
equally important for relativistic cosmology. We will get back to relativistic physics in the unit
about the CMB.
In the following I will use the notation f (x) for a general scalar field. In general f (x) also
depends on time as f (x, t) but in this section we won’t need explicit time dependence (and it is
easy to add to the notation).
P[f ] (9.3)
and
45
correlation function. Correlation functions of fields are expectation values of products of fields
at different spatial points. The two point correlator is
Z
ξ(x, y) ≡ ⟨f (x)f (y)⟩ = Df P(f )f (x)f (y), (9.8)
where the integral is a functional integral (or path integral) over field configurations. This is the
usual definition of an expectation value in statistics.
By statistical homogeneity, the correlation function can only depend on the difference of the
positions x + r and x and statistical isotropy enforces dependence on the magnitude only. In this
case the correlation function is given by
The proof of this intuitive statement can easily be found in textbooks. The correlation function
of galaxies and other observable fields can be measured and is used to probe properties of the
universe.
and
d3 k
Z
f (x) = expik·x f (k) (9.11)
(2π)3
A nice discussion of other consistent Fourier conventions is in appendix A of 0907.5424.
Cosmology needs Fourier space (also called k-space or momentum space) as much as position
space (also called x-space or configuration space), so let’s review some properties:
• If the position space fields is real we have f (k) = f ∗ (−k). This can be shown by Fourier
transforming f (x) = f ∗ (x).
46
• Under spatial translation, the Fourier transform gets a phase factor
Z
T̂a f (k) = d3 x f (x − a)e−ik·x (9.12)
Z
′
= d3 x′ f (x′ )e−ik·x e−ik·a (9.13)
= f (k)e−ik·a . (9.14)
where x′ = x − a.
The delta function has the dimension of the inverse of its argument, thus here in 3d it has
dimension [k −3 ] = [length]3 . This is also the orthogonality relation for plane waves in an infinite
volume. In the other direction the delta function is
1
Z
′ ′
δD (x − x ) = 3
d3 k e±ik·(x−x ) . (9.16)
(2π)
Using these delta function definitions you can check that the Fourier transform of the Fourier
transform returns the original function as it must.
where, in the second line, we introduced r ≡ x − x′ and then performed the integral over x′ which
gives us a Dirac delta function. We see that different Fourier modes are uncorrelated. This is a
consequence of translation invariance. The power spectrum can also be written in the equivalent
form
(note the change of signs and conjugates) due to the reality condition.
The power spectrum P(k) and the correlation function ξ(r) are related by the 3-dimensional
Fourier transform. We can simplify this relation as follows. Using spherical coordinates, k · r =
47
kr cos θ we have
Z
P (k) = d3 r e−ik·r ξ(r) (9.22)
Z 2π Z 1 Z ∞
= dϕ d(cos θ) dr r2 e−ikr cos θ ξ(r) (9.23)
0 −1 0
Z ∞
r2 h ikr i
= 2π dr e − e−ikr ξ(r) (9.24)
ikr
Z0 ∞
4π
= dr r sin(kr)ξ(r) (9.25)
k 0
Z ∞
= 4π dr r2 j0 (kr)ξ(r). (9.26)
0
where
sin x
j0 (x) = (9.27)
x
is a spherical Bessel function of order zero. These functions are frequently encountered in cos-
mology. In the other direction, one can express ξ in terms of the power spectrum as
d3 r ik·r
Z
ξ(r) = e P (k) (9.28)
(2π)3
dk k 2
Z
= j0 (kr) P (k). (9.29)
2π 2
The power spectrum has the dimension [length]3 . It is often useful to define the dimensionless
power spectrum by multiplying with k 3
k3
∆2 (k) = PR (k) (9.30)
2π 2
which we encountered before in Eq.(6.35). There are different conventions around for the π and
the 2 factor.
where the positive definite, symmetric N × N -matrix Cij = ⟨fi fj ⟩ is called the covariance matrix.
A random field f : R3 → R is a Gaussian random field (GRF) if for arbitrary collections of
48
field points (x1 , ..., xN ) the variables [f (x1 ), ..., f (xN )] are joint Gaussian variables. Since any
N-point function can be calculated from the field PDF, the GRF is fully defined in terms of
its covariance matrix, which is the 2-point function. As we see from the PDF, a Gaussian
random field is not neccessarily homogeneous and isotropic. To make it so, we need to enforce
that the covariance matrix is
Here we have discretized the PDF, i.e. we wrote f as a finite dimensional vector rather than
an infinite dimensional function. In principle, for the continuous fields that we have discussed so
far, we should express the GRF using a Gaussian functional which is schematically
1
Z
3 3
F [f (x)] ∝ exp − d xd y f (x)C(x − y)f (y) (9.33)
2
with Kronecker delta δik . The covariance matrix for a homogeneous field is thus diagonal in
momentum space, i.e. the covariance matrix between different Fourier modes is zero. Note
that the Fourier modes fk , unlike the position space field, are complex numbers. We will get
back to the precise definition of a Gaussian random field on a finite volume (i.e. with discrete
Fourier modes) shortly, including the proportionality factor.
Some special cases are n = 0 which is called white noise and n = 1 which is called the Harrison-
Zeldovich power spectrum, as we will see below.
49
9.3.2 Potential and Density Power Spectra
To discuss typical power spectra we need to discriminate between the power spectrum of the
gravitational potential Φ and the power spectrum of resulting matter perturbations δm , which
are related by the Poisson equation. The primordial gravitational potential Φ is closely related to
the primordial curvature perturbation R as we discussed in Sec. 6.6.1. We are switching notation
to Φ now instead of R because the following is also valid in Newtonian gravity which works with a
Newtonian gravitational potential Φ. The Poisson equation for matter in an expanding space-time
is
4πG
∇2 Φ(x) = 2 a2 ρ̄δ(x) (9.38)
c
which in momentum space is
4πG
−k 2 Φ(k) = 2 a2 ρ̄δ(k) (9.39)
c
Thus the power spectra of the two quantities will be related by
The relation between the density power spectrum and the primordial potential power spectrum
is thus
If the dimensionless primordial power spectrum is constant (rather than k dependent), i.e. if the
primordial curvature perturbations have the same amplitude on all scales, then the density power
spectrum is
This is called a Harrison-Zeldovich Power Spectrum. The potential fluctuations are said to
be scale-invariant primordial fluctuations. This is the case ns = 1 (remember that ns ∼ 0.96,
so it’s close).
We also sometimes need the variance of the field (also called the zero-lag correlation function)
given by Z
σf2 ≡ ⟨f 2 (x)⟩ = ξf (0) = 1/(2π)3 d3 k Pf (k) . (9.43)
where
k3
∆2f (k) ≡ Pf (k) . (9.45)
2π 2
The dimensionless power spectrum is thus the contribution to variance per log wave number. If
the dimensionless power spectrum has a peak at some k∗ then fluctuations in f are dominated by
wavelengths of order 2π
k∗ . Note that the integral Eq.(9.44) is divergent in the large k limit unless
the field is smoothed at some scale so that the power spectrum goes to zero. We will get back to
the smoothing of fields.
50
9.3.3 Illustrating Power Law Power Spectra in 2d
We’d like to get some intuition for how Gaussian density fields look like that have power law
spectra. We will illustrate these fields in 2d (as appropriate for the CMB).
Let’s have a look at the position space correlation function for arbitrary dimension d:
dd k dd k ′ −ik·x−ik′ ·y
Z Z
⟨f (x)f (y)⟩ = e ⟨f (k)f (k′ )⟩ (9.46)
(2π)d (2π)d
dd k −ik·(x−y)
Z
= e Pf (k). (9.47)
(2π)d
P (k) = A k n (9.48)
Which means that all pixels are uncorrelated. This is called white noise. This is illustrated in
Fig.10 top left.
If we decrease n, for example P (k) = A k −1 the correlation between points increases, i.e.
nearby points become more likely to have a similar value (Fig.10). There is a special value of n
for which the field becomes scale invariant, i.e. the correlation between any two points becomes
independent of distance. In dimension d this is n = −d, so in 3d it is n = −3 as we have already
seen above and in 2d it is n = −2. To show this, we rescale the correlation function by a factor λ
dd k −iλk·(x−y) −d
Z
⟨Φ(λx)Φ(λy)⟩ = e k (9.50)
(2π)d
dd k ′ −d −ik′ ·(x−y) k ′ −d
Z
= λ e (9.51)
(2π)d λ
= ⟨Φ(x)Φ(y)⟩ (9.52)
where we changed variables to k′ = λk. If we go more negative with n than the scale invariant
value, then the universe becomes more inhomogeneous on larger scales (i.e. we see larger pertur-
bations if we zoom out). This would be inconsistent with the cosmological principle which wants
the universe to be homogeneous on large scales..
51
Figure 10. 2d Gaussian random fields with power spectrum P (k) = A k n for various n. For n = 0 we get
white noise and for lower n we get progressively more correlation. The scale invariant case is n = −2 in
2d. This plot was made with the Pylians library. The script is provided with the course material.
52
9.4 Matter Power Spectrum and Boltzmann Codes
Apart from power-law power spectra, the most important power spectrum in this course is perhaps
the matter power spectrum. It can be written approximately as two different power laws.
where ti is the initial time, taken just after inflation. The function D(a(t)) is called the growth
function. There are various possible conventions for T and D but the key point is that the time
and k dependence factorizes.
It follows that the power spectrum evolves as
The transfer function thus crucially depends on their comoving k compared to the wave number
of the mode that entered the Hubble horizon at the time of matter-radiation equality. Modes
larger than this (k < keq ) enter the horizon in the matter dominated era and modes smaller than
this (k > keq ) will have entered during radiation domination. The form of the transfer functions
comes from the fact that radiation domination stops (or more precisely slows to logarithmic)
growth of perturbations as we will discuss later.
If we start with the power-law spectrum P ∼ k n , then it subsequently evolves to
(
kn for k < keq
P (k) =
k n−4 for k > keq
with the turnover near ak ≈ akeq ∼ 0.01 Mpc−1 . As we have discussed, for a Harrison-Zeldovich
power spectrum we have n = 1, while in we measure n = 0.96. The linear matter power spectrum,
scaled to today, is shown in Fig. 11.
53
Figure 11. The linear matter power spectrum scaled to z = 0 from Planck 2018 CMB data and various
galaxy surveys. We see the power spectrum turnover at keq .
• CAMB. https://camb.info/. CAMB, written in Fortran, has been the community stan-
dard for a long time. It comes with a nice python wrapper and documentation https:
//camb.readthedocs.io/en/latest/ and a demo notebook https://camb.readthedocs.
io/en/latest/CAMBdemo.html.
Both of these codes can for example generate the black theory curve in Fig. 11. In general these
codes are only correct in the linear regime (CMB, LSS for k ≲ 0.1 Mpc−1 at z = 0). However
they have some extensions to calculate power spectra in the non-linear regime. These are based
on results from non-linear perturbation theory or N-body simulations.
54
9.5 Random scalar fields in discrete coordinates
While analytic work is usually done with continuous distributions, numerical work usually uses
a discrete data representation. For example, the 3d matter distribution can be represented as a
box of 3d pixels. Such a finite box also has a finite set of discrete Fourier modes. We work in 3d
but adapting to 2d is straight forward.
1 X
f (x) = f (k)eiki ·x (9.59)
Vbox
ki
d3 k
Z
= 3
f (k)eiki ·x (9.60)
Vk (2π)
X
f (k) = Vpix f (xi )e−ik·xi (9.61)
xi
Z
= d3 x f (x) e−ikx (9.62)
Vbox
The larger K the more high frequency modes we can resolve. The lowest Fourier mode which
covers one side length with one whole mode is called the fundamental mode
2π
kf = (9.64)
L
The total set of Fourier modes is
where (nx , ny , nz ) is a set of whole numbers that runs from −K/2 to K/2. The finite number of
Fourier modes leads to cosmic variance as we will discuss further shorty.
The power spectrum is given by
55
• In our conventions, for a dimensionless f (xi ) the Fourier modes have again dimension
[length]3 . The power spectrum also has dimension [length]3 . The Kronecker delta is dimen-
sionless.
• For discrete modes the reality condition reads again f−k = fk∗ .
• If your data is not periodic there will be “spurious transfer of power” (aliasing) in your FT.
We’ll address this when we talk about experimental masks.
• The highest frequency that we can resolve is the Nyquist frequency of the grid given by
K Kπ π
kN y = kf = = (9.67)
2 L H
9.5.2 Gaussian random field in discrete coordinates
For a Gaussian random field, of course our discrete Fourier modes are drawn from a Gaussian
distribution. Since they are complex numbers, let’s understand precisely what that means. This
will also suggest how we can generate such a field in code.
We split modes into their real and imaginary parts as f (k) = a(k) + ib(k). The reality of
f requires f−k = fk∗ and hence the real and imaginary parts of fk must satisfy the constraints
a−k = ak and b−k = −bk . For a homogeneous and isotropic Gaussian processes these modes are
drawn from:
⟨fk fk′ ⟩ = ⟨ak ak′ ⟩ + i(⟨ak bk′ ⟩ + ⟨ak′ bk ⟩) − ⟨bk bk′ ⟩ = σk2 δk,−k′ ,
where we have taken into account that a−k = ak and b−k = −bk and that the two random
variables a and b are uncorrelated. One can also change variables from a, b to polar coordinates
r, ϕ and find that the PDF of r is a Rayleigh distribution and the PDF of the phase is constant:
2r − r2 1
p(r) = e σ p(ϕ) = . (9.70)
σ2 2π
Comparing with the above we have σk2 = Vbox Pf (k).
56
Figure 12. Discrete Fourier grid and discrete modes (blue points) contributing to a wavenumber bin
(blue shaded region) centered around k.
where Nk is the number of cells in the k-bin. The modes are illustrated in Fig.12.
This estimate assumes statistical isotropy (i.e. the power spectrum depends only on the
magnitude of the wave vector. It is easy to see that this estimator is unbiased:
1 X
⟨P̂ ⟩ = ⟨f (ki )f ∗ (ki )⟩ (9.71)
Nk Vbox −
ki ∈[k ,k+)
1 X
= Vbox Pf (ki ) (9.72)
Nk Vbox
ki ∈[k− ,k+)
= Pf (k) (9.73)
where we have used Eq.(9.66). In the last step, for a finite bin width, in principle we should
average the theory power spectrum over the same modes, but in practice for narrow bins this is
not neccessary (i.e. all modes in the bin have almost the same theoretical power spectrum). Power
spectrum estimation is no longer as easy when statistical isotropy is broken by the experiment
(i.e. we only observe a part of the sky) but we will deal with this difficulty in later chapters.
We note that here we estimate the power spectrum, which is defined as an expectation value
over the PDF of the random field (as in Eq.(9.8) but in Fourier space) from a single universe.
This is possible because the modes are independent so they are all drawn from the same PDF,
whether in the same universe or in different universes.
57
It is also useful to calculate the number of modes in a power spectrum bin analytically (rather
than numerically from the FT grid). The number of modes is given by
To make progress, we need an important theorem for Gaussian fields called Wick’s theorem
(which also appears in QFT). Wick’s theorem states that the higher order correlation functions
of a Gaussian random field of mean zero can be expressed as certain products of the two point
function. This implies that the 3-point function ⟨f (k1 )f (k2 )f (k3 )⟩ of such a field must vanish,
as do all odd N-point function. On the other hand for a 4-point functions, as we have in our
calculation, Wick’s theorem states:
A general discussion of Wick’s theorem can be found in some cosmology text books. Using this
relation in our calculation we get
1 X 2 X
⟨P̂ 2 (k)⟩ − ⟨P̂ (k)⟩2 = P (ki )P (kj ) + P 2 (ki ) − P 2 (k) (9.79)
Nk2 Nk2
ki ,kj ∈[k±] ki ∈[k±]
2 2
= P (k) (9.80)
Nk
Thus the relative error on the power spectrum is given by:
r
∆P 2
= (9.81)
P Nk
The factor of 2 comes from the modes being complex (thus in the k-sphere we double counted) and
√
the N may be familiar from the error bar in a histrogram. From Eq.(9.74) we see that the error
−(1/2)
scales with the box volume as Vbox . So we need four times the cosmological volume to reduce
58
the error bar by a factor of 2. Remember that this calculation is only correct in the Gaussian
case. For the smaller scale non-linear power spectrum the variance also gets a contribution from
the so-called connected 4-point function which cannot be reduced to 2-point functions by Wick’s
theorem.
The error that results from a finite number of Fourier modes in a given cosmological volume
is called cosmic variance. Since the observable universe is also finite in size, we will never be
able to measure the power spectrum on large scales precisely. This is reflected in the error bars
in Fig. 11.
The last equation means that they are mutually uncorrelated. This is a good approximation in
many situations. If we now calculate the expectation value of the observed power spectrum we
get
1 X
⟨P̂ obs ⟩ = ⟨δgobs (ki )δgobs∗ (ki )⟩ (9.86)
Nk Vbox
ki ∈[k− ,k+)
That means that to measure the true power spectrum Pg , we need to subtract the noise power
spectrum from the observed spectrum. We can also calculate the variance
2
V [P̂ obs (k)] = (Pg (k) + N (k))2 (9.88)
Nk
In the case of a galaxy survey, the noise is to good approximation given by the comoving number
density n̄g as
1
Ng = . (9.89)
n̄g
This is called Poisson noise or shot noise. If P is much larger than N then the error is
cosmic variance domianted while if N is larger than P the error is noise dominated. For a
CMB experiment the noise power spectrum depends on the angular resolution and sensitivity of
the CMB detector. For both CMB and galaxy surveys, on large scales the error is always cosmic
variance dominated.
59
10 Basics of Statistics
We now discuss how to measure and forecast cosmological parameters. Above we have learned
how to
• calculate a theoretical power spectrum, such as the power spectrum Pm of the matter
density, using for example CAMB
• estimate the observed power spectrum P̂ obs from the data, for example the matter distri-
obs with the estimator Eq. (9.86).
bution δm
To measure cosmological parameters Λ from the power spectrum, schematically one adjusts
the parameters Λ so that Pm matches P̂ obs up to noise. While we are primarily considering the
power spectrum here, the same methodology can be used for other summary statistics such as
the 3-point function. In this section we discuss how this parameter fitting works in detail, and
also how we can forecast parameter sensitivity without having taken any data. Let’s start with
reviewing some concepts from statistics.
10.1 Estimators
While we have already used the concept, let’s define what an estimator is. If a random variable
x is characterized by a PDF p(x|λ) dependent on a parameter λ, then an estimator for λ is a
function E(x) used to infer the value of the parameter. If a given dataset {xobs } is drawn from
the distribution p(x, λ), then λ̂ = E(xobs ) is the estimate of the parameter λ from the given
observations. We often use a “hat” over the variable to indicate an estimator. Since E is a
function of a random variable, it is itself a random variable. A random variable obtained as a
function of another set of random variables is often referred to as a statistic.
An estimator for a parameter λ is unbiased if its average value is equal to the true value of
the parameter:
⟨λ̂⟩ = λ. (10.1)
We want our estimator to be unbiased. However, biased estimators can also be useful, since it
can be possible to “unbias” them.
After unbiasedness, the second key property of an estimator is its expected error or variance.
The variance is given by
We try to find an estimator that is unbiased and that has as small an error as possible. One can
often show which estimator will have the smallest possible error bar. Such an estimator is called
an optimal estimator. We already saw an optimal estimator, the one for the power spectrum
in Eq. (9.86), although we did not prove optimality.
60
If we have several estimators, we are also interested in their covariance
From the covariance, one can also calculate their cross-correlation (which is between −1 and
1) as
Cov[λ̂i , λ̂j ]
Corr[λ̂i , λ̂j ] = q (10.7)
Cov[λ̂i , λ̂i ]Cov[λ̂j , λ̂j ]
which tells us whether the estimators are correlated, anti-correlated or uncorrelated (Corr = 0).
L(d|λ, M ) (10.8)
where the line | is read as “given”. It is often possible to write down the likelihood function
analytically. The likelihood does not tell us what model and model parameters are likely given
the data (rather it answers the opposite question). It is the posterior probability
that measures parameters for us. From now on I will drop the label M , since a likelihood and a
posterior are always only defined assuming some model (e.g. Lambda-CDM), and are different
if you assume a different model. To connect the posterior and the Likelihood we need Bayes
theorem:
L(d|λ)P(λ)
P(λ|d) = (10.10)
P(d)
• The prior P(λ) of the parameters in model M before the data is analyzed. The choice of
the prior can be somewhat tricky but often flat or Gaussian works.
• The evidence P(d) which is the probability of seeing the data d under any parameters λ
of the model. The evidence can also be written as
Z
P(d) = L(d|λ)P(λ)dλ (10.11)
The evidence can be difficult to calculate numerically because it is often a huge multidi-
mensional integral. However in many cases we do not need to evaluate it, since it only
depends on the data and thus does not change our measurement of model parameter λ.
The evidence is however useful for model selection as we will discuss later.
61
10.3 Gaussian Likelihoods
Both the likelihood and the posterior are of course probability distributions. It turns out that a
Gaussian likelihood is often a good approximation of the data (while the posterior is often not
Gaussian). Consider first the simple case of a single Gaussian random variable. Imagine you
want to measure a persons’s weight w (following Dodelson’s example). To get an error bar, you
will measure the weight m times. In each measurement, you get data di which is given by the
true value plus some (Gaussian) noise: di = w + ni . If our measurements are independent, then
the likelihood is Pm 2
m 1 i=1 (di − w)
L({di }i=1 |w, σw ) = 2 )m/2
exp − 2
(10.12)
(2πσw 2σw
which is the product of the likelihoods of the individual measurements. The parameters of our
model here are the true weight w and the variance of the data σ 2 . To find the maximum likelihood
estimator for our parameters, in this simple case we can do the maximization analytically. We are
assuming a flat prior here, so that the prior does not change our estimate. Taking the derivative
we get
m Pm 2
∂L 1 (dj − w) exp − i=1 (di − w)
X
= 2 2 )m/2 2
(10.13)
∂w σw (2πσw 2σw
j=1
2 estimator from
which one can guess of course. In the same way we can calculate the σw
Pm 2
−m i=1 (di − w)
∂L
2
=L× 2
+ 4
(10.16)
∂σw 2σw 2σw
Setting this to zero gives
m
1 X
σˆw
2 = (di − w)2 . (10.17)
m
i=1
which is the well-known estimator of the variance. Taking into account that w is also estimated
from the same data one gets m → m − 1.
We can also calculate the variance (error) of our two estimators using Eq.(10.4). The answer
is
σ2
Var[ŵ] = w (10.18)
m
and
2 2 4
Var[σˆw
2] = (σ ) (10.19)
m w
These calculations are written out in Dodelson’s textbook.
62
Often we are interested in measuring only a subset of the parameters, while other parameters
are considered nuisance parameters. In the weight example, we may be interested in measuring
w but do not have knowledge of σw . Then, given the full posterior P (w, σw |di ), we can calculate
the the marginalized posterior
Z ∞
P (w|di ) = dσw P (w, σw |di ). (10.20)
0
Here we have dropped the determinant term of the Gaussian, assuming that the covariance matrix
does not depend on cosmological parameters (a common assumption) and included off-diagonal
terms between different modes of the power spectrum (which are needed in the non-linear regime
and when including masks). The covariance matrix (defined in Eq.(10.5)) in a real experiment
can usually not be calculated analytically but is estimated using simulations. We’ll talk more
about this in the large-scale structure unit. We also note that, while the covariance matrix is in
principle parameter dependent, it is usually better to evaluate it at fixed fiducial values. This
has to do with the power spectrum likelihood not being exactly Gaussian, see arxiv:1204.4724
for details. For this reason we have also dropped the determinant term of the Gaussian.
δ(ki ) (Cov(λ))−1
X
†
−2 ln L({δm (k)}|λ) = ki ,kj δ (kj ) + log |Cov(λ)| (10.22)
ki ,kj
where the covariance is given in terms of the power spectrum P theo (k, p) and diagonal in the
homogeneous case. We have already discussed this PDF in Sec. 9.5.2 in a different notation.
The dagger † is required because the Fourier modes are complex. In this equation we have kept
a parameter dependent covariance matrix, and thus the determinant term of the Gaussian. The
determinant here is required if we want to make the PDF dependent on cosmological parameters.
63
approximation. If we can assume Gaussianity then our task in specifying the likelihood is vastly
simplified because we “only” need to determine the right covariance matrix (usually from a set
of simulations).
However we should stress that likelihoods are not always Gaussian or nearly Gaussian. For
small number statistics, Poissonian likelihoods are also common. If the likelihood is more compli-
cated, we need a different approach. Fortunately, if we have enough simulations, we can instead
learn L(d|p) as a free function from simulations, using machine learning. This approach is called
likelihood-free inference (LFI) (meaning that we must learn the likelihood). We will get back
to these methods in Sec. 28. Of course learning a free PDF is much more difficult than determin-
ing just a covariance matrix. In my impression, LFI with ∼ 10 variables (such as the cosmological
parameters) often works, but it gets difficult in much higher dimension.
∂ ln L(d|λ)
=0 (10.23)
∂λ λ=λ̂
for λ̂. Sometimes this can be done analytically. In general the MLE does not have to be the
optimal estimator though under common assumptions it is. Further, from the posterior one can
define an estimator called the maximum aposteriori estimator (MAP) given by solving
∂ ln P(λ|d)
=0 (10.24)
∂λ λ=λ̂
for λ̂. If the prior is flat, which is sometimes a good and sometimes a bad choice, the MLE and
the MAP are the same. I may add more details on estimation theory later. Using estimators
such as MLE under some model is typical for the frequentist approach to statistics. In this
approach, the error bar is set by calculating the covariance of the estimator analytically, or if
that is not possible, by estimating it from simulations (Monte Carlo).
On the other hand, the Bayesian approach considers the complete posterior density. A
Bayesian would often sample from the posterior using MCMC, and summarize the posterior
by quantities such as the posterior mean (which is not the same as the MAP). Power spectrum
analysis is usually done in a Bayesian way using MCMC.
Sometimes the difference between frequentist and Bayesian statistic is also presented in terms
of the use of priors and of updating beliefs with new data. In my opinion there is no need to
be either a frequentist or a Bayesian and you can consistently use concepts from both sides. A
common complaint about frequentist analysis in cosmology is that we have only one universe and
cannot repeat the experiment. However one can still run simulations of different initial conditions
or analytically integrate over initial conditions. As long as you correctly interpret your math (e.g.
you do not claim that a 3-sigma frequentist excess of some estimator is equivalent to a 3 sigma
detection of your favorite new physics model) you won’t have inconsistencies. The full Bayesian
method formally answers the interesting physical questions most directly, but it is not always
computationally tractable and not always needed.
64
10.5 Fisher forecasting
In many situations we want to know the error on our parameters that an experiment can achieve
before having taken any data. Theory papers need to estimate whether their effect is observable,
and experiments need to be designed to meet specified sensitivity goals. These forecasts are
commonly made using the Fisher forecasting formalism (a different option is running MCMC on
synthetic data). We first discuss Fisher forecasting for Gaussian likelihoods, but the formalism
also generalizes to other likelihoods.
If a given observed variable Oa is characterized by Gaussian distributed errors, then its like-
lihood is
2
L ∝ eχ /2 , (10.25)
where the χ2 statistic is defined as:
h i2
X Oa (λ) − Ôa (λ)
χ2 = , (10.26)
a
Var [Oa ]
where Ôa are the measured values of our observable, for example the power spectrum bins
P̂ (kα ). To find the best fit parameters λ̂ we minimize χ2 (which is equivalent to maximizing the
likelihood). We assume here that the variance is not parameter dependent and thus we don’t
need the determinant term in the likelihood.
If we first work in the 1-dimensional case with only one variable λ we can expand the χ2
around its minimum
1 ∂ 2 χ2
χ2 (λ) = χ2 (λ̄) + (λ − λ̄)2 . (10.27)
2 ∂λ2 λ=λ̄
The linear term vanishes at the minimum. The quadratic term describes the local curvature
of the likelihood. It tells us how narrow or wide the minimum is, and thus what it’s error bar
is. If we define
1 ∂ 2 χ2
F≡ , (10.28)
2 ∂λ2 λ=λ̄
√
then we can estimate the minimum possible error on λ as 1/ F . Note that the Fisher ma-
trix depends on where we have assumed the minimum to be, i.e. it depends on the fiducial
parameters λ̄ of our forecast.
If we compute F explicitly we get
" #
∂Oa 2 ∂2O
X 1 a
Fλλ = + Oa − Ôa 2
. (10.29)
α
Var [Oa ] ∂λ ∂λ
To forecast F we will not have observed data.D Rather Ewe should be taking the expectation
value, which simplifies our expression because Oa − Ôa = 0 at the minimum (because the
measurements will fluctuate around the truth). Thus
Fλλ = ⟨Fλλ ⟩ (10.30)
" 2 #
X 1 ∂Oa
= (10.31)
α
Var [Oa ] ∂λ
65
This quantity is called the Fisher Information F . For several variables, this generalizes to the
Fisher information matrix:
From the Fisher matrix one can obtain two different errors. If we have several parameters and
we assume all parameters except λ are known then
1
σλ = √ unmarginalized (10.35)
Fλλ
More commonly, we want to know the error on λ if all other parameters are marginalized over.
This is obtained by inverting the Fisher matrix as follows
p
σλ = (F −1 )λλ marginalized (10.36)
Often the marginalized errors are significantly larger than the unmarginalized ones. An illustra-
tion of this in the 2-parameter case is shown in Fig. 13.
For maximum likelihood estimators and large enough data sets the Rao Cramer bound is sat-
urated, which is why we wrote an equal sign in the previous section. Some details about the
Rao-Cramer bound can be found in Appendix A of 1001.4707. In cosmology we usually assume
that the Rao-Cramer bound is saturated in our forecasts.
66
Figure 13. Marginalized and unmarginalized error on parameter λ1 in a 2 parameter Fisher matrix.
(for example from a different measurements), we add a term to the corresponding diagonal Fisher
matrix element
1
Fλλ → Fλλ + 2 . (10.40)
σλ
Sometimes we want to marginalize over a subset of the parameters only. This can be done as
follows:
• invert F
• remove the rows and columns of parameters we want to marginalize over, to arrive at a
smaller matrix which we call G−1
67
where , i is the partial derivative with respect to λi . Here I have used matrix notation rather than
index notation (sum over i, j). This form of the Fisher matrix appears very often in cosmology
papers. A derivation of this result can be found for example in arxiv:0906.0664. If our likelihood
is a Gaussian random field with mean zero, the first term is zero.
We can also make the famous corner plots that are often shown in cosmology papers (see for
example the Planck results in 1807.06209 Fig. 5). A key property of MCMC is that it scales
approximately linearly with the number of parameters, so we can do quite high-dimensional
problems. MCMC methods do not require the target distribution (often the posterior distribution
in Bayesian inference) to be normalized. However, they do require the ability to evaluate the
unnormalized version of the target distribution up to a constant factor.
68
How does MCMC sampling work? There is a really nice discussion in Dodelson-Schmidt
Sec. 14.6 which I will briefly summarize. A Markov Chain is an algorithm where we draw a
new sample λ′ from λ, but without considering earlier samples. The algorithm is completely
described by the conditional probability K(λ′ |λ) that takes us from a sample λ to the next one,
λ′ . The fundamental requirement on K, in order for the MCMC sampler to sample from the
right posterior, is called detailed balance:
This means that the rate for the forward reaction λ → λ′ is the same as for the reverse reaction
λ′ → λ, which means we have reached an equilibrium distribution. If we start with a distribution
of λ that follows P (λ), then an algorithm that obeys detailed balance will stay in this distribution.
Further, if you start with an arbitrary sample λinit , after drawing sufficiently many samples, the
algorithm will end up in distribution and have forgotten about its starting point (in the same way
as we can reach thermodynamic equilibrium from any initial conditions if we wait long enough).
This is called the burn in phase of MCMC. MCMC is closely connected to thermodynamics,
where an equilibrium distribution loses its memory of the initial conditions.
There are different choices for K(λ′ |λ) that obey detailed balance. A common choice is the
Metropolis Hastings algorithm. In this algorithm, we draw the next parameter sample from
a Gaussian, symmetric around the current parameter sample. This sample is then accepted
with a probability given by
P (λ′ )
′
pacc (λ , λ) = min ,1 (10.46)
P (λ)
If the new sample is not accepted, we repeat the previous step in the chain. You can check that
this procedure obeys detailed balance. The free parameter here is the width of the Gaussian from
which the next parameter is draw. If it is too small, the sampler will take a long time to map out
the PDF and may get stuck in local minima. If it is large, the sampler will have a low acceptance
rate since most samples will be very unlikely. Many algorithms adjust this value dynamically. A
good acceptance rate is about 1/3.
The most popular sampler (currently) in cosmology is called emcee, which implements an
algorithm called “Affine Invariant Markov Chain Monte Carlo (MCMC) Ensemble sampler”.
Emcee and other popular algorithms use several so called walkers which sample from the PDF
in parallel. In practice it is usually not important that you understand your MCMC algorithm
at a fundamental level, but it is critical that you use it correctly:
• We need to discard samples from the burn-in phase. One can often clearly see the burn-in
phase in the chain plots coming from the sampler (see Sec. 11 for an example).
• MCMC Samples are not statistically independent. It takes a while until the “memory” of a
sample is forgotten. This is called the auto-correlation length. One can pick one sample
per auto-correlation length for analysis. This is called thinning of the chain. Samplers
usually come with some estimator of the auto-correlation length.
69
• Sometimes one can misjudge the convergence and auto-correlation length of an MCMC
chain. Chains may be slowly drifting or even oscillating, without being noticeable at the
chain length we probed. There is no absolutely guaranteed method to avoid such problems.
• The many Monte Carlo walkers (typically 20 or more) should give statistically equivalent
samples. Comparing the different chains and their “mixing” helps judging the convergence
of the MCMC, for example using the Gelman-Rubin statistic.
A typical length of an MCMC chain could be 100.000 samples. Roughly speaking, for an auto-
correlation length of 100 samples this would give us 1000 independent samples (see the emcee
documentation for best practices of auto-correlation analysis and thinning). We’ll see example
MCMC results in Sec. 11.
A common question is what prior we should use. Common choices are
• Flat priors in some window (constant probability per dλ). This is the most common case.
• Priors that are flat in the log of λ in some window (constant probability per d ln λ). This
is useful if we are unsure even about the order of magnitude of the parameter.
• Priors that are Gaussian, particularly coming from a previous independent measurement.
Note first that if the data is very informative, then the likelihood will completely dominate the
posterior and the prior becomes irrelevant (as long as it is nonzero at the maximum of the
likelihood). Conversely if the data is weak, the choice of prior changes the result substantially.
In that case no strong measurement can be made. The main reasons to put an informative prior
are
• If we have a strong and trustworthy measurement for a parameter from a different uncor-
related experiment and we want to include that information (usually as a Gaussian prior).
• If we have a physical theory that gives a reliable prior, such as that Ωm cannot be negative
or that the primordial curvature perturbations are Gaussian.
70
A different approach that is starting to be used in astrophysics is variational inference.
In variational inference, one fits a simpler variational distribution to approximate the true
posterior. This is useful in cases where it would be too expensive to sample from the true
posterior. However we still need to be able to evaluate the unnormalized posterior at some points
to fit the variational distribution.
1
P (Y ) = Y n/2−1 e−Y /2 (10.48)
2n/2 Γ(n/2)
The χ2 distribution has the following properties:
• when n ≫ 1, the chi-squared distribution starts to look like the Gaussian distribution, with
mean n and variance 2n.
For example, the power spectrum estimator uses the sum of squares of Gaussian modes δ(k) and
thus is χ2 distributed (and approximately Gaussian for enough modes).
The χ2 distribution arises as the sum of squares of the residuals d−dmodel in a least squares
model fitting:
χ2kdof = [d − dmodel (λ)]T C −1 [d − dmodel (λ)], (10.49)
This is also the form of a Gaussian likelihood with parameter independent data covariance C
(such as in our power spectrum likelihood).
If our model is a good fit to the data we should have
where
kdof = Ndata points − Nfitted parameters (10.51)
As a consistency check, if we have as many model parameters as data parameters we should get a
perfect fit without residuals. We can use the properties of the χ2 distribution such as the variance
71
or the P-value to quantify whether the fit is good. So for a good fit we would have
p
χ2 ≈ kdof ± 2kdof (10.53)
If the fit is good, this implies for example that 68% of the data points are withing the 1σ error.
Otherwise we see how many sigmas we are away from a good fit.
If the χ2 is higher than expected it can mean either
It can also happen that χ2 is smaller than expected if we overestimated our data error.
One sometimes also defines the reduced χ2 :
χ2
χ2red = (10.54)
Ndata points
In the common case that Ndata points ≫ Nfitted parameters the reduced χ2 should be around 1.
Let’s assume that A is a subset of B, for example A is ΛCDM and B is ΛCMB extended with
a free equation of state parameter for dark energy (so that w = −1 in A but free in B). Of
course the fit must be better or equal in model B. If the ∆χ2 is large (negative), then model
B is a much better fit than model A. According to Wilk’s theorem ∆χ2 can be quantified by
a χ2 distribution with degrees of freedom kdof,B − kdof,A (which would be 1 in the dark energy
example).
There are also model comparison tests for cases in which B is NOT a subset of A, in particular
the Bayesian Information Criterium (BIC) and the Akaike Information Criterion
(AIC) (see e.g. Huterer’s book).
The most consistent, but computationally challenging, way to compare models is using the
Bayesian approach. Here we calculate the Bayes Factor
P (d|A)
BAB = (10.56)
P (d|B)
from the evidence ratios of the two models. If we have priors on the models, we get the posterior
odds
P (A|d) P (A)
= BAB (10.57)
P (B|d) P (B)
72
The Bayes factor is difficult to evaluate since we need to integrate over the entire model parameter
space (Eq.(10.11)). According to the Jeffreys’ scale, for equal prior models, B > 3 is considered
weak evidence, B > 12 is considered moderate evidence and B > 150 is considered strong evidence
(less than 1/150 chance probability) for one model over the other.
73
Part III
Cosmic Microwave Background
Due to its linearity, the primary CMB is the cleanest probe of cosmology we have. While the
primary CMB temperature perturbations have been mapped out almost to cosmic variance,
upcoming experiments will measure E-mode polarization in more detail, while primordial B-
mode polarization has not been detected at all and is a major science target. Secondary
CMB anisotropies, which are induced by the re-scattering of CMB photons on charges, and
by gravitational lensing, have been detected but are far form being fully exploited for cosmology
and astrophysics. In this section I will focus more on secondary anisotropies and data analysis
methods, and be brief on primary CMB physics which is interesting but mostly worked out. We
will also, for the first time in this course, discuss “real world” data analysis issues such as detector
noise and the mask, which makes even power spectrum estimation rather complicated. Finally I
will discuss the topic of foreground cleaning, which is relevant also for many other types of data.
Further reading
The general references of Unit 1 all contain a discussion of the CMB. In addition I recommend
• Anthony Challinor’s 2015 lecture notes Part III Advanced Cosmology - Physics of the
Cosmic Microwave Background.
The Ylm are familiar from quantum mechanics as the position-space representation of the
eigenstates of the angular momentum operators L̂2 = −∇2 and L̂z = −i∂ϕ (setting ℏ = 1):
74
with l an integer ≥ 0 and m an integer with |m| ≤ l.
The spherical harmonics are orthonormal over the sphere,
Z
dn̂Ylm (n̂)Yl∗′ m′ (n̂) = δll′ δmm′ (12.4)
There are various phase conventions for the Ylm ; here we adopt Ylm ∗ = (−1)m Y
l,−m so that
∗ m
flm = (−1) fl,−m for a real field.
The spherical harmonics are products of associated Legendre polynomials and an azimuthal
phase factor: s
2l + 1 (l − m)! m
Ylm (θ, ϕ) = P (cos θ) expimϕ
4π (l + m)! l
The correspondence between multipoles and angles is l ∼ π/Θ where Θ is in radians.
where µ = n̂ · n̂′ = cos θ and we used the addition theorem for spherical harmonics,
X
∗ 2l + 1
Ylm (n̂)Ylm (n̂′ ) = Pl (n̂ · n̂′ ). (12.11)
m
4π
We see that the 2-point function depends only on the angle, as we require from isotropy. The
inverse relation, going from position space to momentum space, is
Z 1
Cl = 2π d cos θ C(θ)Pl (cos θ). (12.12)
−1
75
In analogy to what we did in Eq.(9.43), we can calculate the variance of the field
X 2l + 1 Z
l(l + 1)Cl
C(0) = Cl ≈ d ln l. (12.13)
4π 2π
l
The quantity
l(l + 1)Cl
Dl = (12.14)
2π
is commonly plotted and gives the contribution to the variance per log range in l. For a scale
invariant power spectrum we have Dl = const.
76
Figure 14. Examples of the spherical Bessel function jl (x) (from Baumann’s Cosmology Lectures).
We project a 3d random field F (x) over a 2-sphere of radius r, centred on the origin, to form
the field f (n̂) = F (rn̂). Expanding F (x) in Fourier modes, we have
d3 k
Z
f (n̂) = F (k)eikrk̂·n̂ (12.15)
(2π)3
X Z d3 k
∗
= 4π il F (k)j l (kr)Ylm (k̂) Ylm (n̂) (12.16)
(2π)3
lm
Here, r = |x| and jl (kr) are the spherical Bessel functions. Extracting the spherical multipoles
of f (n̂) from above, we have
d3 k
Z
l ∗
flm = 4πi F (k)jl (kr)Ylm (k̂)
(2π)3
The spherical Bessel function peak at kr = l. This means that the observed multipoles l mainly
probe spatial structure in the 3D field F (x) with wavenumber k ≈ l/r, but higher k also con-
tribute. The Bessel functions are oscillatory and need to be evaluated precisely. Examples are
plotted in Fig.14. Evaluating such Bessel function integrals numerically in cosmology is very com-
mon and there are methods developed to speed them up, in particular the FFTlog algorithm
(e.g. 1705.05022).
77
Now we relate the power spectrum of the projected field to the power spectrum of the 3d field:
d3 k d3 k ′
Z Z
∗ l′
2 l
⟨flm fl′ m′ ⟩ =(4π) i (−i) F (k)F ∗ (k′ ) jl (kr)jl′ (k ′ r)Ylm
∗
(k̂)Yl′ m′ (k̂′ ) (12.19)
(2π)3 (2π)3
dkk 2
Z Z
′ ∗
=4πil (−i)l P F (k)jl (kr)jl ′ (kr) dk̂Ylm (k̂)Yl′ m′ (k̂) (12.20)
2π 2
dkk 2
Z
=4πδll′ δmm′ PF (k)jl2 (kr) (12.21)
2π 2
dkk 2
Z
Cl = 4π PF (k)jl2 (kr) (12.22)
2π 2
Z
= 4π d ln k ∆2 (k)jl2 (kr) (12.23)
where in the last step we defined the dimensionless power spectrum as in Eq.(9.30).
78
12.6 Flatsky coordinates
For an experiment that covers only a small part of the sky, spherical harmonics are not neccessary.
Instead, one can use flat-sky coordinates. These coordinates are defined at the tangential surface
to the sphere at some point in the sky. In flatsky coordinates, we can use an ordinary 2-d Fourier
transfrom:
d2 l
Z
f (n̂) = fl eil·x . (12.30)
(2π)2
and inverse Fourier transform Z
f (l) = d2 n̂ f (n̂) eil·x . (12.31)
One can formally relate the spherical harmonics expression to the Fourier modes by taking the
large-l limit of the Legendre polynomials (see e.g. Liddle, Lyth book Sec 10.3). The correspon-
dence between the power spectra
is simply
Cℓ = Cℓflat (12.33)
To work with the flatsky approximation numerically we need to discretize the fourier transform
in the same way as we did in the 3d field.
T (n̂) − T0 X
Θ(n̂) ≡ = aℓm Yℓm (n̂) , (13.1)
T0
ℓm
d3 k
Z
ℓ
aℓm = 4π(−i) ∆T ℓ (k) Rk Yℓm (k̂) (13.4)
(2π)3
79
Using the addition theorem we get
2
Z
CℓT T = k 2 dk PR (k) ∆2T ℓ (k) . (13.5)
π | {z } | {z }
inflation evolution,projection
2
Z
TT
Cℓ = k 2 dk PR (k) jℓ2 (k[τ0 − τrec ]) . (13.7)
9π
This is sometimes called the “snapshot approximation” or “instantaneous recombination approx-
imation”.
On smaller scales, we need to take into account that recombination does not happen instanta-
neously, but rather in a finite time window (with a comoving width of about 10 Mpc). Further,
some CMB perturbations on large scales are also sourced at later times in the universe (in partic-
ular during reionization). To take these effects into account in the transfer functions, one needs
to do a line-of-sight integral over a “source term” S(k, τ ) as follows
Z τ0
∆T ℓ (k) = dτ S(k, τ )jℓ (kτ ) . (13.8)
0
The source term S(k, τ ) comes from solving the Boltzmann equation, and the Bessel function is
again due to the spherical projection. CAMB and CLASS are calculating these transfer functions
for us. The full details are explained in one of the famous papers of cosmology, astro-ph/9603033,
which proposed the method.
• On the largest scales (Region I), modes re-enter the horizon after recombination and thus
they do not evolve. This gives an approximately flat power spectrum in Dl . See Sec. 6.6.2
for a discussion of horizon exit and re-entry.
• Intermediate regions (Region II) are dominated by the baryon acoustic oscillations
(BAO). The BAO are oscillations in the primordial plasma of photons and electrons.
• On smaller scales (Region III) the primary perturbations are getting exponentially sup-
pressed due to diffusive damping (also called Silk damping). As the photons move
from over-dense to under-dense regions, they effectively smooth out the fluctuations in
the photon-baryon fluid on their typical scattering length scale. This leads to a suppression
80
Figure 15. The CMB temperature power spectrum (plot from Baumann’s Cosmology Lectures).
of the anisotropy in the CMB at small scales (large multipole moments) On these small
scales, secondary anisotropies due to lensing and kSZ start to dominate. We will discuss
secondary anisotropies in Sec. 17.
On all of these scales, the anisotropies are generated by three different effects, which appear
as terms in the transfer function:
• The Sachs-Wolfe (SW) effect is the largest contribution. It combines the temperature
inhomogeneity in the primordial plasma (due to the density perturbations) with the redshift
of the photons (which have to “climb out of their potential well” when they are in an
overdense region). Due to the redshift it turns out that colder CMB spots correspond to
higher density regions.
• The Doppler effect is the change in photon energy due to scattering off moving electrons.
• The Integrated Sachs-Wolfe effect (ISW) describes the additional gravitational redshift
due to the evolution of the metric potentials along the line-of-sight. This effect occurs during
radiation domination (early ISW) and during dark energy domination (late ISW). The late
ISW adds power at very low l < 10.
For a plot of the different contributions see for example Baumann’s book Fig. 7.7. As was the
case for the matter power spectrum, the CMB power spectrum is very sensitive to cosmological
parameters. For a nice illustration see Plate 4 in the review astro-ph/0110414v1.
81
• The experiment has a finite resolution, which is decribed by the beam.
• There are sources of noise (from the detector and e.g. the atmosphere).
• There is a finite region of the sky that is observed (described by the mask), which breaks
statistical isotropy.
• There are foregrounds such as galactic dust and synchrotron radiation, which obscure the
true CMB signal.
To infer cosmological parameters, we need to compare the theory Cltheo prediction (which
depends on cosmological parameters in a known way determined by the laws of physics) with the
data Clobs . There are in principle two directions how to approach this problem.
• In backward modelling, one first tries to remove the experimental effects from the data,
to arrive at a reconstruction of the signal had there been no experimental effects. This
reconstructed signal is then compared with the theory. The advantage of this approach is
that one can easily compare measurements from different experiments at map level. Most
of our discussion below is of this sort.
• In forward modelling one models the experimental effects on the theory result. We would
model how the theory power spectrum changes due to the experimental effects and compare
this Cltheo,f orward to the Clobs . This approach has the advantage that it is easier to propagate
errors and one can cleanly separate theory from data.
where θ(n̂′ ) is the true CMB temperature signal, B(n̂, n̂′ ) is the beam or point-spread function
(PSF) which tells us how the detector reacts to the distribution on the sky, and n(n̂) is noise
which is uncorrelated with the signal. As we can see, the beam is a convolution in real space.
The observed aobs lm are then given by
Z
obs ∗
aℓm = dΩ Yℓm (n̂)Θobs (n̂) . (14.2)
It is often a good approximation that the beam is constant on the sky and isotropic. In this case
one gets
aobs
ℓm = Bl alm + nlm (14.4)
82
For a Gaussian beam, the Bl are given by
−l2 2
Θbeam
Bl = exp 2 (14.5)
where Θbeam is related to the width of the beam. For small l the beam is approximately 1
(lΘbeam ≪ 1) while for large l it is approximately zero (i.e. it washes out anoisotropies on these
scales). The noise can often be approximated as Gaussian, in which case it is fully determined
by the 2-point function
⟨nlm n∗l′ m′ ⟩ = Nl δll′ δmm′ (14.6)
where Nl is called the noise power spectrum. There are various forms of noises as we will
discuss. You can often download the Bl and Nl of a CMB experiment such as Planck.
If the noise power spectrum is known (from measurements and modelling of the detector), and
the noise is Gaussian, and the beam and noise are isotropic, one can show (e.g. Dodelson 14.4.1)
that the unbiased power spectrum estimator is
l
!
−2 1 X
obs
2
Ĉ(l) = Bl alm − N (l) (14.7)
2l + 1
m=−l
• Atmospheric noise, which grows larger on large angular scales, can be understood in
terms of Kolmogorov turbulence. Atmospheric noise is correlated between pixels (but nearly
uncorrelated in Fourier space, like the CMB).
• 1/f noise in the detector, which is also correlated between pixels (but nearly uncorrelated
in Fourier space). It leads to a “stripy” noise pattern that depends on the scanning strategy
of the experiment. This noise is important on large scales and falls as 1/l. It turns out
that a wide variety of detectors all lead to noise that goes up on large scales on the sky
with roughly a 1/f spectrum. This noise comes from fluctuations in the instrument and
environment over time. One approach to reduce 1/f noise is to scan angles in the sky at a
faster rate than the time dependence of the detector noise.
We’ll illustrate these in a computational notebook from the CMB data analysis summer school
linked above.
83
14.2 Simple power spectrum estimator: Transfer function and bias
The naive power spectrum estimator
1 X ∗
Ĉlnaive = a alm (14.9)
2l + 1 m lm
will, when applied to a masked field, result in a biased measurement of the true theoretical
(unmasked) power spectrum.
As a first step to improve the result, one can apodize the mask (and/or use the related tech-
nique of inpainting), which means that we smooth out the sharp boundaries of the mask. Many
possible apodizations have been proposed with different trade-offs of sensitivity loss, coupling of
adjacent modes, and ringing. The mask smoothly reduces the signal to zero on the boundary.
This also means that by apodization we make our data periodic (since it goes to zero on all
sides). Aperiodic maps generate spourious power in the Fourier transform. However, even after
apodization our power spectrum estimate remains biased.
Let’s first discuss a simple method how to obtain an unbiased measurement from the naive
power spectrum of the apodized data. This approach generalizes to a more optimal method we
will describe later. The naive Ĉlnaive are related to the true Cl by a transfer function M (which
in addition to the mask includes the beam) and a noise bias Nl as follows
• First generate a large number of simulations with known power spectrum C unbiased and no
noise.
• From the pairs of true power spectrum and measured power spectrum calculate the transfer
function Ml .
The noise bias can be computed by running noise only simulations through the naive power spec-
trum estimator and computing the average power spectrum. An example of the whole procedure
is given in the CMB summer school notebook on power spectrum estimation (on the flat sky).
A useful approximation is that the measured power spectrum is related to the true power
spectrum by the sky area fraction fsky covered by the experiment (a number between 0 and 1):
This approximation does not take into account mode coupling due to the mask, but it does take
into account the reduced survey area, and is especially useful for Fisher forecasting.
84
simplest case, the mask is a discrete function W = 1 for observed pixels, and zero otherwise.
More generally, the mask can be apodized and have smooth values between 0 and 1. The window
function can be expanded in spherical harmonics as
Z
∗
wℓm = dn̂W (n̂)Yℓm (n̂) (14.12)
Z
∗
ãℓm = dn̂Θ(n̂)W (n̂)Yℓm (n̂) (14.14a)
X Z
∗
= aℓ′ m′ dn̂Yℓ′ m′ (n̂)W (n̂)Yℓm (n̂) (14.14b)
ℓ ′ m′
X
= aℓ′ m′ Kℓmℓ′ m′ (W ), (14.14c)
ℓ′ m′
where Kℓmℓ′ m′ is the coupling kernel between different modes. The ãℓm are still Gaussian
variables, as they are the sum of Gaussian variables (the “true” aℓm that expand the true Θ).
However, the multipole coefficients of the temperature field on the partial sky are not independent
anymore, as the sky cut introduces the coupling represented by Eq. 14.14c.
By expanding the mask in spherical harmonics, the coupling kernel can be written as follows:
Z
Kℓ1 m1 ℓ2 m2 = dn̂Yℓ1 m1 (n̂)W (n̂)Yℓ∗2 m2 (n̂) (14.15a)
X Z
= wℓ3 m3 dn̂Yℓ1 m1 (n̂)Yℓ3 m3 (n̂)Yℓ∗2 m2 (n̂) (14.15b)
ℓ3 m3
1/2
X (2ℓ1 + 1)(2ℓ2 + 1)(2ℓ3 + 1)
m2
= wℓ3 m3 (−1) (14.15c)
4π
ℓ3 m3
! !
ℓ1 ℓ2 ℓ3 ℓ1 ℓ2 ℓ3
× ,
0 0 0 m1 −m2 m3
where we used the Gaunt integral
Z
ll′ l′′
gmm′ m′′ = dΩ Ylm (n̂)Yl′ m′ (n̂)Yl′′ m′′ (n̂) (14.16)
r ! !
(2l + 1)(2l′ + 1)(2l′′ + 1) l l′ l′′ l l′ l′′
= . (14.17)
4π 0 0 0 m m′ m′′
which expresses the integral over three spherical harmonics in terms of Wigner 3j symbols. The
coupling kernel is singular and therefore Eq. 14.14c cannot be inverted to compute the true aℓm .
This makes sense as a small part of the observed sky should not allow us to reconstruct the true
entire sky.
85
14.4 Pseudo-Cl estimator and PyMaster
The standard approach for CMB estimation is the Pseudo-Cl approach from astro-ph/0105302.
Pseudo-Cl are near optimal in most cases and fast to evaluate. The Pseudo-CL approach is
also modestly called the “MASTER” estimator (Monte Carlo Apodised Spherical Transform
EstimatoR).
The cut-sky coefficients can be used to define the pseudo-Cℓ power spectrum
ℓ
1 X
C̃ℓ = ãℓm ã∗ℓm . (14.18)
2ℓ + 1
m=−ℓ
From Eq. 14.22, a relation between the true power spectrum and the pseudo-power spectrum can
be derived taking the ensamble average (in the same way as in section 14.2 but now taking into
account mode coupling):
ℓ1
1 X
⟨C̃ℓ1 ⟩ = ⟨ãℓ1 m1 ã∗ℓ1 m1 ⟩ (14.19a)
2ℓ1 + 1
m1 =−ℓ1
ℓ1
1 X X X
= ⟨aℓ2 m2 a∗ℓ3 m3 ⟩Kℓ1 m1 ℓ2 m2 [W ]Kℓ∗1 m1 ℓ3 m3 [W ] (14.19b)
2ℓ1 + 1
m1 =−ℓ1 ℓ2 m2 ℓ3 m3
ℓ1 ℓ2
1 X X X
= ⟨Cℓ2 ⟩ |Kℓ1 m1 ℓ2 m2 [W ]|2 (14.19c)
2ℓ1 + 1
m1 =−ℓ1 ℓ2 m2 =−ℓ2
X
= Mℓ1 ℓ2 ⟨Cℓ2 ⟩ . (14.19d)
ℓ2
The last line in Eq. 14.19a can be obtained by expanding the kernel couplings in spherical
harmonics and making use of the orthogonality relations of the Wigner-3j symbols. The coupling
matrix Mℓ1 ℓ2 is thus given by:
!2
2ℓ2 + 1 X ℓ1 ℓ2 ℓ3
Mℓ1 ℓ2 = (2ℓ3 + 1)Wℓ3 . (14.20)
4π 0 0 0
ℓ3
which can be evaluated numerically without needing to run simulations. The unbiased power
spectrum estimator is then X
−1
Ĉℓ = Mℓℓ ′ C̃ℓ′ . (14.21)
ℓ′
If we observe a sufficiently large part of the sky, the coupling matrix Mℓℓ′ is invertible. When
we see only a smaller part of the sky, the matrix can become singular: some modes end up being
in the masked region of the sky. In such a case, it makes sense to bin the ℓ into larger bins, until
the matrix becomes invertible again.
The state-of-the art public implementation of the MASTER approach is called PyMaster (or
NaMaster when not in Python). It is documented here: 1809.09603. PyMaster can do pseudo-Cl
on fullsky and flatsky, and also includes polarization (and foreground mode deprojection which
we have not yet discussed). The pseudo-Cl formalism also extends in a straight forward way to
86
the cross-correlation of two fields (as long as we use the same mask for them). The pseudo-Cl
are then
ℓ
1 X
C̃ℓab = ãℓm b̃∗ℓm . (14.22)
2ℓ + 1
m=−ℓ
for two fields a and b (such as CMB temperature and E-mode polarization).
For a data set with N pixels, the direct inversion of a dense N ×N covariance matrix is impossible
for current CMB maps with millions of pixels. Conjugate gradient solvers are usually employed
to perform Wiener filtering of CMB data. But the computational costs are enormous for Planck
resolution and Wiener filtering a large ensemble of maps remains very difficult even with large
computing resources. An example of a small Wiener filtered CMB map is shown in Fig. 16.
Often it can be assumed that the noise covariance N is diagonal in pixel space, i.e. the noise is
assumed uncorrelated between pixels. We can represent the mask as a limiting case of anisotropic
noise, by taking the noise level to be infinity in masked pixels. (In a code implementation, we
set the corresponding entries of N −1 to zero). On the other hand, the signal covariance matrix is
diagonal in momentum space. Because the two matrices are not diagonal in any common basis
(except in the special case of a fullsky observation without mask, where the noise is also diagonal
in momentum spacce), the matrix inversion is computationally hard.
The Wiener filter is the optimal reconstruction of the signal given the noise, for a
Gaussian field with known power spectrum. That means it is the maximum a posteriori solution
of the posterior
1 1
− log P (s|d) = (s − d)T N −1 (s − d) + sT S −1 s + const. (14.24)
2 2
The Wiener filter also minimizes the mean squared error (MSE) between the true signal and the
reconstructed signal. For more discussion of this see 1905.05846.
Based on the Wiener filtererd data, one can then construct the Quadratic Maximum Like-
lihood (QML) which is the provably optimal power spectrum estimator. We refer to appendix
B of 1909.09375 for a discussion of this estimator. It involves Wiener filtering the data and then
estimating the mode coupling matrix from simulations.
87
Figure 16. Wiener filtering example (plot from 1905.05846). Left: true signal, Middle: Observed noisy
and masked data, Right: Wiener filtered data, which is a reconstruction of the underlying true map.
88
A similar project, more widely used in the large-scale structure community is CosmoSIS (COS-
MOlogical Survey Inference System) https://cosmosis.readthedocs.io/.
If you want to combine say the the Planck and DES likelihood to sample cosmological
parameters, perhaps with some extension of LambdaCDM, a practical approach is to set up
this analysis in Cobaya. You should not try to analyze e.g. the Planck data directly from
map level for a power spectrum analysis, since you’d have to redo all the hard work of the
Planck collaboration to make a reliable likelihood with correct covariance. Collaborations also
release their likelihood directly, without going through Cobaya. Sometimes these can be di-
rectly imported as Python modules. See for example the ACT CMB likelihood here: https:
//github.com/ACTCollaboration/pyactlike. We will look at an example script that uses
Cobaya.
89
outgoing
polarized wave
electron
incoming quadrupolar
anoisotropy in x-y plane
Figure 17. Generation of CMB polarization by scattering of a quadrupole anisotropy. Bold blue lines
are hotter, thin red lines are colder. Figure adapted from Dodelson-Schmidt.
I is the intensity of the light, which is proportional to the temperature. The CMB is a sum
of unpolarized light for which (U = U = V = 0) and linearly polarized light (φ = 0 and
Q ̸= 0, U ̸= 0, V = 0). Therefore a CMB experiment measures an intensity map I and a Q and U
map. Only exotic theories of the early universe can produce circular polarization (V ̸= 0). The
polarization fraction of the CMB is about 10%.
While T is a scalar and does not change under rotation, the quantities Q and U transform
under rotation by an angle ψ as a spin-2 field (Q ± iU )(n̂) → e∓2iψ (Q ± iU )(n̂). This is because
they are “headless vectors” so that a 180◦ rotation brings them back to themselves. The harmonic
analysis of Q ± iU therefore requires expansion on the sphere in terms of tensor (spin-2) spherical
harmonics X
(Q ± iU )(n̂) = a±2,ℓm ±2 Yℓm (n̂) . (15.2)
ℓ,m
While Q and U maps come naturally out of experiments, for theoretical analysis it is more
convenient to work with scalar quantities. These can be obtained as follows:
1
aE,ℓm ≡ − (a2,ℓm + a−2,ℓm ) (15.3)
2
1
aB,ℓm ≡ − (a2,ℓm − a−2,ℓm ) (15.4)
2i
which are the multipole coefficients of the scalar E-mode and B-mode fields:
X
E(n̂) = aE,ℓm Yℓm (n̂) (15.5)
ℓ,m
X
B(n̂) = aB,ℓm Yℓm (n̂) . (15.6)
ℓ,m
Pure E-mode fields are curl-free and pure B-mode fields are divergence free, in close analogy with
electrodynamics.
90
The angular power spectra are defined as before
1 X ∗
CℓXY ≡ ⟨a aY,ℓm ⟩ , X, Y = T, E, B . (15.7)
2ℓ + 1 m X,ℓm
The auto power spectra are T T , EE and BB. Some of the cross power spectra are zero. Although
E and B are both invariant under rotations, they behave differently under parity transformations.
E-modes are parity even (like temperature) an B-modes are parity odd. For this reason, in a
parity invariant early universe, the cross power spectrum T E is non-zero while T B or EB are
zero. Note however that secondary (non-primordial) anisotropies and foregrounds can generate
non-zero T B and EB-correlations.
A crucial physical insight found in the late nineties (astro-ph/9609169) is that scalar (den-
sity) perturbations create only E-modes and no B-modes. On the other hand, tensor
(gravitational wave) perturbations create both E-modes and B-modes. For this reason, current
and upcoming experiments try to detect primordial B-modes to detect gravitational waves. Note
however that foregrounds and gravitational lensing do generate B-modes, and these have to be
cleaned out in order to not confuse them with a primordial signal.
Once we have calculated the E-mode and B-mode power spectra, which are scalars, cosmolog-
ical analysis works in much the same way as for temperature T. For example, CAMB and CLASS
can calculate polarization transfer functions ∆Eℓ (k) and ∆Bℓ (k) so that the power spectrum of
EE is
2
Z
EE
Cℓ = k 2 dkPR (k)∆2Eℓ (k) (15.8)
π
and similar for T E and BB.
16 Primordial non-Gaussianity
The cosmic microwave background is an the ideal probe of primordial non-Gaussianity, i.e. of
interactions (and thus correlations) between the primordial modes. This is because of the linearity
of the CMB. In the future, it may be possible to beat the CMB constraints with large-scale
structure, but this is probably at least a decade away (with the exception of so called “local non-
Gaussianity”). Good reviews on primordial non-Gaussianity are 1001.4707 (which this section
is based on) and 1003.6097. The formalism we are discussing here also generalizes to other
bispectra (i.e. 3 point correlators), including non-primordial ones and bispectra of galaxy
surveys.
91
Here, the delta function enforces the triangle condition, that is, the constraint that the wavevec-
tors in Fourier space must close to form a triangle, k1 + k2 + k3 = 0.
A well studied model is the the local model in which contributions from ‘squeezed’ triangles
are dominant, that is, with e.g. k3 ≪ k1 , k2 . In this model, non-Gaussianity is created as follows:
where fNL is called the nonlinearity parameter. The bound on fNL from Planck is about fNL < 5.
For this model one can show that
BΦ (k1 , k2 , k3 ) = 2fNL [PΦ (k1 )PΦ (k2 ) + PΦ (k2 )PΦ (k3 ) + PΦ (k3 )PΦ (k1 )] (16.5)
A different primordial bispectrum that is often considered is the equilateral model with shape
function
(k1 + k2 − k3 )(k2 + k3 − k1 )(k3 + k1 − k2 )
S equil (k1 , k2 , k3 ) = . (16.7)
k1 k2 k3
Unlike the local model, this one peaks for equilateral triangles, so the local and equilateral models
probe different kinds of correlations.
where we have inserted the exponential integral form for the delta function in the bispectrum
definition. The last integral over the angular part of x is the Gaunt integral, while x is the radial
conformal distance. The full bispectrum Bm ℓ1 ℓ2 ℓ3 can be expressed in terms of the reduced
1 m2 m3
bispectrum bℓ1 ℓ2 ℓ3 as
ℓ1 ℓ2 ℓ3 ℓ1 ℓ2 ℓ3
Bm 1 m2 m3
= Gm b
1 m2 m3 ℓ1 ℓ2 ℓ3
. (16.13)
92
The reduced bispectrum is given by
3 Z
2
Z
bℓ1 ℓ2 ℓ3 = x dx dk1 dk2 dk3 (k1 k2 k3 )2 BΦ (k1 , k2 , k3 )
2
(16.14)
π
× ∆ℓ1 (k1 ) ∆ℓ2 (k2 ) ∆ℓ3 (k3 ) jℓ1 (k1 x) jℓ2 (k2 x) jℓ3 (k3 x) . (16.15)
which relates the primordial bispectrum, predicted by inflationary theories, to the reduced bis-
pectrum observed in the cosmic microwave sky. This formula is the equivalent of the power
spectrum relation
2
Z
Cℓ = dkk 2 PΦ (k)∆2ℓ (k). (16.16)
π
16.3 Optimal estimator for bispectra
For a fullsky observation it can be shown that the optimal estimator for fNL is
m m m fNL =1
1 X Gℓ1 ℓ12 ℓ32 3 bℓ1 ℓ2 ℓ3
fˆNL = aℓ1 m1 aℓ2 m2 aℓ3 m3 (16.17)
N Cℓ1 Cℓ2 Cℓ3
{ℓi ,mi }
2
m m m f =1
X Gℓ1 ℓ12 ℓ32 3 bℓ1NLℓ2 ℓ3
N = , (16.18)
Cℓ1 Cℓ2 Cℓ3
{ℓi ,mi }
where bℓ1 ℓ2 ℓ3 is the reduced bispectrum and Gℓm1 ℓ12mℓ32 m3 is the Gaunt integral and N is the normal-
ization factor. This estimator can be interpreted as summing up all mode triplets weighted by
their expected signal-to-noise. See 1001.4707 and 1003.6097 for two different derivations of this
result. This kind of estimator is called a cubic estimator, because it uses three copies of the
alm (the power spectrum estimator on the other hand is a quadratic estimator).
The noise and beam of the experiment can be included with the following replacements:
B is the beam and Nℓ is the noise power spectrum (constant for uncorrelated white noise). The
noise is assumed to be Gaussian (which is a very good approximation because the bispectrum, if
non-zero, is much smaller than the power spectrum). Including the effect of the mask is a little
harder and we won’t review it here. It involves adding a linear term to the cubic estimator
above. Details can be found in the same reviews.
The way how primordial bispectrum analysis is performed is that theorists have come up
with a large collection of theoretically motivated bispectrum templates bℓ1 ℓ2 ℓ3 (such as local
and equilateral), and we have run the bispectrum estimator on all of these templates (Planck
1905.05697). While no statistically significant detection has been made, many models (or at least
part of their parameter space) have been excluded in this way. Instead of running bispectrum
estimators, one can also measure the bispectrum in bins (as we do in the power spectrum), but
all measurements are consistent with zero.
93
16.4 The separability trick
I want to mention one more important aspect of bispectrum estimation, which also often occurs
in cosmology. As it is written above, the bispectrum estimator is computationally intractable, as
it is a sum over six variables l, m all of which go to about 2500 for Planck resolution. Fortunately
the estimator can be rewritten in a much better form. If the primordial shape function is
separable, i.e. it can be written in the form
In that case, using the definition of the Gaunt integral, the estimator can be rewritten as
1
Z Z
2
E(a) = dx x dΩn̂ MX (r, n̂)MY (x, n̂)MZ (x, n̂) + perms. , (16.23)
N
where
X aℓm Xℓ (x)
MX (x, n̂) ≡ Yℓm (n̂) ,
Cℓ
ℓm
X aℓm Yℓ (x)
MY (x, n̂) ≡ Yℓm (n̂) ,
Cℓ
ℓm
X aℓm Zℓ (x)
MZ (x, n̂) ≡ Yℓm (n̂) , (16.24)
Cℓ
ℓm
By a detailed examination of the operations, one finds that this reduces the computational cost
from O (ℓ5max ) to O (ℓ3max ) operations, which is can be easily calculated in practice. This re-
writing of the estimator is sometimes called a fast position space estimator (since we work
with the maps M in position space rather than Fourier space). Not all theoretical shapes are
separable. However it is often possible to expand unseparable shapes into separable shapes (see
0912.5516).
94
• smooths acoustic peaks
• introduces non-Gaussianity
• makes B-mode polarization by lensing E-modes. Thus de-lensing is important for B-mode
searches.
The lensing effect can be used to reconstruct the lensing potential, a map of the integrated
mass density of the universe on large scales. The lensing potential can be used as a probe of
cosmological parameters including neutrino masses and dark energy. An important feature of
lensing is that it probes the entire mass density (since any mass and energy gravitate in
General Relativity), while e.g. a galaxy survey only probes luminous matter. This is why lensing
is critical to probe dark matter.
My brief discussion of CMB lensing is based on the review astro-ph/0601594v4. Another good
review is 0911.0612. We will only discuss temperature, but polarization is lensed in the same
way.
where α is a deflection angle. This result follows from General Relativity. At lowest order the
deflection angle is a pure gradient, α = ∇ψ. The lensing potential is defined by
Z χ∗
(χ∗ − χ)
ψ(n̂) ≡ −2 dχ Ψ(χn̂; η0 − χ), (17.2)
0 χ∗ χ
where χ is conformal radial distance along the line of sight, χ∗ is the conformal distance to re-
combination, and η0 is the conformal time today, and Ψ is the Newtonian gravitational potential.
From this one can calculate the power spectrum of the lensing potential ClΨ . It is defined by
where
χ∗ χ∗
χ∗ − χ′ χ∗ − χ
dk
Z Z Z
Clψ = 16π dχ dχ′ PΨ (k; η0 − χ, η0 − χ′ )jl (kχ)jl (kχ′ ) .
k 0 0 χ∗ χ′ χ∗ χ
(17.4)
which in the linear regime can be calculated by a Botzmann solver like CAMB, depending on
cosmological parameters. PΨ (k; η0 − χ, η0 − χ′ ) is the power spectrum between unequal times.
It is also often useful to work with the CMB convergence given by
1
κ(n) = − ∇2 Ψ(n) (17.5)
2
95
and thus
l2
κ(l) = − Ψ(l) (17.6)
2
The convergence probes the integrated matter density between us and the CMB (since from the
Poisson equation ∇2 Ψ(n) ∝ ρ with density ρ). A visual example of the quantities involved in
CMB lensing is uploaded to the lecture files.
Going to Fourier space, one can show that to first order the lensed CMB field is given by
Z 2′
d l ′
Θ̃(l) ≈ Θ(l) − l · (l − l′ )ψ(l − l′ )Θ(l′ ) (17.10)
2π
This shows that there will now be some mode coupling between modes Θ(l) and Θ(l′ ), assum-
ing a fixed lensing potential Ψ. That means that there will be off-diagonal components in the
covariance matrix of the observed CMB.
D We couldE now also derive the powers spectrum of the
lensed CMB field C̃lΘ by calculating Θ̃(l)Θ̃∗ (l) .
96
Averaging over realizations of the unlensed temperature field Θ to first oder in the lensing
potential gives
Z 2′
d l ′
⟨Θ̃(l)Θ̃∗ (l − L)⟩Θ = δ(L) ClΘ − l · (l − l′ )ψ(l − l′ )⟨Θ(l′ )Θ∗ (l − L)⟩
2π
1 h i
= δ(L) ClΘ + Θ
(L − l) · L C|l−L| + l · L ClΘ ψ(L) (17.12)
2π
To estimate the lensing potential we thus want to sum over all quadratic combinations
Θ̃(l)Θ̃∗ (l − L) with some weighting factor g that needs to be determined:
d2 l
Z
ψ̂(L) ≡ N (L) Θ̃(l)Θ̃∗ (l − L)g(l, L), (17.13)
2π
where g(l, L) is the weighting function and N (L) is a normalization. This strategy is originally
from astro-ph/0301031 and has been re-used for many different applications.
To find the weighting function and normalization, we impose two conditions:
As was the case for the bispectrum, this estimator can be re-written as a fast position space
estimator.
This estimator (with minor modifications to take into account the mask and noise), is for
example used in the recent ACT CMB lensing analysis (2004.01139), which is also in flatsky
coordinates. The sperhical hamonics version of this estimator was used in the Planck analysis
(1807.06210). However, because lensing is a non-linear operation, the quadratic estimator is
not optimal in general. For existing experiments (Planck, ACT), it is optimal, but for Simons
Observatory it will already be slightly sub-optimal and for future very high resolution experiments
it can be very suboptimal. A completely optimal lensing analysis can be made with a field-
level likelihood, but is computationally extremely expensive. References on this topic include
1708.06753,1704.08230,astro-ph/0209489.
97
17.4 Physics with CMB lensing
Once the lensing potential is reconstructed, one can estimate its power spectrum in the usual
way and use the lensing power spectrum in a cosmological analysis to constrain parameters.
Lensing is in particular a great probe of the size of matter perturbations at later times. By
comparing the amplitude of primordial (primary) CMB perturbations with the amplitude of
late time perturbations from lensing, one can study the growth of structure in the universe.
This for example can be used to contstrain neutrino masses, as the free streaming of neutrinos
suppresses growth. The measured growth of structure at late times is currently an exciting topic in
cosmology, with evidence for a disagreement with Lambda-CDM (see eg. 2203.06142,2304.05203)
called the S8 tension or σ8 tension.
Another important application of the lensing potential is to cross-correlate it with a dif-
ferent tracer of matter, such as a galaxy survey. Such cross-power spectra can also be very
sensitive to various cosmological parameters, for example local primordial non-Gaussianity (e.g.
1710.09465).
• The thermal Sunyaev-Zeldovich (tSZ) effect is the scattering of photons on hot elec-
trons, i.e. on their thermal velocities.
These effects are by far the most important SZ effects. There are however smaller effects including
the polarized SZ effect and the rotational SZ effect. The total probability of a CMB photon
to be re-scattered between recombination and Earth is about 5%. This probability is related to
the optical depth and the visibility function. To my knowledge there is no comprehensive
review on SZ anisotropies.
In passing I want to mention that apart from lensing and electron scattering there is a class
of secondary anisotropies which come from the evolution of gravitational potentials over time.
These cause the (late time) ISW effect, the Rees-Sciama effect (also called non-linear
ISW effect) and the moving lens effect. Finally of course all these secondary effects are
combined, for example SZ anisotropies are lensed, and there can be multiple scatterings etc.
These higher order effects are not yet detectable.
98
collapse. Their comoving size is a few Mpc and their angular sizes range from about one arcminute
to about one degree (depending on size and distance). Clusters can be detected in various ways,
e.g. by galaxy surveys in the optical, by tSZ emission, or by X-ray astronomy (Bremsstrahlung
emission of the electrons on the nuclei). The temperature, measured from X-ray, is typically a
few keV.
The thermal SZ effect generated by a gas of electrons at temperature Te leads to a spectral
distortion of the CMB emission law. The difference between the distorted CMB photon
distribution Iν and the original CMB blackbody spectrum Bν (TCMB )
where Te is the electron temperature, me the electron mass, c the speed of light, ne the electron
density, and σthomson the Thomson cross section. A multi-frequency CMB detector can measure
the Compton-y map, which is caused by the tSZ effect.
where Sℓ is the spherical harmonics transform of the radial profile of the signal S(r) (e.g. the tSZ
profile), Θℓm is the CMB map, and Cℓ +Nℓ is the CMB power spectrum plus the instrumental noise
power spectrum. The output of the matched filter is a “heat map” of detection probabilities,
99
which has its maxima where a tSZ source exists. A matched filter usually comes with some
parameter to scan over, e.g. the radius of the profile.
Some details on the matched filter method can be found e.g here 2106.03718. An application
of the matched filter for a completely different problem (finding primordial particle production),
and a discussion of why it is optimal, can be found e.g. here 1910.00596 (Sec. 3B).
For tSZ sources, one often wants to understand signals at the low mass and therefore low signal
to noise end, where the matched filter may not be able to pick up the signal. With an external
catalogue of galaxy clusters, one can co-add the signals from objects in the external catalogue
to boost the signal to noise. This is called tSZ stacking. From the stack, one can then infer
parameters of cluster physics, such as the radial profile of the gas temperature. Stacking local
sources with an external catalogue to enhance SNR is also a generally important technique. For
example, it was recently used to detect a 21cm intensity signal with CHIME (2202.01242).
where ne is the electron density and vr is the radial velocity of the structure that contains the
electron (not the velocity caused by temperature). A nice application, which I developed with
my collaborators, is to use this signal to reconstruct the velocity field, by making a template
for ne using a survey of the galaxy density δg . One can then write a quadratic estimator (as in
the case of lensing, but here I chose to work in spherical harmonics) for the velocities which is
schematically
X
v̂r (L, M ) = N g(L, M, ℓ, m, ℓ′ , m′ )Θℓ,m δg (ℓ′ , m′ , z) (18.5)
ℓ,m,ℓ′ ,m′
where again we can find the weights g(L, M, ℓ, m, ℓ′ , m′ ) that deliver an unbiased minimum vari-
ance estimator. Here z is the redshift of the galaxy bin. The reconstructed velocity map has
similar cosmological applications as the lensing potential map. This method will be promising
for Simons Observatory. More details can be found in (1707.08129, 1810.13423). The quadratic
estimator is not the only way to do cosmology with the kSZ, a review of methods can be found
in 1810.13423.
100
104
total CMB
103 lensed CMB
( + 1)/(2 ) C TT [ K 2]
unlensed CMB
102 late-time kSZ
reion. kSZ
101
100
10 1
10 2
0 2000 4000 6000 8000 10000
Figure 18. The CMB power spectrum ClT T from primary CMB, gravitational lensing, late-time kSZ
(z < 6) and reionization kSZ. We have only shown contributions with blackbody frequency dependence.
Non-blackbody contributions (CIB, tSZ) can be mostly removed using multifrequency analysis. Note that
the kSZ from both late times and reionization is not known very precisely, the curves come from different
theoretical models or simulations. Plot from 1810.13423.
• Synchrotron radiation which is emitted by relativistic cosmic ray (CR) electrons, which
are accelerated by the Galactic magnetic field.
• Thermal dust radiation is blackbody emission from interstellar dust grains with typical
temperatures T ∼ 20K.
101
Figure 19. CMB foreground components in temperature (left) and polarization (right). Plot from Planck:
1502.01588, see there for more details.
• Spinning dust radiation is emitted by the smallest interstellar dust grains and molecules,
which can rotate at GHz frequencies.
All of these have different spectral characteristics which is essential for foreground cleaning. While
in temperature, the CMB is of similar amplitude as the foregrounds depending on frequency, in
polarization the foregrounds dominate at all frequencies. A plot of the various components
compared to the primary CMB is shown in Fig.19.
Ti = ai s + fi + ni (19.1)
where s is the common signal that we want to estimate (such as the CMB), fi are foregrounds
in channel i and ni is the noise in this channel. The coefficient ai is the frequency dependence or
spectral energy distribution (SED) of the signal. This is the only physical input required
for the ILC. In the case of the CMB this is the known black body spectrum. We also need to
assume that the signal is statistically independent from the noise and foreground. Note that the
signal does not have to be the CMB, it could also be e.g. the tSZ temperature.
This equation above is basis independent. In the real space ILC we work in pixel space so
that
Ti (n̂) = ai s(n̂) + fi (n̂) + ni (n̂) (19.2)
and for the harmonic space ILC we use spherical harmonics
i i
Tℓm = ai sℓm + fℓm + niℓm (19.3)
The harmonic space version is optimal if the fields are statistically isotropic, however galactic
foregrounds are not isotropic. The real space ILC on the other hand can deal with statistical
102
anisotropy but is not suited for scale-dependent behavior. Both advantages can be combined
in the wavelet basis, which is local in position space and harmonic space at the same time,
which results in the Needlet ILC (NILC). NILC is one of Planck’s four component separation
methods.
The ILC is a linear combination of the input maps
X
ŝ = wi Ti (19.4)
i
weighted with weights wi , so that ŝ is unbiased and minimum variance. This can be done
with a constrained optimization using a Lagrangian multiplier. The result is
AT C−1
w= (19.5)
AT C−1 A
where A is the vector of the SED coefficients ai . The covariance matrix is estimated from the
data, for example in harmonic space we have
1 X
i j∗
Cij = Tℓm Tℓm . (19.6)
2m + 1 m
Here T is the vector of observed frequency channels, A is the mixing matrix (describing how
each signal, such as the tSZ, projects into each frequency) and s is the vector of components
that we want to determine. In principle we simply want to invert this equation to obtain the
components s. Because of the noise and the fact that the matrix is in general non-invertible (not
even a square matrix), there is quite a range of possible solutions, reviewed in astro-ph/0702198v2,
depending on various prior assumptions that one can make. Again, there are different possible
bases to work in, such as real space and harmonic space. Solving for s(n̂) can also be done with
an optimizer or using MCMC sampling (e.g. Planck’s Commander pipeline). One can also
include external data for the various signal components, to make useful templates.
Interestingly, it is even possible to determine the components if the mixing matrix is not
known, if the components of the linear mixture can be assumed to be statistically independent.
This is possible because statistical independence is a strong mathematical property and often
103
a physically plausible one. This direction is called blind separation or independent com-
ponent analysis (ICA). ICA ideas are used in Planck’s SMICA pipeline (Spectral Matching
Independent Component Analysis).
104
Part IV
Large-Scale Structure
We now move on to 3-dimensional probes of the large-scale structure (LSS) of the universe such
as galaxy surveys. We have access to such 3-dimensional data at a much later time (redshift
z ≲ 10) than the CMB (redshift z ≃ 1100). A major complication compared to the CMB is that
matter evolves non-linearly at these later times, both due to gravitation and due to “baryonic”
physics. Further, most of the matter density δm in the universe is not directly observable. A
large part is contained in dark matter, and even most baryonic matter is contained in dilute gas
rather than luminous stars. To probe most cosmological parameters, ideally we’d like to measure
the matter power spectrum Pm (k), but we can only measure the power spectrum of tracers of
large-scale structures, such as different galaxy populations. We thus need to learn how these
tracers relate to the matter density, which can be done on large enough scales with the
bias expansion. Further, we need to take into account that in cosmology we can only measure
the redshift of galaxies, but not their absolute distance (unless they contain a standard candle).
We thus need to study the topic of red shift space distortions.
In this unit we primarily learn to analyze galaxy survey data (but other 3-dimensional probes
of the universe work almost the same). Galaxy survey data comes in two broad classes: Spec-
troscopic galaxy surveys take a spectrum of each galaxy, to obtain a precise redshift. Pho-
tometric galaxy surveys take pictures of the sky in several wavelengths, which allows a rough
determination of the redshift. The latter is much easier to do experimentally, so the galaxy
sample sizes are much larger, but on the other hand the lack of precise distances loses a lot of
information. Both surveys types have different strengths. The geometry of space (dark energy)
can best be probed with spectroscopic surveys which deliver precise BAO measurements.
Further reading
The general references of Unit 1 all contain a discussion of galaxy surveys. In addition I recom-
mend
• The classic review “Large-Scale Structure of the Universe and Cosmological Perturbation
Theory” astro-ph/0112551.
• For the connection between matter and galaxies, the galaxy bias, there is the review “Large-
Scale Galaxy Bias” 1611.09787
• Specifically on EFT of LSS there are lecture notes from Senatore, Baldauf, Ivanov, and
Philcox.
105
20 The galaxy power spectrum at linear scales
We first start with a discussion of the linear galaxy power spectrum. We will introduce galaxy
bias, shot noise and red shift space distortions, but defer a more detailed discussion to later. This
section follows Dodelson-Schmidt chapter 11. Later we will extend our discussion to non-linear
scales.
Here b1 means that this is the first order bias. The bias depends on conformal time τ or
equivalently redshift z. We will briefly discuss the derivation of this result, as well as higher
order biases, later. The bias depends sensitively on the galaxy sample considered and is in
general red-shift dependent. A typical galaxy bias for a survey like DESI could be b1 ∼ 2, i.e.
the overdensities of galaxies are twice as large as those of matter.
where n is white noise (i.e. pixels have uncorrelated noise). In terms of the power spectrum we
get
where the (shot-) noise is approximately inverse to the comoving galaxy density
1
N (k) = (20.4)
n̄g
Note in particular that shot noise is flat in k. While the n̄1g approximation is not very precise,
especially at high halo density (where halos may not form independently of eachother), the fact
that the noise is flat on large scales holds to good approximation.
106
20.3 Velocity field on large scales
We also need to know the velocity perturbations on large scales, which are the source of red shift
distortions. On linear scales the matter velocity and matter density perturbations are related by
ik
um (k, τ ) = f aH δ(k, τ ) (20.5)
k2
We will derive this result in Sec. 21.2.2. The factor f is called the linear growth rate. The
growth rate is close to unity for a ΛCDM universe and exactly 1 for a flat matter-dominated
cosmology. Notice that the velocity in Fourier space is proportional to the wavevector k.
where χtrue is the true (not measurable) comoving distance of the galaxy. We define the 3-
dimensional position of the galaxy in red shift space as
The distance χ(z) is the comoving distance at red shift z (if z was only due to the Hubble
expansion). The function χ(z) depends on cosmological parameters, and we evaluate it at some
fiducial cosmological parameters. The fact that these parameters are not exactly known is
also important, and leads to the Alcock-Paczynski effect that we will discuss below. For now
assume that the cosmological parameters are known.
In reality, galaxies do move with respect to the background frame and their redshift is given
by the Hubble flow and their peculiar velocity u as
1 1
1+z = (1 + ug · n̂) = 1 + u|| (20.8)
aem aem
where aem is the scale factor at which the light from the galaxy was emitted (the above is a
non-relativistic approximation, galaxies don’t move faster than ∼ 1% of the speed of light). The
observed position of the galaxy in red shift space xobs is thus given by a correction ∆xRSD to
the true position x of the galaxy as
107
20.5 Redshift space distortions of the density field
To measure cosmological parameters, we need to be able to calculate the observed galaxy power
spectrum with RSD included. On linear scales, this effect was derived by Kaiser in 1987 and
leads to the Kaiser red shift term. We will only summarize the calculation, see Dodelson for
more details. The starting point is the observation that, since RSD neither creates nor destroys
galaxies, the densities in red shift space and configuration space must be related by
We can write the volume element in spherical coordinates as d3 x = x2 dxdΩ and d3 xobs =
x2obs dxobs dΩ where dΩ is the same in both coordinates. Therefore the densities are related by a
Jocobian J as
with
d3 x dx x2
J≡ = (20.13)
d3 xobs dxobs x2obs
1 ∂
J ≈1− u (20.14)
aH ∂x ∥
For density perturbations δ = n̄(1 + δ) it follows (to first order in perturbations) that
1 ∂
1 + δg,obs (xobs ) = 1 + δg (x[xobs ]) − u (x[xobs ]) (20.15)
aH ∂x ∥
We now have the building blocks that we need to calculate the galaxy power spectrum. We
first note that in the above equation we can set xobs = x at lowest order in the perturbations.
This is because expanding the arguments of δg and u would lead to higher order terms that
would be small. We also use linear galaxy bias to express δg in terms of δm . We can also equal
the galaxy velocity ug to the matter velocity um . Physically this is because the velocities are
sourced by the attraction of all the matter in the universe, not just that of galaxies. With these
approximations we get
∂ um (x) · x̂
δg,RSD (x) = b1 δm (x) − (20.16)
∂x aH
Next we introduce the distant observer approximation, also called the plane parallel
approximation. The idea is to take the line of sight x̂ to agree with the z-axis and treat it as
fixed, neglecting changes from galaxy to galaxy. This is justified for galaxies that are relatively
nearby on the sky. We can then replace um (x) · x̂ → um (x) · êz . Using the distant observer
approximation we can evaluate the Fourier transform δg,RSD (k) as follows:
∂ um (x) · êz
Z
3 −ik·x
δg,RSD (k) = d x e b1 δm (x) − (20.17)
∂x aH
108
which can be evaluated, using Eq. 20.5 for the velocities, to give
where µk = êz · k̂ is the vector between the line of sight and the perturbation. This is called the
Kaiser redshift space distortion. The apparent overdensity in redshift space is thus larger
than in configuration space (except for transverse perturbations where µ = 0).
2l + 1 1
Z
(l)
Pg,RSD (k) = dµk Pl (µk )Pg,RSD (k, µk ) (20.20)
2 −1
using Legendre polynomials (as appropriate for an azimuthally symmetric function). The power
spectrum is then
(l)
X
Pg,obs (k, µk ) = Pl (µk )Pg,obs (k) (20.21)
l
One can again propagate this error through the Jacobian as we did in our derivation of RSD.
This leads to an additional anisotropy of the measured power spectrum. The derivation can be
found in Dodelson-Schmidt 11.1.3.
109
where W is normalized to unity and drops to zero outside of the interval.
The angular galaxy density in the bin is then given by
Z ∞
∆g (n̂) = dχW (χ)δg,obs (x = n̂χ, τ = τ0 − χ) (20.24)
0
is given by
∞ ∞
2
Z Z Z
Cg (l) = k 2 dk dχW (χ)jl (kχ) dχ′ W (χ′ )jl (kχ′ )Pg,obs (k, τ (χ), τ (χ′ )). (20.27)
π 0 0
Note that this includes a non-equal time power spectrum, which takes into account the different
times probed due to the light-cone. In the so-called Limber approximation this becomes
dχ 2 l + 1/2
Z
Cg (l) = W (χ)P g,obs k = , τ (χ) (20.28)
χ2 χ
This approximation avoids evaluating the Bessel function integrals and is used in many cosmology
papers. The Limber approximation is valid if the radial extent of the bin is much larger than the
scale of the angular scale of the multipole l under consideration. More about the accuracy of the
Limber approximation can be found here: 0809.5112.
110
100
σW (R)
10−1
10−2
100 101 102
R [h−1 Mpc]
Figure 20. The standard deviation of the density field, Eq. (21.3), when smoothed over different scales R,
where R is the width of the smoothing filter in position space, at redshift z = 0. The value at R = 8h−1 Mpc
is the definition of the common cosmological parameter σ8 .
where W (x) is the filtering kernel that we can take to be isotropic. This filtering corresponds to
a multiplication in Fourier space:
where W (k) is the Fourier transform of the isotropic filtering kernel, such as a real-space tophat.
The variance of the filtered field is
d3 k d3 k ′
Z Z
∗ ′
2 2
σW ≡ ⟨(δW ) (x)⟩ = 3 3
⟨δW (k)δW (k′ )⟩ei(k−k )·x (21.3)
(2π) (2π)
Z 3
d k
= PL (k)|W (k)|2 (21.4)
(2π)3
1
Z
= 2 d ln k k 3 PL (k)|W (k)|2 . (21.5)
2π
which is plotted in Fig. 20 as a function of the smoothing scale. We see when perturbations get
smaller than one, indicating that a perturbative expansion in δk is possible.
111
21.2.1 Equations of motion
To describe the universe as a fluid we need the following variables
• δ(x, τ ): Overdensity of matter related to the the density ρ(x, τ ) by δ(x, τ ) = ρ(x, τ )/ρ̄(τ )−1
• v(x, τ ): Fluid velocity. Note that in the fluid approximation we cannot describe a situation
where matter clumps of different velocity pass through each other.
• σij (x, τ ): Viscous stress tensor. σij = 0 for a perfect fluid, which we consider here, but it
becomes important in the EFTofLSS.
In the collisionless fluid approximation we consider the total matter distribution, dark matter
and baryons, together.
The equations of motion, which can be derived from the collisionless Boltzmann equation,
in the Newtonian limit, are
Here and below dots indicate derivatives with respect to conformal time.
where H = aH is the comoving Hubble parameter. The Euler equation is the equivalent to
F = ma for a fluid element. The left hand side is the “convective time derivative” and the
right hand side has a force term due to the gravitational potential and a term due to the
Hubble expansion.
One can solve these equations perturbatively on scales where these perturbations are small,
so that the pertubative expansion converges.
112
this yields the following equations for the first-order fields, δ1 , v1 :
where δL (x) is the linear density field set by inflation (and k-dependent transfer functions that
take into account mode evolution in the early universe, see Sec. 9.4.1). This drops a “decaying
mode”. The growth factor is given by the integral solution
a(τ )
da′
Z
D(τ ) = D0 H(τ ) , (21.14)
0 H3 (a′ )
d log D(τ )
f (τ ) ≡ (21.15)
d log a
We see that densities evolve according to D(τ ) while velocities are enhanced by a factor of
H(τ )f (τ ).
Switching to Fourier-space, we obtain:
and
where PL (k) is the power spectrum of the initial conditions. We have dropped a momentum-
conserving Dirac delta function (indicated by the prime in the expectation value ⟨⟩′ , as is often
done).
113
parameters δ and θ. Explicitly, we begin with the series solutions
∞
X
δ(k, τ ) = Dn (τ )δ (n) (k) (21.20)
n=1
∞
X
θ(k, τ ) = −H(τ )f (τ ) Dn (τ )θ(n) (k), (21.21)
n=1
where the n-th order solution contains n copies of the linear solution, δ (1) (k) = δL (k). We
have assumed separability in time and space which is an excellent approximation (and exact
for Einstein de-Sitter universes), though deviations can occur at high order. The n-th order
contribution takes the form:
Z
(n)
δ (k) = Fn (p1 , . . . , pn )δ (1) (p1 ) . . . δ (1) (pn )(2π)3 δD (p1 + . . . + pn − k), (21.22)
p1 ...pn
Z
θ(n) (k) = Gn (p1 , . . . , pn )δ (1) (p1 ) . . . δ (1) (pn )(2π)3 δD (p1 + . . . + pn − k). (21.23)
p1 ...pn
This is the convolution of n linear density fields with a kernel, Fn or Gn . The kernels up to
second order are given by:
5 2
F1 (p) = 1, F2 (p1 , p2 ) = α(p1 , p2 ) + β(p1 , p2 ), (21.24)
7 7
3 4
G1 (p) = 1, G2 (p1 , p2 ) = α(p1 , p2 ) + β(p1 , p2 ) (21.25)
7 7
where
p1 · k k 2 p1 · p2
α(p1 , p2 ) = ; β(p1 , p2 ) = ; k = p1 + p2 . (21.26)
p21 2p21 p22
where P (ij) (k) = ⟨δ (i) (k)δ (j) (−k)⟩, and we have assumed Gaussian initial conditions, such that
any correlator involving an odd number of linear density fields vanishes. In the same way we can
compute higher-order correlators. The next most important one is the three-point function, or
bispectrum, which at lowest order is given by
with higher-order contributions containing loop integrals over the linear power spectrum.
114
21.3 Lagrangian Perturbation theory (LPT)
There is a second important way to perform perturbation theory, in a different set of variables.
This is the Lagrangian formulation. Let’s briefly outline this approach. One can describe a fluid
in two ways:
• In the Eulerian picture described above, we describe the matter density ρ(x, t) and the
velocity field v(x, t) as a function of a fixed spatial coordinate x.
• In the Lagrangian picture, instead of working with densities, we describe the movement of
particles (or fluid elements) from their initial comoving coordinate q to their later comoving
Eulerian coordinate x by defining the displacement field Ψ(q, τ ) so that
x(τ ) = q + Ψ(q, τ ).
All coordinates are comoving, so the expansion of the Universe does not change them. Note
that Ψ = 0 initially so that q is the same as the usual comoving coordinate at initial time,
τ = 0. Once we have calculated the displacement field, using Lagrangian perturbation
theory, we can estimate the observable density field ρ(x, t) from it.
Lagrangian perturbation theory looks similar to SPT, i.e. we can calculate a series solution
of form
∞
X
Ψ(q, τ ) = Dn (τ )Ψ(n) (q), (21.29)
n=0
As in the Eulerian case, the n-th order solution can be written as a convolution over n copies of
the linear density field δL :
i
Z
Ψ(n) (k, τ ) = Ln (p1 , . . . , pn )δL (p1 ) . . . δL (pn )(2π)3 δD (p1 + . . . + pn − k), (21.30)
n! p1 ...pn
However, the integrals over the kernels are in general harder to evaluate than those of SPT.
Some comments on the relation of Eulerian and Lagrangian PT:
• The first order LPT solution called the Zeldovich approximation and its second order
extension called 2-LPT are remarkably good at reproducing the full non-linear density
field at intermediate scales. They outperform the first and second order SPT solutions
substantially. However, by including so called IR resummation, one can improve SPT and
ultimately both Eulerian and Lagrangian perturbation theory give equivalent results (see
e.g. Senatore’s EFTofLSS lecture notes).
• The Zeldovich approximation (1-LPT) and 2-LPT are used to set up initial conditions for
N-body simulations. N-body simulations track particles, so the use of Lagrangian particle
displacements makes intuitive sense. The reason why N-body simulations need perturbation
theory is to set up initial particle displacements (of equal mass particles) that incorporate
the initial inhomogeneities from inflation, as well as to speed up computation time by
treating small density fluctuations analytically until they grow sufficiently to require N-
body simulation.
115
Figure 21. Plots showing a comparison of N-Body data (black boxes) with theoretical SPT power spectra
at tree level (dotted), one loop (solid red), and two loop (dashed blue) orders. The left and right plots
show the comparison at redshifts 0 and 1 respectively. Each curve has been divided by the no-wiggle
(broadband) power spectrum for clarity of range. The plots are taken from 0905.0479.
• Observations only provide us with Eulerian densities, since we cannot look back in time to
observe the movement of a chunk of matter to its initial position. Observations are thus
closer to Eulerian theory. However N-body simulations readily provide both displacement
fields and densities.
The dual description of structure formation in the Eulerian and Lagrangian picture continues
to be important even for machine learning based methods. For example, a neural network struc-
ture formation emulator can either be trained to output Eulerian density fields ρ(x) or to output
the displacement field ψ of particles, and indeed both have been tried.
116
Figure 22. Top panel: z = 0 matter overdensity power spectrum in a 1D CDM-like model calculated
analytically using linear theory, the Zeldovich approximation (LPT at any order), and SPT to the specified
order in the overdensity. Note that even a 20 loop order SPT does not perform any better than a two loop
SPT. Figure is taken from 1502.07389
the universe is close to a perfect fluid and traces the initial conditions very well (which implies
minimal mode mixing). However, linear SPT begins to fail at scales ∼ O(0.01) h/Mpc at z = 0.
One might consider adding the next order terms in PT, the one-loop terms P (13) and P (22) , to
our theoretical fit to improve the range of SPT. This is shown by the solid red curve in Fig. 21.
Clearly, the one-loop SPT does not improve our fit better than the linear theory. Here, we may
be tempted to add higher loop order terms such as second and third to improve the fit. This
is shown in the dashed blue curve where we show the performance of SPT up to two loops.
Interestingly, adding higher-order loops does not improve our fit. The fitted curve appears to
oscillate around the true data points. This exercise when carried up to as large as 10 loop orders
reveals a similar pattern, as illustrated in Fig. 22. Hence, we deduce that SPT fails to fit the
nonlinear matter power spectrum on scales k ≪ 0.3h/Mpc ≡ kNL (z = 0). Therefore, SPT needs
to be improved.
When deriving SPT, the solution to the nonlinear coupled equations (Euler, Poisson and con-
tinuity) of the matter overdensity contrast δ(k) in Fourier space was given in terms of corrections
to the linear solution δ (1) :
117
with Fn (..) the symmetrized kernel of the nth-order solution.
The above expansion hinges on the assumption of perturbativity which requires that each
nth order correction must be smaller than the (n − 1)th order term. This is needed for the
perturbative solution to exist in the first place. However, as we will show below the loop terms
inherently contain contributions from the internal momenta (or modes) where our perturbation
theory is bound to break down. This lack of a clear small expansion parameter in the SPT is the
prime reason for its failure. To this end, consider the one-loop term P (13) as given below
D E′ D E′
(1) (3)
P (13) (k) = δk δp(3) + δk δp(1) (22.3)
d3 q
Z
= 6P (11) (k) F3 (⃗k, ⃗q, −⃗q)P (11) (q) (22.4)
(2π)3
(1) (1) ′
D E
where P (11) (k) = δk δp is the linear matter power spectrum and ′ denotes that we have
absorbed the Dirac delta function and the factor of (2π)3 .
On very large scales, i.e. in the limit
k → 0, the kernel F3 → k /q . Hence, we find the UV (k/q → 0) limiting behavior of P (13) is
2 2
Z ∞
(13) 61 2 (11)
lim P (k) ≈ − k P (k) dqP (11) (q). (22.5)
k→0 630π 2 0
The integral in the above expression goes over all internal momenta q and hence the integrand
P (11) (q) is evaluated on very small scales. This raises serious concerns as the linear power
spectrum P (11) is not valid on scales beyond ∼ kN L and yet we are summing over all scales down
to those of individual galaxies, stars, planets and even dust!
The concerns over the summation of internal momenta over very small scales leads to a related
issue with the SPT. Let us consider that the linear power spectrum can be approximated by a
power-law form
P (11) (k) ∝ k n . (22.6)
This is an excellent approximation on very large and quasi-linear scales where n ≈ 1 and ≈ −1.5
respectively. These values are reflective of our universe where the initial conditions are governed
dominantly by adiabatic initial (primordial) conditions. Upon substituting the power-law linear
power spectrum into the UV limiting integral for P (13) , we find that the integral diverges if
n ≥ −1. Fortunately, for adiabatic initial conditions, the spectral index n → −3 as k → ∞ and
hence the integral converges. However, in models extending beyond a pure adiabatic assumption,
such as those featuring a small fraction of primordial large blue-tilted (n > −1) isocurvature
fluctuations, the integral becomes divergent. Consequently, the SPT fails to meet the general
requirement of applicability to arbitrary initial conditions (see 1301.7182 for details)
In the next subsection, we will show how these problems can be ameliorated by the EFTofLSS
formalism.
118
The key problem in the SPT was the evaluation of loop integrals over scales where the internal
propagator (linear power spectrum) is known to break down. Hence, a straightforward solution is
to regulate these integrals by evaluating them up to a finite cutoff scale Λ. Similar to the ‘cutoff
regularization’ in QFT, we can evaluate the UV limit of the one loop term P (13) up to a scale Λ
as
Z Λ
(13) 2 (11)
lim P (k, Λ) ∼ k P (k) dqP (11) (q). (22.7)
k→0 0
By choosing Λ ≪ kNL we are guaranteed that the integrand P (11) (q) is evaluated on perturbative
scales. This seemingly simple solution has an inherent problem. By choosing an arbitrary cutoff
scale Λ, we have made our final SPT evaluations Λ-dependent. This is easy to observe for choices
of cutoff scales Λ1 < Λ2 ≪ kNL such that
Z Λ2 3
(13) (13) (11) d q ⃗ q , −⃗q)P (11) (q).
P (k, Λ2 ) = P (k, Λ1 ) + 6P (k) 3 F3 (k, ⃗ (22.8)
Λ1 (2π)
In the limit k ≪ Λ1 , we can approximate the integral in above expression using the UV limit
derived earlier. Hence we obtain
Z Λ2
(13) (13) 2 (11)
P (k, Λ2 ) = P (k, Λ1 ) − constant × k P (k) dqP (11) (q), (22.9)
Λ1
(13) 2 (11)
=P (k, Λ1 ) − k P (k) [f (Λ2 ) − f (Λ1 )] . (22.10)
However, our true data points either from an N-body simulation or observed samples are Λ-
independent. Hence, even though we have made a positive step in finding a resolution for the
failure of SPT, we have introduced an arbitrary scale in our theory which may be physically
motivated but is not an accurate description of the data.
The cutoff regularization procedure suggests that the SPT can be explicitly restricted to scales
k < Λ where Λ is some coarse-graining scale. Hence, we must look for a new ‘effective’ theory
that applies to perturbative long-wavelength modes. This reminds us of the effective field theory
(EFT) approach in QFT. In the EFT picture, we define the partition function Z of our theory as
Z
Z = DϕeiS[ϕ] (22.11)
where S[ϕ] is an action and is a functional of the field ϕ. EFT hinges on the argument that
to describe a low energy regime of field configurations at k ≪ Λ, we do not need high energy
momentum field modes. To visualize this, consider a complete UV theory where we can factorize
the underlying ϕ field in terms of long and short scales ϕ = ϕl + ϕs . Thus,
Z Z
iS[ϕ]
Z = Dϕe = Dϕl Dϕs eiS[ϕl ,ϕs ] . (22.12)
119
Since the partition function remains consistent in both descriptions, the actions, denoted as
S[ϕl ] and S[ϕ], differ. The new action S[ϕl ] yields a low-energy effective theory, capturing the
evolution of the field theory by integrating out ultraviolet (UV) modes and applying rescaling.
It is important to note that the new low-energy effective theory action, S[ϕl ] may incorporate
residual effects from small-scale modes. The feedback of these small-scale modes on the large
scale forms the essence of the success of the Effective Field Theory of Large-Scale Structure
(EFTofLSS) formalism.
Similar to the above discussion, we propose that the matter overdensity field δ can be broken
down into long and short scale (wavelength) modes where the long wavelength modes are chosen
such that they are perturbative. This is achieved in principle by smoothing the matter overdensity
field δ(x) within a smoothing radius R ∼ Λ−1 . The smoothing procedure integrates over all short
scale (x < R or k > Λ) information. Hence, we define a new EFT of LSS that consists of
smoothed field variables obtained by smoothing the δ, v and ϕ :
where [O]Λ is the operation of smoothing over an operator O and π = ρv is the momentum
density operator.
Here, δ and v are the DM number-density fluctuation and peculiar velocity field respectively.
We can construct the equations of motion for an effective fluid by coarse-graining the fluid
equations using a smoothing window function. The smoothing guarantees that the Boltzmann
hierarchy can be truncated, leaving us with an effective fluid. We define our isotropic smoothing
window function WΛ (x̄, x̄′ ) as a function of the radial separation r and smoothing radius Λ−1 :
120
with its Fourier transform
k2
WΛ (k) = e− 2Λ2 (22.21)
where Λ now represents a k-space, comoving cutoff scale. We regularize our observable quantities
by smoothing them which is equivalent to taking convolution in real space with the filter (window
function), defining the effective long wavelength quantity as
Z
Al (x̄) = d3 x′ WΛ x̄, x̄′ A(x̄′ ),
(22.22)
and split the fields into short and long wavelength fluctuations by defining the short wavelength
quantity as
As (x̄) = A(x̄) − Al (x̄). (22.23)
In Fourier space, this is represented as
Specifically, for fields δ, v and ϕ the effective long-wavelength fluctuations are defined as
Z
δl (x̄) = d3 x′ WΛ x̄, x̄′ δ(x̄′ ),
(22.26)
Z
d3 x′ WΛ x̄, x̄′ ϕ(x̄′ ),
ϕl (x̄) = (22.27)
Z
d3 x′ WΛ x̄, x̄′ 1 + δ(x̄′ ) v̄(x̄′ ).
(1 + δl (x̄)) v̄l (x̄) = (22.28)
By applying the smoothing operation to the Euler, Poisson, and Continuity equations, and
after numerous simplifications, we obtain the following set of fluid equations (see 1206.2926):
3
∇2 ϕl − H2 ρ0 δl = 0, (22.29)
2
∂τ δl + ∇ · [(1 + δl ) v̄l ] = 0, (22.30)
1 h ji h i
∂τ ⃗vl + H⃗vl + (⃗v · ∇) ⃗vl + ∇ϕl = − ∂j τi + ∂j τij 2 . (22.31)
ρl Λ ∂
where
ρl (x) ≡ ρ0 dl (x) = ρ0 (1 + δl (x)) , (22.32)
and
" ′ ′
#
h i
j ′ ′ j ′ 2∂ j ϕs (x̄′ )∂i′ ϕs (x′ ) − δij ∂ k ϕs (x̄′ )∂k′ ϕs (x′ )
τi = ρ(x̄ )vs (x̄ )vs (x̄ ) + (22.33)
Λ 8πG
Λ
!
h i ∂m vl (x̄)∂ m vlj (x̄) 2∂k ∂i ϕl (x)∂ k ∂ j ϕ j m k
l − δi ∂k ∂ ϕl (x)∂ ∂m ϕl
τij = ρl (x̄) + . (22.34)
∂2 Λ2 8πGΛ2
121
We see that the long-wavelength fluctuations obey an Euler equation in which thehstress ij
i tensor τ
j
receives contributions from two terms that are induced by the short wavelength ( τi ) and long-
h i Λ
wavelength ( τij 2 ) fluctuations respectively. The long wavelength fluctuations are suppressed
∂
by 1/Λ2 factor and can be neglected in the limit Λ → ∞. In the large Λ limit, the leading stress
tensor is sourced by the short-wavelengths. These residual stress terms arise since multiplication
and smoothing do not commute, i.e. [AB]Λ ̸= [A]Λ [B]Λ . Physically speaking, the intuition for
the above stress tensor is that small scale modes appearing in the fluid equations non-linearly
modify the dynamics of large scale modes.
Although we started with the EoM of a pressure-less fluid, the effective pressure of the ‘im-
perfect’ matter fluid after smoothing (in the limit Λ → ∞) is given as
1 h ki
peff = τ (22.35)
3 k Λ " ′ # !
1 h ′ i ∂ k ϕ (x̄′ )∂ ′ ϕ (x′ )
s k s
= ρ(x̄ )vs;k (x̄′ )vsk (x̄′ ) + . (22.36)
3 Λ 8πG
Λ
Hence, we see that the small scale fluctuations induce an effective pressure perturbation on the
long-wavelength fluid. One can also see the effect of the small scale velocity fluctuations by taking
the first term in Eq. (22.33) and writing it as
1 ij 1 ′ i ′ j ′
τ Λ∼ ρ(x̄ )vs (x̄ )vs (x̄ ) Λ (22.37)
ρl ρl
∼ δl vsi (x̄′ )vsj (x̄′ ) Λ
(22.38)
∼ δl c2s δ ij + O (∂k vs ) . (22.39)
The parameter c2s is the sound speed squared due to the residual pressure of the small scales. The
effective stress tensor that we have identified is thus explicitly dependent on the short wavelength
fluctuations. These are very large, strongly coupled, and therefore impossible to treat within the
effective theory. The next key step in the EFT description is the expansion of this stress tensor
in terms of powers of derivatives and δl with the expansion coefficients (such as c2s ) parameterized
instead of being computed.
Since we treat the matter as a collisionless and pressureless fluid, it is convenient to introduce
the notation of fluid dynamics to understand the various terms that arise from the smoothing
procedure, such as the induced stress-tensor as given in Eqs. (22.33) and (22.34). To this end,
consider the Naiver-Stokes equation for a fluid velocity ū
∂ ū T 2
ρ + ū.∇ū + ρ∇ϕ = −∇p + ∇· η ∇ū + (∇ū) − (∇· ū) I + ζ (∇· ū) I (22.40)
∂t 3
where the coefficients ζ and η are the bulk and shear viscosity. Similarly, we can re-frame the
smoothed stress tensor τ by expanding the small-scale modes around their expectation value with
a perturbation that is modulated by long-wavelength modes. Hence we write
3ρ c2
2 ρb 2 ij
ij ij 2 k b sv j i i j k
τ Λ = δ pb + cs δρl − cbv ∂k vl − ∂ vl + ∂ vl − δ ∂k vl + ∆τ + · · · (22.41)
aH 4aH 3
122
where the parameters cbv and csv are the coefficients related to the bulk and shear viscosity
respectively of the effective fluid. ∆τ is the stochastic term (due to small scale fluctuations)
uncorrelated with the smoothed field and · · · represents terms higher order in derivative and
power counting in δl . The various coefficients c2s , c2sv , c2bv encapsulate the backreaction of
‘ultraviolet (UV) physics’ of the Universe, i.e. that operating on scales beyond our
cutoff Λ, on large scale effective fluid. This seemingly simple addition from the EFTofLSS
over SPT is the most significant difference between the two PT formalisms. The free parameters
within our new theory aka EFTofLSS are obtained by fitting to the observed data or simulations.
This way, EFTofLSS captures the backreaction of small scales on large scales without making
any assumptions about small-scale physics. Some of the most complex ‘baryonic’ effects can also
be treated in this way, while remaining completely agnostic about their intricate physics (see
1412.5049 and 2010.02929).
with c2 (Λ) = c2s (Λ) + f c2sv (Λ) + c2bv (Λ) as given in Eq. (22.41) and where we have made the
Λ dependence of these free parameters explicit. Here, f is the logarithmic growth rate given as
f = d ln D/d ln a.
Finally, the Fourier space matter power spectrum up to one-loop2 is given as
h i
(11) (13) (22)
PΛEFT (k, z) = D2 PΛ (k) + D4 PΛ (k) + PΛ (k) + D2 PΛctr (k, z) (22.44)
where PΛctr is referred as the ‘counterterm’ contribution and D ≡ D(z) is the normalized growth
function. The counterterm contribution is expressed as
(11)
PΛctr (k, z) = −c2Λ (z)k 2 PΛ (k). (22.45)
At this order, there are two key differences between the SPT and EFTofLSS predictions: (1) the
loop integrals extend only to Λ, since we have smoothed the fields, and (2) the appearance of the
final term involving the effective sound-speed c2Λ .
2
We have neglected the contribution from the stochastic term ∆τ which will remain sub-dominant for the
cosmologies of our interest.
123
22.3.1 Renormalization
The EFT power spectrum as given above appears to be Λ-dependent due to the inherent depen-
dence of the long-wavelength field δl on the smoothing scale Λ. However, we will show that the
additional Λ-dependent term PΛctr is precisely what we need to make the entire one-loop spectrum
(11)
approximately Λ-independent. To this end, consider the linear power spectrum PΛ (k):
(1) (1) ′
D E
(11)
PΛ (k ≪ Λ) = δk,Λ δp,Λ (22.46)
D E′
(1)
= WΛ (k)δk WΛ (p)δp(1) (22.47)
= WΛ2 (k)P (11) (k) (22.48)
≈ P (11) (k) (22.49)
where we used δk,Λ ≡ WΛ (k)δk and in the last line we approximated the smoothing kernel
WΛ (k) ≈ 1 for k ≪ Λ. Hence, the linear power spectrum is Λ-independent for all scales of
(13)
interest that are much larger than the smoothing scale. Now, let us consider the PΛ (k) term:
d3 q
Z
(13) (11) ⃗ q , −⃗q)P (11) (q)
PΛ (k) = 6PΛ (k) 3 F3 (k, ⃗ Λ (22.50)
(2π)
(13)
Hence, we find that the PΛ term has an explicit Λ-dependence due to the smoothing proce-
dure. This Λ-dependence is similar to the one we derived for a corresponding P (13) term in
SPT and hence leads to similar problems since the complete one loop power spectrum must
be inherently Λ-independent. However, unlike SPT, EFTofLSS contains an additional term at
one loop order, the counterterm contribution. This contribution has the exact spectral shape
(11) (13)
PΛctr (k, z) = −c2Λ (z)k 2 PΛ (k) to cancel the apparent Λ-dependence of PΛ . To see this, con-
(13)
sider the sum of PΛ and PΛctr for all scales k ≪ Λ and we will use the approximation that
(11)
PΛ (k) ≈ P (11) (k) for all Λ such that k ≪ Λ. Hence,
(13) (13) (11) (11)
D2 PΛ2 (k) + PΛctr
2
(k, z) = D2 PΛ1 (k) + D2 k 2 PΛ1 (k) [f (Λ2 ) − f (Λ1 )] − c2Λ2 (z)k 2 PΛ2 (k)
(22.55)
(13)
= D2 PΛ1 (k) − k 2 P (11) (k) c2Λ2 (z) − D2 f (Λ2 ) + D2 f (Λ1 )
(22.56)
(13) (11)
= D2 PΛ1 (k) − c2Λ1 (z)k 2 PΛ1 (k) (22.57)
124
Therefore, we find that the counterterm in EFTofLSS ‘renormalizes’ the P (13) one-loop term such
that the apparent Λ-dependence vanishes. Hence, we observe that the microphysical c2 (z) changes
as we vary Λ and the variation of c2 occurs in precisely the manner to cancel any change in P (13)
term. For this reason, c2 is also known as ‘ultraviolet counterterm’. In other words, although the
individual loop integrals and counterterms are Λ-dependent, their sum isn’t: therefore as desired
the overall theory is independent of any cutoff scale Λ.
So far we have only considered the P (13) loop term. However, the above argument can be
applied to any loop term. Specifically, we note that the apparent Λ-dependence of P (22) term
scales as k 4 . This k 4 dependence is exactly canceled or absorbed by the lowest order stochastic
term ∆τ in our EFT expansion. However, since k ≪ Λ, the k 4 dependence is sub-dominant
compared to k 2 P (11) (k) for our scales of interest. Therefore, the Λ-dependence of P (22) term is
usually neglected along with any contribution from ⟨∆τk ∆τp ⟩′ .
Based on the above discussion, we write the full EFT power spectra at one loop order as first
derived in 1206.2926:
(13) (11)
P EFT (k, z) = D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z)PΛ (k) − D2 (z)c2Λ (z)k 2 PΛ (k). (22.58)
where we remind the reader that the LHS is Λ-independent even though the individual terms
(13)
PΛ (k) and PΛctr can vary with Λ. Note that c2 > 0 implies a positive residual pressure and
hence the power reduces on quasi-linear scales. However, note that c2 is a coefficient of an EFT
operator consistent with symmetries and power counting, and we did not make assumptions of
the positivity of this coefficient
125
of small scales on large scale modes. Hence, we can rewrite the full one-loop EFT matter power
spectrum as
(13)
P EFT (k, z) =D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z)PΛ (k) − 2D2 (z)c̃2Λ (z)k 2 P (11)
− 2D2 (z)c2phy (z)k 2 P (11) . (22.60)
(13)
Since the c̃2Λ (z) term must cancel the Λ-dependence of PΛ at all redshifts, it should vary with
redshift exactly like D2 (z). Hence,
(13)
P EFT (k, z) =D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z) PΛ (k) − 2c̃2Λ (0)k 2 P (11)
− 2D2 (z)c2phy (z)k 2 P (11) (22.61)
where c̃2Λ (0) is the value of the counterterm at z = 0. Note that c2phy (z) can have an arbitrary red-
shift dependence, contingent upon the evolution of the residual pressure induced by gravitational
clustering. More importantly, we note that the only IR-surviving quantity inherited from the UV
effects is the renormalized parameter c2phy (z). When analyzing cosmologies with different initial
conditions and cosmological parameters, a comparison between c2phy (z) can act as an additional
distinguishing feature. For instance, refer to Fig. 3 in 2306.09456 where the authors show the
variation of c2phy (z) as a function of a cosmological parameter that alters the small-scale power.
From the structure of the counterterm contribution, we expect the c2phy (z) ∼ 1/kNL 2 (z). This
gives a value of c2phy (z = 0) ≈ O(10) for kNL (z = 0) ≈ 0.3 using the renormalization scheme
mentioned in 2306.09456. Note that this value differs from the usual O(1) value typically quoted
for the bare c2 (z).
126
Figure 23. Plot showing comparison of matter power spectrum as obtained from SPT and EFTofLSS
against data from N-Body simulations. Each curve has been divided by the linear power spectrum. The
fully nonlinear N-Body power spectrum is plotted in black boxes. The red and blue dashed curves show
one and two-loop results from SPT whereas similar order curves from EFTofLSS are shown in solid colors.
The above figure is taken from O. Philcox’s presentation, 2020.
Figure 24. Similar to Fig. 23. Here, the curves are divided by the nonlinear (NL) matter power spectrum
as obtained from NBody simulations. Taken from 1507.05326.
P (11) (k) ∝ k n , then some of the loop terms diverge for n > −1. Such conditions can arise naturally
if we consider mixed primordial initial conditions consisting of adiabatic fluctuations with a small
fraction of CDM blue-tilted isocurvature power as shown in Fig. 25. This is an example from our
own research work published in 2306.09456. We briefly mentioned isocurvature in Sec. 6.6.3.
For such cosmologies, one is often forced to choose a particularly small value of Λ to avoid
large spurious contributions from small scales. In the EFTofLSS, however, there are no such
divergences. This occurs since the domain of integration is bounded; since the internal momenta
q are limited by Λ and the integrand is analytic, the loop integrals are guaranteed to be finite.
If there are divergences lurking in the high-q regime, they are themselves absorbed within the
counterterms. In Fig. 26 we plot EFT curves for the pure adiabatic and mixed cosmologies. For
both of the cases we choose an arbitrarily large value of the cutoff scale Λ ≈ 100 h/Mpc. For
127
Linear matter power spectrum at z = 2.0
103
P(k) [h 1Mpc]3
102
101
AD
MX (niso = 3.75, = 0.25)
MX (niso = 3.0, = 0.25)
100
10 3 10 2 10 1 100 101
k [hMpc 1]
Figure 25. Plot showing a comparison between the linear matter power spectra at a redshift of z = 2 for
the pure adiabatic and mixed initial conditions. For the mixed case, we show two examples in which the
power deviates from the adiabatic scenario on small scales with the spectral indices n = −0.25 (dashed)
and n = −1 (dotted) respectively. Taken from 2306.09456.
the pure adiabatic case, the counterterm tends to an asymptotic value as Λ → ∞ and is an O(1)
(13)
number as shown in the figure. However, due to the diverging structure of the PΛ term for the
mixed scenario with limk→∞ P (11) (k) ∝ k −0.25 , the counterterm runs with the variation in Λ. For
our choice of Λ = 100 h/Mpc, we find that c2Λ (z = 1) ≈ −6.23 (Mpc/h)2 . While a negative value
of the c2 compared to adiabatic may seem alarming, note that the only IR-surviving quantity
inherited from the UV effects is the physically relevant parameter c2phy . For the pure adiabatic and
mixed case, the physical parameter c2phy are nearly identical in magnitude. A small difference
in their magnitude is due to a larger power on smaller scales within the mixed scenario. On
the other hand, given that the bare c2Λ can become negative for Λ ≳ O(3), there is an unclear
interpretation of this parameter, which leaves room for more intricate UV dynamics being at play
here such as for the mixed (isocurvature) scenario. A nonlinear UV model exploration of this
issue may be useful to further elucidate the difference.
128
Non-linear matter power spectrum at z = 1.0
250
225
200
k P(k) [h 1Mpc]2
175
150
125
FastPM AD
100 FastPM MX
1-loop-eft AD (c2 = 0.4)
75 1-loop-eft MX (c2 = 6.23)
10 1
k [hMpc 1]
Figure 26. In this figure we highlight the fitting of one-loop EFT power spectrum to the N-body data.
Note that we plot scaled power spectrum, k × P (k), on the y-axis for clarity. For the mixed case, we use
fiducial value as n = −0.25. The value of the bare c2Λ (at cutoff Λ = 100 h/Mpc) one-loop EFT parameter
is given in the label for the EFT curves. Note that the value of c2Λ for the mixed case is negative. The
one-loop EFT curves is accurate up to ≈ 0.5 h/Mpc at redshift z = 1. We also plot the approximate
theoretical error band expected from two-loop contributions.
(1)
coarse-grained linear overdensity δl ,
X
δg (x) = (bO + ϵO (x)) O(x) + bϵ ϵ(x) (22.62)
O
= b1 δ(x) + bϵ ϵ(x)
b2
+ δ 2 (x) + bG2 G2 (x) + ϵδ (x)δ(x)
2
b3
+ bδG2 δ(x)G2 (x) + δ 3 (x) + bG3 G3 (x) + bΓ3 Γ3 (x) + ϵδ2 (x)δ 2 (x) + ϵG2 (x)G2 (x)
6
+ b∇2 δ ∇ δ(x) + b∇2 ϵ ∇2 ϵ(x)
2
(22.63)
where all the operators O in the above expression are considered to be coarse-grained and the
subscripts l or Λ have been dropped for brevity. In Fourier space the Laplacian takes the form
∇2 → (k/k∗ )2 where k∗ is some characteristic scale of clustering for biased tracers and we re-
strict to scales k/k∗ ≪ 1. Hence every insertion of a Laplacian is equivalent to a second order
correction to an operator O and the derivative operators in the last-line of Eq. (22.63) are
counted approximately as cubic order in bias expansion. Therefore, Eq. (22.63) is a double ex-
pansion in density fluctuations and their derivatives. The remaining operator set {δ 2 , G2 , ϵδ δ}
and {G2 δ, δ 3 , G3 , Γ3 , ϵδ2 δ 2 , ϵG2 G2 , ∇2 δ(x), ∇2 ϵ(x)} are second and third order respectively and we
refer the readers 1611.09787 for definition and details regarding these operators. Notably, the
2
operators non-local in δ such as G2 = (∇i ∇j Φ)2 − ∇2 Φ arise naturally due to gravitational
evolution and renormalization requirements respectively. This was first shown in 1402.5916.
129
22.7 Application of the EFTofLSS to simulations and real data
Finally, within the EFTofLSS, we model the perturbative galaxy-galaxy power spectrum Pgg at
one loop level as sum of the deterministic, stochastic and counterterm parts:
det sto ctr
Pgg = Pgg + Pgg + Pgg . (22.64)
During cosmological parameter inference from simulation or observational data, we fit the afore-
mentioned theoretical power spectrum with the relevant number of bias and counterterm param-
eters. The theoretical spectrum can be obtained from a few existing codes such as CLASS-PT
(2004.10607) for EPT, and Velocileptor (2012.04636) for a more complicated LPT implemen-
tation. CLASS-PT is an adaptation of the CLASS code designed to compute the non-linear
power spectra of dark matter and biased tracers using one-loop cosmological perturbation theory
in Eulerian coordinates. It handles both Gaussian and non-Gaussian initial conditions. It’s an
easy-to-use and convenient code when performing LSS analysis. Now consider the simplest case
where we fit the one loop spectrum to a data in real space. In this case there exists only one
free parameter c2 which is often absorbed into the Laplacian bias coefficient. In redshift space,
discussed in Sec. 20.6, we often consider only the first 3 multipoles ℓ = 0, 2, 4 and attach in-
dependent counterterms to each multipole spectra. In Fig. 27 we show the results of a blinded
challenge that was performed and reported in 2003.08277 using EFTofLSS in redshift space for
the first two multipoles.
In Fig. 28 we show the results from a recent cosmological parameter inference performed using
four independent Baryonic Oscillation Spectroscopic Survey (BOSS) datasets across two redshift
bins (zeff = 0.38, 0.61) in flat ΛCDM, marginalizing over 7 nuisance parameters for each dataset
P
(28 in total) and varying 5 cosmological parameters (ωb , ωcdm , H0 , As , mν ). The theory model
includes a complete perturbation theory description that properly takes into account the non-
linear effects of dark matter clustering, short-scale physics, galaxy bias, redshift-space distortions,
and large-scale bulk flows. The constraints on H0 and Ωm as obtained from the EFT analysis
of BOSS data are already competitive with the CMB measurements of Planck for the same
cosmological model with varied neutrino masses. This highlights the success of EFTofLSS and
setting the stage for precision cosmology from future surveys.
23 N-body simulations
The fluid approximation breaks down on small scales. For example, the velocity field is no longer
single valued at a point in space, once shell crossing happens (i.e. clouds of mass pass through
eachother).
Next to perturbation theory of fluids, the second main way to evaluate the dynamics of the
universe are N-body simulations of (dark) matter. N-body simulations are not intrinsically
perturbative, and can thus in principle extend our reach to non-perturbative scales to extract
cosmological parameters with more sensitivity. On the other hand, N-body simulations are
computationally costly and it is difficult to simulate the survey volume of a galaxy survey with the
required resolution. In addition, dark matter N-body simulations are only valid on scales where
baryonic feedback is unimportant. To go to smaller scales, one needs even more computationally
130
Figure 27. (Taken from 2003.08277) The upper panel shows comparison of the data for the monopole
and the quadrupole with the best-fit EFT model. The residuals for the monopole and the quadrupole
for the best-fit model (right panel). Note that the quadrupole data points are slightly shifted for better
visibility. In the lower panel we show different contributions to the monopole (left panel) and quadrupole
(right panel) power spectra. The data errors and the two-loop estimate are also displayed. We plot the
absolute values, some terms are negative. Here, k 4 -ctr is the contribution due to the Finger-of-God effect.
131
Figure 28. (Taken from 1909.05277) Left panel: The posterior distribution for the late-Universe parame-
ters H0 , Ωm and σ8 obtained with priors on ωb from Planck (gray contours) and BBN (blue contours). For
comparison we also show the Planck 2018 posterior (red contours) for the same model (flat ΛCDM with
massive neutrinos). Right panel: The monopole (black dots) and quadrupole (blue dots) power spectra
moments of the BOSS data for high-z (upper panel) and low-z (lower panel) north galactic cap (NGC)
samples, along with the best-fit theoretical model curves. The corresponding best-fit theoretical spectra
are plotted in solid black and blue.
dynamics by tracking N = Nside 3 particles from their initial (almost uniform) position to their
late time positions as a function of time. The equations of motion are simply Newtonian gravity
in an expanding space time:
dxi pi
= (23.1)
dt ma
dpi m ∂ϕ
= −Hpi − (23.2)
dt a ∂xi
Introducing the superconformal momentum pc = ap, which is conserved in the absence of
perturbations, this can be re-written as
dxi pi
= c2 (23.3)
dt ma
dpic ∂ϕ
= −m i (23.4)
dt ∂x
Solving these equations numerically, for a large number of particles such as 10003 , leads to a
beautiful and physically accurate matter distribution.
132
A computationally efficient and widely used method to solve these equations is the leapfrog
scheme, where density and velocity are evaluated with an offset of half a time step:
x(i) (t) and pc(i) (t − ∆t/2) (23.5)
After generating the initial conditions (usually using LPT as discussed in Sec. 21.3), the
algorithm proceeds as follows:
1. Compute the gravitational potential generated by the collection of particles, and take its
gradient to obtain ∇ϕ(x, t).
133
23.3 Baryonic simulations
To take into account baryonic forces, one uses magneto-hydrodynamic (MHD) simulations. These
can be implemented using Smoothed-particle hydrodynamics (SPH) simulations (i.e. still
using particles, but with additional forces), or with a (moving) mesh. Unfortunately it is
not possible to simulate these forces from first principles (e.g. how an AGN blows out gas), so
one needs to approximate them with a so-called subgrid model. There are different subgrid
models that lead to different answers. For example, in the CAMELS simulations, the same initial
conditions but different subgrid models can change the galaxy density by 30% or so. So while
dark matter simulations, given enough resolution, are in principle arbitrarily accurate, the same
is not true once we include baryonic physics. This is a key difficulty in simulation-based inference
on small scales.
134
Figure 29. Halo formation. Halos form where the smoothed density field crosses the critical density. For
illustration, we plot a single large-scale mode (dashed) and a few small scale modes. Figure adapted from
Baumann’s Cosmology book.
• Matter perturbations on large scales are Gaussian and grow with the growth factor D(z).
• In spots where the smoothed density field crosses the critical density δc , a halo will form.
Because perturbations grow, new halos will form in time. It turns out that the critical
density is independent of the halo mass or smoothing scale and is about δc = 1.6, which
can be derived from Newtonian gravity. This is illustrated in Fig. 29.
• Since this picture depends on the smoothing scale R, in principle smaller halos can be
contained in larger ones. This is handled more carefully in the extended Press-Schechter
formalism.
We don’t have time to derive the mathematical results, but I want to show you the widely
used result. The halo mass function can be expressed as
ρm d ln σ(m, z)
n(m, z) = 2
f (σ, z) , (24.2)
m d ln m
where ρm is the mean matter density. The quantity σ 2 (m, z) is the variance of mass within a
sphere of radius R(m) defined as
Z ∞
1
σ 2 (m, z) = 2 dk k 2 P lin (k, z)W 2 (kR) (24.3)
2π 0
135
Figure 30. Sheth-Tormen mass function at different redshifts (from 2108.04279).
3 [sin(kR) − kR cos(kR)]
W (kR) = (24.4)
(kR)3
R can be interpreted as the radius we need to collect primordial mass from to form the halo. The
term f (σ, z) is called the halo multiplicity and one often assumes the Sheth-Tormen halo
multiplicity function:
r 2 p
aδc2
2a σ δc
f (σ, z) = A 1+ exp − (24.6)
π aδc2 σ 2σ 2
with A = 0.3222, a = 0.75, p = 0.3, and δc = 1.686. The resulting mass function is plotted in
Fig. 30.
The halo mass function, as a function of cosmological parameters, can also be “learned” from
simulations. This is done for example in 1804.05866, 2003.12116. By measuring the HMF from the
data and comparing it to the theoretical expectation from simulations one can then in principle
measure cosmological parameters. This is called cluster abundance or cluster counting. While
small halos may be very sensitive to unknown baryonic physics, the largest halos are dominated
by gravity and might provide reliable measurements.
δ = δh + δb (24.7)
136
Figure 31. Example of contributions to the 1-halo and 2-halo power spectra.
The short modes will eventually form halos. The long modes can be interpreted as locally
shifting the required critical density for the short modes to form halos. This is illustrated in Fig.
29 (dotted line is the long mode). By expanding the mass function to linear order in δb one can
derive the linear halo bias. This leads to:
1 d log f
bh (m, z) = 1 + (24.8)
δc d log σ
Note that the halo bias satisfies a consistency relation:
Z ∞
m
d ln m mn(m, z) bh (m, z) = 1, (24.9)
−∞ ρm (z)
i.e. the total matter field comprised of all halos is unbiased. Note that bias can be smaller than
one (and even negative, for voids, which preferentially form in underdense regions). The bias of
typical galaxies in a survey is larger than one.
137
24.4.1 Dark matter
In Fourier space, the dark matter power spectrum is given by
1h 2h
Pmm (k, z) = Pmm (k, z) + Pmm (k, z) (24.10)
Z ∞ 2
1h m
Pmm (k, z) = d ln m mn(m, z) |u(k|m, z)|2 (24.11)
−∞ ρm
Z ∞ 2
2h lin m
Pmm (k, z) = P (k, z) d ln m mn(m, z) bh (m, z)u(k|m, z) (24.12)
−∞ ρm
In these expressions, m is the halo mass, ρm is the present day cosmological matter density,
n(m, z) is the halo mass function (i.e. the differential number density of halos with respect
to mass), u(k|m, z) is the normalized fourier transform of the halo profile, P lin (k) is the linear
matter power spectrum, and bh (m, z) is the linear halo bias. The one halo term is the shot noise
convolved with the profile.
We need u(k|m, z), the Fourier transform of the dark matter halo density profile, which for
spherically symmetric profiles is defined as
Z rvir
sin(kr) ρ(r|m, z)
u(k|m, z) = dr 4πr2 . (24.13)
0 kr m
We assume that halos are truncated at the virial radius, and have mass
Z rvir
m= dr 4πr2 ρ(r|m, z) (24.14)
0
Note that with this definition of mass, u(k|m, z) → 1 as k → 0. Returning to the two-halo
term and using the consistency relation in Eq. (24.9), this property of u(k|m, z) ensures that
2h (k, z) ≃ P lin (k, z) in the limit where k → 0, as it should.
Pmm
138
Halo model power spectra can be calculated with various codes, such as https://github.
com/borisbolliet/class_sz. The halo model can also be used to calculate higher N-point
functions such as the bispectrum. While the halo model is powerful, remember however that
the assumptions of a set of spherical halos that includes all matter is not a very realistic one.
139
w(r)
[n(r) − αns (r)] ,
F (r) = (25.1)
I 1/2
where n and ns are the observed number density field for the galaxy catalog and synthetic catalog
of random objects, respectively. Here we have assigned the galaxies to a regular grid using some
mass assignment scheme such as CIC. The factor w(r) is a general weight factor which we discuss
shortly. The factor α normalizes the synthetic catalog to the number density of the galaxies, so
that ⟨F ⟩ = 0. The field F (r) is normalized by the factor of I, defined as I ≡ dr w2 n̄2 (r).
R
The estimator for the multipole moments (recall that we are in red shift space) of the power
spectrum is
Z
2ℓ + 1 dΩk
Z Z
ik·(r1 −r2 )
P̂ℓ (k) = dr1 dr2 F (r1 )F (r2 )e Lℓ (k̂ · r̂h ) − Pℓnoise (k) , (25.2)
I 4π
where Ωk represents the solid angle in Fourier space, rh ≡ (r1 + r2 )/2 is the line-of-sight to the
mid-point of the pair of objects, and Lℓ is the Legendre polynomial of order ℓ. The shot noise
Pℓnoise is Z
Pℓnoise (k) = (1 + α) dr n̄(r)w2 (r)Lℓ (k̂ · r̂), (25.3)
140
where Nm is the number of mock catalogs and P̄ℓ (k) is the mean power spectrum,
Nm
1 X
P̄ℓ (k) = Pℓ,n (k). (25.6)
Nm
n=1
This is done at a fiducial cosmology. There are some subtleties with covariance matrix estimation,
see in particular the Hartlap factor correction which affects the inverse covariance matrix at
the level of a few percent.
26 Non-Gaussianity
Let’s briefly discuss going beyond the power spectrum. Here we are concerned not primarily with
primordial non-Gaussianity (see Sec. 16 in the CMB unit), but rather with gravitational and
baryonic interaction.
and extract cosmological parameters from it, together with the power spectrum. Note that the
bispectrum and power spectrum estimators have a covariance, they are not independent, due to
mode coupling. At perturbative scales which we can use for cosmological analysis, including the
bispectrum improves cosmological parameters by 10 to 30% (2206.08327). Bispectrum parameter
estimation works the same as power spectrum parameter estimation, i.e. we need a bispectrum
estimator, a theoretical model of the bispectrum and a likelihood with covariance.
Of course there are even higher point correlation functions. The next is the galaxy trispec-
trum
Tg (k1 , k2 , k3 , k4 ) ∼ ⟨δg (k1 )δg (k2 )δg (k3 )δg (k4 )⟩ (26.2)
The trispectrum is not yet normally used for galaxy survey analysis, but should squeeze some more
signal-to-noise out of cosmological parameter constraints (in particular by breaking degeneracies
with biases). Higher N-point functions become progressively more difficult to model theoretically
and more computationally difficult to estimate in the data. In the perturbative regime, higher
N-point functions have progressively less signal-to-noise, since they are higher order in the small
initial perturbations. So there is no point in continuing this to ever higher order correlators. On
non-perturbative scales, it is likely that N-point functions are not the right thing to do, as we
discuss below.
141
26.2 Primordial non-Gaussianity
Higher N-point functions are also a way to measure primordial non-Gaussianity (e.g. review
1412.4671). As in the CMB, in general the most promising observable is the bispectrum. The
problem with non-Gaussianity estimation is to tell apart the signal coming from non-linear evolu-
tion and that of primordial origin. The degeneracy of the two signals severely degrades constraints
on primordial non-Gaussianity from galaxy surveys. Even next generation galaxy surveys can
only about equal (2211.14899) existing constraints from Planck for equilateral and orthogonal
non-Gaussianity. However in the far future, we hope that intensity mapping of the dark ages can
improve constraints by orders of magnitude (1610.06559).
The situation is better for local non-Gaussianity, or any signal that peaks in the squeezed
limit. Interestingly, in that case there is an observable signal in the galaxy power spectrum
called scale-dependent bias. Scale dependent bias is likely to improve the constraint on fN local
L
by a factor of 10 or so over Planck, within the next 10 years. Scale-dependent bias leads to a
characteristic kink of the primordial power spectrum on large scales. I have spent a lot of time
with this signal in my own research and hope to add a discussion here later.
142
κ κgal,j
• cosmic shear tomography (Cℓ gal,i ).
The indices i and j indicate red-shift bins. Adding the CMB lensing convergence field, we can
extend the data vector with 3 more two-point functions:
This can be called a 6x2 analysis. The angular power spectrum between redshift bin i of observ-
able A and redshift bin j of observable B at Fourier mode ℓ (using the Limber approximation)
is given by
WAi (χ)WBj (χ)
ℓ + 1/2
Z
ij
CAB (ℓ) = dχ Pm , z(χ) , (27.1)
χ2 χ
where χ is the comoving distance, Pm (k, z) is the matter power spectrum, and WAi (χ), WBj (χ)
are weight functions of the observables A, B given by
nilens (z(χ)) dz
Wδig (χ) = big , (27.2)
n̄ilens dχ
Z χimax
3H02 Ωm χ ni (z(χ′ )) dz χ′ − χ
Wκi g (χ) = dχ′ source , (27.3)
2c 2 a(χ) χimin i
n̄source dχ′ χ′
3H02 Ωm χ χ∗ − χ
WκCMB (χ) = , (27.4)
2c2 a(χ) χ∗
where χimin/max are the minimum and maximum comoving distance of the redshift bin i. Here
a(χ) is the scale factor, Ωm the matter density fraction at present, H0 the Hubble constant, big
is the galaxy bias in bin i, and χ∗ the comoving distance to the surface of last scattering. Note
that the weight function of κCMB does not depend on redshift bins. The galaxy density and
CMB convergence weight functions we have encountered before in these lectures. The galaxy
lensing weight function integrates the lensing effect over the source density in a bin i. Details
on cross-correlation analyses are given in 1607.01761 and 2108.00658. Of course, one can also
consider bispectra involving the three signals.
143
28.1 Overview
In recent years, a lot of effort is made in the community to go beyond power spectra, bispectra and
Gaussian likelihood approximations. The hope of course is to extract more sensitive parameter
constraints from the data. The broad tools we use for this include simulations, optimization
(auto-differentiation), and the many forms of machine learning. I will try to give you a broad
overview with suitable references to study more. Despite massive effort, it is still somewhat
debated whether these methods really allow us to get better parameter constraints from real
experiments. This is because the methods need to be robust with respect to non-linear small-
scale physics, which is difficult to achieve. Currently most state-of-the-art constraints still come
from a more traditional analysis. See 2405.02252 for a recent quantitative comparison of some of
these methods with traditional approaches.
Modern methods can be broadly classified into two different categories, which we dicuss in
more detail below:
In both cases we need a forward model, which can be a simulator, a neural network, or
even analytic perturbation theory. The forward model maps cosmological parameters
and initial conditions to observable data. Of course a crucial aspect of the forward model
is that it is accurate at the scales we are interested in and that it is computationally tractable to
evaluate (as often as required by the chosen parameter inference method). These conditions are
not easy to meet.
144
• Power spectrum and bispectrum of course.
• Cluster or Galaxy number density and mass distribution. Cluster counting, espe-
cially of very massive clusters which are less sensitive to baryonic physics, can be used to
constrain cosmological parameters.
• Topological data analysis. Aims to use the distribution of topological features (“sim-
plices”) in the data.
• The simulation can be trusted on the scales that the neural network gets to see (e.g. we
can filter out small scales first to make it more robust, but losing sensitivity).
These conditions don’t neccessarily hold in practice so there is still some interest in coming up
with new “hand made” summary statistics. A neural network is usually trained to directly give
estimates of the cosmological parameters, while for other summary statistics there is a second
step involved in mapping them to cosmological parameters. A neural network can also be trained
to estimate error bars and covariances for its measurements. However, a more robust approach
is to learn these error bars after training in a second step, which we discuss now.
(θ n , xn )}Nsim
n=1 (28.1)
145
• In Neural Likelihood Estimation (NLE) we learn the likelihood p(x|θ), i.e. the con-
ditional probability of the summary x given the parameters θ. If we have learned the full
distribution, we can do fast amortized inference, which means that we do not need to run
new simulations to get the posterior for a new set of observations. On the other hand, it can
be too expensive to run enough such simulations, in which case one can use a Sequential
Neural Likelihood Estimator (SNLE) which focuses on learning the likelihood near
the observed data. Once the neural likelihood is learned, we multiply by a prior and run
the ususal MCMC.
• In Neural Posterior Estimation one learns directly p(θ|x). One thus does not have to
run an MCMC anymore.
To learn either of the likelihood or posterior we need a parametric density model, that
is some function that we can fit to the data set Eq. (28.1) by adjusting its parameters. The
state-of-the-art to do this is to use so-called normalizing flows, for example the masked au-
toregressive flow (MAF). A normalizing flow is a neural network that transforms a simple
base distribution (usually a Gaussian) into a complicated target distribtion by learning a series
of diffeomorphisms (i.e. an invertible and differentiable change of coordinates). Fitting/Training
the normalizing flows works by adjusting its weights using auto-differentiation, in the same way
as ordinary neural networks are trained (though with a different loss function). By now SBI
methods have been fairly well established and, as in the case of MCMC, you don’t neccessarily
have to understand the methods in great detail to use them. A key challenge is to check that
the learned likelihood or posterior is correct (especially that it is not over confident). One ap-
proach to do so is called Simulation-based calibration. My lecture slides on AI in Physics
https://ai.physics.wisc.edu/teaching/ contain more details of the above methods, for ex-
ample the required neural network training objectives.
146
28.2.5 Theory emulators
An approach that is related to SBI and getting popular in cosmology is the generation of neural
network based emulators of summary statistics, in particular of the power spectrum. Even for
linear physics, running CAMB or CLASS at each point in the Monte Carlo chain is annoyingly
slow. To speed this up, cosmologists have trained neural networks to emulate the Boltzmann
solver, i.e. provide the power spectrum P theo,lin (k, Θ) as a function of Θ. An example of this is
Cosmopower 2106.03846. Using an emulator, the MCMC will run much faster than with the full
Boltzmann solver.
Using simulations with different cosmological parameters, one can also make a power spectrum
emulator of the non-linear matter power spectrum. This was done for example here 2207.12345.
It is also possible to combine dark matter simulations with a biasing model (1910.07097) to obtain
a bias dependent emulator of the halo (or galaxy) power spectrum. Emulators can be paired with
SBI by learning their likelihood with NDEs.
dobs = d + n (28.2)
where the noise has covariance N . The likelihood of observing dobs given s is thus given by
1
log L(d|s, Θ) = − (f (s, Θ) − dobs )T N −1 (f (s, Θ) − dobs ) + const. (28.3)
2
We want to turn this around and get the posterior P(s, Θ|d). This will give us the joint PDF
of the initial conditions of the universe and the cosmological parameters, and thus measure both
of them. We thus need a prior on s, which is that it is a Gaussian field with some primordial
parameters Θ′ (e.g. As , ns ) that define the primordial power spectrum. The prior is
147
Using Bayes theorem we get the posterior
1 1
log P (s, Θ, Θ′ |d) = − (f (s, Θ) − dobs )T N −1 (f (s, Θ) − dobs ) − sT S −1 (Θ′ )s (28.7)
2 2
1 ′ ′
− log |S(Θ )| + log P(Θ ) + log P(Θ) + const. (28.8)
2
Usually we don’t care about the initial conditions (the “phases”) of the perturbations and only
want to know the cosmological parameters. In this case we need to marginalize to get
Z
P (Θ, Θ |d) = dsP (s, Θ, Θ′ |d)
′
(28.9)
The posterior Eq. (28.7) is not so different conceptually from the posterior of the power spectrum,
but here our variables are the entire cosmological field. The reason it is even possible to write
down a posterior PDF for the field is that we know the PDF of the initial conditions, and we
can forward model the field to late times. Assuming that our forward model is correct, this
analysis will be statistically optimal, i.e. it includes all available information. The problem is
to handle such a huge computation problem. The data d and initial conditions s can be easily
10003 dimensional. A normal MCMC would never converge. Fortunately there are techniques
for extremely high dimensional inference. Before discussing inference let’s get an overview of the
forward models. Forward modelling does not have to be done with the galaxy density of course,
weak lensing or intensity mapping is also an appealing target.
There are different types of forward models that are being used, in particular:
• Pertubation theory at field level plus a bias expansion (1808.02002). This case is the
most tractable, but can of course not go beyond the regime of perturbation theory. It is
still somewhat unclear to what extent a PT forward model can outperform the traditional
analysis (2307.04706).
• Hybrid EFT (1910.07097), a combination of dark matter simulations and a bias expansion
in Lagrangian space.
Recall however that to draw a single sample of the posterior, we need to call the forward model.
So our forward model should be fast enough to be evaluated millions of times. Above we have
not written out nuisance parameters of the forward model and one can also include stochastic
sampling to model galaxy formation, but our discussion captures the essential features. As a side
note, these same forward models (minus the requirement for differentiability) can also be used to
get the SBI training data in Eq. (28.1).
Next to the forward model we need an inference algorithm to approximate the posterior
and/or sample from it. All of these require that the forward model is differentiable with respect to
all parameters, including the initial conditions s. This is where auto-differentiation, for example
Jax or pytorch, comes in. Inference algorithms that have been proposed include:
148
• Finding the MAP and making a Gaussian approximation around it for error bars (1706.06645).
Finding the MAP by gradient descent is faster than sampling, but it is hard to get reliable
error bars.
• Hamiltonian Monte Carlo (HMC), (1203.3639). This is the most reliable but also most
computationally intense approach. Recently a different variant of Monte Carlo was used
that is also promising: 2307.09504.
In all of these cases it is difficult to deal with a multimodal posterior which is expected at
small scales. Even without that problem, it is difficult to generate enough independent samples
and be sure the posterior surface is well covered. Also, not all parameters are created equal
and it is hard for example to sample from band powers of the initial power spectrum. While
appealing in principle, it is still very hard to use this approach in practice on data, especially
with a non-perturbative forward model. On the other hand, forward modeling can in principle
strongly improve constraints by breaking parameter degeneracies present in N-point function
analysis (2112.14645) and is the only provably optimal approach.
149
28.4 Generative Machine Learning at field or point cloud level
I want to briefly mention one more major area of machine learning research in cosmology, that
of generative modeling at the field level. A generative model can emulate a simulation and be
potentially much faster than the original simulation that it was trained on. Making simulations
(or emulations if you prefer) is also possible without generative (probabilistic) modelling. For
example one can train a deterministic U-net to go from initial conditions, which are very fast to
generate, to the late time matter distribution. Generative modeling on the other hand includes
a step of random sampling, i.e. every time we run the machine learning model we get a different
result.
The main machine learning models that can generate cosmological distributions such as δm (x)
at high resolution are
and all of them have been used in cosmology. One can model either density fields or displacement
fields of particles. Very recently, people also work with point cloud models (2311.17141), that
don’t generate a field (image) but a set of points (galaxies). All three generative methods can
also work with point clouds.
A main use for generative models is to speed up simulations. For example, one may be
able to upgrade a low-resolution dark matter simulation to an emulated high resolution hydro
simulation. Or one may populate a low resolution dark matter simulation with realistic galaxies.
Of course, to train all of these models one needs to have some high-resolution training simulations.
The generation process can also be conditioned on cosmological parameters. Such conditional
generative models can also be run in inverse mode to estimate cosmological parameters.
There is a rapidly growing literature in this field and I have only cited a small subset thereof.
150