Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
93 views155 pages

Cosmology Lecture Notes

Uploaded by

Sampreeth Gopala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views155 pages

Cosmology Lecture Notes

Uploaded by

Sampreeth Gopala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

Lecture Notes on

Computational Cosmology
Moritz Münchmeyer

Physics Department, University of Wisconsin-Madison

Version: June 27, 2024


Contents
1 Introduction 1

I Basics of Cosmology 2

2 Scales and Units 2

3 Expansion of the Universe 3


3.1 The metric 3
3.2 Flat expanding space time 4
3.3 Curved expanding space time 6
3.4 Redshift 7
3.5 Hubble’s law at close distance 8
3.6 Proper distance, Hubble’s law, Hubble distance, Hubble time 8

4 Dynamics of the Homogeneous Universe 9


4.1 Cosmological fluids and equation of state 10
4.2 Solving Einstein’s Equation 11
4.3 Continuity Equation 11
4.4 Friedmann equation 13
4.5 Solutions to the Friedmann equation for a single fluid in flat space 13
4.6 Critical density 15
4.7 General solution to the Friedmann equation 15
4.8 Lambda-CDM and its parameters 16
4.9 Matter-Radiation Equality 17
4.10 Numerical examples 18

5 Early Universe Thermodynamics 18


5.1 Overview 19
5.2 Statistical mechanics description of the universe 19
5.3 Thermal equilibrium 20
5.4 Beyond Equilibrium: The integrated Boltzmann equation 25
5.5 Beyond Homogeneity: The Einstein-Boltzmann equations 26

6 Inflation 26
6.1 The flatness problem 27
6.2 The horizon problem 27
6.3 Inflationary expansion 30
6.4 The field theory of inflation 32
6.5 The quantum field theory of inflation 34
6.6 Primordial perturbations from inflation 35

1
II Introduction to Computation and Statistics in Cosmology 40

7 From Initial Conditions to Observed Data 40

8 Overview of Observed Data 42

9 Random Fields in Cosmology 44


9.1 Random scalar fields in Euclidean space 44
9.2 Gaussian Random Fields 48
9.3 Power Law Power Spectra 49
9.4 Matter Power Spectrum and Boltzmann Codes 53
9.5 Random scalar fields in discrete coordinates 55
9.6 Power spectrum estimation 57

10 Basics of Statistics 60
10.1 Estimators 60
10.2 Likelihoods, Posteriors, Bayes Theorem 61
10.3 Gaussian Likelihoods 62
10.4 Using the likelihood and the posterior 64
10.5 Fisher forecasting 65
10.6 Sampling the posterior: MCMC 68
10.7 Other algorithms beyond MCMC 70
10.8 Goodness of fit 71
10.9 Model comparison 72

11 Analyzing an N-body simulation 73

III Cosmic Microwave Background 74

12 Random fields on the sphere 74


12.1 Spherical harmonics 74
12.2 2-point function 75
12.3 Discretization with HEALPix and Pixell 76
12.4 Projections of 3D random fields to the sphere 76
12.5 Power spectrum estimator and covariance 78
12.6 Flatsky coordinates 79

13 Primary CMB power spectrum 79


13.1 Transfer functions and line-of-sight solution 79
13.2 The physics of the CMB Power spectrum 80

14 Analyzing the CMB power spectrum 81


14.1 Beam and Noise 82
14.2 Simple power spectrum estimator: Transfer function and bias 84

2
14.3 Mask and mode coupling 84
14.4 Pseudo-Cl estimator and PyMaster 86
14.5 Wiener filtering 87
14.6 Likelihood of the CMB 88
14.7 Tools to sample the CMB likelihood 88

15 Polarization and primordial B-modes 89

16 Primordial non-Gaussianity 91
16.1 Primordial bispectra 91
16.2 CMB bispectrum 92
16.3 Optimal estimator for bispectra 93
16.4 The separability trick 94

17 Secondary anisotropies: CMB lensing 94


17.1 CMB lensing potential 95
17.2 Lensed CMB map 96
17.3 Quadratic estimator for lensing 96
17.4 Physics with CMB lensing 98

18 Secondary anisotropies: Sunyaev-Zeldovich effect 98


18.1 Thermal SZ effect 98
18.2 Matched filter and tSZ stacking 99
18.3 Kinetic SZ effect 100

19 Foregrounds and foreground cleaning 100


19.1 Galactic foregrounds of the CMB 101
19.2 The ILC algorithm 102
19.3 Component separation 103

IV Large-Scale Structure 105

20 The galaxy power spectrum at linear scales 106


20.1 Linear galaxy bias 106
20.2 Shot noise 106
20.3 Velocity field on large scales 107
20.4 Red shift space 107
20.5 Redshift space distortions of the density field 108
20.6 Redshift space distortions of the galaxy power spectrum 109
20.7 Alcock–Paczynski effect 109
20.8 Red shift binned angular correlation functions 109

3
21 Overview of LSS Perturbation Theory 110
21.1 Fluid approximation 110
21.2 Standard (Eulerian) Perturbation Theory 111
21.3 Lagrangian Perturbation theory (LPT) 115

22 Effective Field Theory of Large-Scale Structure* 116


22.1 Problems with SPT 116
22.2 Coarse graining and effective fluid 118
22.3 EFTofLSS solution and renormalization 123
22.4 EFTofLSS matter power spectrum result 126
22.5 Application to iso-curvature perturbations 126
22.6 From dark matter to galaxies: The bias expansion 128
22.7 Application of the EFTofLSS to simulations and real data 130

23 N-body simulations 130


23.1 Equations for particles 131
23.2 Evaluating the potential 133
23.3 Baryonic simulations 134

24 Halos and Galaxies 134


24.1 Halos and Halo mass profile 134
24.2 Halos mass function 135
24.3 Halo bias 136
24.4 Halo model 137

25 Analyzing a Galaxy Survey Power Spectrum 139


25.1 Power spectrum estimator 139
25.2 Covariance matrix estimation 140

26 Non-Gaussianity 141
26.1 Tightening measurements of cosmological parameters 141
26.2 Primordial non-Gaussianity 142

27 Galaxy Weak Lensing 142

28 Modern Inference Methods 143


28.1 Overview 144
28.2 Simulation-based Inference 144
28.3 Probabilistic Forward Modeling at Field Level 147
28.4 Generative Machine Learning at field or point cloud level 150
1 Introduction
This is a one-semester course on Computational Cosmology, aimed in particular at beginning
graduate students working in cosmology. My goal in these lectures is to focus on computational
and statistical methods that data oriented cosmologists need, while being brief on theoretical
foundations that can be learned later on in more detail. For example, topics that are not treated
in any detail here are inflation, BBN and relativistic perturbation theory. These topics are
typically covered in a course on theoretical cosmology. There are many excellent texts on these
topics.
These lectures are primarily about the two most important data sources for cosmology: the
Cosmic Microwave Background and Large-Scale Structure (LSS) surveys. Of course there are
many other useful probes of the universe, such as Supernovae, Strong Lensing and Gravitational
Waves. However we won’t have time to cover these in any detail this semester.
For the CMB and LSS we will cover how cosmological parameters can be measured from data
using both the standard methods and more recent developments. My goal is to be broad rather
than detailed, and mention many of the techniques that all cosmologists should know. I also try
to provide good references for each section so you can dig deeper.
The course is organized as follows. In Unit 1, we discuss fundamentals of cosmology for those
of you that are completely new to the field. If you studied theoretical cosmology before you
can perhaps skip this unit. In Unit 2, we introduce common statistical and computational tools
required for data analysis, such as random fields, correlation functions, likelihoods and MCMC.
We finish this unit by analyzing an N-body simulation to measure its cosmological parameters. In
Unit 3 we discuss the CMB, focussing on data analysis methods. We include brief discussions of
several advanced topics, such as lensing and kSZ. In Unit 4, we study data analysis in large-scale
structure. We both discuss the classic power spectrum analysis, and more recent developments
such as simulation-based inference.
Please check out my website https://munchmeyer.physics.wisc.edu/lecture-notes/ for
computational notebooks, links, and updated versions of this course. If you find any errors or
typos, please drop me an email so that I can fix them ([email protected]). If you use these
notes for teaching or learning, I’d also be very happy to hear about it. If you want to build on
these notes, the latex version is available on request.
I thank Utkarsh Giri for contributing the MCMC analysis of the Quijote Power Spectrum and
Sai Tadepalli for developing and teaching the section on EFTofLSS. I also thank Jacob Audette
for making the front page graphic.

1
Part I
Basics of Cosmology
In the first part of this course we will briefly review basics of cosmology. My main goal is to
introduce the coordinates as well as the physical parameters that we want to measure in data
later on.

Further reading
There are many excellent textbooks that go deeper into these foundations. This section is based
primarily on the text books

• Daniel Baumann - Cosmology (2021) as well as his TASI lecture notes (arxiv: 0907.5424).
Material similar to the textbook is also available here: http://cosmology.amsterdam/
education/cosmology/.

• Dragan Huterer - A course in cosmology (2023)

I’m also using David Tong’s cosmology lecture notes from

• http://www.damtp.cam.ac.uk/user/tong/teaching.html

I recommend all of David’s lecture notes. Another popular textbook which we will use is

• Dodelson, Schmidt - Cosmology (2020).

2 Scales and Units


The universe is obviously enormous. In fact, while the observable universe is finite (due to
the finite age of the universe and the finite speed of light), it is not known whether the universe
is finite as a whole.
Cosmologists like to measure distances in parsec (pc). The conversion from pc to lightyears is

1 pc = 3.26 ly (2.1)

Parsecs are the typical distance of nearby stars. A parsec is equal to the distance at which 1
AU (astronomical unit – average distance between Earth and the Sun) is seen at an angle of
1
one arc second, which is 3600 of a degree. The size of our galaxy is more conveniently given in
kilo parsec (kpc). The Milkyway is about 30 kpc in diameter, and we are about 8kpc from the
center. The distance to other galaxies is usually given in mega parsec (Mpc). The nearest spiral
galaxy, Andromeda, is about 1Mpc away from us. The comoving distance (we’ll explain the term
“comoving” soon) to the edge of the observable universe is about 14.3 Gpc.
An order of magnitude estimate is that the observable universe contains about 100 billion
galaxies and a typical galaxy contains about 100 billion stars. There is no reason to believe that
our galaxy or star are particularly special in the cosmological sense.

2
3 Expansion of the Universe
In this section we want to understand the equations that govern the evolution of the entire
universe. Cosmology can be understood in two steps:

• On large scales (i.e. after smoothing out small scale irregularities such as galaxies), the
universe is uniform. By studying its average contents we can understand the back-
ground expansion of the universe. The Cosmological Principle states: On the
largest scales, the universe is spatially homogeneous and isotropic.

• On smaller scales, there are initially small and later very large inhomogeneities (such as
galaxies). The evolution of these cosmological perturbations on top of the background
expansion is much more complicated, but tells us much of what we know about the universe.

Following the standard practice of cosmology courses, we will first discuss the uniform large-scale
universe and then later discuss perturbations.
To start describing the universe mathematically we first need to define coordinates. A cru-
cial feature of cosmology is that space-time cannot be treated statically because the universe is
expanding very substantially during its history.
A large part of theoretical and computational cosmology can be done by assuming that the uni-
verse is flat. To date there is no experimental evidence for any curvature of space on large scales.
We thus focus on flat expanding space-time and are brief on the generalization to curvature.

3.1 The metric


We can define a space-time through the so called metric. The metric can be thought of as a
mathematical object that turns coordinate distances (which are a matter of coordinate choice)
into physical distances (which are invariant and thus physically meaningful). For 3-dimensional
Euclidean space the physical distance dl is related to Euclidean coordinate distance (dx,dy,dz)
by

3
X
dl2 = dx2 + dy 2 + dz 2 = δij dxi dxj , (3.1)
i,j=1

where the Kronecker delta δij = diag(1, 1, 1) is the metric. If we were to use spherical coordi-
nates instead we would get

3
X
2 2 2 2 2 2 2
dl = dr + r dθ + r sin θdϕ = gij dxi dxj , (3.2)
i,j=1

where (x1 , x2 , x3 ) = (r, θ, ϕ) and the metric is gij = diag(1, r2 , r2 sin2 θ).
Since Einstein we know that physics is really happening in space-time and that distances in
time and space are not independently invariant. We instead need a metric that turns space-time
coordinates xµ = (ct, xi ) into the invariant space-time distance (also called invariant line
element)

3
3
X
ds2 = gµν dxµ dxν ≡ gµν dxµ dxν . (3.3)
µ,ν=0

In the specific case of special relativity where space-time is not curved (Minkowski space),
using Euclidean coordinates, this line element is

3
X
ds2 = −c2 dt2 + δij dxi dxj (3.4)
i,j=1

= −c dt + dx2
2 2
(3.5)

and the Minkowski metric is gµν = diag(−1, 1, 1, 1). Recall from special relativity that ds
can be positive, negative or null.
In general relativity, the metric depends on the position in space-time, gµν (t, x). The metric is
of course coordinate dependent. To give a physical description of curvature that is independent
of the choice of coordinates one needs to use the formalism of Riemann geometry. In this course
we won’t need much of that.

3.2 Flat expanding space time


The space-time metric of cosmology, assuming a flat space (but not flat space-time) is a simple
generalization of Minkowski space, where we scale the spatial part of the metric with the time
dependent scale factor a(t):

ds2 = −c2 dt2 + a(t)2 dr2 (3.6)

The spatial coordinates r are called the comoving coordinates. The comoving coordinate of
an object does not change due to the expansion of space time. The comoving coordinate system
expands with spacetime, as illustrated in Fig. 1. In computational cosmology we usually work
with comoving coordinates (e.g. comoving galaxy positions in an N-body simulation). The scale
factor is usually defined to be equal to 1 today, a(ttoday ) = 1. To define the coordinates we also
need to set some origin O where r = 0 and t = 0.
We also define the physical coordinates rphys (t) = a(t)r(t). If an object has a trajectory
r(t) in comoving coordinates and rphys = a(t)r in physical coordinates, the physical velocity of
the object is
drphys da dr
vphys ≡ = r + a(t) ≡ H(t)rphys + vpec , (3.7)
dt dt dt
where we have introduced the Hubble parameter

H≡ (3.8)
a
and the peculiar velocity
vpec ≡ a(t)ṙ (3.9)

4
Figure 1. Comoving coordinate grid on an expanding spacetime.

The first term Hrphys is the Hubble flow, which is the physical velocity of the object due to
the expansion of space between the origin and the object. This expression is a version of Hubble’s
law (though not the original one where H is time-independent). The second term, the peculiar
velocity, describes the motion of the object relative to the cosmological rest frame. Typical
peculiar velocities of galaxies are hundreds of km/s, so β = vc ≃ 10−3 . The present day value of
the Hubble parameter is1
H0 ≃ 67.8 km s−1 Mpc−1 (3.10)
This is telling us that a galaxy 1 Mpc away will be seen to be retreating at a speed of 67.8 km/s
due to the expansion of space. Galaxies that are farther away than a few Mpc thus have a
larger recession speed due to the Hubble flow than due to their peculiar velocities. A common
definition of the Hubble parameter is given by introducing h so that H0 = 100 h km s−1 Mpc−1
with h ≈ 0.678.
It is often useful to write the metric in polar coordinates:

ds2 = −c2 dt2 + a2 (t) dr2 + r2 dΩ2



(3.11)

where

dΩ2 = dθ2 + sin2 θdϕ2 (3.12)

is the metric on the unit two-sphere. This metric is useful to describe observations by an observer
at the coordinate center of the universe. The radial coordinate r is called the comoving distance
to the origin.
A further way to write the metric is by introducing conformal time
dt
dη = (3.13)
a(t)
1
You may notice that I am primarily a CMB cosmologist (see the “Hubble tension”).

5
Conformal time slows down with the expansion of the universe. The metric is then

ds2 = a2 (η) −c2 dη 2 + dr2 + r2 dΩ2



(3.14)

The scale-factor is now a time-dependent overall factor infront of a static metric. Conformal
coordinates are especially useful to analyze light rays and causality.

3.3 Curved expanding space time


The generalization of the line element (3.11) to curved space-time is
 
dr 2
ds2 = −c2 dt2 + a2 (t)  kr2
+ r2 dΩ2  (3.15)
1 − R2
0

This general form is called the Friedmann-Lemaitre-Robertson-Walker (FLRW) metric.


Here the constant is k = 0 for flat space, k = 1 for positively curved space and k = −1 for
negatively curved space. R0 is the curvature scale, which defines how strongly the space
is curved. The three spaces are the three maximally symmetric three-spaces, i.e. they are
homogeneous and isotropic. This metric is derived in most textbooks. Note that the FLRW
metric is not invariant under Lorentz transformation. This means that the universe picks out a
preferred rest frame, described by co-moving coordinates (physically the matter content breaks
invariance under Lorentz boosts). A closed universe (positive curvature) has a finite volume
and the angles of a triangle add up to more than 180◦ (like on a sphere). An open universe
(negative curvature) has infinite volume and the angles of a triangle add up to less than 180◦ .
This metric is sometimes written by defining the radial coordinate
dr
dχ ≡ q (3.16)
kr2
1− R02

which can be integrated to obtain r = Sk (χ) where



sinh(χ/R0 ),

 k = −1
Sk (χ) ≡ R0 χ/R0 , k=0 (3.17)


sin(χ/R ), k=1
0

The metric is then

ds2 = −c2 dt2 + a2 (t) dχ2 + Sk2 (χ)dΩ2



(3.18)

Note that for flat space-time k = 0 there is no difference between comoving distance r and radial
coordinate χ. Finally, using again conformal time, this metric can be written as

ds2 = a2 (η) −c2 dη 2 + dχ2 + Sk2 (χ)dΩ2


 
(3.19)

6
3.4 Redshift
The expansion of the universe means that light rays which travel through the universe also
get stretched. This is called cosmological redshift. We define the dimensionless redshift
parameter
λ0 − λ1 f1 − f0
z= = (3.20)
λ1 f0
where λ0 is the observed wave length and λ1 is the emitted wavelength. It turns out that the
ratio of wavelength scales as the ratio of the scale factor:
λ0 a(t0 )
= (3.21)
λ1 a(t1 )
This result is intuitive: the photon wave is stretched with the expansion of space. Let’s derive
this result starting from the metric. In general relativity, light rays travel along null geodesics,
meaning that ds = 0. A light ray on a radial direction (with fixed θ and ϕ) will thus obey

c dt = ±a(t)dχ (3.22)

where the minus sign describes light moving towards us (i.e. as t gets larger χ gets smaller).
Thus we have
c dt
= ±dχ (3.23)
a(t)
Let’s consider a crest of the light wave to be emitted at time t1 from distance χ1 . We observe
the crest at time t0 at position χ0 = 0. Thus we get the integral equation
Z t0 Z χ1
c dt
= dχ first crest (3.24)
t1 a(t) 0

The next crest of the wave is emitted at time t1 + δt1 and received at time t0 + δt0 . Thus the
integral equation is
Z t0 +δt0 Z χ1
c dt
= dχ second crest (3.25)
t1 +δt1 a(t) 0

The right hand side of these two relations are the same and thus we have
Z t0 Z t0 +δt0
dt dt
= (3.26)
t1 a(t) t1 +δt1 a(t)
from which it follows that
t1 +δt1 t0 +δt0
dt dt
Z Z
= . (3.27)
t1 a(t) t0 a(t)

Because a(t) does not change significantly in a single tick δt1 or δt0 it follows that
δt1 δt0
= (3.28)
a(t1 ) a(t0 )

7
The time difference between two wave crests is δt = λc , with c the same at emission and reception,
and thus we confirm that
λ0 a(t0 )
= . (3.29)
λ1 a(t1 )

The same result can be derived more formally by considering massless particles in General
Relativity (see Baumann’s book Sec 2.2). With the scale factor normalized to a(t0 ) = 1 we find
from Eq. (3.20) that
1
1+z = (3.30)
a(t1 )

Redshifts of galaxies are roughly in the range 0 < z < 10 (at higher redshifts no galaxies have had
time to form since the big bang) and the redshift of the CMB is about z = 1100. For example, a
galaxy at red shift 2 is observed when the universe was 1/3 of its current size.

3.5 Hubble’s law at close distance


To connect to Hubble’s law we Taylor expand the scale factor at some time t close to the present
time t0 ,
a(t) = a(t0 ) + ȧ|t=t0 (t − t0 ) + · · · = a(t0 )[1 + H0 (t − t0 ) + · · · ] (3.31)
where H0 ≡ (ȧ/a)|t=t0 is the Hubble parameter today, which we call the Hubble constant even
though it is not constant in time, and t − t0 is called the lookback time. The first order Taylor
expansion is valid for nearby sources. Using (3.30) and inverting (3.31) using the binomial series
(1 + x)α ≈ 1 + αx we find
z = H0 (t0 − t1 ) + . . . . (3.32)
q
1+v/c v
In the non-relativistic limit of the Doppler shift of light z = 1−v/c − 1 we have z = c , and
this relation can be used at any velocity to define the so called redshift velocity. We also
define a distance d to the object as d = c(t0 − t1 ). Using these approximations we find the
Hubble-Lemaitre law

v = zc = H0 d. (3.33)

which is the famous linear relation between distance and recession velocity found by Hubble.
This relation is valid for z ≪ 1. In the next section we describe the Hubble law valid at any
distance.

3.6 Proper distance, Hubble’s law, Hubble distance, Hubble time


So far we have discussed coordinates, now we discuss the so called proper distance or instan-
taneous physical distance. This is the distance between two objects at a given fixed time
(imagine stopping the expansion of the universe and going there with a meter stick). Recall that
in special relativity the proper length of an object is the distance between the two end points
in space in a reference frame where both of these are at rest. The proper distance between two
spacelike-separated events is the distance between the two events, as measured in an inertial

8
frame of reference in which the events are simultaneous. Here our frame of reference is the one
provided by the global FLRW metric.
The proper distance dp between the origin and a point at coordinate (r, θ, ϕ) at fixed time t is
Z Z
dp = ds = a(t)dχ = a(t)χ (3.34)

The proper distance, and its time derivative appear in the Hubble law:

˙ = ȧχ = ȧ aχ = H(t)dp
dp (3.35)
a
The Hubble law in flat space-time discussed above in Eq. 3.7 agrees with this expression.
The Hubble distance (or Hubble radius) is defined as the distance where the recession
velocity of an object without peculiar velocity becomes equal to the speed of light. This is the
case when

d˙p = H0 dp = c (3.36)

and thus
c
dH (t0 ) = (3.37)
H0

For H0 = 67.8 km s−1 Mpc−1 this gives dH = 4420Mpc. This is the distance of galaxies that are
currently receding at the speed of light (not at the time when their light was emitted).
Note that the Hubble constant has inverse units of time. Another common definition is thus
the Hubble time
1 a
tH = = = 4.45 × 1017 s = 14.4Gyr (3.38)
H0 ȧ
This turns out to be pretty close to the age of the universe, which is somewhat accidental. The
Hubble time is the age the universe would have, if the expansion had been linear, which is not
the case as we shall soon see.
Measuring the Hubble parameter directly is difficult. One needs to measure both the distance
of objects as well as their recession speed, which are not directly observable. A primary tool to
do this are SN1a supernovae, but we won’t be covering this method this semester. However, the
Hubble parameter can also be measured somewhat more indirectly with the CMB and with LSS.

4 Dynamics of the Homogeneous Universe


We have seen that the metric of a homogeneous isotropic universe (one with the same matter
content everywhere in space) is the FLRW metric. The free quantity that we need to determine
is the evolution of the scale factor a(t), which depends on matter and energy content of the
universe. These calculations require both general relativity and thermodynamics. Since this is
not the focus of the present course, we will only discuss the results.

9
4.1 Cosmological fluids and equation of state
According to the cosmological principle, we want to consider homogeneous and isotropic contents
of the universe, which are called cosmological fluids and specified by:

• energy density ρ(t). This has units energy per volume, and thus E 4 in natural units.

• pressure P (t). This is the flux of momentum across a surface of unit area (which is
equivalent to force per area if there were a wall). The units are also E 4 in natural units.
Note that positive pressure leads to gravitational attraction (i.e. wants to contract), not
expand as would be the case for a balloon. This is because the kinetic energy of the particles
contributes to the positive energy density which attracts gravitationally.

The relation between the energy and pressure P = P (ρ) is called the equation of state of the
fluid. Note that in GR, both energy density and pressure gravitates, i.e. they are part of the
energy momentum tensor. The equation of state is calculated in (relativistic) thermodynamics.
The two main forms of cosmological fluids in the universe are non-relativistic particles, also
called dust or simply matter and relativistic particles such as photons, also called radiation.
The cosmological fluid is made up of particles which obey the relativistic relation

E 2 = p2 c2 + m2 c4 (4.1)

The two fluids come from considering this equation in its two limits:

• Non-Relativistic particles: pc ≪ mc2 . Here the energy is dominated by the mass,


p
E ≈ mc2 , and the velocity of the atoms is v ≈ m . This is true for example for galaxies and
galactic dust.

• Relativistic particles: pc ≫ mc2 . Here the energy is dominated by the momentum,


E ≈ pc, and the velocity of the atoms approaches the speed of light |v| ≈ c. This is
true for photons at any time, as well as for massive particles at early times depending
on the temperature of the universe. As we will discuss, the temperature in the past was
T = T0 /a(t) (where T0 ≃ 2.75K) and the energy of particles in thermodynamic equilibrium
is of course ⟨E⟩ = kB T .

For non-relativistic particles, the equation of state parameter is


P
w= ≈0 matter (4.2)
ρ
which means that the pressure of matter is negligible compared to the energy density in its mass.
For example, even though the pressure of the gas in our atmosphere is not strictly zero, the
kinetic energy in the gas molecules is far lower than the energy in their rest mass, and thus the
pressure contributes almost nothing to gravity.
On the other hand, for radiation (including photons, relativistic neutrinos and gravitational
waves), the equation of state is
P 1
w= = radiation (4.3)
ρ 3

10
The factor 1/3 comes from 3-dimensional space.
Finally we need the equation of state parameter of dark energy, which is
P
w= = −1 dark energy (4.4)
ρ
We’ll talk more about this exotic substance later. The key point is that this substance has a
negative pressure that leads to gravitational repulsion.

4.2 Solving Einstein’s Equation


In this course we will not derive the equations that govern the evolution of the scale factor, which
are the continuity equation and the Friedmann equation. These are derived in many textbooks,
such as the one from Baumann. One starts with the field equation of general relativity, the
Einstein equation. It plays a role similar to Maxwell’s equations in electrodynamics and is
given by
8πG
Gµν = 4 Tµν (4.5)
c
where Gµν is called the Einstein tensor which can be expressed as a function of the metric gµν
and its derivatives and defines the curvature of space.
On the right side of the equation is the energy-momentum tensor Tµν (also called the
stress-energy tensor). The energy-momentum tensor is the source term for the curvature, similar
to electric charge in the Maxwell equations. Both energy density and pressure are sources of
curvature.
For a perfect fluid, in the frame of the comoving observer, the energy-momentum tensor is
given by  
−ρc2 0 0 0
 0 P 0 0
µ
Tν = . (4.6)
 
 0 0 P 0
0 0 0 P
By plugging this tensor in the Einstein equation, one can derive the equation of motion of the
scale factor, the Friedmann equation. There is also a conservation law for the energy-momentum
tensor given by
∇µ T µ ν = 0 (4.7)
where ∇ is the so-called covariant derivative. This equation implies the continuity equation
we discuss below.
Some books also motivate the Friedmann equation and the continuity equation by arguments
from Newtonian physics, to avoid GR. This is done in the book by Huterer and the lecture notes
by Tong. I encourage you to check these out.

4.3 Continuity Equation


The first equation of the dynamics of the universe which we will study is the so-called continuity
equation
ρ̇ + 3H(ρ + P ) = 0 (4.8)

11
This is the expression of energy conservation in a cosmological setting. Note however that energy
is a subtle concept in cosmology, due to the broken time translation invariance.
Using the equation of state P = wρ and assuming a single substance with given w we get
ρ̇ ȧ
= −3(1 + w) = −3(1 + w)H (4.9)
ρ a
To find the relation between ρ and a we can integrate this equation to get:
   
ρ a
log = −3(1 + w) log (4.10)
ρ0 a0

and thus
ρ(a) = ρ0 a−3(1+w) (4.11)
where we’ve used the fact that a(t0 ) = 1 and where ρ0 is the density today (i.e. at a = 1).
Using their equation of state we find the following scalings for our three substances:

• Matter (w = 0):
1
ρm ∝ (4.12)
a3
This is just the dilution with the volume that grows as V ∝ a3 .

• Radiation (w = 1/3):
1
ρr ∝ (4.13)
a4
Radiation is not only diluted with the volume but in addition there is a linear redshift effect
on the wave length and thus on the energy E = hc λ.

• Dark energy (w = −1):

ρΛ = const. (4.14)

Dark energy has a constant energy density. It does not dilute with the expansion of space. A
universe where ρΛ ̸= 0 will always ultimately be dominated by dark energy. There are also
more complicated dark energy models where dark energy is not the cosmological constant.
In these, the equation of state can deviate from w = −1 and can also be time dependent.
So far there is no evidence for such models.

The different substances dilute differently with an expanding universe, and thus their mutual
importance changes. This is a crucial result in cosmology. In addition, note that total energy is
not conserved. This is due to the broken time translation invariance in an expanding universe.

12
4.4 Friedmann equation
The continuity equation tells us ρ(a) but it is not enough to determine a(t) or ρ(t) for a given
collection of homogeneous fluids. For this we need the famous Friedmann Equation. The
dynamics of the scale factor is dictated by the energy density ρ(t) through the Friedmann equation
 2
2 ȧ 8πG kc2
H ≡ = ρ − (4.15)
a 3c2 R02 a2

where R0 is the curvature scale, and, as in the FLRW metric, k is either -1, 0, or +1 determining
the curvature of space, and G is Newton’s gravitational constant given by

G ≈ 6.67 × 10−11 m3 kg−1 s−2 (4.16)

The Friedmann Equation, continuity equation, and equation of state together form a closed set
of equations that determines the background evolution of the universe.
By taking the time derivative of the Friedmann equation and using the continuity equation one
can derive a further useful equation which is called the acceleration equation or the second
Friedmann Eq. or the Raychaudhuri equation. It gives the acceleration rate of the scale
factor as
ä 4πG
= − 2 (ρ + 3P ) (4.17)
a 3c
4.5 Solutions to the Friedmann equation for a single fluid in flat space
The Friedmann equation is easy to solve if we consider a flat universe k = 0 with only a single
type of fluid. From the continuity Eq. we had

ρ(t) = ρ0 a−3(1+w) (4.18)

Plugging this into the Friedmann Eq. for k = 0 we get


 2
ȧ 8πG ρ0 D2
= = (4.19)
a 3c2 a3(1+w) a3(1+w)
8πGρ0
where we defined the constant D2 = 3c2
. To solve this equation we take the square root

ȧ D
= 3/2(1+w) (4.20)
a a
and then integrate this equation:
Z a Z t
′ ′ 21 (1+3w)
da a =D dt′ (4.21)
0 0

where we picked time t = 0 to be the time of the big bang where a(t = 0) = 0. This leads to
1 3
3 a 2 (1+w) = D t (4.22)
2 (1 + w)

13
The common convention is that today at time t0 the scale factor is a(t0 ) = 1. This time is given
by
 −1
3
t0 = D (1 + w) (4.23)
2
Plugging this definition of t0 in our solution we can write it as
 2/(3+3w)
t
a(t) = (4.24)
t0
Let’s now consider the three types of fluids:
• Matter: For w = 0 we get
t 2/3

a(t) = (4.25)
t0
This is known as the Einstein-de Sitter universe. It can be used to approximate our current
universe if we neglect dark energy. In this universe the Hubble constant today is
ȧ 21
H0 = = (4.26)
a a=1 3 t0
With H0 ≃ 70 km s−1 Mpc−1 this gives an age of the universe of
t0 ≃ 1010 yrs (4.27)
It turns out that there are stars that are older than that, which shows that our universe
does not contain only matter.

• Radiation: For w = 1/3 we get


 1/2
t
a(t) = (4.28)
t0
and the relation between H0 and t0 is now
1
t0 = H0−1 (4.29)
2
• Dark energy: The dark energy density (or vacuum energy density) ρΛ is related to
the so-called cosmological constant by
Λc2
ρΛ = (4.30)
8πG
so that the Friedmann equation with k = 0 reads
Λ
H2 = (4.31)
3
and thus r
ȧ Λ
= (4.32)
a 3
which is solved by √
a(t) = A exp Λ/3 t (4.33)
This shows that for constant energy density we get exponential expansion. This space-
time is called deSitter space, and is used in inflation. Note that our previous calculation
Eq.(4.24) fails here because there is no time when a = 0 (no big bang).

14
4.6 Critical density
From the Friedmann equation Eq.(4.15), there is a certain density ρ for which the universe would
be flat, i.e. k = 0:
H 2 3c2
ρcrit = (4.34)
8πG
The Hubble parameter is time dependent so the critical density also varies. Today, the critical
energy density is
H 2 3c2
ρcrit,0 = 0 (4.35)
8πG
and the critical mass density is thus
ρcrit,0
≃ ×10−26 kg m−3 (4.36)
c2
which is about one hydrogen atom per cubic meter. The subscript 0 for today is often dropped
in the literature.
A very useful and common definition is the density of all fluids together relative to the critical
density, called the density parameter:

ρT OT (t)
ΩT OT (t) = (4.37)
ρcrit (t)

Our constraints on Ω today are roughly ΩT OT = 0.999 ± 0.002. Of course, curvature could
still exist but be smaller than that. Before the discovery of dark energy, several measurements
pointed to Ω ∼ 0.3.
It’s important to note that a flat universe will remain flat forever. To see this we re-write the
Friedmann equation as
kc2
1 − ΩT OT (t) = − 2 2 2 (4.38)
R0 a H
from which for κ = 0 it follows that ΩT OT (t) = 1.
However flatness is dynamically unstable. If the density is just slightly above or below the
critical density, the universe will become more curved quickly. This poses the question of why our
universe was so flat to begin with that it is still flat today. This can be explained by cosmological
inflation as we will discuss later.

4.7 General solution to the Friedmann equation


We now reinstate the curvature term in the Friedmann equation and consider a mix of fluids.
The Friedmann equation is then

8πG X kc2
H2 = ρw − (4.39)
3c2 R2 a2
w=m,r,Λ

The three fluids have individual density parameters:

ρm (t) ρr (t) ρΛ (t)


Ωm = Ωr = ΩΛ = (4.40)
ρcrit (t) ρcrit (t) ρcrit (t)

15
It is useful to write the curvature on an equal footing as the fluids by defining

3kc4
ρk = − (4.41)
8πGR02 a2

with critical density

ρk,0 kc2
Ωk = =− 2 2 (4.42)
ρcrit,0 R0 H0

We can then write the Friedmann Eq. as

H2 Ωr Ωm Ωk
2 = 4 + 3 + 2 + ΩΛ (4.43)
H0 a a a

The Ω are related as


Ωk = 1 − ΩM − ΩR − ΩDE ≡ 1 − ΩT OT (4.44)
Recall that ΩT OT > 1 for the closed universe case and ΩT OT < 1 for the open universe case.
We can now calculate a(t) for an arbitrary universe. Taking the square root of the Friedmann
equation
da p
= H0 Ωr a−4 + Ωm a−3 + Ωk a−2 + ΩΛ (4.45)
adt
which we can integrate to obtain

da′
Z a
−1
t(a) = H0 p (4.46)
0 Ωr a′−2 + Ωm a′−1 + Ωk + ΩΛ a′2

which can be evaluated numerically. The age of the universe today is given by setting a = 1.
Note that Ωm , Ωr and ΩΛ change over time, but in a flat universe ΩT OT = 1 does not change
change, as we saw above. In the same way a closed universe stays closed and an open universe
stays open (ΩT OT changes but not its sign).

4.8 Lambda-CDM and its parameters


The main success of cosmology is to establish the so-called standard model of cosmology, or
Lambda-CDM model (we will also write ΛCDM). This model fits an amazing amount of differ-
ent cosmological observations with stunning efficiency. In fact, only 6 parameters are required to
fit all this cosmological data (we don’t include known physical constants such as masses of known
particles in the counting). The Lambda-CDM (“Lambda cold dark matter”) model has three
components: (cold, non-relativistic) matter, radiation and the dark energy. We have already met
3 of the 6 ΛCDM parameters. Their current best fit values from the Planck CMB satellite are

• Matter density Ωm = 0.310 ± 0.007. This is the combined density of cold dark matter
and baryons.

• Baryon density ΩB h2 = 0.0224 ± 0.0002 (i.e. ΩB ≈ 0.05). Baryons are a part of matter
(the rest being dark matter ΩCDM ≈ 0.26). We need both of these components to fit
observations for reasons that we will discuss later.

16
• Hubble constant H0 = (67.9 ± 0.7) km s−1 Mpc−1 . There is currently a famous ∼ 5σ
disagreement of different measurements called the Hubble tension which we will talk
about later.

In addition, spatial curvature is not detected, Ωk = 0.001±0.002. Radiation Ωr is negligible today.


The current density in photons is about Ωγ = 5 × 10−5 . Neutrinos today are not relativistic
anymore and their density is about Ων = 3.4 × 10−5 . The density of dark energy of ΩΛ =
0.6847 ± 0.0073 follows from the other values. Note that the selection of 3 parameters is not
unique, for example one could switch ΩΛ and H0 or replace H0 by the age of the universe.
The above parameters define the background expansion of the universe. Another two param-
eters of ΛCDM are not about the background expansion but rather about the small inhomo-
geneities (perturbations) that seeded structure formation at the beginning of the universe. They
are called

• amplitude of primordial perturbations As and

• spectral index of primordial perturbations ns .

We’ll discuss these in the section about inflation. Finally a sixth parameter of ΛCDM is the
so-called

• Optical depth τ . This parameter describes how transparent the universe is for CMB light.
Physically it depends on how many free electrons there are, which depends on the process
of reionization.

This parameter is required to fit CMB data.


Much of this course will be about how these and other parameters can be measured with data
from large-scale structure and the CMB. Other parameters that we routinely fit to data either
have a standard model expectation (such as Nef f ) or are currently compatible with zero (such
as the tensor-to-scalar ratio r) and are thus not included in the counting of 6 Lambda-CDM
parameters. There is of course no guarantee that 6 parameters are sufficient to fit any future
data, and much of cosmology, as in particle physics, is about searching for physics beyond the
standard model (of cosmology).

4.9 Matter-Radiation Equality


Over much of the history of the universe one component of the fluids dominated over the others.
As we have seen from the Ω values today, currently we are in a Λ dominated era but matter is not
yet negligible, while radiation is four orders of magnitude smaller. From the scaling ρm ∝ a−3
and ρr ∝ a−4 it is clear that at earlier times matter and radiation dominated over dark energy,
and that at the earliest times radiation dominated over the other two components. We now want
to determine the time of matter-radiation equality.
It turns out that neutrinos were relativistic at the time of matter-radiation equality. Thus we
include them in the radiation density

Ωr = Ωγ + Ων ≈ 8.4 × 10−5 , (4.47)

17
To find the redshift where matter and radiation were equal, we equate their densities:

ρm (zeq ) = ρr (zeq )
ρm,0 (1 + zeq )3 = ρr,0 (1 + zeq )4
ρcrit,0 Ωm (1 + zeq )3 = ρcrit,0 Ωr (1 + zeq )4
ΩM
zeq = − 1 ≈ 3250
ΩR
Therefore the equality of the two happened when the scale factor was about 3000 times smaller
than today. Using a(t) = (t/t0 )2/3 , valid during matter domination, gives teq = 70.000 yrs. A
more accurate calculation using Eq.(4.46) gives teq = 50.000 yrs.
The growth of perturbations (such as those in CMB) depends sensitively on which component
is dominating the universe. Radiation pressure suppresses structure growth. We will study this
topic later.
We may also ask about matter-dark energy equality. Using a similar calculation one finds that
this happened about 4 billion years ago, which is relatively recently in cosmological terms.

4.10 Numerical examples


To understand the expansion of the universe better, I recommend that you make some plots of
various quantities. This can be done for example using the Boltzmann solvers CAMB or CLASS,
or the library AstroPy. Here are some ideas what to plot:

• Evolution of the different fluids. Plot ΩX for m, r, Λ as well as their sum as a function of
the scale factor from a = 10−5 to a = 100 on a log-x plot. Do the same as a function of
time.

• Evolution of H(a) and aH(a). Plot these functions for the same range of the scale factor.
Mark the radiation, matter and DE dominated regions. Do the same as a function of log
time and linear time.

• Evolution of the scale factor a(t).

5 Early Universe Thermodynamics


At early times the universe is well approximated by a hot fluid in thermal equilibrium. The
further we go back in time, the higher the temperature of the universe. This section covers the
so called hot big bang. Prior to that there was the period of inflation which we cover next.
As the universe gets hotter (i.e. we go backwards in time), the properties of the fluid and
the relevant physical particles change dramatically. For example, at some point the universe
was too hot to form atoms, nuclei and even nucleons, since their collisions would rip them apart
immediately. Up to temperatures that we have probed with particle colliders (∼ 1 TeV), the
fundamental physics is known through the standard model, and we can thus in principle calculate
what happens (the strong force is difficult to calculate because of the strong coupling, and even
for the other forces some precision calculations are still being improved). These calculations

18
generally match observations very well. Beyond that energy scale, theorists have come up with
different models. Of course, this is an opportunity to probe physics beyond the standard model.
I want to give only a very brief overview of this material. While important, these topics have
been worked out in detail and have been put into code packages that can be used without detailed
understanding for most practical purposes.

5.1 Overview
We can get an idea of the hot big bang simply from the following facts:

• The temperature in the past was T = T0 /a(t). This is because for radiation (which dom-
inates the thermodynamics even in the matter dominated universe) we have λ ∝ a as we
have derived and the temperature of a black body scales as λpeak = const.
T (Wien’s law). We
have already calculated a(t).

• The average energy of a particle in thermal equilibrium is ⟨E⟩ = kB T (up to a constant


factor counting degrees of freedom).

• The temperature of the universe now is T0 ≃ 2.75K, which is the temperature of the Cosmic
Microwave Background. A clump of gas without any source of heat (also no gravitational
heating) will be at this temperature.

• We know the masses of particles and binding energies of bound states. For example the
binding energy of electrons in atoms is of order eV. If the kinetic energy exceeds the binding
energy, the bound state will be broken up. The binding energy of nuclei is of order 1 MeV
and the binding energy of nucleons is of order 1 GeV. This tells us very roughly at what
temperature these bounds states form. The actual temperatures are lower because of the
tail of the Boltzmann distribution.

These facts correctly suggest a thermal history that is illustrated in Fig. 2. The key events in
the thermal history of the universe are also listed in table 1. To understand it in more detail we
need to review some thermodynamics.

5.2 Statistical mechanics description of the universe


In the very early universe (after inflation), the rate of interactions in the primordial plasma
was very high and the universe was in a state of thermal equilibrium. This simple state is the
beginning of the so-called hot big bang. Later, some particles drop out of thermal equilibrium,
and we will need the Boltzmann equation to describe them.
To define the state of matter in statistical mechanics, we use the phase-space distribution
function f (x, p, t). It gives the probability density that a particle is found at a particular
position x with momentum p at time t. That is, the number of particles N in a phase space
element is is given by
(∆x)3 (∆p)3
N (x, p, t) = f (x, p, t) × (5.1)
(2π)3
In the present unit we are mostly concerned with homogeneous and isotropic matter, and thus
the phase space distribution function does not depend on position and only on the magnitude

19
Event Temperature Energy Time
Inflation < 1028 K < 1016 GeV > 10−34 s
Dark matter decouples ? ? ?
Baryogenesis (matter-antimatter asymmetry, ? ? ?
GUT?, quark-gluon plasma)
EW phase transition (symmetry breaking due 1015 K 100GeV 10−11 s
to Higgs)
Hadrons form (protons, neutrons) from quark- 1012 K 150MeV 10−5 s
gluon plasma. QCD phase transition.
Neutrinos decouple (weak interaction) 1010 K 1MeV 1s
Big Bang Nucleosynthesis BBN: sets element 109 K 100keV 200s
abundances
Atoms form (helium, hydrogen) 3400K 0.30eV 260, 000yrs
Photons decouple (transparent universe) 2900K 0.25eV 380, 000yrs
First stars 50K 4meV 100million yrs
First galaxies 20K 1.7meV 1billion yrs
Dark energy 3.8K 0.33meV 9billion yrs
Today 2.7K 0.24meV 13.8billion yrs

Table 1. Key events in the histroy of the universe (adapted from Baumann table 1.2)

of momentum: f (p, t). To treat inhomogeneities we will later need the full f (x, p, t) and its
differential equation, the Boltzmann equation.

5.3 Thermal equilibrium


5.3.1 Thermal equilibrium vs decoupling
The basis to understand the early universe is thermal equilibrium. To be in thermal equilib-
rium, particles need to interact and we must have waited for long enough so that their thermal
bath has become uniform. For example, a typical situation would be a particle X that can anni-
hilate with its anti-particle X̄ into 2 photons, and in return 2 photons can pair-create an X + X̄
pair:
X + X̄ ⇌ 2γ (5.2)
The interaction rate Γ (per particle) for this process is

Γ = nσv,

with n the number density of particles, σ their cross-section, and v their velocity (all three are
in general a function of temperature). Note that this is the interaction rate per particle (which
is why it is linear in n), not the total interaction rate per volume. Γ has units of inverse time.
In an expanding universe, it turns out (from the Boltzmann equation) that particles can be
in thermal equilibrium if Γ ≫ H, that is the interaction rate is much larger than the Hubble
rate. To understand this better remember that the age of the universe is roughly tage ≃ H −1 ,
and we want the typical interaction time to be much smaller than the age of the universe. To

20
Figure 2. Thermal history of the universe (plot from the Particle Data Group).

summarize, particles fall out of thermal equilibrium when their interaction rate drops below the
Hubble expansion rate of the universe. At that moment, the particles stop interacting with the
rest of the thermal bath, which is called decoupling, and a relic abundance is created. Both
the creation and the annihilation of such relic particles is negligibly small after decoupling.
We will first assume that we are in thermal equilibrium, but later discuss beyond equilibrium
phenomena.

5.3.2 Basics of equilibrium thermodynamics


To simplify notation we will be using natural units with c = 1, ℏ = 1 and kB = 1 (see e.g.
Baumann Appendix C or Huterer 1.7). Particles in thermal equilibrium are either Bose-Einstein
distributed (bosons) or Fermi-Dirac distributed (fermions). The distribution function is
given by
1
f (p, T ) = (E(p)−µ)/T (5.3)
e ±1

21
where the − sign is for bosons and the + sign is for fermions. It gives the probability that a
particle chosen at random has the momentum p. The distributions depend on the temperature
T and the chemical potential µ (which can depend on temperature and thus on time in an
expanding universe). The chemical potential describes the response of a system to a change in
particle numbers. Since for photons µ = 0 and for particle-antiparticle pairs µX = −µX̄ we
can ignore the chemical potential in much of the following discussion. The chemical potential
is important when the particle number changes, for example during recombination where the
number of free electrons changes.
From the distribution functions we can calculate the important thermodynamic quantities.

• The number density of particles is


g
Z
n(T ) = d3 pf (p, T ) (5.4)
(2π)3

where g is the number of internal degrees of freedom of the particle (e.g. number of spin
states).

• The energy density is


g
Z
ρ(T ) = d3 pf (p, T )E(p) (5.5)
(2π)3

• The pressure is
g p2
Z
P (T ) = d3 pf (p, T ) (5.6)
(2π)3 3E(p)
where E(p) is the relativistic energy of the particles. Note that in the ultra-relativistic case
E = p and thus P = 31 ρ as expected.

In thermal equilibrium we can have several different particle species with masses mi and
chemical potential µi but at the same temperature T .

5.3.3 Number density and energy density of particles


As we discussed, for our purpose we can set the chemical potential to zero. The integrals above
are evaluated for example in Baumann’s book.
In the relativistic limit one gets that
(
ζ(3) 3 1 (bosons)
n = 2 gT × 3 (5.7)
π 4 (fermions)

where ζ(3) ≈ 1.202 is the Riemann zeta function, and for the energy density
(
π2 4 1 (bosons)
ρ= gT × 7 . (5.8)
30 8 (fermions)

Note the scaling with temperature and the fact that bosons and fermions only differ in a constant
factor. A typical use of this result is to calculate the number density and energy density of photons

22
today, given the observed temperature of the CMB, T0 ≈ 2.73 K:

2ζ(3) 3
nγ,0 = T ≈ 410 photons cm−3 ,
π2 0
π2 4
ργ,0 = T ≈ 4.6 × 10−34 g cm−3 .
15 0
In terms of the critical density, the photon energy density is then

Ωγ h2 ≈ 2.5 × 10−5 . (5.9)

as mentioned earlier.
For non-relativisitic particles (m ≫ T ), the result for bosons and fermions is the same.
The integral gives
mT 3/2 − m
 
n=g e T (5.10)

The exponential suppression is called Boltzmann suppression. Physically it means that parti-
cles and anti-particles still annihilate when the temperature becomes low, but they are no longer
created in pair production. This means that when the temperature of the universe falls below
the particle mass and the particle is still in thermal equilibrium (i.e. Γ ≫ H), then the particle’s
abundance and energy density drop rapidly.
For non-relativistic particles one also gets

ρ = mn (5.11)

and
P = nT (5.12)
which is the ideal gas law P V = N kb T with kB = 1 and thus P ∼ 0 since T ≪ m.

5.3.4 Effective degrees of freedom


The last aspect of thermal equilibrium that we want to mention is the effective number of
(relativistic) degrees of freedom. A general formula for the energy density that includes all
species/particles is (generalizing Eq.5.8):
X π2 ∗ 4
ρR ≡ ρi = g T . (5.13)
30
i

where the parameter g ∗ , called the effective number of relativistic degrees of freedom, is a weighted
sum of the multiplicity factors of all particles. This factor is defined as
 4  4

X Ti 7 X Ti
g (T ) = gi + gi . (5.14)
T 8 T
i∈bosons i∈fermions

Here we are allowing the possibility that the species have a different temperature Ti from the
photon temperature T , hence the power-law factors in this definition of g ∗ . The effective number
of relativistc degrees of freedom are plotted in Fig.3. Above about 100GeV all particles of the

23
Figure 3. Evolution of effective number of relativistic degrees of freedom assuming the Standard Model
particle content. The EW and QCD phase transitions are also indicated. From Daniel Baumann’s cos-
mology lectures.

standard model are relativistic. Considering all quarks, leptons, gauge bosons, gluons and the
Higgs (with their helicity or spin and their anti-particles) this adds up to g ∗ = 106.75 The
fractional number is possible due to the 7/8 prefactor. As the universe cools down, the heavier
particles drop out of this sum. In the end we are left with 3.38 relativistic degrees of freedom.
Of these, 2 are for the photon (2 polarisations) and the rest is for neutrinos. The counting of
relativistic degrees of freedom for neutrinos is subtle. In particular the neutrino temperature
today (∼ 1.9 K) is not the same as that of the photons (∼ 2.7 K) because they decouple from
the thermal bath before electrons and protons annihilate (which heats the thermal bath). Also
today neutrinos are not relativistic anymore.
Several physical observables are sensitive to the effective number of relativistic species, and
this is an avenue to detect new physics. We usually parametrise the “extra” degrees of freedom
by the effective number of neutrino species Neff . Constraints on Neff come from

• Element abundances: BBN is sensitive to the expansion rate, which is sensitive to Neff .

• The CMB power spectrum and so-called CMB spectral distortions also constrain Neff at a
later time.

The constraint from Planck is Neff = 2.99 ± 0.34. The theory expectation from the standard
model is not exactly 3 but rather 3.046 which is due to the fact that neutrinos deviate a bit from
a Fermi-Dirac distribution due to the energy dependence of the weak interaction.

24
5.4 Beyond Equilibrium: The integrated Boltzmann equation
To study processes that are not in thermal equilibrium we need the Boltzmann equation. The
(integrated) Boltzmann equation for a homogeneous particle species ni is given in general by:

1 d(ni a3 )
= Ci [{nj }] (5.15)
a3 dt
The left hand side is just the conservation of particle number if the right hand side is zero.
The right hand side is the collision term that describes the interaction with all other particle
species nj . The collision term includes cross-sections between particles, which is where the
standard model of particle physics and QFT scattering amplitude calculations come in. Solving
this equation goes beyond this course material.
We want to again point out one important non-equilibrium phenomenon: the freeze out.
The terms decoupling and freeze-out are closely related. Decoupling means that interactions
effectively stop and freeze-out means the creation of a relic density. Above we found that when
the temperature of the universe falls below the particle mass (thus we are in the non-relativistic
regime) and the particle is still in thermal equilibrium, then the particle’s abundance and energy
density are exponentially suppressed in m/T . We have also discussed that for the particles to be
in thermal equilibrium, we need their interaction rate to be larger than the expansion rate Γ ≫ H.
However if the particle drops out of thermal equilibrium before the Boltzmann suppression kicks
in, i.e. Γ < H, we say that it “freezes out”. In this case some relic density of the massive
particle remains which is constant in comoving volume and does not change (unless the particle
decays, such as neutrons).
If we had time to study the integrated Boltzmann equation in more detail, we would in
particular examine:

• The formation of the light elements during the Big Bang nucleosynthesis (BBN). This is
one of the big successes of the standard model of cosmology, making predictions for the
abundance of elements that agree very well with data (with the possible exception of the
lithium problem).

• The production of dark matter (which likely has a relic density set in the early universe).

• The decoupling of neutrinos in the early universe, when the weak interaction becomes too
weak to couple them to the thermal bath.

• The neutron freeze-out which set the initial neutron to proton ratio. Remember that free
neutrons decay.

• The period of recombination where electrons and nuclei form neutral atoms and the universe
becomes transparent.

• Baryogenesis, the somewhat unknown process that led to the matter-antimatter asymmetry
observed today.

25
5.5 Beyond Homogeneity: The Einstein-Boltzmann equations
So far in this unit we have been considering the homogeneous universe. Of course, the universe
is only interesting because it is not homogeneous. Here I want to outline what the full set of
equations are that govern the universe, without assuming homogeneity. The relevant equations
with inhomogeneity are still the Einstein equation and the Boltzmann equation, which of course
are coupled to eachother. The Boltzmann equation for an inhomogeneous anisotropic fluid with
phase space density f (x, p, t) is schematically
dfa (x, p, t)
= C[{fb (x, p, t)}], (5.16)
dt
where fa is the phase space density of particle a and we have other particles {fb } which also have
Boltzmann Equations. Note that unlike the integrated Boltzmann equation for number densities
we saw above, this is an equation for the full phase space density. The phase space density goes
into the energy momentum tensor of Einsteins equation. For completeness, the energy momentum
tensor for a given phase space distribution function f (x, p, t) is
g dP1 dP2 dP3 P µ P ν
Z
µ
Tν (x, t) = f (x, p, t)
−det[gαβ ] (2π)3 P0
(Dodelson Eq 3.20) where the degeneracy factor g counts the internal states. The EM tensor of
course defines the metric through the Einstein equation
8πG
Gµν = Tµν (5.17)
c4
These equations can be solved analytically in relativistic cosmological perturbation the-
ory. To do so, one expands the metric in perturbations around FLRW, and the particle content
in perturbations around the average density. Relativistic cosmological perturbation theory is in
particular required to calculate the Cosmic Microwave Background. In fact, the full Boltzmann
equation is not required for an analytic treatment, since one can make a fluid approximation on
large enough scales. However, on scales where the mean free path length of the photon becomes
important (during recombination) such an approximation must break down. In practice, we thus
solve the Einstein-Boltzmann equations of the early universe numerically, with codes such as
CAMB and CLASS. Even the numerical solution starts with a perturbative ansatz, which gives
the linearized Einstein-Boltzmann equations. As we will discuss more, linear perturbation
theory is enough to calculate the CMB to excellent precision.
On the other hand, in the late universe, perturbations are non-linear. This is the domain
of structure formation. Fortunately, in this domain relativistic effects are small and we can
work with Newtonian perturbation theory and Newtonian simulations. In summary, in
cosmology one very rarely needs perturbation theory that is both (general) relativistic and non-
linear. We will learn much more about perturbations in the CMB and in large-scale structure
during this course.

6 Inflation
To complete our overview of the evolution of the universe, we need to discuss the earliest (highest
energy) epoch of the universe which we can currently understand, the period of cosmological

26
inflation. Unlike the hot big bang, inflation is still somewhat speculative. It makes predictions
that we can verify with observations, but these predictions are not so unique that we would
consider the theory to be proven. There is however, in the opinion of most (but not all) cosmol-
ogists, no competitive theory that would be equally attractive. In fact, inflation is such a good
framework to set up the initial conditions of the universe that it is almost treated as a fact by
many cosmologists. It does in particular the following things for us:

• It makes the universe flat even if it started out not being flat. There is some debate and
ongoing research about this question (e.g. does inflation even start in an inhomogenous
universe), but the majority opinion seems to be that it works.

• It solves the horizon problem, which is that we find thremal equilibrium and correlations
between parts of the universe that would not be causally connected without inflation.

• It sets up the horizon exit for the later re-entry of perturbations, which is crucial
to explain the matter and CMB power spectrum (turnover and BAO phases).

• It gives a natural mechanism to generate the initial inhomogeneities of the uni-


verse from quantum perturbations, and provides a framework to calculate their statistical
properties.

• It solves the so-called magnetic monopole problem, which depends on a speculative GUT
theory and may thus not exist. We will not cover this problem.

In my opinion it is near certain that accelerated expansion and a quantum origin of primordial
perturbations really happened in the universe. Whether this happened through a weakly coupled
slowly rolling scalar field, as is the case in most inflation models, is less certain. Whether anything
happened “before inflation” and whether this question makes sense is not known, and likely won’t
be answered before we have a complete non-perturbative theory of quantum gravity.
Inflation is treated beautifully in all the references that we pointed out for this unit, and I will
compress the material very significantly.

6.1 The flatness problem


We have already developed all the tools to understand the flatness problem. We know from
experimental data that |Ωk | < 0.01. We also know that curvature scales as ρk ∝ a12 , matter scales
as ρm ∝ a13 and radiation ρr ∝ a14 . From this it is easy to estimate how flat the universe must
have been at the earliest time that we trust our understanding of physics. If you chose this time
to be the electroweak phase transition (100 GeV, z = 1015 ), one finds that |Ωk (tEW )| < 10−30
(see e.g. Tong’s lecture notes Sec. 1.5.1 for this calculation). The flatness problem is the question
why the universe was so flat to begin with. We would like a physical theory that sets up the
initial conditions of the universe so that it is “naturally” very flat. Inflation does this for us.

6.2 The horizon problem


The CMB has the same temperature in all directions of the sky. This suggests that all parts of
the CMB sky should have been in thermal equilibrium prior to recombination, which requires

27
causal contact at that time. More than that, as we will see, different parts of the CMB sky have
small temperature perturbations which are correlated. To establish a correlation, clearly causal
contact is also required. It turns out that in the hot big bang picture which we developed so
far, parts of the CMB that are further than about 1 degree apart in angle were not in causal
contact prior to recombination without inflation. This is the horizon problem which inflation
solves. Let’s now put this into equations.

6.2.1 Comoving and physical particle horizon


If we are sitting at a point in comoving coordinates, say at x = 0, at a time t, the comoving
distance from which we can receive light is limited by the speed of light and age of the universe.
This light cone also limits the patch of the universe that can causally influence us. This finite
distance is called the comoving particle horizon. To evaluate it, we start with the metric in
the form Eq.(3.18) or Eq.(3.19) for a radial trajectory:

ds2 = a2 (η) −c2 dη 2 + dχ2


 
(6.1)
ds2 = −c2 dt2 + a2 (t)dχ2 (6.2)

We then integrate this equation for a light ray ds = 0. If the Big Bang “started” with the
singularity at ti ≡ 0 then comoving particle horizon at time t is:

dt′
Z t
comov
dh (t) ≡ χ = c ′
= c(η − ηi ) (6.3)
0 a(t )
R
where the h stands for “horizon”. Recall our definition of conformal time η = dt/a(t). For
a scale factor that goes to zero at tBB = 0 (i.e. matter or radiation dominated) we can set
ηi = 0 and the comoving horizon is just η. The size of the physical particle horizon is (using
Eq.(3.34))
dt′
Z t
phys comov
dh (t) = a(t)dh (t) = a(t)c ′
(6.4)
tBB a(t )

The particle horizon can be nicely illustrated in a spacetime diagram. To understand that,
we start from the metric Eq.(6.1). We see that a light ray is given by χ(η) = ±c η + const. and
thus can be drawn as a 45◦ angle in the χ-cη plane. The resulting diagram is called a spacetime
diagram. The spacetime diagram for the particle horizon is shown in Fig. 4. One often also
defines the event horizon which is the forward (rather than backwards) light-cone and tells us
what events we can influence in the
 future.
 2
t 3(1+w)
In a flat universe with a(t) = t0 and w = const., the physical particle horizon today
is:
2
dphys = H −1 . (6.5)
h 1 + 3w 0

which is equal to H0−1 up to an order 1 factor.

28
Conformal time

Observer

Causally disconnected Causally disconnected

Particle horizon
Conformal distance

Figure 4. Spacetime diagram of the particle horizon for a given observer.

Conformal time

today

past light cone

Recombination

Conformal distance
Big bang singularity Particle horizon

Figure 5. Particle horizon for CMB perturbations without inflation. CMB regions we see today in the
sky were causally disconnected in the past.

6.2.2 Particle horizon and the CMB


It turns out that the particle horizon at the time of the CMB was much smaller than our particle
horizon today. This is illustrated in Fig.5. The CMB is a particularly clear example, but the
horizon problem exists also for the matter and galaxy distribution.
Let’s calculate the size of the (physical) particle horizon at recombination and today. For a
purely matter-dominated universe (radiation does not substantially change the conclusion), with
 2
t 3
a(t) = , (6.6)
t0

29
the particle horizon at time t is defined by
t
dt′
Z
dh (t) = c a(t) = 3ct (6.7)
0 a(t′ )
Let’s write this in terms of red shift:

dh (t) = 3 c t = 3 c a3/2 t0 = 3 c (1 + z)−3/2 t0 = 2 c (1 + z)−3/2 H0−1 (6.8)

where we used 1 + z = a1 and H0 = 3t20 .


We already discusseed that recombination happened around z ≈ 1100. We would like to
know how large the particle horizon at recombination is in todays physical distance. From
recombination to today, the distance scale dh (z) has been stretched by the expansion of the
a0
universe to a(t) dh (z) = (1 + z)dh (z).
We can compare this to the particle horizon today, which is
2c
dH (t0 ) = . (6.9)
H0
The distance dH (z) today subtends an angle on the sky that’s given by the fraction of these two
distances:
r
(1 + z)dH (z) 1
θ≈ ≈ ≈ 0.03 rad =⇒ θ ≈ 1.7◦ . (6.10)
dH (t0 ) 1100
Thus, assuming a matter distributed universe, patches of the sky separated by more than ≈ 1.7◦
had no causal contact at the time the CMB was formed. Radiation does not change this estimate
substantially.

6.3 Inflationary expansion


Both the horizon problem and the flatness problem can be solved by a sufficiently long time of
accelerated expansion prior to the hot big bang. Let’s see why this happens.
An accelerating phase (ä > 0), assuming power-law expansion for now, is given by

a(t) ∼ tn (6.11)

with n > 1. Alternatively, we could have exponential expansion

a(t) ∼ eHinf t (6.12)

with constant Hinf which also accelerates.


The comoving particle horizon is
t
dt′
Z
dh (t) = c (6.13)
0 a(t′ )
It is finite only if the integral converges. This was the case for a matter (or radiation) dominated
universe, as we saw in above. But, for a(t) ∼ tn we have
Z t ′
dt
′n
→∞ (6.14)
0 t

30
if n > 1. As t′ approaches 0 we get more and more contributions to dh so it diverges. Recall
from Eq.(6.3) that dh = c(η − ηi ). We see that an early accelerating phase buys us conformal
time and allows all regions of the universe to have been in causal contact in the inflationary past.
For inflation, the natural choice is to use time coordinates so that inflation starts at ηi = −∞
(since the lower end of the integral leads to the divergence) and ends at conformal time ηf = 0.
Patching the spacetime of inflation to the spacetime of the later universe together we get Fig. 6,
which shows that light cones now overlap. For accelerated power law expansion there is still a
big bang at t = 0 where a(t) = 0, but there is an infinite amount of conformal time after that.
However we should not expect that our equations hold for a(t) smaller than the Planck length,
since we don’t know non-perturbative quantum gravity.
Inflation models naturally generate exponential expansion rather than power law expansion,
i.e.
a(t) ∝ eHinf t (6.15)
Unlike the power law acceleration we considered above, for exponential expansion there is no
big bang in the past since a(t) > 0 at all times. This means that there is no natural choice for
t = 0 and our time integral can go from ti = −∞ to tf , which again makes the comoving horizon
diverge, if exponential expansion went on infinitely long. The Hubble parameter has dimension of
energy (or inverse time) so the exponent is dimensionless as it should be. Exponential expansion
means that a patch of space time of physical size di grows to a size df = di eHinf T in time T . We
define the numer of e-folds of inflation by N = Hinf T .
Let’s estimate how many e-folds we need to solve the horizon problem, by inflating a causally
connected patch before inflation to the size of our current universe. Before inflation started,
the physical particle horizon had some value di that was causally connected. A natural scale is
−1
the physical Hubble distance Eq.(3.37) di = cHinf . After exponential expansion, this connected
N
patch has the physical size df = e di . Then, due to the expansion of the universe since the end
of inflation, the patch grows to

df eN di eN c
dnow = = = (6.16)
ainf ainf Hinf ainf
where ainf is the scale factor at the end of inflation, the beginning of the ordinary evolution of
the universe. We want dnow to be much larger than the Hubble horizon today, i.e. dnow ≫ cH0−1 .
It thus follows that
Hinf
eN > ainf (6.17)
H0
Most of the relative expansion since the end of inflation happened during the radiation era, in
which H ∝ a12 . Thus we have HHinf
0
= a21 from which we get
inf

 1/2
Hinf
e N
> = a−1
inf (6.18)
H0

In this relation H0 is known but Hinf or equivalently ainf are not. A possible value of H during
inflation could be 1014 GeV or below, so let’s use this value as an example. The Hubble constant

31
today is H0 ∼ 10−18 s−1 . Let’s convert this to GeV via E = ℏω where ℏ ∼ 10−15 eVs. This gives
H0 ∼ 10−33 eV = 10−42 GeV. Thus we have
1/2
1014 GeV

N
e > = 1028 (6.19)
10−42 GeV

Solving for N this gives the often quoted estimate that we need around 60 e-folds of inflation.
We can also estimate how long inflation needs to last from N = Hinf T . With Hinf = 1014 GeV ∼
1038 s−1 we get that T ∼ 10−36 s. So inflation can be extremely brief. However from the discussion
presented here, there is no limit for how long inflation can last in principle, we only set a lower
limit. In particle physics models of inflation there can be an upper limit. The length of inflation is
also connected to the subject of eternal inflation. In some models inflation never ends globally.
Accelerated expansion also solves the flatness problem. To drive accelerated expansion as in
Eq.(6.11) one needs an inflation energy density that goes as
1
ρinf ∼ (6.20)
a(2/n)
with n > 1 (this follows from Eq.(4.24) and Eq.(4.11)). This clearly dilutes more slowly than
curvature, radiation and matter. In fact, in the case of exponential expansion, ρinf does not
dilute at all, like dark energy, as we have seen. This means that after a long period of inflation,
curvature, matter and radiation will all have diluted away to negligible amounts and the universe
is empty except for ρinf .
While this solves the flatness problem, we are left with a new problem: why is the universe
not empty. What is missing is a mechanism that ends inflation and converts the inflationary
energy density into ordinary (relativistic) matter and radiation. This mechanism exists and is
called reheating. Interestingly, reheating is somewhat natural in a quantum field theory of
inflation, i.e. there are simple models that have this behavior. In summary, the matter and
radiation in our universe is believed to have been created with the energy in the field that drove
inflation.

6.4 The field theory of inflation


As physicists we would like a theory that “explains” the accelerated expansion (and its end) in
terms of fundamental particles/fields and a Hamiltonian or Lagrangian for them. This will at
the same time allow us to quantize the theory. It turns out that exponential expansion can be
achieved by having a homogeneous scalar field that “slowly rolls down” a very flat potential. An
example potential is illustrated in Fig. 7 (but many other potential shapes can also work). Most
inflation models use a scalar field, the inflaton, and there is a sense (EFTofInflation) in which
more complicated inflation models can also be described by a scalar degree of freedom. For those
of you who had QFT, here is the Lagrangian of a scalar field, the inflaton ϕ, coupled to gravity:

 
1 1
Z
S = d4 x −g R + g µν ∂µ ϕ ∂ν ϕ − V (ϕ) = SEH + Sϕ . (6.21)
2 2

The action (6.21) is the sum of the gravitational Einstein-Hilbert action, SEH , and the action of
a scalar field with canonical kinetic term, Sϕ . The potential V (ϕ) describes the self-interactions

32
Conformal time

today

past light cone

Recombination

Conformal distance
End of inflation,
Reheating

Inflation

Overlapping particle horizon

Figure 6. Particle horizon for CMB perturbations with inflation (adapted from 0907.5424).

of the scalar field (in addition there can be derivative self-interactions). Assuming the FRW
metric for gµν and restricting to the case of a homogeneous field ϕ(t, x) ≡ ϕ(t), the scalar energy-
momentum tensor

(ϕ) 2 δSϕ
Tµν ≡ −√ (6.22)
−g δg µν

takes the form of a perfect fluid (see e.g. Baumann’s book for the math) with
1 2
ρϕ = ϕ̇ + V (ϕ) , (6.23)
2
1
pϕ = ϕ̇2 − V (ϕ) . (6.24)
2
The resulting equation of state
1 2
pϕ ϕ̇ − V
wϕ ≡ = 21 , (6.25)
ρϕ 2
2 ϕ̇ + V

shows that a scalar field can lead to negative pressure (wϕ < 0) and accelerated expansion
(wϕ < −1/3, see Eq.(4.24)) if the potential energy V dominates over the kinetic energy 12 ϕ̇2 .
This is why the field needs to be rolling slowly. Inflation ends when the field rolls into a steeper

33
inflation
end of inflation

reheating

Figure 7. A typical potential for slow-roll inflation with a scalar field.

part of the potential as show in the Figure. Finally the field rolls into a minimum and starts
oscillating around it. During this phase of oscillation, the inflaton acts like pressure-less matter
(wϕ = 1 above) and decays into other particles (those of the standard model if this is all there is).
This process is called reheating and is very model dependent and very complicated. It turns out
that predictions for cosmology don’t depend much on reheating, all we need is that the inflaton
energy is ultimately getting transformed into a thermal bath of standard model particles.

6.5 The quantum field theory of inflation


Having a field theory of inflation, we can now quantize it. Doing this goes beyond this course
material. I want to make only a few comments:

• If you had QFT, you know that the standard method to quantize a field theory for which
you know the Lagrangian or Hamiltonian is to promote the field ϕ(x, t) and its conjugate
momentum π(x, t) to operators and impose canonical commutation relations between them.
This is correct here too. Schematically
h i
ϕ̂(x, t), π̂(x′ , t) = iδ 3 (x − x′ ) (6.26)

where the delta function enforces locality.

• The quantization of inflation leads to exactly the kind of primordial perturbations we need
to seed the structure formation of the universe. As you know, in quantum mechanics there
is a fundamental uncertainty on quantities which means that the inflaton field ϕ cannot be
exactly homogeneous but rather must have small “quantum wiggles” in it. A heuristic way
to think about this is through Heisenbergs uncertainty principle in the form ∆t ∆E ∼ 1
−1
(where we set ℏ = 1). The time scale is set by the Hubble time ∆T ∼ Hinf and the
energy fluctuations are set by the fluctuations in ϕ, i.e. ∆E ∼ δϕ. The uncertainty relation
predicts that we should see fluctuations of size δϕ ∼ Hinf .

• Because the potential is so flat during inflation, interaction terms such as ϕ3 are very small.
Inflation is thus an almost free (i.e. linear) field theory. The quantization of inflation thus

34
leads to a collection of (nearly) uncoupled quantum harmonic oscillators. Said differently,
the Fourier modes of the inflaton field act like independent harmonic oscillators.

• Inflation does include and requires perturbative quantum gravity. Well below the Planck
scale, we can quantize gravity in the sense of an effective field theory that integrates out the
unknown UV physics of the ultimate quantum gravity theory. Depending on the details of
the model, in particular its energy scale, inflation can be more or less sensitive to unknown
quantum gravity “UV physics”.

6.6 Primordial perturbations from inflation


In this unit we have focussed on the homogeneous evolution of the universe, but inflation cannot
be discussed sufficiently without talking about perturbations, so let’s get started with them.

6.6.1 Curvature perturbations from inflation


In cosmology (due to the cosmological principle), quantities such as the inflaton field (or the
energy density etc.) can be split into their homogeneous background value and perturbations
around it:

ϕ(x, t) = ϕhom (t) + δϕ(x, t) (6.27)

As we shall see later, perturbations are best discussed in Fourier space, because these Fourier
modes evolve almost independently (i.e. they are not coupled in the free theory). We therefore
express perturbations as

δϕ(x, t) → δϕ(k, t) (6.28)

where k is the comoving wave vector.


Inflaton perturbations from quantum fluctuations lead to curvature perturbations in the
metric and equivalently density perturbations in the energy density. To discuss these per-
turbations precisely would take us too far into relativistic perturbation theory. This topic is
complicated in particular because of different possible gauge (coordinate) choices. To describe
the scalar curvature field of the universe, you will most frequently encounter:

• The comoving curvature perturbation R. Under some gauge conditions this is the cur-
vature that a local observer would observe. This quantity is also conserved on superhorizon
scales (see next section).

• The curvature perturbation on uniform density hypersurfaces ζ. This is often used


in inflation calculations.

• The Newtonian potential Φ. The metric in Newtonian gauge is

ds2 = a2 (η) −(1 + 2Φ)dη 2 + (1 + 2Φ)δij dxi dxj


 
(6.29)

With some gauge subtleties on superhorizon scales, Φ is related to the energy density by
Poisson’s equation ∇2 Φ ∝ ρ.

35
The distinction of these is not important in this course. The first main point to take away here is
that we need a single scalar field to describe the scalar curvature perturbations of the
universe (which are those induced by scalar density perturbations). This does not include tensor
perturbations which are gravitational waves. Scalar and tensor perturbations together make up
the full metric. So far there is no experimental evidence of primordial tensor perturbations.
It turns out that the inflaton perturbations δϕ generate curvature perturbations as
Hδϕ
R≈− , (6.30)
ϕ̇

where ϕ̇ is the inflaton speed. We won’t derive this equation. The second main point to take
away is thus that we can calculate the curvature perturbations R in a given inflation model, and
that they are sourced by inflaton quantum perturbations δϕ.

6.6.2 Horizon exit and re-entering


To understand the physics of inflation, we need one more concept, the comoving Hubble radius.
We have discussed the comoving and physical particle horizon. There is also the comoving and
physical hubble radius (sometimes called the Hubble horizon). The physical Hubble radius is
dphys
H = H1 and we discussed it in (Eq.(3.37)). The comoving Hubble radius is

1
dcomov
H = (6.31)
aH
It gives the size of the Hubble radius in comoving coordinates. An accelerating phase (ä > 0) is
equivalent to a shrinking Hubble radius
 
d 1
ä > 0 ⇔ <0 (6.32)
dt aH

We see that the comoving Hubble radius is shrinking during inflation, but growing during ordinary
cosmological evolution (matter and radiation). You can think of the Hubble radius as the “size
of the currently causally connected patch at time t”, while the particle horizon tells us about the
“size of the causally connected patch when considering the entire past”.
The comoving Hubble radius is crucial to understand the behavior of cosmological pertur-
bations. For these perturbations R(k, t) we can study their equations of motion and find that
their behavior (i.e. whether they grow, stay constant or get smaller with time) depends crucially
on their size compared to the comoving Hubble radius. We can divide perturbations into two
classes, by comparing their comoving wave number with the comoving Hubble horizon:

subhorizon perturbations: k ≫ aH . (6.33)

superhorizon perturbations: k < aH . (6.34)


The size of cosmological perturbations compared to the Hubble radius as a function of time is
illustrated in Fig.8.
We can now summarize what happens to cosmological perturbations:

36
inflation radiation domination matter domination
comoving horizon

longer density fluctuation


re-enters in matter dom.

super-horizon
shorter density fluctuation
re-enters in radiation dom.
sub-horizon

reheating matter-radiation
equality

Figure 8. Horizon exit and re-entering of perturbations. On the y-axis is the comoving scale of the
perturbation 1/k and the scale of the comoving Hubble radius 1/(aH). The modes that leave the horizon
the latest (the smallest wave length λ) re-enter the horizon first (last out, first in).

• Quantum fluctuations create perturbations ϕ(k, t) during inflation on subhorizon scales.


Inflaton perturbations generate curvature perturbations as discussed above.

• The curvature perturbation Rk exits the horizon during inflation and stops evolving. This
can be proven in relativistic perturbation theory. Horizon exit is also connected to the fact
that these perturbations classicalize (i.e. get a concrete value that we observe, like in a
measurement). Note that the curvature perturbation does not change due to re-heating,
which is why the details of reheating don’t matter for cosmological predictions.

• At some point after the end of inflation the curvature perturbation Rk re-enters the horizon
and starts evolving again. Later we will see that the time when perturbations re-enter the
horizon (during radiation domination or matter domination) is crucial for their amplitude.

We have not yet discussed how perturbations evolve in time. This depends on whether they are
subhorizon or superhorizon and whether they evolve during radiation domination or during matter
domination, or later during Lambda domination. The study of subhorizon and superhorizon
evolution of perturbations is required to understand the qualitative properties of the matter
power spectrum. We will briefly get back to this later in the course.

6.6.3 Primordial power spectrum


After inflation we are left with small curvature perturbations which seed the structure formation
of the universe. We will discuss the statistics of these perturbations, and the meaning of their
power spectrum, in more detail soon. However, let’s summarize their properties and parameters
here. The primordial scalar curvature fluctuations R are to very good approximation
Gaussian and thus fully described by the following power spectrum:

k3
⟨Rk Rk′ ⟩ = (2π)3 δ(k + k′ ) PR (k) , ∆2s ≡ ∆2R = PR (k) . (6.35)
2π 2

37
Figure 9. Reconstruction of the primordial power spectrum from Planck (1807.06211). The k scales
where the power spectrum is constrained by the CMB are limited by the size of the observable universe
to the left and by the small-scale damping of the CMB to the right. One can clearly see that ns ̸= 1.

Here, ⟨ ... ⟩ defines the ensemble average of the fluctuations. The power spectrum is often approx-
imated by a power law form
 ns (k⋆ )−1+ 1 αs (k⋆ ) ln(k/k⋆ )
k 2
∆2s (k) = As (k⋆ ) , (6.36)
k⋆

where k⋆ is an arbitrary reference or pivot scale. The scale-dependence of the power spectrum is
defined by the scalar spectral index (or tilt)

d ln ∆2s
ns − 1 ≡ , (6.37)
d ln k
where scale-invariance corresponds to the value ns = 1. We may also define the running of the
spectral index by
dns
αs ≡ . (6.38)
d ln k
The free parameters we want to measure are thus

• The primordial spectral amplitude As . Planck measured As ∼ 2.1 × 10−9 for kp = 0.05
Mpc−1 .

• The primordial spectral index ns . Planck measured ns = 0.966 ± 0.005.

• The running of the spectral index αs (although no running has been detected).

This parametrization of the primordial power spectrum is very useful in practice. However, one
can also directly reconstruct the power spectrum without a parametrization, as shown in Fig. 9.

38
In the same way inflation predicts small primordial tensor fluctuations. These are the
primordial gravitational waves and are also given by a Gaussian power spectrum for the two
polarization modes. For these one can measure in particular

• The primordial tensor amplitude At or the tensor-to-scalar amplitude r = At


As
.

• The primordial tensor spectral index nt .

Primordial gravitational waves have not been detected so there are only bounds on these param-
eters. The current constraint is about r < 0.05, i.e. tensor modes are less than 5% of the scalar
modes. This bound will be significantly improved by upcoming CMB experiments. A detection
of non-zero r is perhaps the best chance for a big fundamental physics discovery in the coming
decade.
Finally, both scalar and tensor perturbations are not expected to be precisely Gaussian. At
the very least, the coupling to gravity which is a non-linear theory, leads to some mode-coupling
between perturbations. More than that, the inflaton potential as well as so-called derivative
interactions also lead to primordial non-Gaussianity. The most obvious way to look for
primordial non-Gaussianity is to search for a non-zero 3-point function

⟨Rk1 Rk2 Rk3 ⟩ =


̸ 0 (6.39)

This three-point function is called the bispectrum. Non-Gaussianity can come in many different
bispectrum shapes (as well as higher N-point functions). The most famous bispectrum amplitude
parameter is

• the amplitude of local non-Gaussianity fN L

In many inflation models primordial non-Gaussianity is too small to be detected any time soon
but there are also well-motivated scenarios where a detection could be around the corner. We
will get back to this topic, which is a main research topic of mine, in more detail later in this
course.
Here we have discussed the initial conditions in the curvature field. This curvature field is
then converted into initial conditions for matter and radiation. In most models, the initial
conditions for the fluids δr , δCDM and δbaryons are all the same (up to an overall amplitude),
since they are seeded by the same curvature field. Such initial conditions are called adiabatic
initial conditions. There could in principle also be perturbations where the different fluids have
different perturbations. Such perturbations are called isocurvature perturbations. Currently
there is no experimental evidence for isocurvature perturbations and the most straight forward
models of inflation and re-heating don’t generate them. In this course, as in most cosmological
analyses, we will assume adiabatic initial conditions.

39
Part II
Introduction to Computation and Statistics
in Cosmology
In this section we introduce some of the main computational tools and data types used in cos-
mology. A practical goal will be to be able to analyze a dark matter simulation, extract its power
spectrum, and run MCMC to determine its cosmological parameters. We also want to learn
how to Fisher forecast experimental sensitivity, and compare it to the result in our simulation
analysis. Analyzing a dark matter simulation comes without the practical complications of a real
CMB or galaxy survey. We will discuss these real world complication in later units.

Further reading
The general references of Part 1 all contain some material on statistics and data analysis, in
particular

• Dodelson, Schmidt, chapter 14

• Huterer, chapter 10.

Lecture notes or reviews that are specifically about data analysis in cosmology include:

• Heavens - Statistical techniques in cosmology. arxiv:0906.0664

• Verde - A practical guide to Basic Statistical Techniques for Data Analysis in Cosmology.
arxiv:0712.3028

• Trotta - Bayesian Methods in Cosmology. arxiv:1701.01467

• Leclercq, Pisani, Wandelt - Cosmology: from theory to data, from data to theory. arxiv:1403.1260

7 From Initial Conditions to Observed Data


Let’s summarize what we learned in the introductory unit. After inflation, through the process of
reheating, we are left with curvature perturbations R which were generated by the quantum
fluctuations of an inflaton field Φ. Because quantum fluctuations are stochastic, they have to
be described by a probability density P({R}). As we discussed, this distribution is (almost)
Gaussian, and, to our current experimental sensitivity, depends only on two parameters:
As ,ns
PGauss ({R}) (7.1)

These initial conditions of the universe then evolve forward in time, according to the
standard model of cosmology or its extensions. For example, the distributions of matter in
the sky δm (which can be probed e.g. by mapping galaxies δg ) is given by some complicated

40
function F of the initial conditions which depends on the ΛCDM parameters (as well as other
physical constants).
F Λ (R,t)
R −−−−−→ δm (t) (7.2)
The function (or simulation) that connects the initial conditions to whatever we observe in the
data is sometimes called the forward model and it depends on physical parameters Λ that
we want to measure such as Ωm or the mass of neutrinos mν . By measuring δm , we can learn
both about primordial parameters such as As , ns and the parameters that influence the time
evolution which we called Λ here. Roughly speaking, the function F is known exactly on large
scales, approximately known on intermediate scales, and computationally intractable on small
scales. A typical course on theoretical cosmology would now spend some weeks with calculating
the function Eq.(7.2) analytically in cosmological perturbation theory, which amounts to
solving the Euler and Poisson equations perturbatively. Instead, we will focus on how to
perform data analysis and just use results from perturbation theory where needed. We will get
back to (non-relativistic) perturbation theory in Sec. 21.
In some modern analyses in cosmology one tries to reconstruct the initial conditions R(x)
directly from data such as the galaxy density δg (x). However, in the vast majority of analyses,
we don’t aim to reconstruct the initial conditions directly, but only their statistical parameters
such as As , ns , together with the parameters of cosmological time evolution Λ. This makes sense
because the theory of the initial conditions only makes predictions for statistical parameters. For
example, no theory can predict where in space a galaxy will form, but we can predict statistical
properties of the galaxy field. For the same reason we don’t usually have to analyze the volumetric
data δg (x) directly but instead only summary statistics of this data.
The most important summary statistic (which in the Gaussian case carries all the information)
is the power spectrum of the field. In many cases, we will measure the observed power spectrum
Pgobs of the galaxy data, and compare it to the theoretical power spectrum Pgtheo (Λ, As , ns ) which
depends on cosmological parameters. By adjusting these parameters so that Pgtheo matches Pgobs
we arrive at a measurement of our cosmological parameters. What we said here for galaxy
density measurements, is also true for all other data sources that probe the matter and radiation
distribution of the universe, in particular the Cosmic Microwave Background (CMB). The CMB
is a particularly clean probe of cosmology because, as we shall see, it is linear in the initial
conditions. Schematically, the “forward model” of the CMB is the linear mapping

Rk = T Λ (k)ΘCM
k
B
(7.3)

where ΘCMk
B are the Fourier modes of the CMB temperature perturbations and T (k) is the

so-called linear transfer function which depends on cosmological parameters Λ. On the other
hand, for the non-linear galaxy field, the Fourier modes are coupled to eachother in a complicated
way.
In the present section we will develop the tools to analyze the matter distribution through
the power spectrum in a simulated cosmological volume. This setup is already enough to write
interesting papers in cosmology. In later units, we will use the same tools to analyze realistic
data from the CMB and galaxy surveys, which comes with many interesting complications. We

41
will then also discuss how to go beyond the power spectrum to extract even more information
from cosmological data.

8 Overview of Observed Data


Here is a list of the main sources of data that cosmologists have available.

• Primary CMB anisotropies. The primary CMB is the jewel of cosmological data. This is
because it has perfectly understood physics, with a linear map to the initial conditions. Our
best constraints on primordial physics come from the primary CMB. On the other hand,
it cannot directly probe late time physics such as dark energy. For primordial physics,
the only limitation is the number of independent modes. Modes here means either
independent pixels or independent Fourier modes. First, the CMB is a 2d probe, while e.g.
a galaxy survey is a 3d probe. Second, because of the free streaming length of photons,
primary CMB anisotropies are damped away on small scales. This limits the number of
available modes in the CMB to roughly

NCMB ∼ ℓ2max ∼ (2500)2 (8.1)

where ℓmax is the maximum multipole scale, as we will see later. The Baryon Acoustic
Oscillations in the power spectrum of the CMB reveal cosmological parameters such as
Ωm and ΩB . The CMB is also polarized. While so-called E-mode polarization has been
measured and roughly doubles the information in the CMB, cosmologist look for primordial
B-mode polarization which would reveal the presence of primordial gravitational
waves.

• Secondary CMB anisotropies. Two things happen to photons on the way from re-
combination to us. First, all photons are gravitationally lensed by the intervening matter.
From the observed CMB one can reconstruct the so-called lensing potential, which is a
weighted radial integral over the matter density on the line of sight. In this way, the CMB
can also be used to probe physics that happens at later times in the universe, such as the
“clumping” of non-relativistic neutrinos due to their non-zero mass. Second, a part of the
CMB photons (a few percent) will hit a free electron and get re-scattered. Depending on
the radial velocitiy of the electron, the photon will either gain or lose energy. This is the
Sunyaev-Zeldovich (SZ) effect. The SZ effect can for example be used to probe the
temperature of gas in clusters.

• Large-scale structure (LSS) with galaxy surveys. The distribution of galaxies probes
the initial conditions of the universe as well as later time physics such as dark energy and
neutrino masses. Galaxies are arranged in a cosmic web of voids, filaments, walls and
clusters. The advantage over the CMB is that this is a 3-dimensional probe, and that it is
not affected by the CMB damping scale, so that we can in principle probe far more modes.
The disadvantage is that the smaller modes are very non-linear and hard to model. There is

42
however a redshift dependent scale of gravitational collapse, where primordial information
should be entirely erased. The number of modes is
 3
kmax
NLSS ∼ (8.2)
kmin

so it goes cubic rather than quadratic since it is a volumetric probe. The resulting number
depends very sensitively on the experiment and theoretical assumptions, which we will re-
visit later. Roughly speaking, current experiments have less independent accessible modes
than the CMB but future experiments will have more. As in the case of the CMB, light from
galaxies is also lensed. This lensing distorts the image of galaxies, which is called cosmic
shear or galaxy weak lensing. Weak lensing probes the same cosmological volume
as the galaxy positions, but it probes all matter (including dark matter) rather than
only luminous matter, which gives somewhat different information. Using large-scale
structure, one can measure the Baryon Acoustic Oscillations in the power spectrum.
These provide a standard ruler that can measure distances, and thus the expansion history
of the universe.

• Large-scale structure (LSS) with intensity mapping. The universe can of course not
only be probed by identifying galaxies in the visible spectrum but in general by mapping
any sort of radiation. In particular, one can map known emission and absorption lines
of both atoms and molecules in the universe. There are many different such lines that
I won’t review here. A current exciting experimental front is 21cm intensity mapping
which looks for the 21cm spin-flip transition line of neutral hydrogen. The universe contains
plenty of neutral hydrogen. The hardware for a 21 interferometer is in principle cheap,
only requiring a set of antennas and a supercomputer to correlate them. Achieving the
required frequency resolution for precise redshifts is easy. However, due to extremely large
foregrounds, this technique is not yet quite ready for cosmology. Even further in the future,
it may be possible to do 21 cm intensity mapping of the dark ages, the time before the
first galaxies formed. In principle, there is an gigantic amount of primordial information
hidden there (N ∼ 1018 ). At the time scale of several decades it may be possible to
access this information. A different intensity mapping technique, that is already in use, is
Lyman-α mapping which looks for the Lyman-α forest, absorption lines in the emission
of distant quasars due to neutral hydrogen in the intergalactic medium.

These are the data sources for which we are developing tools in this course. They have in
common that they probe the universe as a density field. There is a different category of probes
which looks at individual objects. Some of the main probes here are:

• Type 1a Supernova Distance Measurements. The discovery of dark energy was made
possible by measuring distances (rather than redshifts) using type 1a supernovae. Their
key property is that they have a known brightness (standard candle), so one can measure
the so called luminosity distance. Type 1a SN thus probe the expansion history of the
universe.

43
• Strong lensing (of quasars and galaxies by galaxies and galaxy clusters). If there is a
dense enough chunk of matter infront of a cosmological light source, one can get multiple
images or Einstein rings. From these strongly lensed images one can obtain a measurement
of the lens profile (thus probing the dark matter profile) and, if the light source is time
variable, one can get time delay measurements. These time delay measurements can be
used to measure the Hubble constant.

• Gravitational waves from compact objects. A very recent addition to cosmological


data are gravitational waves from black hole or neutron star mergers. These were discovered
by LIGO in 2015. This is the first time that we observe the universe by something other than
electromagnetic waves. For cosmology, it is particularly interesting that these blackholes
can be used as standard sirens. One can reconstruct their absolute “loudness” from the
shape of the gravitational wave pulse, and then estimate their distance from the observed
loudness. In addition to LIGO’s interferometric detector, there is also now strong evidence
for gravitational waves from NANOGrav’s pulsar timing, which may have discovered a
gravitational wave background created by supermassive blackholes.

The list above is not meant to be complete, but covers the most important data sources for cos-
mology (rather than for the large field of multi-messenger astrophysics, which studies individual
sources).

9 Random Fields in Cosmology


As we have discussed, the universe starts with a random field of initial conditions, which comes
from quantum fluctuations during inflation. This random field then evolves in time, generating
both the CMB perturbations and galaxy density perturbations. Much of cosmology is thus about
calculating and measuring statistical properties of random fields such as their power spectrum,
and their evolution in time. Our next goal is thus to study random fields.
In this section we will describe random fields in 3+1 dimensional space. These are the coor-
dinates in which a super-observer would observe the universe, who observes all of space at equal
time. In reality we can only observe the universe on the light cone, i.e. the farther away the
object is the farther back we also look in time. Observations on the light cone will be described
in Part IV of these lectures. Further we will be working in co-moving coordinates, so that the
background expansion as a function of time is factored out.

9.1 Random scalar fields in Euclidean space


We will start by discussing random scalar fields in Euclidean 3D space. Typical scalar fields are

• The density ρ(x) of a continuous field such as dark matter.

• The number density n(x) of a discrete tracer such as galaxies where n(x) = δN (x)/δV .

• (Over-)density fields based on these quantities


ρ(x) − ρ̄ ρ(x) n(x)
δ(x) = = −1= −1 (9.1)
ρ̄ ρ̄ n̄

44
By construction, the spatial average of δ vanishes,

⟨δ(x, t)⟩ = 0 (9.2)

In this section we will primarily think about non-relativistic cosmology (such as galaxy sur-
veys), so that the pressure can be ignored. However, the statistical techniques we develop are
equally important for relativistic cosmology. We will get back to relativistic physics in the unit
about the CMB.
In the following I will use the notation f (x) for a general scalar field. In general f (x) also
depends on time as f (x, t) but in this section we won’t need explicit time dependence (and it is
easy to add to the notation).

9.1.1 PDF of random fields


The scalar fields f (x) are random fields. That means that they are drawn from some probability
density function (PDF) which we will write as

P[f ] (9.3)

To be precise, here f (x) is a continuous function so that P is a probability density functional.


Continuous functions are appropriate for analytic calculations, but simulations as well as data
analysis must discretize space. We will study discrete coordinate below which you will encounter
in numerical examples.
As far as we know the universe is statistically homogeneous and statistically isotropic,
which means that on average it looks the same in all places and all directions. Mathematically
this can be expressed by defining a translation operator

T̂a f (x) ≡ f (x − a), (9.4)

and a rotation operator


R̂f (x) ≡ f (R−1 x), (9.5)
where R is a rotation matrix. Given these operators the isotropic and homogeneous field PDF
obeys

P[f (x)] = P[T̂a f (x)] (9.6)

and

P[f (x)] = P[R̂f (x)] (9.7)

for any translation or rotation.

9.1.2 Position space correlation functions


The field PDF, which can often be parametrized as a function of a few cosmological parameters,
encodes all there is to know about the random field. Often we don’t work with the field PDF
directly but with statistics that are easier to work with. All of these can in principle be calcu-
lated from the field PDF. In position space the most basic such statistic is the (position space)

45
correlation function. Correlation functions of fields are expectation values of products of fields
at different spatial points. The two point correlator is
Z
ξ(x, y) ≡ ⟨f (x)f (y)⟩ = Df P(f )f (x)f (y), (9.8)

where the integral is a functional integral (or path integral) over field configurations. This is the
usual definition of an expectation value in statistics.
By statistical homogeneity, the correlation function can only depend on the difference of the
positions x + r and x and statistical isotropy enforces dependence on the magnitude only. In this
case the correlation function is given by

⟨f (x)f (x + r)⟩ = ξ(|r|) = ξ(r) (9.9)

The proof of this intuitive statement can easily be found in textbooks. The correlation function
of galaxies and other observable fields can be measured and is used to probe properties of the
universe.

9.1.3 Fourier Space


We often work in Fourier space (also called momentum space) rather than position space.
The main reason is that on large scales or at early times, the perturbations of the universe
evolve linearly. This means that Fourier modes evolve independently, rather than coupling to
another. Recall that Fourier space is used to solve Linear Homogeneous Differential Equations
with Constant Coefficients. Fourier transforms can diagonalize such differential equations, turning
them into algebraic equations in Fourier space. This is because differentiation in real space
corresponds to multiplication with k in Fourier space. This sort of differential equations appear
when we linearize the Euler, Poisson and continuity equation for small perturbations. We will
do this math later. For now, let’s discuss Fourier space.
We are using the following conventions for the continuous Fourier transform.
Z
f (k) = d3 x exp−ik·x f (x) (9.10)

and
d3 k
Z
f (x) = expik·x f (k) (9.11)
(2π)3
A nice discussion of other consistent Fourier conventions is in appendix A of 0907.5424.
Cosmology needs Fourier space (also called k-space or momentum space) as much as position
space (also called x-space or configuration space), so let’s review some properties:

• If f (x) is dimensionless, the Fourier modes f (k) have dimension [length]3 .

• If the position space fields is real we have f (k) = f ∗ (−k). This can be shown by Fourier
transforming f (x) = f ∗ (x).

46
• Under spatial translation, the Fourier transform gets a phase factor
Z
T̂a f (k) = d3 x f (x − a)e−ik·x (9.12)
Z

= d3 x′ f (x′ )e−ik·x e−ik·a (9.13)

= f (k)e−ik·a . (9.14)

where x′ = x − a.

• The Fourier space representation of the nabla operator is given by ∇ → ik.

We will often use the Dirac delta function identity


1
Z
′ ′
δD (k − k ) = 3
d3 x e±i(k−k )·x . (9.15)
(2π)

The delta function has the dimension of the inverse of its argument, thus here in 3d it has
dimension [k −3 ] = [length]3 . This is also the orthogonality relation for plane waves in an infinite
volume. In the other direction the delta function is
1
Z
′ ′
δD (x − x ) = 3
d3 k e±ik·(x−x ) . (9.16)
(2π)

Using these delta function definitions you can check that the Fourier transform of the Fourier
transform returns the original function as it must.

9.1.4 Power spectrum


The famous power spectrum is the 2-point function in Fourier space:
Z
′ ′
⟨f (k)f (k )⟩ = d3 xd3 x′ e−ik·x eik ·x ⟨f (x)f (x′ )⟩
∗ ′
(9.17)
Z
′ ′
= d3 rd3 x′ e−ik·r e−i(k−k )·x ξ(r) (9.18)
Z
= (2π) δD (k − k ) d3 re−ik·r ξ(r)
3 ′
(9.19)

= (2π)3 δD (k − k′ )P (k) (9.20)

where, in the second line, we introduced r ≡ x − x′ and then performed the integral over x′ which
gives us a Dirac delta function. We see that different Fourier modes are uncorrelated. This is a
consequence of translation invariance. The power spectrum can also be written in the equivalent
form

⟨f (k)f (k′ )⟩ = (2π)3 δD (k + k′ )P (k) (9.21)

(note the change of signs and conjugates) due to the reality condition.
The power spectrum P(k) and the correlation function ξ(r) are related by the 3-dimensional
Fourier transform. We can simplify this relation as follows. Using spherical coordinates, k · r =

47
kr cos θ we have
Z
P (k) = d3 r e−ik·r ξ(r) (9.22)
Z 2π Z 1 Z ∞
= dϕ d(cos θ) dr r2 e−ikr cos θ ξ(r) (9.23)
0 −1 0
Z ∞
r2 h ikr i
= 2π dr e − e−ikr ξ(r) (9.24)
ikr
Z0 ∞

= dr r sin(kr)ξ(r) (9.25)
k 0
Z ∞
= 4π dr r2 j0 (kr)ξ(r). (9.26)
0

where
sin x
j0 (x) = (9.27)
x
is a spherical Bessel function of order zero. These functions are frequently encountered in cos-
mology. In the other direction, one can express ξ in terms of the power spectrum as
d3 r ik·r
Z
ξ(r) = e P (k) (9.28)
(2π)3
dk k 2
Z
= j0 (kr) P (k). (9.29)
2π 2
The power spectrum has the dimension [length]3 . It is often useful to define the dimensionless
power spectrum by multiplying with k 3
k3
∆2 (k) = PR (k) (9.30)
2π 2
which we encountered before in Eq.(6.35). There are different conventions around for the π and
the 2 factor.

9.2 Gaussian Random Fields


The statements we made above are correct for any homogeneous isotropic random field. However
a Gaussian Random Field (GRF) is particularly important in cosmology. Inflationary physics
predicts an (almost) Gaussian Random field, and on large scales, the universe remains Gaussian
through its evolution. Let’s now see how a GRF is defined.

9.2.1 GRFs in Position Space


A vector f = [f1 , . . . , fN ] of random variables is called Gaussian, if the joint probability density
function (PDF) is a multivariate Gaussian
 
1 1 −1
P (f ) = p exp − fi Cij fj (9.31)
(2π)N |C| 2

where the positive definite, symmetric N × N -matrix Cij = ⟨fi fj ⟩ is called the covariance matrix.
A random field f : R3 → R is a Gaussian random field (GRF) if for arbitrary collections of

48
field points (x1 , ..., xN ) the variables [f (x1 ), ..., f (xN )] are joint Gaussian variables. Since any
N-point function can be calculated from the field PDF, the GRF is fully defined in terms of
its covariance matrix, which is the 2-point function. As we see from the PDF, a Gaussian
random field is not neccessarily homogeneous and isotropic. To make it so, we need to enforce
that the covariance matrix is

Cij = ⟨fi fj ⟩ = ξ(|xi − xj |) (9.32)

Here we have discretized the PDF, i.e. we wrote f as a finite dimensional vector rather than
an infinite dimensional function. In principle, for the continuous fields that we have discussed so
far, we should express the GRF using a Gaussian functional which is schematically
 
1
Z
3 3
F [f (x)] ∝ exp − d xd y f (x)C(x − y)f (y) (9.33)
2

In practice we don’t usually need this continuous expression.

9.2.2 GRFs in Fourier space


The field PDF has the same mathematical form when expressed in momentum space, where the
vector f = [f1 , . . . , fN ] is over the Fourier modes. In this case, for a homogeneous isotropic field,
the covariance matrix is

⟨fki fk∗j ⟩ ∝ δik P (k) (9.34)

with Kronecker delta δik . The covariance matrix for a homogeneous field is thus diagonal in
momentum space, i.e. the covariance matrix between different Fourier modes is zero. Note
that the Fourier modes fk , unlike the position space field, are complex numbers. We will get
back to the precise definition of a Gaussian random field on a finite volume (i.e. with discrete
Fourier modes) shortly, including the proportionality factor.

9.3 Power Law Power Spectra


9.3.1 Power Laws
A typical power spectrum

⟨f (k)f ∗ (k′ )⟩ = (2π)3 δD (k − k′ )P(k) (9.35)

is given by a power law,


P (k) = A k n (9.36)
where n is called the spectral index. The corresponding dimensionles power spectrum is

∆2 (k) ∝ A k n+3 (9.37)

Some special cases are n = 0 which is called white noise and n = 1 which is called the Harrison-
Zeldovich power spectrum, as we will see below.

49
9.3.2 Potential and Density Power Spectra
To discuss typical power spectra we need to discriminate between the power spectrum of the
gravitational potential Φ and the power spectrum of resulting matter perturbations δm , which
are related by the Poisson equation. The primordial gravitational potential Φ is closely related to
the primordial curvature perturbation R as we discussed in Sec. 6.6.1. We are switching notation
to Φ now instead of R because the following is also valid in Newtonian gravity which works with a
Newtonian gravitational potential Φ. The Poisson equation for matter in an expanding space-time
is
4πG
∇2 Φ(x) = 2 a2 ρ̄δ(x) (9.38)
c
which in momentum space is
4πG
−k 2 Φ(k) = 2 a2 ρ̄δ(k) (9.39)
c
Thus the power spectra of the two quantities will be related by

PΦ (k) ∝ k −4 Pδ (k) (9.40)

The relation between the density power spectrum and the primordial potential power spectrum
is thus

Pδ (k) ∝ k 4 PΦ (k) ∝ k∆2Φ (k) (9.41)

If the dimensionless primordial power spectrum is constant (rather than k dependent), i.e. if the
primordial curvature perturbations have the same amplitude on all scales, then the density power
spectrum is

Pδ (k) ∝ k PΦ (k) ∝ k n−4 = k −3 (9.42)

This is called a Harrison-Zeldovich Power Spectrum. The potential fluctuations are said to
be scale-invariant primordial fluctuations. This is the case ns = 1 (remember that ns ∼ 0.96,
so it’s close).
We also sometimes need the variance of the field (also called the zero-lag correlation function)
given by Z
σf2 ≡ ⟨f 2 (x)⟩ = ξf (0) = 1/(2π)3 d3 k Pf (k) . (9.43)

which can be written as Z


σf2 ≡ d ln k ∆2f (k) , (9.44)

where
k3
∆2f (k) ≡ Pf (k) . (9.45)
2π 2
The dimensionless power spectrum is thus the contribution to variance per log wave number. If
the dimensionless power spectrum has a peak at some k∗ then fluctuations in f are dominated by
wavelengths of order 2π
k∗ . Note that the integral Eq.(9.44) is divergent in the large k limit unless
the field is smoothed at some scale so that the power spectrum goes to zero. We will get back to
the smoothing of fields.

50
9.3.3 Illustrating Power Law Power Spectra in 2d
We’d like to get some intuition for how Gaussian density fields look like that have power law
spectra. We will illustrate these fields in 2d (as appropriate for the CMB).
Let’s have a look at the position space correlation function for arbitrary dimension d:

dd k dd k ′ −ik·x−ik′ ·y
Z Z
⟨f (x)f (y)⟩ = e ⟨f (k)f (k′ )⟩ (9.46)
(2π)d (2π)d
dd k −ik·(x−y)
Z
= e Pf (k). (9.47)
(2π)d

We again consider a power law of form

P (k) = A k n (9.48)

For any dimension d, for n = 0, we find that


d
⟨f (x)f (y)⟩ ∼ δD (x − y) (9.49)

Which means that all pixels are uncorrelated. This is called white noise. This is illustrated in
Fig.10 top left.
If we decrease n, for example P (k) = A k −1 the correlation between points increases, i.e.
nearby points become more likely to have a similar value (Fig.10). There is a special value of n
for which the field becomes scale invariant, i.e. the correlation between any two points becomes
independent of distance. In dimension d this is n = −d, so in 3d it is n = −3 as we have already
seen above and in 2d it is n = −2. To show this, we rescale the correlation function by a factor λ

dd k −iλk·(x−y) −d
Z
⟨Φ(λx)Φ(λy)⟩ = e k (9.50)
(2π)d
dd k ′ −d −ik′ ·(x−y) k ′ −d
Z  
= λ e (9.51)
(2π)d λ
= ⟨Φ(x)Φ(y)⟩ (9.52)

where we changed variables to k′ = λk. If we go more negative with n than the scale invariant
value, then the universe becomes more inhomogeneous on larger scales (i.e. we see larger pertur-
bations if we zoom out). This would be inconsistent with the cosmological principle which wants
the universe to be homogeneous on large scales..

51
Figure 10. 2d Gaussian random fields with power spectrum P (k) = A k n for various n. For n = 0 we get
white noise and for lower n we get progressively more correlation. The scale invariant case is n = −2 in
2d. This plot was made with the Pylians library. The script is provided with the course material.

52
9.4 Matter Power Spectrum and Boltzmann Codes
Apart from power-law power spectra, the most important power spectrum in this course is perhaps
the matter power spectrum. It can be written approximately as two different power laws.

9.4.1 Transfer function and Growth function


On large scales the density perturbations of the universe evolve linearly. That means that the
evolution of perturbations δ(k) can be described by a transfer function T as follows:

δm (k, t) ∝ T (k) D(a(t)) δm (k, ti ) (9.53)


2
∝ T (k) D(a(t)) k Φ(k, ti ) (9.54)

where ti is the initial time, taken just after inflation. The function D(a(t)) is called the growth
function. There are various possible conventions for T and D but the key point is that the time
and k dependence factorizes.
It follows that the power spectrum evolves as

Pm (k, t) = T 2 (k) D2 (a(t)) Pm (k, ti ) (9.55)

For non-relativistic matter the transfer function is

T (k) ∼ 1 for k < keq (9.56)


−2
T (k) ∼ constant × k for k > keq (9.57)

The transfer function thus crucially depends on their comoving k compared to the wave number

keq = (aH)eq (9.58)

of the mode that entered the Hubble horizon at the time of matter-radiation equality. Modes
larger than this (k < keq ) enter the horizon in the matter dominated era and modes smaller than
this (k > keq ) will have entered during radiation domination. The form of the transfer functions
comes from the fact that radiation domination stops (or more precisely slows to logarithmic)
growth of perturbations as we will discuss later.
If we start with the power-law spectrum P ∼ k n , then it subsequently evolves to
(
kn for k < keq
P (k) =
k n−4 for k > keq

with the turnover near ak ≈ akeq ∼ 0.01 Mpc−1 . As we have discussed, for a Harrison-Zeldovich
power spectrum we have n = 1, while in we measure n = 0.96. The linear matter power spectrum,
scaled to today, is shown in Fig. 11.

53
Figure 11. The linear matter power spectrum scaled to z = 0 from Planck 2018 CMB data and various
galaxy surveys. We see the power spectrum turnover at keq .

9.4.2 Boltzmann Solvers


The polynomial approximation to the matter power spectrum is of course not exact, especially
around the turnover which happens rather gradually. In practice, cosmologists calculate the
power spectrum of Gaussian fields such as the matter density and the CMB temperature with
so-called Boltzmann codes which solve the Einstein-Boltzmann equations, briefly discussed in
Sec. 5.5, numerically. There are currently two main codes that are used by the community:

• CAMB. https://camb.info/. CAMB, written in Fortran, has been the community stan-
dard for a long time. It comes with a nice python wrapper and documentation https:
//camb.readthedocs.io/en/latest/ and a demo notebook https://camb.readthedocs.
io/en/latest/CAMBdemo.html.

• CLASS. https://lesgourg.github.io/class_public/class.html. This is a more re-


cent C++ implementation which is becoming increasingly popular. If you need to modify
(rather than just run) a Boltzmann code you may be better off with CLASS. There is also
a useful extension to calculate non-linear power spectra at smaller scales using EFTofLSS
called Class-PT. We will get back to this topic in the unit on LSS. There is also a python
wrapper https://github.com/lesgourg/class_public/wiki/Python-wrapper.

Both of these codes can for example generate the black theory curve in Fig. 11. In general these
codes are only correct in the linear regime (CMB, LSS for k ≲ 0.1 Mpc−1 at z = 0). However
they have some extensions to calculate power spectra in the non-linear regime. These are based
on results from non-linear perturbation theory or N-body simulations.

54
9.5 Random scalar fields in discrete coordinates
While analytic work is usually done with continuous distributions, numerical work usually uses
a discrete data representation. For example, the 3d matter distribution can be represented as a
box of 3d pixels. Such a finite box also has a finite set of discrete Fourier modes. We work in 3d
but adapting to 2d is straight forward.

9.5.1 Fourier conventions


We now work in a finite pixelized box with side length L, with K grid points per side length and
grid length H = L/K. The box volume is then Vbox = L3 and the pixel volume is Vpix = H 3 =
Vbox /Npix where Npix = K 3 . Our Fourier conventions are then:

1 X
f (x) = f (k)eiki ·x (9.59)
Vbox
ki
d3 k
Z
= 3
f (k)eiki ·x (9.60)
Vk (2π)
X
f (k) = Vpix f (xi )e−ik·xi (9.61)
xi
Z
= d3 x f (x) e−ikx (9.62)
Vbox

Here we also introduced Vk = ( 2π 3


H ) , the volume of the Fourier space cube. Fourier conventions
are non-uniform in the literature, but I am using probably the most common one here. The best
discussion of the discrete Fourier transfrom in cosmology that I have found is in Donghui Jeong’s
PhD thesis appendix A. The discrete mode orthogonality condition with our conventions is
X  ′ ∗ X ′
eikxi eik xi = ei(k−k )xi = Npix δkk′ (9.63)
xi xi

The larger K the more high frequency modes we can resolve. The lowest Fourier mode which
covers one side length with one whole mode is called the fundamental mode

kf = (9.64)
L
The total set of Fourier modes is

ki ∈ (nx , ny , nz )kf (9.65)

where (nx , ny , nz ) is a set of whole numbers that runs from −K/2 to K/2. The finite number of
Fourier modes leads to cosmic variance as we will discuss further shorty.
The power spectrum is given by

f (k)f (k′ )∗ = Vbox Pf (k) δkk′ (9.66)

A few more comments:

55
• In our conventions, for a dimensionless f (xi ) the Fourier modes have again dimension
[length]3 . The power spectrum also has dimension [length]3 . The Kronecker delta is dimen-
sionless.

• For discrete modes the reality condition reads again f−k = fk∗ .

• If your data is not periodic there will be “spurious transfer of power” (aliasing) in your FT.
We’ll address this when we talk about experimental masks.

• The highest frequency that we can resolve is the Nyquist frequency of the grid given by
K Kπ π
kN y = kf = = (9.67)
2 L H
9.5.2 Gaussian random field in discrete coordinates
For a Gaussian random field, of course our discrete Fourier modes are drawn from a Gaussian
distribution. Since they are complex numbers, let’s understand precisely what that means. This
will also suggest how we can generate such a field in code.
We split modes into their real and imaginary parts as f (k) = a(k) + ib(k). The reality of
f requires f−k = fk∗ and hence the real and imaginary parts of fk must satisfy the constraints
a−k = ak and b−k = −bk . For a homogeneous and isotropic Gaussian processes these modes are
drawn from:

p(ak , bk ) = p(ak )p(bk ) (9.68)


 2  2
1 ak 1 b
=√ exp − 2 √ exp − k2 (9.69)
πσk σk πσk σk
where the variance is equal to σk2 /2 and is the same for both independent variables ak and bk .
For the expectation value of the product of Fourier coefficients we get

⟨fk fk′ ⟩ = ⟨ak ak′ ⟩ + i(⟨ak bk′ ⟩ + ⟨ak′ bk ⟩) − ⟨bk bk′ ⟩ = σk2 δk,−k′ ,

where we have taken into account that a−k = ak and b−k = −bk and that the two random
variables a and b are uncorrelated. One can also change variables from a, b to polar coordinates
r, ϕ and find that the PDF of r is a Rayleigh distribution and the PDF of the phase is constant:
2r − r2 1
p(r) = e σ p(ϕ) = . (9.70)
σ2 2π
Comparing with the above we have σk2 = Vbox Pf (k).

9.5.3 Implementing Fourier Transforms


The way to calculate the Fourier transform is of course the famous Fast Fourier Transform
algorithm. In numpy this can be done in n dimensions with numpy.fftn and numpy.rfftn. Numer-
ical methods are explained in detail in the famous book “Numerical Recipes” (NumReps) which
goes in detail over FFTs. A main choice one can make is whether to use a complex FFT or a
real FFT called RFFT. The RFFT enforces the reality condition f−k = fk∗ to be more memory
efficient (it saves only half as many modes) but the code can look somewhat less elegant than
with the complex FFT.

56
Figure 12. Discrete Fourier grid and discrete modes (blue points) contributing to a wavenumber bin
(blue shaded region) centered around k.

9.6 Power spectrum estimation


9.6.1 Power spectrum estimator
We estimate the power spectrum in bins, i.e. spherical shells of width dk corresponding to an
interval in wavevector magnitude [k ± ] = [k − , k + ) = [k − dk dk
2 , k + 2 ) centered at k by averaging
the square of all the modes in this bin:
1 X
P̂ (k) = f (ki )f ∗ (ki ),
Nk Vbox −
ki ∈[k ,k+)

where Nk is the number of cells in the k-bin. The modes are illustrated in Fig.12.
This estimate assumes statistical isotropy (i.e. the power spectrum depends only on the
magnitude of the wave vector. It is easy to see that this estimator is unbiased:
1 X
⟨P̂ ⟩ = ⟨f (ki )f ∗ (ki )⟩ (9.71)
Nk Vbox −
ki ∈[k ,k+)
1 X
= Vbox Pf (ki ) (9.72)
Nk Vbox
ki ∈[k− ,k+)

= Pf (k) (9.73)

where we have used Eq.(9.66). In the last step, for a finite bin width, in principle we should
average the theory power spectrum over the same modes, but in practice for narrow bins this is
not neccessary (i.e. all modes in the bin have almost the same theoretical power spectrum). Power
spectrum estimation is no longer as easy when statistical isotropy is broken by the experiment
(i.e. we only observe a part of the sky) but we will deal with this difficulty in later chapters.
We note that here we estimate the power spectrum, which is defined as an expectation value
over the PDF of the random field (as in Eq.(9.8) but in Fourier space) from a single universe.
This is possible because the modes are independent so they are all drawn from the same PDF,
whether in the same universe or in different universes.

57
It is also useful to calculate the number of modes in a power spectrum bin analytically (rather
than numerically from the FT grid). The number of modes is given by

Vshell 4πk 2 dk 4πk 3 dlnk


Nk = = = (9.74)
Vf Vf Vf
3
where Vf is called the volume of the fundamental cell Vf = (2π)
Vbox . This means that in logarithmic
bins (in which we often plot the power spectrum), the number of modes goes as k 3 .

9.6.2 Cosmic Variance


We want to calculate the variance of the power spectrum estimator, which tells us how well we
can measure the power spectrum from a given cosmological volume. Recall that the variance of
a random variable X is

V [x] = ⟨(X − ⟨X⟩)2 ⟩ (9.75)


2 2
= ⟨X ⟩ − (⟨X⟩) (9.76)

For the power spectrum estimator we thus have


1 X
V [P̂ (k)] = ⟨P̂ 2 (k)⟩ − ⟨P̂ (k)⟩2 = 2 ⟨f (ki )f (−ki )f (kj )f (−kj )⟩ − P 2 (k) (9.77)
Nk V 2
ki ,kj ∈[k±]

To make progress, we need an important theorem for Gaussian fields called Wick’s theorem
(which also appears in QFT). Wick’s theorem states that the higher order correlation functions
of a Gaussian random field of mean zero can be expressed as certain products of the two point
function. This implies that the 3-point function ⟨f (k1 )f (k2 )f (k3 )⟩ of such a field must vanish,
as do all odd N-point function. On the other hand for a 4-point functions, as we have in our
calculation, Wick’s theorem states:

⟨f1 f2 f3 f4 ⟩ = ⟨f1 f2 ⟩⟨f3 f4 ⟩ + ⟨f1 f3 ⟩⟨f2 f4 ⟩ + ⟨f1 f4 ⟩⟨f2 f3 ⟩ (9.78)

A general discussion of Wick’s theorem can be found in some cosmology text books. Using this
relation in our calculation we get
1 X 2 X
⟨P̂ 2 (k)⟩ − ⟨P̂ (k)⟩2 = P (ki )P (kj ) + P 2 (ki ) − P 2 (k) (9.79)
Nk2 Nk2
ki ,kj ∈[k±] ki ∈[k±]
2 2
= P (k) (9.80)
Nk
Thus the relative error on the power spectrum is given by:
r
∆P 2
= (9.81)
P Nk

The factor of 2 comes from the modes being complex (thus in the k-sphere we double counted) and

the N may be familiar from the error bar in a histrogram. From Eq.(9.74) we see that the error
−(1/2)
scales with the box volume as Vbox . So we need four times the cosmological volume to reduce

58
the error bar by a factor of 2. Remember that this calculation is only correct in the Gaussian
case. For the smaller scale non-linear power spectrum the variance also gets a contribution from
the so-called connected 4-point function which cannot be reduced to 2-point functions by Wick’s
theorem.
The error that results from a finite number of Fourier modes in a given cosmological volume
is called cosmic variance. Since the observable universe is also finite in size, we will never be
able to measure the power spectrum on large scales precisely. This is reflected in the error bars
in Fig. 11.

9.6.3 Power spectrum estimation with experimental noise


A common situation in cosmology is that we can only measure the field that we are interested in
up to some noise. Let’s assume that we measure the galaxy density field δg up to some noise n,
i.e. that
δgobs (k) = δg (k) + n(k) (9.82)
where n(k) is a Gaussian field of noise. These quantities have the power spectrum

⟨δg (k)δg∗ (k′ )⟩ = (2π)3 δD (k − k′ )Pg (k) (9.83)


⟨n(k)n∗ (k′ )⟩ = (2π)3 δD (k − k′ )N (k) (9.84)
∗ ′
⟨δg (k)n (k )⟩ = 0 (9.85)

The last equation means that they are mutually uncorrelated. This is a good approximation in
many situations. If we now calculate the expectation value of the observed power spectrum we
get
1 X
⟨P̂ obs ⟩ = ⟨δgobs (ki )δgobs∗ (ki )⟩ (9.86)
Nk Vbox
ki ∈[k− ,k+)

= Pg (k) + N (k) (9.87)

That means that to measure the true power spectrum Pg , we need to subtract the noise power
spectrum from the observed spectrum. We can also calculate the variance
2
V [P̂ obs (k)] = (Pg (k) + N (k))2 (9.88)
Nk
In the case of a galaxy survey, the noise is to good approximation given by the comoving number
density n̄g as
1
Ng = . (9.89)
n̄g
This is called Poisson noise or shot noise. If P is much larger than N then the error is
cosmic variance domianted while if N is larger than P the error is noise dominated. For a
CMB experiment the noise power spectrum depends on the angular resolution and sensitivity of
the CMB detector. For both CMB and galaxy surveys, on large scales the error is always cosmic
variance dominated.

59
10 Basics of Statistics
We now discuss how to measure and forecast cosmological parameters. Above we have learned
how to

• calculate a theoretical power spectrum, such as the power spectrum Pm of the matter
density, using for example CAMB

• estimate the observed power spectrum P̂ obs from the data, for example the matter distri-
obs with the estimator Eq. (9.86).
bution δm

To measure cosmological parameters Λ from the power spectrum, schematically one adjusts
the parameters Λ so that Pm matches P̂ obs up to noise. While we are primarily considering the
power spectrum here, the same methodology can be used for other summary statistics such as
the 3-point function. In this section we discuss how this parameter fitting works in detail, and
also how we can forecast parameter sensitivity without having taken any data. Let’s start with
reviewing some concepts from statistics.

10.1 Estimators
While we have already used the concept, let’s define what an estimator is. If a random variable
x is characterized by a PDF p(x|λ) dependent on a parameter λ, then an estimator for λ is a
function E(x) used to infer the value of the parameter. If a given dataset {xobs } is drawn from
the distribution p(x, λ), then λ̂ = E(xobs ) is the estimate of the parameter λ from the given
observations. We often use a “hat” over the variable to indicate an estimator. Since E is a
function of a random variable, it is itself a random variable. A random variable obtained as a
function of another set of random variables is often referred to as a statistic.
An estimator for a parameter λ is unbiased if its average value is equal to the true value of
the parameter:
⟨λ̂⟩ = λ. (10.1)
We want our estimator to be unbiased. However, biased estimators can also be useful, since it
can be possible to “unbias” them.
After unbiasedness, the second key property of an estimator is its expected error or variance.
The variance is given by

Var[λ̂] = ⟨(λ̂ − ⟨λ̂⟩)2 ⟩ (10.2)


= ⟨λ̂2 ⟩ − ⟨λ̂⟩2 (10.3)

and the error is given by the square root of the variance


q
σλ = ⟨(λ̂ − ⟨λ̂⟩)2 ⟩, (10.4)

We try to find an estimator that is unbiased and that has as small an error as possible. One can
often show which estimator will have the smallest possible error bar. Such an estimator is called
an optimal estimator. We already saw an optimal estimator, the one for the power spectrum
in Eq. (9.86), although we did not prove optimality.

60
If we have several estimators, we are also interested in their covariance

Cov[λ̂i , λ̂j ] = ⟨( λ̂i − ⟨λ̂i ⟩) (λ̂j − ⟨λ̂j ⟩) ⟩ (10.5)


= ⟨λ̂i λ̂j ⟩ − ⟨λ̂i ⟩⟨λ̂j ⟩ (10.6)

From the covariance, one can also calculate their cross-correlation (which is between −1 and
1) as

Cov[λ̂i , λ̂j ]
Corr[λ̂i , λ̂j ] = q (10.7)
Cov[λ̂i , λ̂i ]Cov[λ̂j , λ̂j ]

which tells us whether the estimators are correlated, anti-correlated or uncorrelated (Corr = 0).

10.2 Likelihoods, Posteriors, Bayes Theorem


The central concept to connect data to theory is the likelihood function. The likelihood is the
probability of measuring data d given a model M with parameters λ. We write it as

L(d|λ, M ) (10.8)

where the line | is read as “given”. It is often possible to write down the likelihood function
analytically. The likelihood does not tell us what model and model parameters are likely given
the data (rather it answers the opposite question). It is the posterior probability

P(λ, M |d) (10.9)

that measures parameters for us. From now on I will drop the label M , since a likelihood and a
posterior are always only defined assuming some model (e.g. Lambda-CDM), and are different
if you assume a different model. To connect the posterior and the Likelihood we need Bayes
theorem:
L(d|λ)P(λ)
P(λ|d) = (10.10)
P(d)

Here we also have the

• The prior P(λ) of the parameters in model M before the data is analyzed. The choice of
the prior can be somewhat tricky but often flat or Gaussian works.

• The evidence P(d) which is the probability of seeing the data d under any parameters λ
of the model. The evidence can also be written as
Z
P(d) = L(d|λ)P(λ)dλ (10.11)

The evidence can be difficult to calculate numerically because it is often a huge multidi-
mensional integral. However in many cases we do not need to evaluate it, since it only
depends on the data and thus does not change our measurement of model parameter λ.
The evidence is however useful for model selection as we will discuss later.

61
10.3 Gaussian Likelihoods
Both the likelihood and the posterior are of course probability distributions. It turns out that a
Gaussian likelihood is often a good approximation of the data (while the posterior is often not
Gaussian). Consider first the simple case of a single Gaussian random variable. Imagine you
want to measure a persons’s weight w (following Dodelson’s example). To get an error bar, you
will measure the weight m times. In each measurement, you get data di which is given by the
true value plus some (Gaussian) noise: di = w + ni . If our measurements are independent, then
the likelihood is  Pm 2
m 1 i=1 (di − w)
L({di }i=1 |w, σw ) = 2 )m/2
exp − 2
(10.12)
(2πσw 2σw
which is the product of the likelihoods of the individual measurements. The parameters of our
model here are the true weight w and the variance of the data σ 2 . To find the maximum likelihood
estimator for our parameters, in this simple case we can do the maximization analytically. We are
assuming a flat prior here, so that the prior does not change our estimate. Taking the derivative
we get  
m  Pm 2
∂L 1  (dj − w) exp − i=1 (di − w)
X
= 2 2 )m/2 2
(10.13)
∂w σw (2πσw 2σw
j=1

which we has its maximum at


m
∂L X
=0 ⇒ (dj − w) = 0 (10.14)
∂w
j=1

The maximum likelihood estimator is then


m
1 X
ŵ = di (10.15)
m
i=1

2 estimator from
which one can guess of course. In the same way we can calculate the σw
Pm 2
−m i=1 (di − w)

∂L
2
=L× 2
+ 4
(10.16)
∂σw 2σw 2σw
Setting this to zero gives
m
1 X
σˆw
2 = (di − w)2 . (10.17)
m
i=1
which is the well-known estimator of the variance. Taking into account that w is also estimated
from the same data one gets m → m − 1.
We can also calculate the variance (error) of our two estimators using Eq.(10.4). The answer
is
σ2
Var[ŵ] = w (10.18)
m
and
2 2 4
Var[σˆw
2] = (σ ) (10.19)
m w
These calculations are written out in Dodelson’s textbook.

62
Often we are interested in measuring only a subset of the parameters, while other parameters
are considered nuisance parameters. In the weight example, we may be interested in measuring
w but do not have knowledge of σw . Then, given the full posterior P (w, σw |di ), we can calculate
the the marginalized posterior
Z ∞
P (w|di ) = dσw P (w, σw |di ). (10.20)
0

10.3.1 Power spectrum likelihood


An important example of an (approximately) Gaussian likelihood is the likelihood of the power
spectrum as a function of cosmological parameters p:
1 X   
P̂ (ki ) − P theo (ki , λ) Cov−1 k ,k P̂ (kj ) − P theo (kj , λ) + const.

ln L({P̂ (k)}|λ) = −
2 i j
ki ,kj
(10.21)

Here we have dropped the determinant term of the Gaussian, assuming that the covariance matrix
does not depend on cosmological parameters (a common assumption) and included off-diagonal
terms between different modes of the power spectrum (which are needed in the non-linear regime
and when including masks). The covariance matrix (defined in Eq.(10.5)) in a real experiment
can usually not be calculated analytically but is estimated using simulations. We’ll talk more
about this in the large-scale structure unit. We also note that, while the covariance matrix is in
principle parameter dependent, it is usually better to evaluate it at fixed fiducial values. This
has to do with the power spectrum likelihood not being exactly Gaussian, see arxiv:1204.4724
for details. For this reason we have also dropped the determinant term of the Gaussian.

10.3.2 Gaussian Field likelihood


The likelihood for the power spectrum is an example of a likelihood for a summary statistic.
In cosmology we also use likelihoods for fields. We have already seen the PDF of a Gaussian
field. When interpreted as a function of the parameters λ of the power spectrum the field-level
likelihood is

δ(ki ) (Cov(λ))−1
X

−2 ln L({δm (k)}|λ) = ki ,kj δ (kj ) + log |Cov(λ)| (10.22)
ki ,kj

where the covariance is given in terms of the power spectrum P theo (k, p) and diagonal in the
homogeneous case. We have already discussed this PDF in Sec. 9.5.2 in a different notation.
The dagger † is required because the Fourier modes are complex. In this equation we have kept
a parameter dependent covariance matrix, and thus the determinant term of the Gaussian. The
determinant here is required if we want to make the PDF dependent on cosmological parameters.

10.3.3 Beyond Gaussianity


Gaussianity is often ensured by the central limit theorem (CLT). For example, in the power
spectrum estimator, we average over many modes in a single k bin, which have a similar variance
and are approximately uncorrelated (at least on roughly linear scales), so the CLT holds to good

63
approximation. If we can assume Gaussianity then our task in specifying the likelihood is vastly
simplified because we “only” need to determine the right covariance matrix (usually from a set
of simulations).
However we should stress that likelihoods are not always Gaussian or nearly Gaussian. For
small number statistics, Poissonian likelihoods are also common. If the likelihood is more compli-
cated, we need a different approach. Fortunately, if we have enough simulations, we can instead
learn L(d|p) as a free function from simulations, using machine learning. This approach is called
likelihood-free inference (LFI) (meaning that we must learn the likelihood). We will get back
to these methods in Sec. 28. Of course learning a free PDF is much more difficult than determin-
ing just a covariance matrix. In my impression, LFI with ∼ 10 variables (such as the cosmological
parameters) often works, but it gets difficult in much higher dimension.

10.4 Using the likelihood and the posterior


From the likelihood one can also define an estimator, called the maximum likelihood estima-
tor (MLE) λ̂. It is given by solving

∂ ln L(d|λ)
=0 (10.23)
∂λ λ=λ̂

for λ̂. Sometimes this can be done analytically. In general the MLE does not have to be the
optimal estimator though under common assumptions it is. Further, from the posterior one can
define an estimator called the maximum aposteriori estimator (MAP) given by solving

∂ ln P(λ|d)
=0 (10.24)
∂λ λ=λ̂

for λ̂. If the prior is flat, which is sometimes a good and sometimes a bad choice, the MLE and
the MAP are the same. I may add more details on estimation theory later. Using estimators
such as MLE under some model is typical for the frequentist approach to statistics. In this
approach, the error bar is set by calculating the covariance of the estimator analytically, or if
that is not possible, by estimating it from simulations (Monte Carlo).
On the other hand, the Bayesian approach considers the complete posterior density. A
Bayesian would often sample from the posterior using MCMC, and summarize the posterior
by quantities such as the posterior mean (which is not the same as the MAP). Power spectrum
analysis is usually done in a Bayesian way using MCMC.
Sometimes the difference between frequentist and Bayesian statistic is also presented in terms
of the use of priors and of updating beliefs with new data. In my opinion there is no need to
be either a frequentist or a Bayesian and you can consistently use concepts from both sides. A
common complaint about frequentist analysis in cosmology is that we have only one universe and
cannot repeat the experiment. However one can still run simulations of different initial conditions
or analytically integrate over initial conditions. As long as you correctly interpret your math (e.g.
you do not claim that a 3-sigma frequentist excess of some estimator is equivalent to a 3 sigma
detection of your favorite new physics model) you won’t have inconsistencies. The full Bayesian
method formally answers the interesting physical questions most directly, but it is not always
computationally tractable and not always needed.

64
10.5 Fisher forecasting
In many situations we want to know the error on our parameters that an experiment can achieve
before having taken any data. Theory papers need to estimate whether their effect is observable,
and experiments need to be designed to meet specified sensitivity goals. These forecasts are
commonly made using the Fisher forecasting formalism (a different option is running MCMC on
synthetic data). We first discuss Fisher forecasting for Gaussian likelihoods, but the formalism
also generalizes to other likelihoods.
If a given observed variable Oa is characterized by Gaussian distributed errors, then its like-
lihood is
2
L ∝ eχ /2 , (10.25)
where the χ2 statistic is defined as:
h i2
X Oa (λ) − Ôa (λ)
χ2 = , (10.26)
a
Var [Oa ]

where Ôa are the measured values of our observable, for example the power spectrum bins
P̂ (kα ). To find the best fit parameters λ̂ we minimize χ2 (which is equivalent to maximizing the
likelihood). We assume here that the variance is not parameter dependent and thus we don’t
need the determinant term in the likelihood.
If we first work in the 1-dimensional case with only one variable λ we can expand the χ2
around its minimum
1 ∂ 2 χ2
χ2 (λ) = χ2 (λ̄) + (λ − λ̄)2 . (10.27)
2 ∂λ2 λ=λ̄
The linear term vanishes at the minimum. The quadratic term describes the local curvature
of the likelihood. It tells us how narrow or wide the minimum is, and thus what it’s error bar
is. If we define
1 ∂ 2 χ2
F≡ , (10.28)
2 ∂λ2 λ=λ̄

then we can estimate the minimum possible error on λ as 1/ F . Note that the Fisher ma-
trix depends on where we have assumed the minimum to be, i.e. it depends on the fiducial
parameters λ̄ of our forecast.
If we compute F explicitly we get
" #
∂Oa 2   ∂2O

X 1 a
Fλλ = + Oa − Ôa 2
. (10.29)
α
Var [Oa ] ∂λ ∂λ
To forecast F we will not have observed data.D Rather Ewe should be taking the expectation
value, which simplifies our expression because Oa − Ôa = 0 at the minimum (because the
measurements will fluctuate around the truth). Thus
Fλλ = ⟨Fλλ ⟩ (10.30)
" 2 #
X 1 ∂Oa
= (10.31)
α
Var [Oa ] ∂λ

65
This quantity is called the Fisher Information F . For several variables, this generalizes to the
Fisher information matrix:

Fλλ′ = ⟨Fλλ ⟩ (10.32)


  
X 1 ∂Oa ∂Oa
= (10.33)
a
Var [Oa ] ∂λ ∂λ′

If the variables are correlated, the Fisher matrix is


X  ∂Oa  
∂Ob

Fλλ′ = Cov−1 (Oa , Ob ) (10.34)
∂λ ∂λ′
a,b

From the Fisher matrix one can obtain two different errors. If we have several parameters and
we assume all parameters except λ are known then
1
σλ = √ unmarginalized (10.35)
Fλλ
More commonly, we want to know the error on λ if all other parameters are marginalized over.
This is obtained by inverting the Fisher matrix as follows
p
σλ = (F −1 )λλ marginalized (10.36)

Often the marginalized errors are significantly larger than the unmarginalized ones. An illustra-
tion of this in the 2-parameter case is shown in Fig. 13.

10.5.1 Non-Gaussian likelihoods and the Rao-Cramer bound


The Fisher matrix for any likelihood (even non-Gaussian ones) is defined as
 2
∂ ln L

Fλλ′ = − (10.37)
∂λ ∂λ′ λ=λ̂
In general, the Fisher matrix sets a lower bound on the possible error bar, called the Rao-Cramer
bound. The bound is
1
σλ ≥ √ unmarginalized (10.38)
Fλλ
p
σλ ≥ (F −1 )λλ marginalized (10.39)

For maximum likelihood estimators and large enough data sets the Rao Cramer bound is sat-
urated, which is why we wrote an equal sign in the previous section. Some details about the
Rao-Cramer bound can be found in Appendix A of 1001.4707. In cosmology we usually assume
that the Rao-Cramer bound is saturated in our forecasts.

10.5.2 Priors, subsets, and combining Fisher matrices


If we want to combine the Fisher forecast of two experiments, we can add their Fisher matrices.
This has to be done before marginalization over nuisance parameters (unless these are independent
for the two experiments). If we want to add a Gaussian prior to a parameter in the Fisher matrix

66
Figure 13. Marginalized and unmarginalized error on parameter λ1 in a 2 parameter Fisher matrix.

(for example from a different measurements), we add a term to the corresponding diagonal Fisher
matrix element
1
Fλλ → Fλλ + 2 . (10.40)
σλ
Sometimes we want to marginalize over a subset of the parameters only. This can be done as
follows:

• invert F

• remove the rows and columns of parameters we want to marginalize over, to arrive at a
smaller matrix which we call G−1

• invert this smaller matrix to get the new Fisher matrix G

A code that helps automatize these operation is pyfisher (https://pyfisher.readthedocs.io/


en/latest/). Note that numerical inversion of a Fisher matrix can fail if it is not well conditioned,
for example due to numerical inaccuracies.
A common practice is to marginalize all but 2 parameters and then plot their Fisher ellipses.
An illustration is shown in Fig. 13. A review of Fisher forecasting that explains drawing ellipses
is given in 0906.4123.

10.5.3 Fisher matrix for a general Gaussian distribution


Above we have given expressions for the Fisher matrix of a Gaussian distribution with a parameter
independent covariance matrix. This is not always a correct treatment (in particular not for the
likelihood of a Gaussian field as in Sec. 10.3.2). For a general Gaussian
 
1 1 T −1
L= exp − (d − µ(λ)) C (λ)(d − µ(λ)) (10.41)
(2π)n/2 det C1/2 2

one can show that the Fisher matrix is


1
Fij = µT,i C−1 µ,j + Tr[C−1 C,i C−1 C,j ] (10.42)
2

67
where , i is the partial derivative with respect to λi . Here I have used matrix notation rather than
index notation (sum over i, j). This form of the Fisher matrix appears very often in cosmology
papers. A derivation of this result can be found for example in arxiv:0906.0664. If our likelihood
is a Gaussian random field with mean zero, the first term is zero.

10.6 Sampling the posterior: MCMC


A typical posterior in cosmology might have between 10 and 100 parameters (physical parameters
plus nuisance parameters). Above we discussed evaluating the posterior P(λ|d). In the case of a
power spectrum analysis, the computational cost to evaluate the posterior is usually dominated
by evaluating P theo (ki , p), for example with CAMB. A single evaluation of the posterior may
take seconds to minutes. It is not possible to simply evaluate the posterior on a grid in high
dimensions, to find the region of large posterior. For example, if we wanted to sample each
axis of the posterior with 10 points and we had 20 parameters, we would need 1020 calls of the
posterior, which is completely impossible even with supercomputing.
Fortunately we don’t need to evaluate the posterior everywhere. In most of the regions it is
vanishingly small. Instead we want to find the region where the posterior is large. You may first
think to descend the gradient of the (negative) log posterior function to find its minimum, and
indeed this is a possible approach when your likelihood is differentiable (either numerically or
analytically, or using auto-differentiation). Note however that there could be multiple minima if
the posterior is “multimodal”. Assuming a single minimum, we can find the MAP by gradient
descent. However, finding the MAP is not enough. We also want to determine error bars and
covariances of the parameters, and be able to marginalize over parameters that we are not inter-
ested in. Again, this is difficult to do at the level of the posterior function in high dimensions
(sometimes approximations are possible).
However there is a much better approach, which can deal with with the ∼ 10 to ∼ 100
parameters of a common analysis in cosmology: sampling from the posterior. There are many
such sampling algorithms, the most popular being variants of Markov Chain Monte Carlo.
If we have an algorithms that can sample from the posterior, i.e. give us sets of parameters λ
that are likeli under the posterior, with the right probability, we can use these samples to obtain
the posterior mean and variance of these parameters:
msample
1 X
λ̄a = λia (10.43)
msample
i=1
msample
1 X
Var[λa ] = (λia − λ̄a )2 (10.44)
msample − 1
i=1

We can also make the famous corner plots that are often shown in cosmology papers (see for
example the Planck results in 1807.06209 Fig. 5). A key property of MCMC is that it scales
approximately linearly with the number of parameters, so we can do quite high-dimensional
problems. MCMC methods do not require the target distribution (often the posterior distribution
in Bayesian inference) to be normalized. However, they do require the ability to evaluate the
unnormalized version of the target distribution up to a constant factor.

68
How does MCMC sampling work? There is a really nice discussion in Dodelson-Schmidt
Sec. 14.6 which I will briefly summarize. A Markov Chain is an algorithm where we draw a
new sample λ′ from λ, but without considering earlier samples. The algorithm is completely
described by the conditional probability K(λ′ |λ) that takes us from a sample λ to the next one,
λ′ . The fundamental requirement on K, in order for the MCMC sampler to sample from the
right posterior, is called detailed balance:

P (λ)K(λ′ |λ) = P (λ′ )K(λ|λ′ ) (10.45)

This means that the rate for the forward reaction λ → λ′ is the same as for the reverse reaction
λ′ → λ, which means we have reached an equilibrium distribution. If we start with a distribution
of λ that follows P (λ), then an algorithm that obeys detailed balance will stay in this distribution.
Further, if you start with an arbitrary sample λinit , after drawing sufficiently many samples, the
algorithm will end up in distribution and have forgotten about its starting point (in the same way
as we can reach thermodynamic equilibrium from any initial conditions if we wait long enough).
This is called the burn in phase of MCMC. MCMC is closely connected to thermodynamics,
where an equilibrium distribution loses its memory of the initial conditions.
There are different choices for K(λ′ |λ) that obey detailed balance. A common choice is the
Metropolis Hastings algorithm. In this algorithm, we draw the next parameter sample from
a Gaussian, symmetric around the current parameter sample. This sample is then accepted
with a probability given by

P (λ′ )
 

pacc (λ , λ) = min ,1 (10.46)
P (λ)

If the new sample is not accepted, we repeat the previous step in the chain. You can check that
this procedure obeys detailed balance. The free parameter here is the width of the Gaussian from
which the next parameter is draw. If it is too small, the sampler will take a long time to map out
the PDF and may get stuck in local minima. If it is large, the sampler will have a low acceptance
rate since most samples will be very unlikely. Many algorithms adjust this value dynamically. A
good acceptance rate is about 1/3.
The most popular sampler (currently) in cosmology is called emcee, which implements an
algorithm called “Affine Invariant Markov Chain Monte Carlo (MCMC) Ensemble sampler”.
Emcee and other popular algorithms use several so called walkers which sample from the PDF
in parallel. In practice it is usually not important that you understand your MCMC algorithm
at a fundamental level, but it is critical that you use it correctly:

• We need to discard samples from the burn-in phase. One can often clearly see the burn-in
phase in the chain plots coming from the sampler (see Sec. 11 for an example).

• MCMC Samples are not statistically independent. It takes a while until the “memory” of a
sample is forgotten. This is called the auto-correlation length. One can pick one sample
per auto-correlation length for analysis. This is called thinning of the chain. Samplers
usually come with some estimator of the auto-correlation length.

69
• Sometimes one can misjudge the convergence and auto-correlation length of an MCMC
chain. Chains may be slowly drifting or even oscillating, without being noticeable at the
chain length we probed. There is no absolutely guaranteed method to avoid such problems.

• The many Monte Carlo walkers (typically 20 or more) should give statistically equivalent
samples. Comparing the different chains and their “mixing” helps judging the convergence
of the MCMC, for example using the Gelman-Rubin statistic.

• Chain convergence is slowed by degeneracies in the posterior. In such a case, a change of


variables is helpful. Some samplers in cosmology such as CosmoMC have functions to deal
with this problem for power spectrum analysis.

A typical length of an MCMC chain could be 100.000 samples. Roughly speaking, for an auto-
correlation length of 100 samples this would give us 1000 independent samples (see the emcee
documentation for best practices of auto-correlation analysis and thinning). We’ll see example
MCMC results in Sec. 11.
A common question is what prior we should use. Common choices are

• Flat priors in some window (constant probability per dλ). This is the most common case.

• Priors that are flat in the log of λ in some window (constant probability per d ln λ). This
is useful if we are unsure even about the order of magnitude of the parameter.

• Priors that are Gaussian, particularly coming from a previous independent measurement.

Note first that if the data is very informative, then the likelihood will completely dominate the
posterior and the prior becomes irrelevant (as long as it is nonzero at the maximum of the
likelihood). Conversely if the data is weak, the choice of prior changes the result substantially.
In that case no strong measurement can be made. The main reasons to put an informative prior
are

• If we have a strong and trustworthy measurement for a parameter from a different uncor-
related experiment and we want to include that information (usually as a Gaussian prior).

• If we have a physical theory that gives a reliable prior, such as that Ωm cannot be negative
or that the primordial curvature perturbations are Gaussian.

10.7 Other algorithms beyond MCMC


While MCMC still dominates astrophysics, there are other inference algorithms that are becoming
important. For very high dimensional problems, say more than 100 parameter, MCMC becomes
too slow to converge. This is because random jumps become more and more unlikely to result
in accepted samples. Instead, a sampling algorithm that has knowledge of the gradient of the
function can be much more efficient. Such an algorithm is Hamiltonian Monte Carlo (also
called Hybrid Monte Carlo). This approach is in particular used when we want to sample over field
variables, such as the lensing potential or the initial conditions of the universe. We’ll discuss this
more in Sec. 28.3. Another form of MCMC which is useful to know about is Gibbs sampling,
used for example in the Planck low-ℓ likelihood.

70
A different approach that is starting to be used in astrophysics is variational inference.
In variational inference, one fits a simpler variational distribution to approximate the true
posterior. This is useful in cases where it would be too expensive to sample from the true
posterior. However we still need to be able to evaluate the unnormalized posterior at some points
to fit the variational distribution.

10.8 Goodness of fit


There is one more crucial topic of statistics that we need to discuss: Goodness of fit of the model,
and the related topic of model testing.
Let’s start with the χ2 distribution. This is the distribution of the sum of squares of a
Gaussian. If Xi are Gaussian random variables with mean zero and variance one, then the sum
of n squares

Y = X12 + X22 + · · · + Xn2 (10.47)


has a chi-squared distribution with n degrees of freedom with the probability distribution

1
P (Y ) = Y n/2−1 e−Y /2 (10.48)
2n/2 Γ(n/2)
The χ2 distribution has the following properties:

• the mean of P (Y ) is equal to the number of degrees of freedom n.

• the variance of P (Y ) is equal to 2n.

• when n ≫ 1, the chi-squared distribution starts to look like the Gaussian distribution, with
mean n and variance 2n.

For example, the power spectrum estimator uses the sum of squares of Gaussian modes δ(k) and
thus is χ2 distributed (and approximately Gaussian for enough modes).
The χ2 distribution arises as the sum of squares of the residuals d−dmodel in a least squares
model fitting:
χ2kdof = [d − dmodel (λ)]T C −1 [d − dmodel (λ)], (10.49)
This is also the form of a Gaussian likelihood with parameter independent data covariance C
(such as in our power spectrum likelihood).
If our model is a good fit to the data we should have

χ2kdof ≈ kdof (10.50)

where
kdof = Ndata points − Nfitted parameters (10.51)
As a consistency check, if we have as many model parameters as data parameters we should get a
perfect fit without residuals. We can use the properties of the χ2 distribution such as the variance

Var(χ2kdof ) = 2kdof (10.52)

71
or the P-value to quantify whether the fit is good. So for a good fit we would have
p
χ2 ≈ kdof ± 2kdof (10.53)

If the fit is good, this implies for example that 68% of the data points are withing the 1σ error.
Otherwise we see how many sigmas we are away from a good fit.
If the χ2 is higher than expected it can mean either

• that the model does not fit the data

• or that we have underestimated the data error (wrong covariance matrix)

• or that there are systematic errors in our data

• or that the errors in our data are not Gaussian.

It can also happen that χ2 is smaller than expected if we overestimated our data error.
One sometimes also defines the reduced χ2 :

χ2
χ2red = (10.54)
Ndata points

In the common case that Ndata points ≫ Nfitted parameters the reduced χ2 should be around 1.

10.9 Model comparison


The simplest way to compare how well models fit is to compare their χ2 . If we have two models
A and B we can calculate the difference in their χ2

∆χ2 = χ2B − χ2A (10.55)

Let’s assume that A is a subset of B, for example A is ΛCDM and B is ΛCMB extended with
a free equation of state parameter for dark energy (so that w = −1 in A but free in B). Of
course the fit must be better or equal in model B. If the ∆χ2 is large (negative), then model
B is a much better fit than model A. According to Wilk’s theorem ∆χ2 can be quantified by
a χ2 distribution with degrees of freedom kdof,B − kdof,A (which would be 1 in the dark energy
example).
There are also model comparison tests for cases in which B is NOT a subset of A, in particular
the Bayesian Information Criterium (BIC) and the Akaike Information Criterion
(AIC) (see e.g. Huterer’s book).
The most consistent, but computationally challenging, way to compare models is using the
Bayesian approach. Here we calculate the Bayes Factor
P (d|A)
BAB = (10.56)
P (d|B)
from the evidence ratios of the two models. If we have priors on the models, we get the posterior
odds
P (A|d) P (A)
= BAB (10.57)
P (B|d) P (B)

72
The Bayes factor is difficult to evaluate since we need to integrate over the entire model parameter
space (Eq.(10.11)). According to the Jeffreys’ scale, for equal prior models, B > 3 is considered
weak evidence, B > 12 is considered moderate evidence and B > 150 is considered strong evidence
(less than 1/150 chance probability) for one model over the other.

11 Analyzing an N-body simulation


The section is currently covered in the Colab notebook and problem set. Will be
added here later.
We are going to analyze a few Quijote simulations (https://quijote-simulations.readthedocs.
io/). These are a large set of 45,000 simulations, covering different cosmological parameters. The
side length of the box of each simulation is 1Gpc/h. A large amount of 17,100 simulations is
generated for a fiducial Planck cosmology. This large number of simulations is for example useful
to determine covariance matrices. There are many other simulations with different cosmological
parameters. In total, Quijote contains 700 terabytes of data and required 35 million CPU core
hours.
In an N-body simulation, we track a set of individual particles as they interact gravitationally.
In Quijote, we have 5123 particles on a volume of 1(Gpc/h)3 . These partciles are initially placed
on a regular grid, with slight displacements that account for the primordial density perturbations.
Then we solve the equations of Newtonian gravity, with numerical tricks to speed the process up.
We are going to talk more about how N-body simulations work in Sec. 23.

73
Part III
Cosmic Microwave Background
Due to its linearity, the primary CMB is the cleanest probe of cosmology we have. While the
primary CMB temperature perturbations have been mapped out almost to cosmic variance,
upcoming experiments will measure E-mode polarization in more detail, while primordial B-
mode polarization has not been detected at all and is a major science target. Secondary
CMB anisotropies, which are induced by the re-scattering of CMB photons on charges, and
by gravitational lensing, have been detected but are far form being fully exploited for cosmology
and astrophysics. In this section I will focus more on secondary anisotropies and data analysis
methods, and be brief on primary CMB physics which is interesting but mostly worked out. We
will also, for the first time in this course, discuss “real world” data analysis issues such as detector
noise and the mask, which makes even power spectrum estimation rather complicated. Finally I
will discuss the topic of foreground cleaning, which is relevant also for many other types of data.

Further reading
The general references of Unit 1 all contain a discussion of the CMB. In addition I recommend

• Anthony Challinor’s 2015 lecture notes Part III Advanced Cosmology - Physics of the
Cosmic Microwave Background.

• Ruth Durrer’s textbook on CMB physics.

Both sources go into far more detail than this course.


We will also use the excellent computational notebooks provided by the CMB data analysis
summer school https://sites.google.com/cmb-s4.org/summer-school-2021/notebooks?authuser=
0.

12 Random fields on the sphere


12.1 Spherical harmonics
Consider a random real scalar field on the 2-sphere, denoted f (n̂), where n̂ is a unit vector pointing
in the direction. Spherical harmonics are a basis to represent any (well-behaved) function on
the sphere, in close analogy to the Fourier expansion in Euclidean space. The expansion in
spherical harmonics is given by
∞ X
X l
f (n̂) = flm Ylm (n̂) (12.1)
l=0 m=−l

The Ylm are familiar from quantum mechanics as the position-space representation of the
eigenstates of the angular momentum operators L̂2 = −∇2 and L̂z = −i∂ϕ (setting ℏ = 1):

∇2 Ylm = −l(l + 1)Ylm , (12.2)


∂ϕ Ylm = imYlm (12.3)

74
with l an integer ≥ 0 and m an integer with |m| ≤ l.
The spherical harmonics are orthonormal over the sphere,
Z
dn̂Ylm (n̂)Yl∗′ m′ (n̂) = δll′ δmm′ (12.4)

The spherical multipole coefficients of f (n̂) are


Z

flm = dn̂f (n̂)Ylm (n̂) (12.5)

There are various phase conventions for the Ylm ; here we adopt Ylm ∗ = (−1)m Y
l,−m so that
∗ m
flm = (−1) fl,−m for a real field.
The spherical harmonics are products of associated Legendre polynomials and an azimuthal
phase factor: s
2l + 1 (l − m)! m
Ylm (θ, ϕ) = P (cos θ) expimϕ
4π (l + m)! l
The correspondence between multipoles and angles is l ∼ π/Θ where Θ is in radians.

12.2 2-point function


It can be shown that for the 2-point correlation function of the flm to be rotationally invariant,
it must have the form

⟨flm fl′ m′ ⟩ = Cl δll′ δmm′ (12.6)
The real quantity Cl is the angular power spectrum of f (n̂). Gaussian random fields on the
sphere are again fully determined by their two-point function (i.e., their covariance) and hence
their angular power spectrum.
As in the Euclidean case, there is a relation between the power spectrum and the 2-point
function in position space. We can calculate it as follows:
X
⟨f (n̂)f (n̂′ )⟩ = ∗
⟨flm fl′ m′ ⟩Ylm (n̂)Yl∗′ m′ (n̂′ ) (12.7)
lml′ m′
X

= Cl Ylm (n̂)Ylm (n̂′ ) (12.8)
lm
X 2l + 1
= Cl Pl (n̂ · n̂′ ) (12.9)

l
= C(θ), (12.10)

where µ = n̂ · n̂′ = cos θ and we used the addition theorem for spherical harmonics,
X
∗ 2l + 1
Ylm (n̂)Ylm (n̂′ ) = Pl (n̂ · n̂′ ). (12.11)
m

We see that the 2-point function depends only on the angle, as we require from isotropy. The
inverse relation, going from position space to momentum space, is
Z 1
Cl = 2π d cos θ C(θ)Pl (cos θ). (12.12)
−1

75
In analogy to what we did in Eq.(9.43), we can calculate the variance of the field
X 2l + 1 Z
l(l + 1)Cl
C(0) = Cl ≈ d ln l. (12.13)
4π 2π
l

The quantity
l(l + 1)Cl
Dl = (12.14)

is commonly plotted and gives the contribution to the variance per log range in l. For a scale
invariant power spectrum we have Dl = const.

12.3 Discretization with HEALPix and Pixell


Unlike in the Euclidean case, pixelizing a sphere is not so straight forward. For example, to put
a measured CMB temperature map on a computer, we need to somehow store the field value
at fixed positions (or pixels) on the sphere. The industry standard to achieve this has been
HEALPIX, although newer alternatives exist.
To quote from the healpix paper (arxiv:0409513): “The simplicity of the spherical form belies
the intricacy of global analysis on the sphere. There is no known point set which achieves the
analogue of uniform sampling in Euclidean space and allows exact and invertible discrete spherical
harmonic decompositions of arbitrary but band-limited functions. Any existing proposition of
practical schemes for the discrete treatment of such functions on the sphere introduces some
(hopefully small) systematic error dependent on the global properties of the point set. The goal
is to minimise these errors and faithfully represent deterministic functions as well as realizations
of random variates both in configuration and Fourier space while maintaining computational
efficiency.”
The approach of the paper is to propose the Hierarchical Equal Area, iso-Latitude
Pixelisation (HEALPix) of the sphere. This approach can be used conveniently with the
healpy package in python. Data from cosmological surveys, such as Planck, is often delivered
in the healpix format. More recently, a different library called Pixell https://github.com/
simonsobs/pixell is also being used.

12.4 Projections of 3D random fields to the sphere


In cosmology we often need to project 3d fields onto 2d spheres. For example, we do perturbation
theory in Euclidean space, but we observe on the light cone which is sperically symmetric.
For example, the CMB is approximately given by a projection of the 3-dimensional potential
fluctuations onto a sphere centered around our current position with comoving radius χ⋆ ∼ 13800
Mpc−1 (the distance that light travelled since recombination). This is not precisely correct, since
recombination has a finite width, but we will take this into account later. The spherical projection
we discuss now is needed more generally (e.g to calculate the galaxy power spectrum on the light
cone).

76
Figure 14. Examples of the spherical Bessel function jl (x) (from Baumann’s Cosmology Lectures).

We project a 3d random field F (x) over a 2-sphere of radius r, centred on the origin, to form
the field f (n̂) = F (rn̂). Expanding F (x) in Fourier modes, we have

d3 k
Z
f (n̂) = F (k)eikrk̂·n̂ (12.15)
(2π)3
X Z d3 k 

= 4π il F (k)j l (kr)Ylm (k̂) Ylm (n̂) (12.16)
(2π)3
lm

where we have used the Rayleigh plane-wave expansion


X
eik·x = il (2l + 1)jl (kr)Pl (k̂ · n̂) (12.17)
l
X

= 4π il jl (kr)Ylm (k̂)Ylm (n̂) (12.18)
lm

Here, r = |x| and jl (kr) are the spherical Bessel functions. Extracting the spherical multipoles
of f (n̂) from above, we have

d3 k
Z
l ∗
flm = 4πi F (k)jl (kr)Ylm (k̂)
(2π)3

The spherical Bessel function peak at kr = l. This means that the observed multipoles l mainly
probe spatial structure in the 3D field F (x) with wavenumber k ≈ l/r, but higher k also con-
tribute. The Bessel functions are oscillatory and need to be evaluated precisely. Examples are
plotted in Fig.14. Evaluating such Bessel function integrals numerically in cosmology is very com-
mon and there are methods developed to speed them up, in particular the FFTlog algorithm
(e.g. 1705.05022).

77
Now we relate the power spectrum of the projected field to the power spectrum of the 3d field:

d3 k d3 k ′
Z Z
∗ l′
2 l
⟨flm fl′ m′ ⟩ =(4π) i (−i) F (k)F ∗ (k′ ) jl (kr)jl′ (k ′ r)Ylm

(k̂)Yl′ m′ (k̂′ ) (12.19)
(2π)3 (2π)3
dkk 2
Z Z
′ ∗
=4πil (−i)l P F (k)jl (kr)jl ′ (kr) dk̂Ylm (k̂)Yl′ m′ (k̂) (12.20)
2π 2
dkk 2
Z
=4πδll′ δmm′ PF (k)jl2 (kr) (12.21)
2π 2

where we used ⟨F (k)F ∗ (k′ )⟩ = (2π)3 PF (k)δ 3 (k − k′ ). Thus we get

dkk 2
Z
Cl = 4π PF (k)jl2 (kr) (12.22)
2π 2
Z
= 4π d ln k ∆2 (k)jl2 (kr) (12.23)

where in the last step we defined the dimensionless power spectrum as in Eq.(9.30).

12.5 Power spectrum estimator and covariance


Power spectrum estimation works in close analogy to the Euclidean case we discussed above. The
power spectrum estimator is simply
1 X ∗
Ĉl = f flm (12.24)
2l + 1 m lm

Let’s first check that the estimator is unbiased:


1 X ∗ 1 X
⟨Ĉl ⟩ = ⟨flm flm ⟩= Cl = Cl . (12.25)
2l + 1 m 2l + 1 m

Now we calculate the variance of the estimator

cov(Ĉl , Ĉl′ ) = ⟨Ĉl Ĉl′ ⟩ − ⟨Ĉl ⟩⟨Ĉl′ ⟩ (12.26)


1 X

= ⟨flm flm fl′ m′ fl∗′ m′ ⟩ − ⟨Cl ⟩⟨Cl′ ⟩ (12.27)
(2l + 1)(2l′ + 1) ′
m,m
1 X
= 2Cl2 δmm′ δll′ (12.28)
(2l + 1)(2l′ + 1) ′ m,m
2Cl2
= δll′ . (12.29)
2l + 1
where we used Wick’s theorem. We see that for a Gaussian field the covariance matrix is diagonal
(this does not hold if we have an experimental mask that breaks isotropy as we shall discuss soon).
Our precision is limited by the number of available modes, which gives the cosmic variance
error r
∆Cl 2
=
Cl 2l + 1
which is inversely proportional to the square root of the number of modes Nmode = 2l + 1.

78
12.6 Flatsky coordinates
For an experiment that covers only a small part of the sky, spherical harmonics are not neccessary.
Instead, one can use flat-sky coordinates. These coordinates are defined at the tangential surface
to the sphere at some point in the sky. In flatsky coordinates, we can use an ordinary 2-d Fourier
transfrom:
d2 l
Z
f (n̂) = fl eil·x . (12.30)
(2π)2
and inverse Fourier transform Z
f (l) = d2 n̂ f (n̂) eil·x . (12.31)

One can formally relate the spherical harmonics expression to the Fourier modes by taking the
large-l limit of the Legendre polynomials (see e.g. Liddle, Lyth book Sec 10.3). The correspon-
dence between the power spectra

⟨fℓm fℓ∗′ m′ ⟩ = δℓℓ′ δmm′ Cℓ , , ⟨f (l)f ∗ (l′ )⟩ ≡ (2π)2 δ D (l − l′ )Cℓflat (12.32)

is simply
Cℓ = Cℓflat (12.33)
To work with the flatsky approximation numerically we need to discretize the fourier transform
in the same way as we did in the 3d field.

13 Primary CMB power spectrum


In this course we don’t cover relativistic perturbation theory, and instead simply use the results
from CAMB or CLASS. Let’s have a look at the results and discuss the main features.
The CMB temperature is given by

T (n̂) − T0 X
Θ(n̂) ≡ = aℓm Yℓm (n̂) , (13.1)
T0
ℓm

where T0 ∼ 2.7K is the mean temperature and


Z

aℓm = dΩ Yℓm (n̂)Θ(n̂) . (13.2)

The CMB power spectrum is then

⟨a∗ℓm aℓ′ m′ ⟩ = CℓT T δℓℓ′ δmm′ (13.3)

13.1 Transfer functions and line-of-sight solution


The linear evolution which relates R and ∆T is given by the transfer function ∆T ℓ (k) through
the k-space integral

d3 k
Z

aℓm = 4π(−i) ∆T ℓ (k) Rk Yℓm (k̂) (13.4)
(2π)3

79
Using the addition theorem we get
2
Z
CℓT T = k 2 dk PR (k) ∆2T ℓ (k) . (13.5)
π | {z } | {z }
inflation evolution,projection

The transfer funtions are calculated by CAMB or CLASS.


On large scales, modes were still outside of the horizon at recombination (the Sachs-Wolfe
regime) the transfer function ∆T ℓ (k) is simply the Bessel function generated by the spherical
projection we discussed above
1
∆T ℓ (k) = jℓ (k[τ0 − τrec ]) . (13.6)
3
The angular power spectrum on large scales (small ℓ) therefore is

2
Z
TT
Cℓ = k 2 dk PR (k) jℓ2 (k[τ0 − τrec ]) . (13.7)

This is sometimes called the “snapshot approximation” or “instantaneous recombination approx-
imation”.
On smaller scales, we need to take into account that recombination does not happen instanta-
neously, but rather in a finite time window (with a comoving width of about 10 Mpc). Further,
some CMB perturbations on large scales are also sourced at later times in the universe (in partic-
ular during reionization). To take these effects into account in the transfer functions, one needs
to do a line-of-sight integral over a “source term” S(k, τ ) as follows
Z τ0
∆T ℓ (k) = dτ S(k, τ )jℓ (kτ ) . (13.8)
0

The source term S(k, τ ) comes from solving the Boltzmann equation, and the Bessel function is
again due to the spherical projection. CAMB and CLASS are calculating these transfer functions
for us. The full details are explained in one of the famous papers of cosmology, astro-ph/9603033,
which proposed the method.

13.2 The physics of the CMB Power spectrum


Let’s have a look at the CMB power spectrum as measured by Planck, ACT and SPT, in Fig.
15. Consider the different regions:

• On the largest scales (Region I), modes re-enter the horizon after recombination and thus
they do not evolve. This gives an approximately flat power spectrum in Dl . See Sec. 6.6.2
for a discussion of horizon exit and re-entry.

• Intermediate regions (Region II) are dominated by the baryon acoustic oscillations
(BAO). The BAO are oscillations in the primordial plasma of photons and electrons.

• On smaller scales (Region III) the primary perturbations are getting exponentially sup-
pressed due to diffusive damping (also called Silk damping). As the photons move
from over-dense to under-dense regions, they effectively smooth out the fluctuations in
the photon-baryon fluid on their typical scattering length scale. This leads to a suppression

80
Figure 15. The CMB temperature power spectrum (plot from Baumann’s Cosmology Lectures).

of the anisotropy in the CMB at small scales (large multipole moments) On these small
scales, secondary anisotropies due to lensing and kSZ start to dominate. We will discuss
secondary anisotropies in Sec. 17.
On all of these scales, the anisotropies are generated by three different effects, which appear
as terms in the transfer function:
• The Sachs-Wolfe (SW) effect is the largest contribution. It combines the temperature
inhomogeneity in the primordial plasma (due to the density perturbations) with the redshift
of the photons (which have to “climb out of their potential well” when they are in an
overdense region). Due to the redshift it turns out that colder CMB spots correspond to
higher density regions.

• The Doppler effect is the change in photon energy due to scattering off moving electrons.

• The Integrated Sachs-Wolfe effect (ISW) describes the additional gravitational redshift
due to the evolution of the metric potentials along the line-of-sight. This effect occurs during
radiation domination (early ISW) and during dark energy domination (late ISW). The late
ISW adds power at very low l < 10.
For a plot of the different contributions see for example Baumann’s book Fig. 7.7. As was the
case for the matter power spectrum, the CMB power spectrum is very sensitive to cosmological
parameters. For a nice illustration see Plate 4 in the review astro-ph/0110414v1.

14 Analyzing the CMB power spectrum


We will now discuss how to analyze the CMB power spectrum from a real experiment. There are
several experimental complications:

81
• The experiment has a finite resolution, which is decribed by the beam.

• There are sources of noise (from the detector and e.g. the atmosphere).

• There is a finite region of the sky that is observed (described by the mask), which breaks
statistical isotropy.

• There are foregrounds such as galactic dust and synchrotron radiation, which obscure the
true CMB signal.

To infer cosmological parameters, we need to compare the theory Cltheo prediction (which
depends on cosmological parameters in a known way determined by the laws of physics) with the
data Clobs . There are in principle two directions how to approach this problem.

• In backward modelling, one first tries to remove the experimental effects from the data,
to arrive at a reconstruction of the signal had there been no experimental effects. This
reconstructed signal is then compared with the theory. The advantage of this approach is
that one can easily compare measurements from different experiments at map level. Most
of our discussion below is of this sort.

• In forward modelling one models the experimental effects on the theory result. We would
model how the theory power spectrum changes due to the experimental effects and compare
this Cltheo,f orward to the Clobs . This approach has the advantage that it is easier to propagate
errors and one can cleanly separate theory from data.

14.1 Beam and Noise


A real CMB experiment that observes the sky has a finite angular resolution and the detector
(and atmosphere) induce noise in the measurement. We will observe a temperature θobs (n̂) at a
direction n in the sky which is given by
Z
θ (n̂) = dΩ′ θ(n̂′ ) B(n̂, n̂′ ) + n(n̂)
obs
(14.1)

where θ(n̂′ ) is the true CMB temperature signal, B(n̂, n̂′ ) is the beam or point-spread function
(PSF) which tells us how the detector reacts to the distribution on the sky, and n(n̂) is noise
which is uncorrelated with the signal. As we can see, the beam is a convolution in real space.
The observed aobs lm are then given by
Z
obs ∗
aℓm = dΩ Yℓm (n̂)Θobs (n̂) . (14.2)

In harmonic space we can express this as


X
aobs
ℓm = Blm,l′ m′ al′ m′ + nlm (14.3)
l ′ m′

It is often a good approximation that the beam is constant on the sky and isotropic. In this case
one gets
aobs
ℓm = Bl alm + nlm (14.4)

82
For a Gaussian beam, the Bl are given by
−l2 2
Θbeam
Bl = exp 2 (14.5)
where Θbeam is related to the width of the beam. For small l the beam is approximately 1
(lΘbeam ≪ 1) while for large l it is approximately zero (i.e. it washes out anoisotropies on these
scales). The noise can often be approximated as Gaussian, in which case it is fully determined
by the 2-point function
⟨nlm n∗l′ m′ ⟩ = Nl δll′ δmm′ (14.6)
where Nl is called the noise power spectrum. There are various forms of noises as we will
discuss. You can often download the Bl and Nl of a CMB experiment such as Planck.
If the noise power spectrum is known (from measurements and modelling of the detector), and
the noise is Gaussian, and the beam and noise are isotropic, one can show (e.g. Dodelson 14.4.1)
that the unbiased power spectrum estimator is
l
!
−2 1 X
obs
2
Ĉ(l) = Bl alm − N (l) (14.7)
2l + 1
m=−l

and the variance of the estimator is


h i 2  2
Var Ĉ(l) = C(l) + N (l)Bl−2 (14.8)
2l + 1
Compare this to our results without beam and noise in Eq. 12.26. The variance now consists of
the cosmic variance (due to finite mode number) and noise variance (due to noise and beam).
We have been using a continuous function θobs (n̂) in this discussion, which assumes that we
have discretized the sky with such a high resolution that the pixelization doesn’t matter (e.g. large
pixel number in HEALPix). If, for computational reasons, we’d have to limit this pixelization we
would also have to include the so-called pixel window function.
The noise in a CMB experiment is generally a combination of three different types:
• White noise, where each pixel has a noise that is drawn from a Gaussian around zero,
independent from all other pixels. In Sec. 9.3.3 we have seen that in Fourier space this
means that Cl = const. For a satellite like Planck this can be a good approximation.

• Atmospheric noise, which grows larger on large angular scales, can be understood in
terms of Kolmogorov turbulence. Atmospheric noise is correlated between pixels (but nearly
uncorrelated in Fourier space, like the CMB).

• 1/f noise in the detector, which is also correlated between pixels (but nearly uncorrelated
in Fourier space). It leads to a “stripy” noise pattern that depends on the scanning strategy
of the experiment. This noise is important on large scales and falls as 1/l. It turns out
that a wide variety of detectors all lead to noise that goes up on large scales on the sky
with roughly a 1/f spectrum. This noise comes from fluctuations in the instrument and
environment over time. One approach to reduce 1/f noise is to scan angles in the sky at a
faster rate than the time dependence of the detector noise.
We’ll illustrate these in a computational notebook from the CMB data analysis summer school
linked above.

83
14.2 Simple power spectrum estimator: Transfer function and bias
The naive power spectrum estimator
1 X ∗
Ĉlnaive = a alm (14.9)
2l + 1 m lm

will, when applied to a masked field, result in a biased measurement of the true theoretical
(unmasked) power spectrum.
As a first step to improve the result, one can apodize the mask (and/or use the related tech-
nique of inpainting), which means that we smooth out the sharp boundaries of the mask. Many
possible apodizations have been proposed with different trade-offs of sensitivity loss, coupling of
adjacent modes, and ringing. The mask smoothly reduces the signal to zero on the boundary.
This also means that by apodization we make our data periodic (since it goes to zero on all
sides). Aperiodic maps generate spourious power in the Fourier transform. However, even after
apodization our power spectrum estimate remains biased.
Let’s first discuss a simple method how to obtain an unbiased measurement from the naive
power spectrum of the apodized data. This approach generalizes to a more optimal method we
will describe later. The naive Ĉlnaive are related to the true Cl by a transfer function M (which
in addition to the mask includes the beam) and a noise bias Nl as follows

⟨Ĉlnaive ⟩ = Ml ⟨Ĉlunbiased ⟩ + Nl (14.10)

The transfer function can be estimated by Monte Carlo as follows:

• First generate a large number of simulations with known power spectrum C unbiased and no
noise.

• For these simulations estimate Ĉlnaive .

• From the pairs of true power spectrum and measured power spectrum calculate the transfer
function Ml .

The noise bias can be computed by running noise only simulations through the naive power spec-
trum estimator and computing the average power spectrum. An example of the whole procedure
is given in the CMB summer school notebook on power spectrum estimation (on the flat sky).
A useful approximation is that the measured power spectrum is related to the true power
spectrum by the sky area fraction fsky covered by the experiment (a number between 0 and 1):

⟨Ĉlnaive ⟩ ≈ fsky ⟨Ĉlunbiased ⟩ (14.11)

This approximation does not take into account mode coupling due to the mask, but it does take
into account the reduced survey area, and is especially useful for Fisher forecasting.

14.3 Mask and mode coupling


Now let’s analyze the problem in more detail. My discussion follows the review 1909.09375.
Our data has a mask W (n) (also called window function or weighting function). In the

84
simplest case, the mask is a discrete function W = 1 for observed pixels, and zero otherwise.
More generally, the mask can be apodized and have smooth values between 0 and 1. The window
function can be expanded in spherical harmonics as
Z

wℓm = dn̂W (n̂)Yℓm (n̂) (14.12)

with power spectrum


1 X
Wℓ = |wℓm |2 (14.13)
2ℓ + 1 m
Given the mask and the true temperature anisotropy Θ(n), the spherical harmonic expansion
of the temperature anisotropy field can be written as

Z

ãℓm = dn̂Θ(n̂)W (n̂)Yℓm (n̂) (14.14a)
X Z

= aℓ′ m′ dn̂Yℓ′ m′ (n̂)W (n̂)Yℓm (n̂) (14.14b)
ℓ ′ m′
X
= aℓ′ m′ Kℓmℓ′ m′ (W ), (14.14c)
ℓ′ m′

where Kℓmℓ′ m′ is the coupling kernel between different modes. The ãℓm are still Gaussian
variables, as they are the sum of Gaussian variables (the “true” aℓm that expand the true Θ).
However, the multipole coefficients of the temperature field on the partial sky are not independent
anymore, as the sky cut introduces the coupling represented by Eq. 14.14c.
By expanding the mask in spherical harmonics, the coupling kernel can be written as follows:

Z
Kℓ1 m1 ℓ2 m2 = dn̂Yℓ1 m1 (n̂)W (n̂)Yℓ∗2 m2 (n̂) (14.15a)
X Z
= wℓ3 m3 dn̂Yℓ1 m1 (n̂)Yℓ3 m3 (n̂)Yℓ∗2 m2 (n̂) (14.15b)
ℓ3 m3
 1/2
X (2ℓ1 + 1)(2ℓ2 + 1)(2ℓ3 + 1)
m2
= wℓ3 m3 (−1) (14.15c)

ℓ3 m3
! !
ℓ1 ℓ2 ℓ3 ℓ1 ℓ2 ℓ3
× ,
0 0 0 m1 −m2 m3
where we used the Gaunt integral
Z
ll′ l′′
gmm′ m′′ = dΩ Ylm (n̂)Yl′ m′ (n̂)Yl′′ m′′ (n̂) (14.16)
r ! !
(2l + 1)(2l′ + 1)(2l′′ + 1) l l′ l′′ l l′ l′′
= . (14.17)
4π 0 0 0 m m′ m′′
which expresses the integral over three spherical harmonics in terms of Wigner 3j symbols. The
coupling kernel is singular and therefore Eq. 14.14c cannot be inverted to compute the true aℓm .
This makes sense as a small part of the observed sky should not allow us to reconstruct the true
entire sky.

85
14.4 Pseudo-Cl estimator and PyMaster
The standard approach for CMB estimation is the Pseudo-Cl approach from astro-ph/0105302.
Pseudo-Cl are near optimal in most cases and fast to evaluate. The Pseudo-CL approach is
also modestly called the “MASTER” estimator (Monte Carlo Apodised Spherical Transform
EstimatoR).
The cut-sky coefficients can be used to define the pseudo-Cℓ power spectrum

1 X
C̃ℓ = ãℓm ã∗ℓm . (14.18)
2ℓ + 1
m=−ℓ

From Eq. 14.22, a relation between the true power spectrum and the pseudo-power spectrum can
be derived taking the ensamble average (in the same way as in section 14.2 but now taking into
account mode coupling):
ℓ1
1 X
⟨C̃ℓ1 ⟩ = ⟨ãℓ1 m1 ã∗ℓ1 m1 ⟩ (14.19a)
2ℓ1 + 1
m1 =−ℓ1
ℓ1
1 X X X
= ⟨aℓ2 m2 a∗ℓ3 m3 ⟩Kℓ1 m1 ℓ2 m2 [W ]Kℓ∗1 m1 ℓ3 m3 [W ] (14.19b)
2ℓ1 + 1
m1 =−ℓ1 ℓ2 m2 ℓ3 m3
ℓ1 ℓ2
1 X X X
= ⟨Cℓ2 ⟩ |Kℓ1 m1 ℓ2 m2 [W ]|2 (14.19c)
2ℓ1 + 1
m1 =−ℓ1 ℓ2 m2 =−ℓ2
X
= Mℓ1 ℓ2 ⟨Cℓ2 ⟩ . (14.19d)
ℓ2

The last line in Eq. 14.19a can be obtained by expanding the kernel couplings in spherical
harmonics and making use of the orthogonality relations of the Wigner-3j symbols. The coupling
matrix Mℓ1 ℓ2 is thus given by:
!2
2ℓ2 + 1 X ℓ1 ℓ2 ℓ3
Mℓ1 ℓ2 = (2ℓ3 + 1)Wℓ3 . (14.20)
4π 0 0 0
ℓ3

which can be evaluated numerically without needing to run simulations. The unbiased power
spectrum estimator is then X
−1
Ĉℓ = Mℓℓ ′ C̃ℓ′ . (14.21)
ℓ′

If we observe a sufficiently large part of the sky, the coupling matrix Mℓℓ′ is invertible. When
we see only a smaller part of the sky, the matrix can become singular: some modes end up being
in the masked region of the sky. In such a case, it makes sense to bin the ℓ into larger bins, until
the matrix becomes invertible again.
The state-of-the art public implementation of the MASTER approach is called PyMaster (or
NaMaster when not in Python). It is documented here: 1809.09603. PyMaster can do pseudo-Cl
on fullsky and flatsky, and also includes polarization (and foreground mode deprojection which
we have not yet discussed). The pseudo-Cl formalism also extends in a straight forward way to

86
the cross-correlation of two fields (as long as we use the same mask for them). The pseudo-Cl
are then

1 X
C̃ℓab = ãℓm b̃∗ℓm . (14.22)
2ℓ + 1
m=−ℓ

for two fields a and b (such as CMB temperature and E-mode polarization).

14.5 Wiener filtering


The pseudo-CL estimator is fast but it is still not fully optimal. A completely optimal power
spectrum estimator can be developed by using inverse covariance filtering which is also called
Wiener filtering or C −1 filtering. Wiener filtering is required for any optimal analysis of
survey data (including e.g. non-Gaussianity search), but it is computationally very expensive and
thus not always performed. Wiener filtering also replaces (and improves over) mask apodization.
Assuming the data vector d is the linear sum of signal s and noise n with independent covari-
ance matrices S and N , the Wiener filtered data dW F is defined by

dW F = S(S + N )−1 d. (14.23)

For a data set with N pixels, the direct inversion of a dense N ×N covariance matrix is impossible
for current CMB maps with millions of pixels. Conjugate gradient solvers are usually employed
to perform Wiener filtering of CMB data. But the computational costs are enormous for Planck
resolution and Wiener filtering a large ensemble of maps remains very difficult even with large
computing resources. An example of a small Wiener filtered CMB map is shown in Fig. 16.
Often it can be assumed that the noise covariance N is diagonal in pixel space, i.e. the noise is
assumed uncorrelated between pixels. We can represent the mask as a limiting case of anisotropic
noise, by taking the noise level to be infinity in masked pixels. (In a code implementation, we
set the corresponding entries of N −1 to zero). On the other hand, the signal covariance matrix is
diagonal in momentum space. Because the two matrices are not diagonal in any common basis
(except in the special case of a fullsky observation without mask, where the noise is also diagonal
in momentum spacce), the matrix inversion is computationally hard.
The Wiener filter is the optimal reconstruction of the signal given the noise, for a
Gaussian field with known power spectrum. That means it is the maximum a posteriori solution
of the posterior
1 1
− log P (s|d) = (s − d)T N −1 (s − d) + sT S −1 s + const. (14.24)
2 2
The Wiener filter also minimizes the mean squared error (MSE) between the true signal and the
reconstructed signal. For more discussion of this see 1905.05846.
Based on the Wiener filtererd data, one can then construct the Quadratic Maximum Like-
lihood (QML) which is the provably optimal power spectrum estimator. We refer to appendix
B of 1909.09375 for a discussion of this estimator. It involves Wiener filtering the data and then
estimating the mode coupling matrix from simulations.

87
Figure 16. Wiener filtering example (plot from 1905.05846). Left: true signal, Middle: Observed noisy
and masked data, Right: Wiener filtered data, which is a reconstruction of the underlying true map.

14.6 Likelihood of the CMB


Once we have a power spectrum estimator (from pseudo-CL or QML or a different estimator), as
well as its covariance, we need a likelihood that we can MCMC sample from, to obtain cosmolog-
ical parameters. As in our analysis of a dark matter simulation in Sec. 11, a good approximation
is a Gaussian with a fixed covariance matrix. This choice was made for example by Planck in their
“high-ℓ likelihood” used for ℓ > 30. Planck also had a completely different “low-ℓ likelihood” for
ℓ < 30 (1907.12875). The reason for that is that the power spectrum estimator is based on the
square of Gaussian variables (the alm ), which is not Gaussian distributed. For large enough ℓ we
can average over enough modes to make the distribution of the Ĉℓ Gaussian by the Central Limit
Theorem but for small ℓ that does not happen. On the other hand, for small ℓ there are less
modes involved so we can allow for a computationally more expensive approach. In particular,
it is possible to make a likelihood at map/pixel level, rather than at power spectrum level. For
details of possible small-scale and large-scale likelihoods, we refer to the review 1909.09375.

14.7 Tools to sample the CMB likelihood


Rather than discussing mathematical details of likelihoods I want to introduce a state-of-the-art
tool for working with CMB likelihoods (and other cosmological probes): Cobaya (code for
Bayesian analysis in Cosmology) https://cobaya.readthedocs.io/. Cobaya for example
will be used to analyze data from Simons Observatory. It builds a common framework that
includes:

• Theory codes (CLASS and CAMB)

• Built-in likelihoods of cosmological experiments (Planck, Bicep-Keck, SDSS etc). Collabo-


rations can release their likelihood as Cobaya modules.

• Various MCMC samplers.

• Tools to analyze the MCMC samples and make common plots.

88
A similar project, more widely used in the large-scale structure community is CosmoSIS (COS-
MOlogical Survey Inference System) https://cosmosis.readthedocs.io/.
If you want to combine say the the Planck and DES likelihood to sample cosmological
parameters, perhaps with some extension of LambdaCDM, a practical approach is to set up
this analysis in Cobaya. You should not try to analyze e.g. the Planck data directly from
map level for a power spectrum analysis, since you’d have to redo all the hard work of the
Planck collaboration to make a reliable likelihood with correct covariance. Collaborations also
release their likelihood directly, without going through Cobaya. Sometimes these can be di-
rectly imported as Python modules. See for example the ACT CMB likelihood here: https:
//github.com/ACTCollaboration/pyactlike. We will look at an example script that uses
Cobaya.

15 Polarization and primordial B-modes


We now give a brief overview of CMB polarization, which is usually expressed in terms of E-
modes and B-modes, so that a complete analysis of CMB perturbations includes T, E, B where
T is the temperature. CMB polarization from E-modes roughly doubles the information on
primordial physics compared to T alone. B-mode polarization, if of primordial origin, would
detect primordial gravitational waves. Details about CMB polarization, B-modes and primordial
gravitational waves can be found in the review 1510.06042 and additional visual explanations of
polarization are presented in astro-ph/9706147. For details on the Stoke parameters see here:
astro-ph/0409734v2. The books by Dodelson and Baumann also have detailed sections on polar-
ization with useful illustrations.
Polarization is generated by the scattering between photons and free electrons. A quadrupolar
anisotropy (in the rest frame of the electron) of incoming unpolarized light, which Thompson
scatters on the electron, leads to outgoing polarized light as shown in Fig. 17. Towards the
end of inflation, when photons decouple from the electrons and protons, density perturbations in
the primordial plasma lead to such a quadrupolar anisotropy. Therefore, there should be some
correlation between temperature and polarization anisotropies.
The mathematical characterization of CMB polarization anisotropies is complicated by the fact
that polarization is not a scalar field. To define the polarization we need the Stokes parameters.
Recall that a monochromatic plane electromagnetic wave can be represented as
E(t) = Ex cos(ωt)x̂ + Ey cos(ωt − φ)ŷ, (15.1)
where we put the phase by convention in the second term. Depending on φ the wave can be
linearly, elliptically or circular polarized. For any electronagnetic wave (not just a monochromatic
plane wave) the Stokes parameters are defined by the expectation values (time averages) of
the transverse components as
I = |Ex |2 + |Ey |2 ,
Q = |Ex |2 − |Ey |2 ,
U = 2|Ex ||Ey | cos φ,
V = 2|Ex ||Ey | sin φ

89
outgoing
polarized wave

electron

incoming quadrupolar
anoisotropy in x-y plane

Figure 17. Generation of CMB polarization by scattering of a quadrupole anisotropy. Bold blue lines
are hotter, thin red lines are colder. Figure adapted from Dodelson-Schmidt.

I is the intensity of the light, which is proportional to the temperature. The CMB is a sum
of unpolarized light for which (U = U = V = 0) and linearly polarized light (φ = 0 and
Q ̸= 0, U ̸= 0, V = 0). Therefore a CMB experiment measures an intensity map I and a Q and U
map. Only exotic theories of the early universe can produce circular polarization (V ̸= 0). The
polarization fraction of the CMB is about 10%.
While T is a scalar and does not change under rotation, the quantities Q and U transform
under rotation by an angle ψ as a spin-2 field (Q ± iU )(n̂) → e∓2iψ (Q ± iU )(n̂). This is because
they are “headless vectors” so that a 180◦ rotation brings them back to themselves. The harmonic
analysis of Q ± iU therefore requires expansion on the sphere in terms of tensor (spin-2) spherical
harmonics X
(Q ± iU )(n̂) = a±2,ℓm ±2 Yℓm (n̂) . (15.2)
ℓ,m

While Q and U maps come naturally out of experiments, for theoretical analysis it is more
convenient to work with scalar quantities. These can be obtained as follows:
1
aE,ℓm ≡ − (a2,ℓm + a−2,ℓm ) (15.3)
2
1
aB,ℓm ≡ − (a2,ℓm − a−2,ℓm ) (15.4)
2i
which are the multipole coefficients of the scalar E-mode and B-mode fields:
X
E(n̂) = aE,ℓm Yℓm (n̂) (15.5)
ℓ,m
X
B(n̂) = aB,ℓm Yℓm (n̂) . (15.6)
ℓ,m

Pure E-mode fields are curl-free and pure B-mode fields are divergence free, in close analogy with
electrodynamics.

90
The angular power spectra are defined as before
1 X ∗
CℓXY ≡ ⟨a aY,ℓm ⟩ , X, Y = T, E, B . (15.7)
2ℓ + 1 m X,ℓm

The auto power spectra are T T , EE and BB. Some of the cross power spectra are zero. Although
E and B are both invariant under rotations, they behave differently under parity transformations.
E-modes are parity even (like temperature) an B-modes are parity odd. For this reason, in a
parity invariant early universe, the cross power spectrum T E is non-zero while T B or EB are
zero. Note however that secondary (non-primordial) anisotropies and foregrounds can generate
non-zero T B and EB-correlations.
A crucial physical insight found in the late nineties (astro-ph/9609169) is that scalar (den-
sity) perturbations create only E-modes and no B-modes. On the other hand, tensor
(gravitational wave) perturbations create both E-modes and B-modes. For this reason, current
and upcoming experiments try to detect primordial B-modes to detect gravitational waves. Note
however that foregrounds and gravitational lensing do generate B-modes, and these have to be
cleaned out in order to not confuse them with a primordial signal.
Once we have calculated the E-mode and B-mode power spectra, which are scalars, cosmolog-
ical analysis works in much the same way as for temperature T. For example, CAMB and CLASS
can calculate polarization transfer functions ∆Eℓ (k) and ∆Bℓ (k) so that the power spectrum of
EE is
2
Z
EE
Cℓ = k 2 dkPR (k)∆2Eℓ (k) (15.8)
π
and similar for T E and BB.

16 Primordial non-Gaussianity
The cosmic microwave background is an the ideal probe of primordial non-Gaussianity, i.e. of
interactions (and thus correlations) between the primordial modes. This is because of the linearity
of the CMB. In the future, it may be possible to beat the CMB constraints with large-scale
structure, but this is probably at least a decade away (with the exception of so called “local non-
Gaussianity”). Good reviews on primordial non-Gaussianity are 1001.4707 (which this section
is based on) and 1003.6097. The formalism we are discussing here also generalizes to other
bispectra (i.e. 3 point correlators), including non-primordial ones and bispectra of galaxy
surveys.

16.1 Primordial bispectra


Recall that for the primordial potential we found (from statistical homogeneity and isotropy):

⟨Φ(k)Φ∗ (k′ )⟩ = (2π)3 dD (k − k′ )PΦ (k) , (16.1)

The equivalent statement for the 3-point correlator is

⟨Φ(k1 )Φ(k2 )Φ(k3 )⟩ = (2π)3 dD (k123 ) BΦ (k1 , k2 , k3 ) . (16.2)

91
Here, the delta function enforces the triangle condition, that is, the constraint that the wavevec-
tors in Fourier space must close to form a triangle, k1 + k2 + k3 = 0.
A well studied model is the the local model in which contributions from ‘squeezed’ triangles
are dominant, that is, with e.g. k3 ≪ k1 , k2 . In this model, non-Gaussianity is created as follows:

Φ(x) = ΦL (x) + ΦNL (x) (16.3)


=ΦL (x) + fNL Φ2L (x) − ⟨Φ2L (x)⟩
 
(16.4)

where fNL is called the nonlinearity parameter. The bound on fNL from Planck is about fNL < 5.
For this model one can show that

BΦ (k1 , k2 , k3 ) = 2fNL [PΦ (k1 )PΦ (k2 ) + PΦ (k2 )PΦ (k3 ) + PΦ (k3 )PΦ (k1 )] (16.5)

The bispectrum is often written in terms of the dimensionless shape function

S(k1 , k2 , k3 ) ≡ (k1 k2 k3 )2 BΦ (k1 , k2 , k3 ) , (16.6)

A different primordial bispectrum that is often considered is the equilateral model with shape
function
(k1 + k2 − k3 )(k2 + k3 − k1 )(k3 + k1 − k2 )
S equil (k1 , k2 , k3 ) = . (16.7)
k1 k2 k3
Unlike the local model, this one peaks for equilateral triangles, so the local and equilateral models
probe different kinds of correlations.

16.2 CMB bispectrum


The CMB bispectrum is the three point correlator of the aℓm . Using Eq. 13.4 we obtain
ℓ1 ℓ2 ℓ3
Bm 1 m2 m3
= ⟨aℓ1 m1 aℓ2 m2 aℓ3 m3 ⟩ (16.8)
Z 3
3 l1 +l2 +l3 d k1 d3 k2 d3 k3
= (4π) (−i) ∆l (k1 )∆l2 (k2 )∆l3 (k3 )× (16.9)
(2π)3 (2π)3 (2π)3 1
⟨Φ(k1 )Φ(k2 )Φ(k3 )⟩ Yℓ1 m1 (k̂1 ) Yℓ2 m2 (k̂2 ) Yℓ3 m3 (k̂3 ) (16.10)
 3 Z
2
Z
= x2 dx dk1 dk2 dk3 (k1 k2 k3 )2 BΦ (k1 , k2 , k3 ) ∆ℓ1 (k1 )∆ℓ2 (k2 )∆ℓ3 (k3 ) (16.11)
π
Z
× jℓ1 (k1 x)jℓ2 (k2 x)jℓ3 (k3 x) dΩx̂ Yℓ1 m1 (x̂)Yℓ2 m2 (x̂)Yℓ3 m3 (x̂)
(16.12)

where we have inserted the exponential integral form for the delta function in the bispectrum
definition. The last integral over the angular part of x is the Gaunt integral, while x is the radial
conformal distance. The full bispectrum Bm ℓ1 ℓ2 ℓ3 can be expressed in terms of the reduced
1 m2 m3
bispectrum bℓ1 ℓ2 ℓ3 as
ℓ1 ℓ2 ℓ3 ℓ1 ℓ2 ℓ3
Bm 1 m2 m3
= Gm b
1 m2 m3 ℓ1 ℓ2 ℓ3
. (16.13)

92
The reduced bispectrum is given by
 3 Z
2
Z
bℓ1 ℓ2 ℓ3 = x dx dk1 dk2 dk3 (k1 k2 k3 )2 BΦ (k1 , k2 , k3 )
2
(16.14)
π
× ∆ℓ1 (k1 ) ∆ℓ2 (k2 ) ∆ℓ3 (k3 ) jℓ1 (k1 x) jℓ2 (k2 x) jℓ3 (k3 x) . (16.15)

which relates the primordial bispectrum, predicted by inflationary theories, to the reduced bis-
pectrum observed in the cosmic microwave sky. This formula is the equivalent of the power
spectrum relation
2
Z
Cℓ = dkk 2 PΦ (k)∆2ℓ (k). (16.16)
π
16.3 Optimal estimator for bispectra
For a fullsky observation it can be shown that the optimal estimator for fNL is
m m m fNL =1
1 X Gℓ1 ℓ12 ℓ32 3 bℓ1 ℓ2 ℓ3
fˆNL = aℓ1 m1 aℓ2 m2 aℓ3 m3 (16.17)
N Cℓ1 Cℓ2 Cℓ3
{ℓi ,mi }
 2
m m m f =1
X Gℓ1 ℓ12 ℓ32 3 bℓ1NLℓ2 ℓ3
N = , (16.18)
Cℓ1 Cℓ2 Cℓ3
{ℓi ,mi }

where bℓ1 ℓ2 ℓ3 is the reduced bispectrum and Gℓm1 ℓ12mℓ32 m3 is the Gaunt integral and N is the normal-
ization factor. This estimator can be interpreted as summing up all mode triplets weighted by
their expected signal-to-noise. See 1001.4707 and 1003.6097 for two different derivations of this
result. This kind of estimator is called a cubic estimator, because it uses three copies of the
alm (the power spectrum estimator on the other hand is a quadratic estimator).
The noise and beam of the experiment can be included with the following replacements:

Cℓ → Cℓ Bℓ2 + Nℓ , bℓ1 ℓ2 ℓ3 → bℓ1 ℓ2 ℓ3 Bℓ1 Bℓ2 Bℓ3 ; (16.19)

B is the beam and Nℓ is the noise power spectrum (constant for uncorrelated white noise). The
noise is assumed to be Gaussian (which is a very good approximation because the bispectrum, if
non-zero, is much smaller than the power spectrum). Including the effect of the mask is a little
harder and we won’t review it here. It involves adding a linear term to the cubic estimator
above. Details can be found in the same reviews.
The way how primordial bispectrum analysis is performed is that theorists have come up
with a large collection of theoretically motivated bispectrum templates bℓ1 ℓ2 ℓ3 (such as local
and equilateral), and we have run the bispectrum estimator on all of these templates (Planck
1905.05697). While no statistically significant detection has been made, many models (or at least
part of their parameter space) have been excluded in this way. Instead of running bispectrum
estimators, one can also measure the bispectrum in bins (as we do in the power spectrum), but
all measurements are consistent with zero.

93
16.4 The separability trick
I want to mention one more important aspect of bispectrum estimation, which also often occurs
in cosmology. As it is written above, the bispectrum estimator is computationally intractable, as
it is a sum over six variables l, m all of which go to about 2500 for Planck resolution. Fortunately
the estimator can be rewritten in a much better form. If the primordial shape function is
separable, i.e. it can be written in the form

S(k1 , k2 , k3 ) = X(k1 ) Y (k2 ) Z(k3 ) + 5 perms. , (16.20)

then the reduced bispectrum can be written as


Z
bℓ1 ℓ2 ℓ3 = dxx2 Xℓ1 (x) Yℓ2 (x) Zℓ3 (x) + 5 perms , (16.21)

where we have defined the quantities:


Z
Xℓ (x) ≡ dkk 2 X(k) jℓ (kx) ∆ℓ (k) ,
Z
Yℓ (x) ≡ dkk 2 Y (k) jℓ (kx) ∆ℓ (k) , (16.22)
Z
Zℓ (x) ≡ dkk 2 Z(k) jℓ (kx) ∆ℓ (k) .

In that case, using the definition of the Gaunt integral, the estimator can be rewritten as
1
Z Z
2
E(a) = dx x dΩn̂ MX (r, n̂)MY (x, n̂)MZ (x, n̂) + perms. , (16.23)
N
where
X aℓm Xℓ (x)
MX (x, n̂) ≡ Yℓm (n̂) ,
Cℓ
ℓm
X aℓm Yℓ (x)
MY (x, n̂) ≡ Yℓm (n̂) ,
Cℓ
ℓm
X aℓm Zℓ (x)
MZ (x, n̂) ≡ Yℓm (n̂) , (16.24)
Cℓ
ℓm

By a detailed examination of the operations, one finds that this reduces the computational cost
from O (ℓ5max ) to O (ℓ3max ) operations, which is can be easily calculated in practice. This re-
writing of the estimator is sometimes called a fast position space estimator (since we work
with the maps M in position space rather than Fourier space). Not all theoretical shapes are
separable. However it is often possible to expand unseparable shapes into separable shapes (see
0912.5516).

17 Secondary anisotropies: CMB lensing


Lensing is the leading secondary effect on the CMB anisotropies below ℓ ≃ 4000 (Fig. 18). It

94
• smooths acoustic peaks

• transfers power to small scales

• introduces non-Gaussianity

• makes B-mode polarization by lensing E-modes. Thus de-lensing is important for B-mode
searches.

The lensing effect can be used to reconstruct the lensing potential, a map of the integrated
mass density of the universe on large scales. The lensing potential can be used as a probe of
cosmological parameters including neutrino masses and dark energy. An important feature of
lensing is that it probes the entire mass density (since any mass and energy gravitate in
General Relativity), while e.g. a galaxy survey only probes luminous matter. This is why lensing
is critical to probe dark matter.
My brief discussion of CMB lensing is based on the review astro-ph/0601594v4. Another good
review is 0911.0612. We will only discuss temperature, but polarization is lensed in the same
way.

17.1 CMB lensing potential


Weak lensing remaps the unlensed CMB map as follows. The lensed CMB temperature in a
direction n̂, T̃ (n̂), is given by the unlensed temperature in the deflected direction

T̃ (n̂) = T (n̂′ ) = T (n̂ + α) (17.1)

where α is a deflection angle. This result follows from General Relativity. At lowest order the
deflection angle is a pure gradient, α = ∇ψ. The lensing potential is defined by
Z χ∗
(χ∗ − χ)
ψ(n̂) ≡ −2 dχ Ψ(χn̂; η0 − χ), (17.2)
0 χ∗ χ

where χ is conformal radial distance along the line of sight, χ∗ is the conformal distance to re-
combination, and η0 is the conformal time today, and Ψ is the Newtonian gravitational potential.
From this one can calculate the power spectrum of the lensing potential ClΨ . It is defined by

⟨ψlm ψl∗′ m′ ⟩ = δll′ δmm′ Clψ . (17.3)

where
χ∗ χ∗
χ∗ − χ′ χ∗ − χ
  
dk
Z Z Z
Clψ = 16π dχ dχ′ PΨ (k; η0 − χ, η0 − χ′ )jl (kχ)jl (kχ′ ) .
k 0 0 χ∗ χ′ χ∗ χ
(17.4)
which in the linear regime can be calculated by a Botzmann solver like CAMB, depending on
cosmological parameters. PΨ (k; η0 − χ, η0 − χ′ ) is the power spectrum between unequal times.
It is also often useful to work with the CMB convergence given by
1
κ(n) = − ∇2 Ψ(n) (17.5)
2

95
and thus
l2
κ(l) = − Ψ(l) (17.6)
2
The convergence probes the integrated matter density between us and the CMB (since from the
Poisson equation ∇2 Ψ(n) ∝ ρ with density ρ). A visual example of the quantities involved in
CMB lensing is uploaded to the lecture files.

17.2 Lensed CMB map


We will now use flatsky coordinates, which are often used for CMB lensing, and simplify the
expressions. Of course, everything can also be expressed in spherical harmonics. Our flat-sky 2D
Fourier transform convention for the temperature field here is:
Z 2 Z 2
d l d x
Θ(x) = il·x
Θ(l)e , Θ(l) = Θ(x)e−il·x . (17.7)
2π 2π
The power spectrum for our statistically isotropic temperature field is diagonal in l, and is given
by
⟨Θ(l)Θ∗ (l′ )⟩ = ClΘ δ(l − l′ ). (17.8)
For weak lensing, to good approximation, the lensing effect can be described by Taylor expan-
sion. To first order we have

Θ̃(x) = Θ(x′ ) = Θ(x + ∇ψ)


≈ Θ(x) + ∇a ψ(x)∇a Θ(x) (17.9)

Going to Fourier space, one can show that to first order the lensed CMB field is given by
Z 2′
d l ′
Θ̃(l) ≈ Θ(l) − l · (l − l′ )ψ(l − l′ )Θ(l′ ) (17.10)

This shows that there will now be some mode coupling between modes Θ(l) and Θ(l′ ), assum-
ing a fixed lensing potential Ψ. That means that there will be off-diagonal components in the
covariance matrix of the observed CMB.
D We couldE now also derive the powers spectrum of the
lensed CMB field C̃lΘ by calculating Θ̃(l)Θ̃∗ (l) .

17.3 Quadratic estimator for lensing


We now outline how the CMB lensing potential can be measured. The standard approach to
do this is the quadratic estimator formalism. The quadratic estimator can be used in may
situation in cosmology, where a large scale “background” field (here the lensing potential) mod-
ulates the statistics of small scale observables (here the CMB temperature perturbations). As
we have seen, for a fixed lensing potential, the distribution of the observed temperature will not
be isotropic. This suggests that we may be able to use the quadratic off-diagonal terms of the
ψ-fixed correlation ⟨Θ̃(l)Θ̃(l′ )⟩Θ to constrain the lensing potential in our sky realization. The
subscript in the expectation value means that Θ here is a random variable while Ψ is a fixed
realization.

96
Averaging over realizations of the unlensed temperature field Θ to first oder in the lensing
potential gives
Z 2′
d l ′
⟨Θ̃(l)Θ̃∗ (l − L)⟩Θ = δ(L) ClΘ − l · (l − l′ )ψ(l − l′ )⟨Θ(l′ )Θ∗ (l − L)⟩

+ l′ · (l − L − l′ )ψ ∗ (l − L − l′ )⟨Θ(l)Θ∗ (l′ )⟩ (17.11)




1 h i
= δ(L) ClΘ + Θ
(L − l) · L C|l−L| + l · L ClΘ ψ(L) (17.12)

To estimate the lensing potential we thus want to sum over all quadratic combinations
Θ̃(l)Θ̃∗ (l − L) with some weighting factor g that needs to be determined:

d2 l
Z
ψ̂(L) ≡ N (L) Θ̃(l)Θ̃∗ (l − L)g(l, L), (17.13)

where g(l, L) is the weighting function and N (L) is a normalization. This strategy is originally
from astro-ph/0301031 and has been re-used for many different applications.
To find the weighting function and normalization, we impose two conditions:

• The estimator should be unbiased, i.e ⟨ψ̂(L)⟩Θ = ψ(L)

• The variance (error) of our estimator should be as small as possible.

It can be shown that this gives the weights


Θ
(L − l) · L C|l−L| + l · L ClΘ
g(l, L) = . (17.14)
2C̃ltot C̃|l−L|
tot

and the normalization is


d2 l h
Z i
N (L)−1 = (L − l) · L C Θ
|l−L| + l · L C Θ
l g(l, L). (17.15)
(2π)2

As was the case for the bispectrum, this estimator can be re-written as a fast position space
estimator.
This estimator (with minor modifications to take into account the mask and noise), is for
example used in the recent ACT CMB lensing analysis (2004.01139), which is also in flatsky
coordinates. The sperhical hamonics version of this estimator was used in the Planck analysis
(1807.06210). However, because lensing is a non-linear operation, the quadratic estimator is
not optimal in general. For existing experiments (Planck, ACT), it is optimal, but for Simons
Observatory it will already be slightly sub-optimal and for future very high resolution experiments
it can be very suboptimal. A completely optimal lensing analysis can be made with a field-
level likelihood, but is computationally extremely expensive. References on this topic include
1708.06753,1704.08230,astro-ph/0209489.

97
17.4 Physics with CMB lensing
Once the lensing potential is reconstructed, one can estimate its power spectrum in the usual
way and use the lensing power spectrum in a cosmological analysis to constrain parameters.
Lensing is in particular a great probe of the size of matter perturbations at later times. By
comparing the amplitude of primordial (primary) CMB perturbations with the amplitude of
late time perturbations from lensing, one can study the growth of structure in the universe.
This for example can be used to contstrain neutrino masses, as the free streaming of neutrinos
suppresses growth. The measured growth of structure at late times is currently an exciting topic in
cosmology, with evidence for a disagreement with Lambda-CDM (see eg. 2203.06142,2304.05203)
called the S8 tension or σ8 tension.
Another important application of the lensing potential is to cross-correlate it with a dif-
ferent tracer of matter, such as a galaxy survey. Such cross-power spectra can also be very
sensitive to various cosmological parameters, for example local primordial non-Gaussianity (e.g.
1710.09465).

18 Secondary anisotropies: Sunyaev-Zeldovich effect


Apart from being gravitationally lensed, the second thing that happens to photons on the way
from the CMB to us is being re-scattered by charges (mostly free electrons) in inverse Compton
scattering. This re-scattering is called the Sunyaev-Zeldovich effect. This effect comes in
several variants:

• The thermal Sunyaev-Zeldovich (tSZ) effect is the scattering of photons on hot elec-
trons, i.e. on their thermal velocities.

• The kinetic Sunyaev-Zeldovich (kSZ) effect is the scattering of photons on eletrons


due to the electron’s bulk movement (i.e. all the electrons in a galaxy move on average
with the velocity of the galaxy).

These effects are by far the most important SZ effects. There are however smaller effects including
the polarized SZ effect and the rotational SZ effect. The total probability of a CMB photon
to be re-scattered between recombination and Earth is about 5%. This probability is related to
the optical depth and the visibility function. To my knowledge there is no comprehensive
review on SZ anisotropies.
In passing I want to mention that apart from lensing and electron scattering there is a class
of secondary anisotropies which come from the evolution of gravitational potentials over time.
These cause the (late time) ISW effect, the Rees-Sciama effect (also called non-linear
ISW effect) and the moving lens effect. Finally of course all these secondary effects are
combined, for example SZ anisotropies are lensed, and there can be multiple scatterings etc.
These higher order effects are not yet detectable.

18.1 Thermal SZ effect


TSZ is generated by any hot ionized gas. In CMB maps, the tSZ is visible in particular from
galaxy clusters, which are the largest massive structures in the Universe, formed by gravitational

98
collapse. Their comoving size is a few Mpc and their angular sizes range from about one arcminute
to about one degree (depending on size and distance). Clusters can be detected in various ways,
e.g. by galaxy surveys in the optical, by tSZ emission, or by X-ray astronomy (Bremsstrahlung
emission of the electrons on the nuclei). The temperature, measured from X-ray, is typically a
few keV.
The thermal SZ effect generated by a gas of electrons at temperature Te leads to a spectral
distortion of the CMB emission law. The difference between the distorted CMB photon
distribution Iν and the original CMB blackbody spectrum Bν (TCMB )

∆Iν = Iν − Bν (TCMB ) (18.1)

can be calculated to be:


xex x(ex + 1)
 
∆Iν = y x − 4 Bν (TCMB ) (18.2)
(e − 1) (ex − 1)

where and x = hν/kTCMB . The dimensionless parameter y, called Compton-y parameter, is


proportional to the integral of the electron pressure along the line of sight:
kTe
Z
y= n σ
2 e thomson
dl
los mec

where Te is the electron temperature, me the electron mass, c the speed of light, ne the electron
density, and σthomson the Thomson cross section. A multi-frequency CMB detector can measure
the Compton-y map, which is caused by the tSZ effect.

18.2 Matched filter and tSZ stacking


Roughly 80% of the baryons in a cluster are not contained within galaxies, but rather exist as
a cloud of gas bound within the gravitational potential well created by a dark matter halo that
caries the vast majority of the mass of the cluster. Within this well, the dilute gas becomes
ionized and heated to temperatures of millions of Kelvin. Detailed calculations show that the
tSZ effect leads to decrement of power at frequencies below the 220 GHz and extra power at
higher frequencies. This result is redshift independent. A nice illustration of the tSZ sources can
be found in the CMB S4 summer school notebooks.
Finding tSZ sources is a typical application of another generally important method in cosmol-
ogy, the matched filter method. The matched filter is the optimal way to detect a localized
object (e.g. a theoretical template of the cluster profile) that is (linearly) added to noisy data.
It is given by a convolution between the signal profile and the CMB map. In harmonic space, for
a spherically symmetric profile, it can be written as
X Θℓm Sℓ
ψ(n̂) = Yℓm (n̂). (18.3)
Cℓ + N ℓ
ℓm

where Sℓ is the spherical harmonics transform of the radial profile of the signal S(r) (e.g. the tSZ
profile), Θℓm is the CMB map, and Cℓ +Nℓ is the CMB power spectrum plus the instrumental noise
power spectrum. The output of the matched filter is a “heat map” of detection probabilities,

99
which has its maxima where a tSZ source exists. A matched filter usually comes with some
parameter to scan over, e.g. the radius of the profile.
Some details on the matched filter method can be found e.g here 2106.03718. An application
of the matched filter for a completely different problem (finding primordial particle production),
and a discussion of why it is optimal, can be found e.g. here 1910.00596 (Sec. 3B).
For tSZ sources, one often wants to understand signals at the low mass and therefore low signal
to noise end, where the matched filter may not be able to pick up the signal. With an external
catalogue of galaxy clusters, one can co-add the signals from objects in the external catalogue
to boost the signal to noise. This is called tSZ stacking. From the stack, one can then infer
parameters of cluster physics, such as the radial profile of the gas temperature. Stacking local
sources with an external catalogue to enhance SNR is also a generally important technique. For
example, it was recently used to detect a 21cm intensity signal with CHIME (2202.01242).

18.3 Kinetic SZ effect


The kSZ effect is not temperature dependent, and leads to a blackbody contribution to the CMB
in the same way as lensing. At high ℓ ≳ 4000 the kSZ is the dominant contribution to CMB
temperature, as shown in Fig. 18. The kSZ can be used both to probe the gas distribution of
clusters and galaxies, as well as for cosmology. The kSZ temperature is given by the line-of-sight
integral
Z
ΘkSZ (n̂) ∝ σT dr ne (r, n̂)vr (r, n̂) (18.4)

where ne is the electron density and vr is the radial velocity of the structure that contains the
electron (not the velocity caused by temperature). A nice application, which I developed with
my collaborators, is to use this signal to reconstruct the velocity field, by making a template
for ne using a survey of the galaxy density δg . One can then write a quadratic estimator (as in
the case of lensing, but here I chose to work in spherical harmonics) for the velocities which is
schematically
X
v̂r (L, M ) = N g(L, M, ℓ, m, ℓ′ , m′ )Θℓ,m δg (ℓ′ , m′ , z) (18.5)
ℓ,m,ℓ′ ,m′

where again we can find the weights g(L, M, ℓ, m, ℓ′ , m′ ) that deliver an unbiased minimum vari-
ance estimator. Here z is the redshift of the galaxy bin. The reconstructed velocity map has
similar cosmological applications as the lensing potential map. This method will be promising
for Simons Observatory. More details can be found in (1707.08129, 1810.13423). The quadratic
estimator is not the only way to do cosmology with the kSZ, a review of methods can be found
in 1810.13423.

19 Foregrounds and foreground cleaning


Reviews of CMB foregrounds, focused on the physics rather than algorithms, are 1606.03606
(on which my foreground discussion is based on) and “CMB foreground: A concise review (by
Kiyotomo Ichiki)”.

100
104
total CMB
103 lensed CMB

( + 1)/(2 ) C TT [ K 2]
unlensed CMB
102 late-time kSZ
reion. kSZ
101
100
10 1

10 2
0 2000 4000 6000 8000 10000

Figure 18. The CMB power spectrum ClT T from primary CMB, gravitational lensing, late-time kSZ
(z < 6) and reionization kSZ. We have only shown contributions with blackbody frequency dependence.
Non-blackbody contributions (CIB, tSZ) can be mostly removed using multifrequency analysis. Note that
the kSZ from both late times and reionization is not known very precisely, the curves come from different
theoretical models or simulations. Plot from 1810.13423.

A good review of foreground cleaning and component separation methods is astro-ph/0702198v2.


The Planck papers 1303.5072 (appendices), 1502.01588 and 1807.06208 also review these methods
and show nice illustration of the raw data and the component separated maps. I can only give
you a glimpse of these methods here.

19.1 Galactic foregrounds of the CMB


Before discussing algorithms, I will briefly list the galactic foregrounds of the CMB, which are
the dominant contribution. There are also extragalactic effects that one can consider a foreground,
in particular the SZ effect, the Cosmic Infrared Background (CIB) (from unresolved infrared
sources) and generally any point sources from astrophysical objects such as radio galaxies,
infrared galaxies, quasars, which are not re-solved by the instruments used in CMB observations.
The foreground situation is different for temperature and polarization. While in temperature,
foregrounds are under good control, in polarization (especially for B-modes at low ℓ) they are
still a formidable obstacle, as evidenced by the incorrect claim for a primordial gravitational wave
detection by BICEP2.
The most important foregrounds are:

• Synchrotron radiation which is emitted by relativistic cosmic ray (CR) electrons, which
are accelerated by the Galactic magnetic field.

• Free-free radiation, or thermal bremsstrahlung, is emitted by free electrons interacting


with ions, in ionised gas.

• Thermal dust radiation is blackbody emission from interstellar dust grains with typical
temperatures T ∼ 20K.

101
Figure 19. CMB foreground components in temperature (left) and polarization (right). Plot from Planck:
1502.01588, see there for more details.

• Spinning dust radiation is emitted by the smallest interstellar dust grains and molecules,
which can rotate at GHz frequencies.

All of these have different spectral characteristics which is essential for foreground cleaning. While
in temperature, the CMB is of similar amplitude as the foregrounds depending on frequency, in
polarization the foregrounds dominate at all frequencies. A plot of the various components
compared to the primary CMB is shown in Fig.19.

19.2 The ILC algorithm


The most basic foreground cleaning algorithm for the CMB, which is used in several variants
in practice, is the internal linear components (ILC) method. Assume that the measured
temperature anisotropy Ti in a frequency channel i is a sum

Ti = ai s + fi + ni (19.1)

where s is the common signal that we want to estimate (such as the CMB), fi are foregrounds
in channel i and ni is the noise in this channel. The coefficient ai is the frequency dependence or
spectral energy distribution (SED) of the signal. This is the only physical input required
for the ILC. In the case of the CMB this is the known black body spectrum. We also need to
assume that the signal is statistically independent from the noise and foreground. Note that the
signal does not have to be the CMB, it could also be e.g. the tSZ temperature.
This equation above is basis independent. In the real space ILC we work in pixel space so
that
Ti (n̂) = ai s(n̂) + fi (n̂) + ni (n̂) (19.2)
and for the harmonic space ILC we use spherical harmonics
i i
Tℓm = ai sℓm + fℓm + niℓm (19.3)

The harmonic space version is optimal if the fields are statistically isotropic, however galactic
foregrounds are not isotropic. The real space ILC on the other hand can deal with statistical

102
anisotropy but is not suited for scale-dependent behavior. Both advantages can be combined
in the wavelet basis, which is local in position space and harmonic space at the same time,
which results in the Needlet ILC (NILC). NILC is one of Planck’s four component separation
methods.
The ILC is a linear combination of the input maps
X
ŝ = wi Ti (19.4)
i

weighted with weights wi , so that ŝ is unbiased and minimum variance. This can be done
with a constrained optimization using a Lagrangian multiplier. The result is

AT C−1
w= (19.5)
AT C−1 A
where A is the vector of the SED coefficients ai . The covariance matrix is estimated from the
data, for example in harmonic space we have
1 X
i j∗
Cij = Tℓm Tℓm . (19.6)
2m + 1 m

The CMB S4 summer school notebooks have an example of the ILC.


In the above we have assumed almost nothing about the foregrounds. However, if the SED
of one or more foreground signals are known, we can deproject them from our final
map. This is called constrained ILC. The constrained ILC can sometimes significantly reduce
foreground biases, with only a small increase in variance. Such advanced versions of the ILC
algorithm are discussed e.g. in 2307.01043 (sec. III).

19.3 Component separation


Above we have discussed how to foreground clean the maps to obtain a single “signal” ŝ. More
generally, one wants to split the total temperature anisotropy into several components. Consider
a linear mixture model in pixel space

T(n̂) = As(n̂) + n(n̂) (19.7)

Here T is the vector of observed frequency channels, A is the mixing matrix (describing how
each signal, such as the tSZ, projects into each frequency) and s is the vector of components
that we want to determine. In principle we simply want to invert this equation to obtain the
components s. Because of the noise and the fact that the matrix is in general non-invertible (not
even a square matrix), there is quite a range of possible solutions, reviewed in astro-ph/0702198v2,
depending on various prior assumptions that one can make. Again, there are different possible
bases to work in, such as real space and harmonic space. Solving for s(n̂) can also be done with
an optimizer or using MCMC sampling (e.g. Planck’s Commander pipeline). One can also
include external data for the various signal components, to make useful templates.
Interestingly, it is even possible to determine the components if the mixing matrix is not
known, if the components of the linear mixture can be assumed to be statistically independent.
This is possible because statistical independence is a strong mathematical property and often

103
a physically plausible one. This direction is called blind separation or independent com-
ponent analysis (ICA). ICA ideas are used in Planck’s SMICA pipeline (Spectral Matching
Independent Component Analysis).

104
Part IV
Large-Scale Structure
We now move on to 3-dimensional probes of the large-scale structure (LSS) of the universe such
as galaxy surveys. We have access to such 3-dimensional data at a much later time (redshift
z ≲ 10) than the CMB (redshift z ≃ 1100). A major complication compared to the CMB is that
matter evolves non-linearly at these later times, both due to gravitation and due to “baryonic”
physics. Further, most of the matter density δm in the universe is not directly observable. A
large part is contained in dark matter, and even most baryonic matter is contained in dilute gas
rather than luminous stars. To probe most cosmological parameters, ideally we’d like to measure
the matter power spectrum Pm (k), but we can only measure the power spectrum of tracers of
large-scale structures, such as different galaxy populations. We thus need to learn how these
tracers relate to the matter density, which can be done on large enough scales with the
bias expansion. Further, we need to take into account that in cosmology we can only measure
the redshift of galaxies, but not their absolute distance (unless they contain a standard candle).
We thus need to study the topic of red shift space distortions.
In this unit we primarily learn to analyze galaxy survey data (but other 3-dimensional probes
of the universe work almost the same). Galaxy survey data comes in two broad classes: Spec-
troscopic galaxy surveys take a spectrum of each galaxy, to obtain a precise redshift. Pho-
tometric galaxy surveys take pictures of the sky in several wavelengths, which allows a rough
determination of the redshift. The latter is much easier to do experimentally, so the galaxy
sample sizes are much larger, but on the other hand the lack of precise distances loses a lot of
information. Both surveys types have different strengths. The geometry of space (dark energy)
can best be probed with spectroscopic surveys which deliver precise BAO measurements.

Further reading
The general references of Unit 1 all contain a discussion of galaxy surveys. In addition I recom-
mend

• The classic review “Large-Scale Structure of the Universe and Cosmological Perturbation
Theory” astro-ph/0112551.

• For the connection between matter and galaxies, the galaxy bias, there is the review “Large-
Scale Galaxy Bias” 1611.09787

• Hannu Kurki-Suonio’s lecture notes “Galaxy Survey Cosmology ” https://www.mv.helsinki.


fi/home/hkurkisu/

• Specifically on EFT of LSS there are lecture notes from Senatore, Baldauf, Ivanov, and
Philcox.

105
20 The galaxy power spectrum at linear scales
We first start with a discussion of the linear galaxy power spectrum. We will introduce galaxy
bias, shot noise and red shift space distortions, but defer a more detailed discussion to later. This
section follows Dodelson-Schmidt chapter 11. Later we will extend our discussion to non-linear
scales.

20.1 Linear galaxy bias


On large scales, it turns out that the density perturbations of the galaxy density δg is related by
a constant linear galaxy bias to the matter density δm :

δg (x, τ ) = b1 (τ )δm (x, τ ) (20.1)

Here b1 means that this is the first order bias. The bias depends on conformal time τ or
equivalently redshift z. We will briefly discuss the derivation of this result, as well as higher
order biases, later. The bias depends sensitively on the galaxy sample considered and is in
general red-shift dependent. A typical galaxy bias for a survey like DESI could be b1 ∼ 2, i.e.
the overdensities of galaxies are twice as large as those of matter.

20.2 Shot noise


In addition to having a bias, a further difference between the matter field and the galaxy field is
that the latter is a point cloud rather than a continuous field. An approximate way to think
about this is that the galaxy field is a Poisson sampling where the mean in each volume element
of space is modulated by the underlying biased matter density. This leads on large scales to a
galaxy field

δg (k) = bδm (k) + n(k) (20.2)

where n is white noise (i.e. pixels have uncorrelated noise). In terms of the power spectrum we
get

Pg (k) = b2 Pm (k) + N (k) (20.3)

where the (shot-) noise is approximately inverse to the comoving galaxy density
1
N (k) = (20.4)
n̄g

Note in particular that shot noise is flat in k. While the n̄1g approximation is not very precise,
especially at high halo density (where halos may not form independently of eachother), the fact
that the noise is flat on large scales holds to good approximation.

106
20.3 Velocity field on large scales
We also need to know the velocity perturbations on large scales, which are the source of red shift
distortions. On linear scales the matter velocity and matter density perturbations are related by
ik
um (k, τ ) = f aH δ(k, τ ) (20.5)
k2
We will derive this result in Sec. 21.2.2. The factor f is called the linear growth rate. The
growth rate is close to unity for a ΛCDM universe and exactly 1 for a flat matter-dominated
cosmology. Notice that the velocity in Fourier space is proportional to the wavevector k.

20.4 Red shift space


A galaxy survey measures the sky angles (θ, ϕ) and the redshift z of galaxies. The position of a
galaxy in configuration space (position space) is

x(z, θ, ϕ) = χtrue n̂(θ, ϕ) (20.6)

where χtrue is the true (not measurable) comoving distance of the galaxy. We define the 3-
dimensional position of the galaxy in red shift space as

xobs (z, θ, ϕ) = χ(zobs )n̂(θ, ϕ) (20.7)

The distance χ(z) is the comoving distance at red shift z (if z was only due to the Hubble
expansion). The function χ(z) depends on cosmological parameters, and we evaluate it at some
fiducial cosmological parameters. The fact that these parameters are not exactly known is
also important, and leads to the Alcock-Paczynski effect that we will discuss below. For now
assume that the cosmological parameters are known.
In reality, galaxies do move with respect to the background frame and their redshift is given
by the Hubble flow and their peculiar velocity u as
1 1 
1+z = (1 + ug · n̂) = 1 + u|| (20.8)
aem aem
where aem is the scale factor at which the light from the galaxy was emitted (the above is a
non-relativistic approximation, galaxies don’t move faster than ∼ 1% of the speed of light). The
observed position of the galaxy in red shift space xobs is thus given by a correction ∆xRSD to
the true position x of the galaxy as

xobs = x + ∆xRSD (20.9)


u|| (x)
=x+ n̂ (20.10)
aH
where RSD means red shift space distortion. Of course, we don’t know u|| of a given galaxy.
Red shift space distortions are not only a problem. Because they are sensitive to velocities,
they can also be used to constain cosmology, in particular through the growth factor.

107
20.5 Redshift space distortions of the density field
To measure cosmological parameters, we need to be able to calculate the observed galaxy power
spectrum with RSD included. On linear scales, this effect was derived by Kaiser in 1987 and
leads to the Kaiser red shift term. We will only summarize the calculation, see Dodelson for
more details. The starting point is the observation that, since RSD neither creates nor destroys
galaxies, the densities in red shift space and configuration space must be related by

ng,obs (xobs )d3 xobs = ng (x)d3 x (20.11)

We can write the volume element in spherical coordinates as d3 x = x2 dxdΩ and d3 xobs =
x2obs dxobs dΩ where dΩ is the same in both coordinates. Therefore the densities are related by a
Jocobian J as

ng,obs (xobs ) = J ng (x) (20.12)

with
d3 x dx x2
J≡ = (20.13)
d3 xobs dxobs x2obs

The Jacobian can be calculated and simplified (on large scales) to

1 ∂
J ≈1− u (20.14)
aH ∂x ∥
For density perturbations δ = n̄(1 + δ) it follows (to first order in perturbations) that
 
1 ∂
1 + δg,obs (xobs ) = 1 + δg (x[xobs ]) − u (x[xobs ]) (20.15)
aH ∂x ∥

We now have the building blocks that we need to calculate the galaxy power spectrum. We
first note that in the above equation we can set xobs = x at lowest order in the perturbations.
This is because expanding the arguments of δg and u would lead to higher order terms that
would be small. We also use linear galaxy bias to express δg in terms of δm . We can also equal
the galaxy velocity ug to the matter velocity um . Physically this is because the velocities are
sourced by the attraction of all the matter in the universe, not just that of galaxies. With these
approximations we get
∂ um (x) · x̂
 
δg,RSD (x) = b1 δm (x) − (20.16)
∂x aH
Next we introduce the distant observer approximation, also called the plane parallel
approximation. The idea is to take the line of sight x̂ to agree with the z-axis and treat it as
fixed, neglecting changes from galaxy to galaxy. This is justified for galaxies that are relatively
nearby on the sky. We can then replace um (x) · x̂ → um (x) · êz . Using the distant observer
approximation we can evaluate the Fourier transform δg,RSD (k) as follows:

∂ um (x) · êz
Z   
3 −ik·x
δg,RSD (k) = d x e b1 δm (x) − (20.17)
∂x aH

108
which can be evaluated, using Eq. 20.5 for the velocities, to give

δg,RSD (k) = [b1 + f µ2k ]δm (k) (20.18)

where µk = êz · k̂ is the vector between the line of sight and the perturbation. This is called the
Kaiser redshift space distortion. The apparent overdensity in redshift space is thus larger
than in configuration space (except for transverse perturbations where µ = 0).

20.6 Redshift space distortions of the galaxy power spectrum


By squaring the RSD galaxy density contrast, and reintroducing shot noise, we find that the
linear power spectrum is given by
2
Pg,RSD (k, µk , z) = PL (k, z) b1 + f µ2k + PN

(20.19)

The redshift dependent power spectrum is usually expanded as

2l + 1 1
Z
(l)
Pg,RSD (k) = dµk Pl (µk )Pg,RSD (k, µk ) (20.20)
2 −1

using Legendre polynomials (as appropriate for an azimuthally symmetric function). The power
spectrum is then
(l)
X
Pg,obs (k, µk ) = Pl (µk )Pg,obs (k) (20.21)
l

By plotting the monopole l = 0, quadrupole (l = 2) and hexadecapole l = 4 (the other ones


are negligibly small), one can avoid plotting the µ dependence. Most of the signal-to-noise is in
the monopole.

20.7 Alcock–Paczynski effect


An additional distortion to the observed power spectrum comes from the fact that the cosmo-
logical parameters are not precisely known (in fact, we want to measure them from the power
spectrum). Therefore our fiducial relation between χ and z has some error

χfid (z) = χ(z) + δχ(z) (20.22)

One can again propagate this error through the Jacobian as we did in our derivation of RSD.
This leads to an additional anisotropy of the measured power spectrum. The derivation can be
found in Dodelson-Schmidt 11.1.3.

20.8 Red shift binned angular correlation functions


For photometric surveys, we don’t have precise individual redshifts for galaxies. Instead, the
photometry can only be used to split galaxies roughly into redshift bins. For each red shift bin,
the redshift density of sources is given by a window function of form
1 dNg
W (χ) = (20.23)
Ng dχ

109
where W is normalized to unity and drops to zero outside of the interval.
The angular galaxy density in the bin is then given by
Z ∞
∆g (n̂) = dχW (χ)δg,obs (x = n̂χ, τ = τ0 − χ) (20.24)
0

Going to multipole space we get


Z ∞
d3 k ∗
Z
∆g,lm = 4πil Y (k̂) dχW (χ)jl (kχ)δg,obs (k, τ (χ)) (20.25)
(2π)3 lm 0

The power spectrum

⟨∆g,lm ∆∗g,l′ m′ ⟩ = δll′ δmm′ Cg (l) (20.26)

is given by
∞ ∞
2
Z Z Z
Cg (l) = k 2 dk dχW (χ)jl (kχ) dχ′ W (χ′ )jl (kχ′ )Pg,obs (k, τ (χ), τ (χ′ )). (20.27)
π 0 0

Note that this includes a non-equal time power spectrum, which takes into account the different
times probed due to the light-cone. In the so-called Limber approximation this becomes
 
dχ 2 l + 1/2
Z
Cg (l) = W (χ)P g,obs k = , τ (χ) (20.28)
χ2 χ
This approximation avoids evaluating the Bessel function integrals and is used in many cosmology
papers. The Limber approximation is valid if the radial extent of the bin is much larger than the
scale of the angular scale of the multipole l under consideration. More about the accuracy of the
Limber approximation can be found here: 0809.5112.

21 Overview of LSS Perturbation Theory


In the next sections we study structure formation, i.e. how the universe grows from small ini-
tial inhomogeneities to the cosmic web we observe today. This process can be treated to good
approximation using Newtonian physics on a flat expanding spacetime. Newtonian gravity is
indeed used in the vast majority of research about structure formation, with “relativistic correc-
tions” being an active topic of research. Both perturbative calculations in large-scale structure
and N-body simulations can be done to high precision while neglecting relativistic effects (but
taking into account the expansion of spacetime of course). We start with a discussion of analytic
perturbation theory. Perturbation theory, in the modern EFT version, is still the state of the art
in extracting cosmological parameters from large-scale structure surveys.

21.1 Fluid approximation


In Newtonian perturbation theory our goal is to calculate how the matter density ρ(x) evolves in
time. We will make the approximation that matter is a collisionless fluid, i.e that it consists of a
continuous matter density that interacts only gravitationally. Clearly, the observed matter in the
universe looks very different, it contains different forms of matter and clumps into galaxies which

110
100

σW (R)
10−1

10−2
100 101 102
R [h−1 Mpc]

Figure 20. The standard deviation of the density field, Eq. (21.3), when smoothed over different scales R,
where R is the width of the smoothing filter in position space, at redshift z = 0. The value at R = 8h−1 Mpc
is the definition of the common cosmological parameter σ8 .

have complicated non-gravitational physics. Nevertheless, the collisionless fluid approximation


works on large enough scales and can be systematically improved on intermediate scales to include
complicated small-scale physics in the effective field theory of inflation which we will outline
below.
Let’s first see why perturbation theory is possible and on what scales. To do so, we define
the filtered density field δW (x),
Z
δW (x) = d3 y W (|x − y|)δm (y), (21.1)

where W (x) is the filtering kernel that we can take to be isotropic. This filtering corresponds to
a multiplication in Fourier space:

δW (k) = W (k)δm (k), (21.2)

where W (k) is the Fourier transform of the isotropic filtering kernel, such as a real-space tophat.
The variance of the filtered field is
d3 k d3 k ′
Z Z
∗ ′
2 2
σW ≡ ⟨(δW ) (x)⟩ = 3 3
⟨δW (k)δW (k′ )⟩ei(k−k )·x (21.3)
(2π) (2π)
Z 3
d k
= PL (k)|W (k)|2 (21.4)
(2π)3
1
Z
= 2 d ln k k 3 PL (k)|W (k)|2 . (21.5)

which is plotted in Fig. 20 as a function of the smoothing scale. We see when perturbations get
smaller than one, indicating that a perturbative expansion in δk is possible.

21.2 Standard (Eulerian) Perturbation Theory


We now briefly review cosmological perturbation theory. My discussion follows the introduction
by Philcox. The standard review is astro-ph/0112551.

111
21.2.1 Equations of motion
To describe the universe as a fluid we need the following variables

• δ(x, τ ): Overdensity of matter related to the the density ρ(x, τ ) by δ(x, τ ) = ρ(x, τ )/ρ̄(τ )−1

• v(x, τ ): Fluid velocity. Note that in the fluid approximation we cannot describe a situation
where matter clumps of different velocity pass through each other.

• ϕ(x, τ ): Peculiar gravitational potential (corrected for the background expansion)

• σij (x, τ ): Viscous stress tensor. σij = 0 for a perfect fluid, which we consider here, but it
becomes important in the EFTofLSS.

In the collisionless fluid approximation we consider the total matter distribution, dark matter
and baryons, together.
The equations of motion, which can be derived from the collisionless Boltzmann equation,
in the Newtonian limit, are

• The Continuity equation

δ̇(x, τ ) + ∇ · [(1 + δ(x, τ ))v(x, τ )] = 0 (21.6)

Here and below dots indicate derivatives with respect to conformal time.

• The Euler equation

v̇(x, τ ) + [v(x, τ ) · ∇]v(x, τ ) = −H(τ )v(x, τ ) − ∇ϕ(x, τ ) (21.7)

where H = aH is the comoving Hubble parameter. The Euler equation is the equivalent to
F = ma for a fluid element. The left hand side is the “convective time derivative” and the
right hand side has a force term due to the gravitational potential and a term due to the
Hubble expansion.

• The Poisson equation


3
∇2 ϕ(x, τ ) = 4πGa2 (τ )ρ̄(τ )δ(x, τ ) = H2 (τ )Ωm (τ )δ(x, τ ) (21.8)
2

One can solve these equations perturbatively on scales where these perturbations are small,
so that the pertubative expansion converges.

21.2.2 Linear solutions


To find the linear solution that describes the universe on large scales, we linearize the fluid
equations, i.e., dropping any terms of second or higher order in {δ, v, ϕ}. Introducing the velocity
potential
θ(x, τ ) = ∇ · v(x, τ ) (21.9)

112
this yields the following equations for the first-order fields, δ1 , v1 :

θ1 (x, τ ) = −δ̇1 (x, τ ) (21.10)


3
δ̈1 (x, τ ) + H(τ )δ̇1 (x, τ ) − H2 (τ )Ωm (τ )δ1 (x, τ ) = 0. (21.11)
2
where we eliminated the peculiar potential. These are solved by a separable solution, such that

δ1 (x, τ ) = D(τ )δL (x) (21.12)


θ1 (x, τ ) = −H(τ )f (τ )D(τ )δL (x), (21.13)

where δL (x) is the linear density field set by inflation (and k-dependent transfer functions that
take into account mode evolution in the early universe, see Sec. 9.4.1). This drops a “decaying
mode”. The growth factor is given by the integral solution
a(τ )
da′
Z
D(τ ) = D0 H(τ ) , (21.14)
0 H3 (a′ )

where D0 ensures the normalization condition D(a = 1) = 1 today. For an Einstein-de-Sitter


Universe (with Ωm = 1), D(τ ) is simply the scale factor a(τ ). For the velocity, we introduced
the (velocity) growth rate

d log D(τ )
f (τ ) ≡ (21.15)
d log a

We see that densities evolve according to D(τ ) while velocities are enhanced by a factor of
H(τ )f (τ ).
Switching to Fourier-space, we obtain:

δ1 (k, τ ) = D(τ )δL (k) (21.16)


θ1 (k, τ ) = −H(τ )f (τ )D(τ )δL (k), (21.17)

and

v(k, τ ) = i(k/k 2 )θ(k, τ ). (21.18)

It follows that the linear-order matter power spectrum is given by


SPT
Plinear (k, τ ) = ⟨δ1 (k, τ )δ1 (−k, τ ′ )⟩′ = D2 (τ )PL (k), (21.19)

where PL (k) is the power spectrum of the initial conditions. We have dropped a momentum-
conserving Dirac delta function (indicated by the prime in the expectation value ⟨⟩′ , as is often
done).

21.2.3 General perturbative solution


The general perturbative solution works by first rewriting the fluid equations in Fourier space,
and then making a series solution, expanding the equations order-by-order in the (assumed small)

113
parameters δ and θ. Explicitly, we begin with the series solutions

X
δ(k, τ ) = Dn (τ )δ (n) (k) (21.20)
n=1

X
θ(k, τ ) = −H(τ )f (τ ) Dn (τ )θ(n) (k), (21.21)
n=1

where the n-th order solution contains n copies of the linear solution, δ (1) (k) = δL (k). We
have assumed separability in time and space which is an excellent approximation (and exact
for Einstein de-Sitter universes), though deviations can occur at high order. The n-th order
contribution takes the form:
Z
(n)
δ (k) = Fn (p1 , . . . , pn )δ (1) (p1 ) . . . δ (1) (pn )(2π)3 δD (p1 + . . . + pn − k), (21.22)
p1 ...pn
Z
θ(n) (k) = Gn (p1 , . . . , pn )δ (1) (p1 ) . . . δ (1) (pn )(2π)3 δD (p1 + . . . + pn − k). (21.23)
p1 ...pn

This is the convolution of n linear density fields with a kernel, Fn or Gn . The kernels up to
second order are given by:
5 2
F1 (p) = 1, F2 (p1 , p2 ) = α(p1 , p2 ) + β(p1 , p2 ), (21.24)
7 7
3 4
G1 (p) = 1, G2 (p1 , p2 ) = α(p1 , p2 ) + β(p1 , p2 ) (21.25)
7 7
where
p1 · k k 2 p1 · p2
α(p1 , p2 ) = ; β(p1 , p2 ) = ; k = p1 + p2 . (21.26)
p21 2p21 p22

These integrals in general have to be evaluated numerically. The integrals in principle go up to


infinite momenta, where perturbations are not small and physics is not perturbative. This is an
inconsistency in SPT, that is fixed in the effective field theory of large-scale structure,
which starts from a smoothing of the underlying fields. We’ll get back to this issue in Sec. 22.
The above equations in principle allow us to compute density (and velocity) field statistics at
arbitrary order. The most important and basic one is the equal-time power spectrum, P (k, τ ) =
⟨δ(k, τ )δ(−k, τ )⟩ which can be written in terms of δ n correlators as
h i
P SPT (k, τ ) = D2 (τ )P (11) (k) + D4 (τ ) P (13) (k) + P (22) (k) + . . . (21.27)

where P (ij) (k) = ⟨δ (i) (k)δ (j) (−k)⟩, and we have assumed Gaussian initial conditions, such that
any correlator involving an odd number of linear density fields vanishes. In the same way we can
compute higher-order correlators. The next most important one is the three-point function, or
bispectrum, which at lowest order is given by

B(k1 , k2 , k3 , τ ) = ⟨δ(k1 , τ )δ(k2 , τ )δ(k3 , τ )⟩′ = D4 (τ )B (211) (k1 , k2 , k3 ) + . . . (21.28)

with higher-order contributions containing loop integrals over the linear power spectrum.

114
21.3 Lagrangian Perturbation theory (LPT)
There is a second important way to perform perturbation theory, in a different set of variables.
This is the Lagrangian formulation. Let’s briefly outline this approach. One can describe a fluid
in two ways:

• In the Eulerian picture described above, we describe the matter density ρ(x, t) and the
velocity field v(x, t) as a function of a fixed spatial coordinate x.

• In the Lagrangian picture, instead of working with densities, we describe the movement of
particles (or fluid elements) from their initial comoving coordinate q to their later comoving
Eulerian coordinate x by defining the displacement field Ψ(q, τ ) so that

x(τ ) = q + Ψ(q, τ ).

All coordinates are comoving, so the expansion of the Universe does not change them. Note
that Ψ = 0 initially so that q is the same as the usual comoving coordinate at initial time,
τ = 0. Once we have calculated the displacement field, using Lagrangian perturbation
theory, we can estimate the observable density field ρ(x, t) from it.

Lagrangian perturbation theory looks similar to SPT, i.e. we can calculate a series solution
of form

X
Ψ(q, τ ) = Dn (τ )Ψ(n) (q), (21.29)
n=0

As in the Eulerian case, the n-th order solution can be written as a convolution over n copies of
the linear density field δL :
i
Z
Ψ(n) (k, τ ) = Ln (p1 , . . . , pn )δL (p1 ) . . . δL (pn )(2π)3 δD (p1 + . . . + pn − k), (21.30)
n! p1 ...pn

However, the integrals over the kernels are in general harder to evaluate than those of SPT.
Some comments on the relation of Eulerian and Lagrangian PT:

• The first order LPT solution called the Zeldovich approximation and its second order
extension called 2-LPT are remarkably good at reproducing the full non-linear density
field at intermediate scales. They outperform the first and second order SPT solutions
substantially. However, by including so called IR resummation, one can improve SPT and
ultimately both Eulerian and Lagrangian perturbation theory give equivalent results (see
e.g. Senatore’s EFTofLSS lecture notes).

• The Zeldovich approximation (1-LPT) and 2-LPT are used to set up initial conditions for
N-body simulations. N-body simulations track particles, so the use of Lagrangian particle
displacements makes intuitive sense. The reason why N-body simulations need perturbation
theory is to set up initial particle displacements (of equal mass particles) that incorporate
the initial inhomogeneities from inflation, as well as to speed up computation time by
treating small density fluctuations analytically until they grow sufficiently to require N-
body simulation.

115
Figure 21. Plots showing a comparison of N-Body data (black boxes) with theoretical SPT power spectra
at tree level (dotted), one loop (solid red), and two loop (dashed blue) orders. The left and right plots
show the comparison at redshifts 0 and 1 respectively. Each curve has been divided by the no-wiggle
(broadband) power spectrum for clarity of range. The plots are taken from 0905.0479.

• Observations only provide us with Eulerian densities, since we cannot look back in time to
observe the movement of a chunk of matter to its initial position. Observations are thus
closer to Eulerian theory. However N-body simulations readily provide both displacement
fields and densities.

The dual description of structure formation in the Eulerian and Lagrangian picture continues
to be important even for machine learning based methods. For example, a neural network struc-
ture formation emulator can either be trained to output Eulerian density fields ρ(x) or to output
the displacement field ψ of particles, and indeed both have been tried.

22 Effective Field Theory of Large-Scale Structure*


*This section was developed and taught by Sai Chaitanya Tadepalli.

22.1 Problems with SPT


In previous sections, we discussed the Standard Perturbation Theory (SPT) of the matter over-
density in an expanding universe during the matter-domination era. The derivation of SPT
inherently assumes that the distribution of matter on large-scales can be treated as pressureless
and collision-less fluid. Clearly this assumption has certain drawbacks and fails to accurately
predict the matter power spectrum even on large scales where it is supposed to perform very well
(i.e. on scales where the matter overdensity variance is much less than unity and hence certainly
perturbative).
To visualize the performance of SPT, consider the plot shown in Fig. 21 where we show the
SPT matter power spectrum fitting at linear, one, and two-loop order to the data obtained from
numerical N-Body simulations. At the outset, we observe that the SPT performs well at very
large scales (k ∼ O(10)H0 /c ≈ 0.003h/Mpc) where the residual is sub-percent. On these scales,

116
Figure 22. Top panel: z = 0 matter overdensity power spectrum in a 1D CDM-like model calculated
analytically using linear theory, the Zeldovich approximation (LPT at any order), and SPT to the specified
order in the overdensity. Note that even a 20 loop order SPT does not perform any better than a two loop
SPT. Figure is taken from 1502.07389

the universe is close to a perfect fluid and traces the initial conditions very well (which implies
minimal mode mixing). However, linear SPT begins to fail at scales ∼ O(0.01) h/Mpc at z = 0.
One might consider adding the next order terms in PT, the one-loop terms P (13) and P (22) , to
our theoretical fit to improve the range of SPT. This is shown by the solid red curve in Fig. 21.
Clearly, the one-loop SPT does not improve our fit better than the linear theory. Here, we may
be tempted to add higher loop order terms such as second and third to improve the fit. This
is shown in the dashed blue curve where we show the performance of SPT up to two loops.
Interestingly, adding higher-order loops does not improve our fit. The fitted curve appears to
oscillate around the true data points. This exercise when carried up to as large as 10 loop orders
reveals a similar pattern, as illustrated in Fig. 22. Hence, we deduce that SPT fails to fit the
nonlinear matter power spectrum on scales k ≪ 0.3h/Mpc ≡ kNL (z = 0). Therefore, SPT needs
to be improved.
When deriving SPT, the solution to the nonlinear coupled equations (Euler, Poisson and con-
tinuity) of the matter overdensity contrast δ(k) in Fourier space was given in terms of corrections
to the linear solution δ (1) :

δ(k) = δ (1) (k) + δ (2) (k) + δ (3) (k) + ... (22.1)

where each nonlinear correction is given by


Z !
⃗k −
X
(n) 3 3 n
δ (k) = d q1 ...d qn δ ⃗qi Fn (q1 , ..., qn ) δ (1) (q1 )......δ (1) (qn ) (22.2)
i

117
with Fn (..) the symmetrized kernel of the nth-order solution.
The above expansion hinges on the assumption of perturbativity which requires that each
nth order correction must be smaller than the (n − 1)th order term. This is needed for the
perturbative solution to exist in the first place. However, as we will show below the loop terms
inherently contain contributions from the internal momenta (or modes) where our perturbation
theory is bound to break down. This lack of a clear small expansion parameter in the SPT is the
prime reason for its failure. To this end, consider the one-loop term P (13) as given below
D E′ D E′
(1) (3)
P (13) (k) = δk δp(3) + δk δp(1) (22.3)
d3 q
Z
= 6P (11) (k) F3 (⃗k, ⃗q, −⃗q)P (11) (q) (22.4)
(2π)3

(1) (1) ′
D E
where P (11) (k) = δk δp is the linear matter power spectrum and ′ denotes that we have
absorbed the Dirac delta function and the factor of (2π)3 .
On very large scales, i.e. in the limit
k → 0, the kernel F3 → k /q . Hence, we find the UV (k/q → 0) limiting behavior of P (13) is
2 2

Z ∞
(13) 61 2 (11)
lim P (k) ≈ − k P (k) dqP (11) (q). (22.5)
k→0 630π 2 0

The integral in the above expression goes over all internal momenta q and hence the integrand
P (11) (q) is evaluated on very small scales. This raises serious concerns as the linear power
spectrum P (11) is not valid on scales beyond ∼ kN L and yet we are summing over all scales down
to those of individual galaxies, stars, planets and even dust!
The concerns over the summation of internal momenta over very small scales leads to a related
issue with the SPT. Let us consider that the linear power spectrum can be approximated by a
power-law form
P (11) (k) ∝ k n . (22.6)
This is an excellent approximation on very large and quasi-linear scales where n ≈ 1 and ≈ −1.5
respectively. These values are reflective of our universe where the initial conditions are governed
dominantly by adiabatic initial (primordial) conditions. Upon substituting the power-law linear
power spectrum into the UV limiting integral for P (13) , we find that the integral diverges if
n ≥ −1. Fortunately, for adiabatic initial conditions, the spectral index n → −3 as k → ∞ and
hence the integral converges. However, in models extending beyond a pure adiabatic assumption,
such as those featuring a small fraction of primordial large blue-tilted (n > −1) isocurvature
fluctuations, the integral becomes divergent. Consequently, the SPT fails to meet the general
requirement of applicability to arbitrary initial conditions (see 1301.7182 for details)
In the next subsection, we will show how these problems can be ameliorated by the EFTofLSS
formalism.

22.2 Coarse graining and effective fluid


The primary goal of the EFTofLSS program is straightforward: to develop a consistent pertur-
bation theory for the expanding Universe which is convergent, accurate, and can be applied in
the presence of arbitrary initial conditions.

118
The key problem in the SPT was the evaluation of loop integrals over scales where the internal
propagator (linear power spectrum) is known to break down. Hence, a straightforward solution is
to regulate these integrals by evaluating them up to a finite cutoff scale Λ. Similar to the ‘cutoff
regularization’ in QFT, we can evaluate the UV limit of the one loop term P (13) up to a scale Λ
as
Z Λ
(13) 2 (11)
lim P (k, Λ) ∼ k P (k) dqP (11) (q). (22.7)
k→0 0

By choosing Λ ≪ kNL we are guaranteed that the integrand P (11) (q) is evaluated on perturbative
scales. This seemingly simple solution has an inherent problem. By choosing an arbitrary cutoff
scale Λ, we have made our final SPT evaluations Λ-dependent. This is easy to observe for choices
of cutoff scales Λ1 < Λ2 ≪ kNL such that
Z Λ2 3
(13) (13) (11) d q ⃗ q , −⃗q)P (11) (q).
P (k, Λ2 ) = P (k, Λ1 ) + 6P (k) 3 F3 (k, ⃗ (22.8)
Λ1 (2π)

In the limit k ≪ Λ1 , we can approximate the integral in above expression using the UV limit
derived earlier. Hence we obtain
Z Λ2
(13) (13) 2 (11)
P (k, Λ2 ) = P (k, Λ1 ) − constant × k P (k) dqP (11) (q), (22.9)
Λ1
(13) 2 (11)
=P (k, Λ1 ) − k P (k) [f (Λ2 ) − f (Λ1 )] . (22.10)

However, our true data points either from an N-body simulation or observed samples are Λ-
independent. Hence, even though we have made a positive step in finding a resolution for the
failure of SPT, we have introduced an arbitrary scale in our theory which may be physically
motivated but is not an accurate description of the data.
The cutoff regularization procedure suggests that the SPT can be explicitly restricted to scales
k < Λ where Λ is some coarse-graining scale. Hence, we must look for a new ‘effective’ theory
that applies to perturbative long-wavelength modes. This reminds us of the effective field theory
(EFT) approach in QFT. In the EFT picture, we define the partition function Z of our theory as
Z
Z = DϕeiS[ϕ] (22.11)

where S[ϕ] is an action and is a functional of the field ϕ. EFT hinges on the argument that
to describe a low energy regime of field configurations at k ≪ Λ, we do not need high energy
momentum field modes. To visualize this, consider a complete UV theory where we can factorize
the underlying ϕ field in terms of long and short scales ϕ = ϕl + ϕs . Thus,
Z Z
iS[ϕ]
Z = Dϕe = Dϕl Dϕs eiS[ϕl ,ϕs ] . (22.12)

Next, we integrate over all short-scale modes and obtain


Z
Z = Dϕl eiS[ϕl ] . (22.13)

119
Since the partition function remains consistent in both descriptions, the actions, denoted as
S[ϕl ] and S[ϕ], differ. The new action S[ϕl ] yields a low-energy effective theory, capturing the
evolution of the field theory by integrating out ultraviolet (UV) modes and applying rescaling.
It is important to note that the new low-energy effective theory action, S[ϕl ] may incorporate
residual effects from small-scale modes. The feedback of these small-scale modes on the large
scale forms the essence of the success of the Effective Field Theory of Large-Scale Structure
(EFTofLSS) formalism.
Similar to the above discussion, we propose that the matter overdensity field δ can be broken
down into long and short scale (wavelength) modes where the long wavelength modes are chosen
such that they are perturbative. This is achieved in principle by smoothing the matter overdensity
field δ(x) within a smoothing radius R ∼ Λ−1 . The smoothing procedure integrates over all short
scale (x < R or k > Λ) information. Hence, we define a new EFT of LSS that consists of
smoothed field variables obtained by smoothing the δ, v and ϕ :

[δ]Λ → δl [π]Λ → πl [ϕ]Λ → ϕl (22.14)

where [O]Λ is the operation of smoothing over an operator O and π = ρv is the momentum
density operator.

22.2.1 Brief derivation of EFTofLSS fluid equations


What is the EFT procedure? The starting point is the fluid EOMs. Let us begin with the
Eulerian PT which aims at solving the system of three fluid equations: Poisson, Continuity, and
Euler. Starting from an EdS cosmology, the equations can be written as
3
∇2 ϕ − H2 ρ0 δ = 0, (22.15)
2
∂τ δ + ∇ · [(1 + δ) v̄] = 0, (22.16)
∂τ v + Hv + (v · ∇) v + ∇ϕ = 0. (22.17)

Here, δ and v are the DM number-density fluctuation and peculiar velocity field respectively.
We can construct the equations of motion for an effective fluid by coarse-graining the fluid
equations using a smoothing window function. The smoothing guarantees that the Boltzmann
hierarchy can be truncated, leaving us with an effective fluid. We define our isotropic smoothing
window function WΛ (x̄, x̄′ ) as a function of the radial separation r and smoothing radius Λ−1 :

WΛ (x̄, x̄′ ) ≡ F r, Λ−1



(22.18)

where r2 = (x − x′ )i (x − x′ )i . The isotropy of the window function implies


Z
i
d3 x′ WΛ x̄, x̄′ x − x′ = 0.

(22.19)

It is convenient to choose a normalized Gaussian function as our smoothing kernel:


 3  3
′ Λ − 12 Λ2 |x̄−x̄′ |2 Λ 1 2 (x−x′ )i (x−x′ )
WΛ (x̄ − x̄ ) = √ e ≡ √ e− 2 Λ i , (22.20)
2π 2π

120
with its Fourier transform
k2
WΛ (k) = e− 2Λ2 (22.21)
where Λ now represents a k-space, comoving cutoff scale. We regularize our observable quantities
by smoothing them which is equivalent to taking convolution in real space with the filter (window
function), defining the effective long wavelength quantity as
Z
Al (x̄) = d3 x′ WΛ x̄, x̄′ A(x̄′ ),

(22.22)

and split the fields into short and long wavelength fluctuations by defining the short wavelength
quantity as
As (x̄) = A(x̄) − Al (x̄). (22.23)
In Fourier space, this is represented as

Al (k) ≡ WΛ (k)A(k), (22.24)


As (k) ≡ (1 − WΛ (k)) A(k) (22.25)

Specifically, for fields δ, v and ϕ the effective long-wavelength fluctuations are defined as
Z
δl (x̄) = d3 x′ WΛ x̄, x̄′ δ(x̄′ ),

(22.26)
Z
d3 x′ WΛ x̄, x̄′ ϕ(x̄′ ),

ϕl (x̄) = (22.27)
Z
d3 x′ WΛ x̄, x̄′ 1 + δ(x̄′ ) v̄(x̄′ ).
 
(1 + δl (x̄)) v̄l (x̄) = (22.28)

By applying the smoothing operation to the Euler, Poisson, and Continuity equations, and
after numerous simplifications, we obtain the following set of fluid equations (see 1206.2926):
3
∇2 ϕl − H2 ρ0 δl = 0, (22.29)
2
∂τ δl + ∇ · [(1 + δl ) v̄l ] = 0, (22.30)
1  h ji h i 
∂τ ⃗vl + H⃗vl + (⃗v · ∇) ⃗vl + ∇ϕl = − ∂j τi + ∂j τij 2 . (22.31)
ρl Λ ∂

where
ρl (x) ≡ ρ0 dl (x) = ρ0 (1 + δl (x)) , (22.32)
and
" ′ ′
#
h i
j ′ ′ j ′ 2∂ j ϕs (x̄′ )∂i′ ϕs (x′ ) − δij ∂ k ϕs (x̄′ )∂k′ ϕs (x′ )
τi = ρ(x̄ )vs (x̄ )vs (x̄ ) + (22.33)
Λ 8πG
Λ
!
h i ∂m vl (x̄)∂ m vlj (x̄) 2∂k ∂i ϕl (x)∂ k ∂ j ϕ j m k
l − δi ∂k ∂ ϕl (x)∂ ∂m ϕl
τij = ρl (x̄) + . (22.34)
∂2 Λ2 8πGΛ2

121
We see that the long-wavelength fluctuations obey an Euler equation in which thehstress ij
i tensor τ
j
receives contributions from two terms that are induced by the short wavelength ( τi ) and long-
h i Λ
wavelength ( τij 2 ) fluctuations respectively. The long wavelength fluctuations are suppressed

by 1/Λ2 factor and can be neglected in the limit Λ → ∞. In the large Λ limit, the leading stress
tensor is sourced by the short-wavelengths. These residual stress terms arise since multiplication
and smoothing do not commute, i.e. [AB]Λ ̸= [A]Λ [B]Λ . Physically speaking, the intuition for
the above stress tensor is that small scale modes appearing in the fluid equations non-linearly
modify the dynamics of large scale modes.
Although we started with the EoM of a pressure-less fluid, the effective pressure of the ‘im-
perfect’ matter fluid after smoothing (in the limit Λ → ∞) is given as
1 h ki
peff = τ (22.35)
3 k Λ " ′ # !
1 h ′ i ∂ k ϕ (x̄′ )∂ ′ ϕ (x′ )
s k s
= ρ(x̄ )vs;k (x̄′ )vsk (x̄′ ) + . (22.36)
3 Λ 8πG
Λ

Hence, we see that the small scale fluctuations induce an effective pressure perturbation on the
long-wavelength fluid. One can also see the effect of the small scale velocity fluctuations by taking
the first term in Eq. (22.33) and writing it as
1  ij  1  ′ i ′ j ′ 
τ Λ∼ ρ(x̄ )vs (x̄ )vs (x̄ ) Λ (22.37)
ρl ρl
∼ δl vsi (x̄′ )vsj (x̄′ ) Λ
 
(22.38)
∼ δl c2s δ ij + O (∂k vs ) . (22.39)

The parameter c2s is the sound speed squared due to the residual pressure of the small scales. The
effective stress tensor that we have identified is thus explicitly dependent on the short wavelength
fluctuations. These are very large, strongly coupled, and therefore impossible to treat within the
effective theory. The next key step in the EFT description is the expansion of this stress tensor
in terms of powers of derivatives and δl with the expansion coefficients (such as c2s ) parameterized
instead of being computed.
Since we treat the matter as a collisionless and pressureless fluid, it is convenient to introduce
the notation of fluid dynamics to understand the various terms that arise from the smoothing
procedure, such as the induced stress-tensor as given in Eqs. (22.33) and (22.34). To this end,
consider the Naiver-Stokes equation for a fluid velocity ū
     
∂ ū T 2
ρ + ū.∇ū + ρ∇ϕ = −∇p + ∇· η ∇ū + (∇ū) − (∇· ū) I + ζ (∇· ū) I (22.40)
∂t 3

where the coefficients ζ and η are the bulk and shear viscosity. Similarly, we can re-frame the
smoothed stress tensor τ by expanding the small-scale modes around their expectation value with
a perturbation that is modulated by long-wavelength modes. Hence we write
 3ρ c2  
2 ρb 2 ij

 ij  ij 2 k b sv j i i j k
τ Λ = δ pb + cs δρl − cbv ∂k vl − ∂ vl + ∂ vl − δ ∂k vl + ∆τ + · · · (22.41)
aH 4aH 3

122
where the parameters cbv and csv are the coefficients related to the bulk and shear viscosity
respectively of the effective fluid. ∆τ is the stochastic term (due to small scale fluctuations)
uncorrelated with the smoothed field and · · · represents terms higher order in derivative and
power counting in δl . The various coefficients c2s , c2sv , c2bv encapsulate the backreaction of
‘ultraviolet (UV) physics’ of the Universe, i.e. that operating on scales beyond our
cutoff Λ, on large scale effective fluid. This seemingly simple addition from the EFTofLSS
over SPT is the most significant difference between the two PT formalisms. The free parameters
within our new theory aka EFTofLSS are obtained by fitting to the observed data or simulations.
This way, EFTofLSS captures the backreaction of small scales on large scales without making
any assumptions about small-scale physics. Some of the most complex ‘baryonic’ effects can also
be treated in this way, while remaining completely agnostic about their intricate physics (see
1412.5049 and 2010.02929).

22.3 EFTofLSS solution and renormalization


Having derived the relevant ‘smoothed’ EoMs for the effective fluid, we solve these using the
same perturbative approach implemented in the SPT formalism. Note that the only difference
between the SPT and EFT equations is the additional induced stress tensor term in the EFT
description. Hence, we write the final solution for the nonlinear matter overdensity field δl as
(1) (2) (3) (c)
δl = δl + δl + δl + δl + ... + ∆τ (22.42)
(c)
where at the one-loop order the only relevant new term is δl which is explicitly given as
(c)
δl = c2 ∇2 δl (22.43)

with c2 (Λ) = c2s (Λ) + f c2sv (Λ) + c2bv (Λ) as given in Eq. (22.41) and where we have made the


Λ dependence of these free parameters explicit. Here, f is the logarithmic growth rate given as
f = d ln D/d ln a.
Finally, the Fourier space matter power spectrum up to one-loop2 is given as
h i
(11) (13) (22)
PΛEFT (k, z) = D2 PΛ (k) + D4 PΛ (k) + PΛ (k) + D2 PΛctr (k, z) (22.44)

where PΛctr is referred as the ‘counterterm’ contribution and D ≡ D(z) is the normalized growth
function. The counterterm contribution is expressed as
(11)
PΛctr (k, z) = −c2Λ (z)k 2 PΛ (k). (22.45)

At this order, there are two key differences between the SPT and EFTofLSS predictions: (1) the
loop integrals extend only to Λ, since we have smoothed the fields, and (2) the appearance of the
final term involving the effective sound-speed c2Λ .
2
We have neglected the contribution from the stochastic term ∆τ which will remain sub-dominant for the
cosmologies of our interest.

123
22.3.1 Renormalization
The EFT power spectrum as given above appears to be Λ-dependent due to the inherent depen-
dence of the long-wavelength field δl on the smoothing scale Λ. However, we will show that the
additional Λ-dependent term PΛctr is precisely what we need to make the entire one-loop spectrum
(11)
approximately Λ-independent. To this end, consider the linear power spectrum PΛ (k):

(1) (1) ′
D E
(11)
PΛ (k ≪ Λ) = δk,Λ δp,Λ (22.46)
D E′
(1)
= WΛ (k)δk WΛ (p)δp(1) (22.47)
= WΛ2 (k)P (11) (k) (22.48)
≈ P (11) (k) (22.49)

where we used δk,Λ ≡ WΛ (k)δk and in the last line we approximated the smoothing kernel
WΛ (k) ≈ 1 for k ≪ Λ. Hence, the linear power spectrum is Λ-independent for all scales of
(13)
interest that are much larger than the smoothing scale. Now, let us consider the PΛ (k) term:

d3 q
Z
(13) (11) ⃗ q , −⃗q)P (11) (q)
PΛ (k) = 6PΛ (k) 3 F3 (k, ⃗ Λ (22.50)
(2π)

whose UV limit is given as


Z ∞
(13) 61 2 (11) (11)
lim P (k) =− k PΛ (k) dqPΛ (q) (22.51)
k→0 Λ 630π 2 0
Z ∞
61 2 2 (11)
=− k WΛ (k)P (k) dqWΛ2 (q)P (11) (q) (22.52)
630π 2 0
Z Λ
61 2 (11)
≈− k P (k) dqP (11) (q) (22.53)
630π 2 0
(11)
= k 2 PΛ (k)f (Λ). (22.54)

(13)
Hence, we find that the PΛ term has an explicit Λ-dependence due to the smoothing proce-
dure. This Λ-dependence is similar to the one we derived for a corresponding P (13) term in
SPT and hence leads to similar problems since the complete one loop power spectrum must
be inherently Λ-independent. However, unlike SPT, EFTofLSS contains an additional term at
one loop order, the counterterm contribution. This contribution has the exact spectral shape
(11) (13)
PΛctr (k, z) = −c2Λ (z)k 2 PΛ (k) to cancel the apparent Λ-dependence of PΛ . To see this, con-
(13)
sider the sum of PΛ and PΛctr for all scales k ≪ Λ and we will use the approximation that
(11)
PΛ (k) ≈ P (11) (k) for all Λ such that k ≪ Λ. Hence,
(13) (13) (11) (11)
D2 PΛ2 (k) + PΛctr
2
(k, z) = D2 PΛ1 (k) + D2 k 2 PΛ1 (k) [f (Λ2 ) − f (Λ1 )] − c2Λ2 (z)k 2 PΛ2 (k)
(22.55)
(13)
= D2 PΛ1 (k) − k 2 P (11) (k) c2Λ2 (z) − D2 f (Λ2 ) + D2 f (Λ1 )
 
(22.56)
(13) (11)
= D2 PΛ1 (k) − c2Λ1 (z)k 2 PΛ1 (k) (22.57)

124
Therefore, we find that the counterterm in EFTofLSS ‘renormalizes’ the P (13) one-loop term such
that the apparent Λ-dependence vanishes. Hence, we observe that the microphysical c2 (z) changes
as we vary Λ and the variation of c2 occurs in precisely the manner to cancel any change in P (13)
term. For this reason, c2 is also known as ‘ultraviolet counterterm’. In other words, although the
individual loop integrals and counterterms are Λ-dependent, their sum isn’t: therefore as desired
the overall theory is independent of any cutoff scale Λ.
So far we have only considered the P (13) loop term. However, the above argument can be
applied to any loop term. Specifically, we note that the apparent Λ-dependence of P (22) term
scales as k 4 . This k 4 dependence is exactly canceled or absorbed by the lowest order stochastic
term ∆τ in our EFT expansion. However, since k ≪ Λ, the k 4 dependence is sub-dominant
compared to k 2 P (11) (k) for our scales of interest. Therefore, the Λ-dependence of P (22) term is
usually neglected along with any contribution from ⟨∆τk ∆τp ⟩′ .
Based on the above discussion, we write the full EFT power spectra at one loop order as first
derived in 1206.2926:
(13) (11)
P EFT (k, z) = D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z)PΛ (k) − D2 (z)c2Λ (z)k 2 PΛ (k). (22.58)
where we remind the reader that the LHS is Λ-independent even though the individual terms
(13)
PΛ (k) and PΛctr can vary with Λ. Note that c2 > 0 implies a positive residual pressure and
hence the power reduces on quasi-linear scales. However, note that c2 is a coefficient of an EFT
operator consistent with symmetries and power counting, and we did not make assumptions of
the positivity of this coefficient

22.3.2 Physical implication of counterterm c2


Until now, our exploration of the counterterm within the Effective Field Theory of Large-Scale
Structure (EFTofLSS) has predominantly centered on its effectiveness in removing Λ dependence
and guaranteeing the renormalizability of loop terms. The parameter c2 associated with the
counterterm serves as a free parameter representing the effective sound speed squared of the
effective fluid, acquired through a smoothing process. Although our initial set of fluid equations
involved a pressureless fluid, the presence of residual pressure on large scales, stemming from
gravitational clustering at smaller scales, was encapsulated in the residual stress tensor term. This
residual pressure, parameterized by c2 , becomes measurable through N-Body simulation data.
Consequently, the inferred value of the counterterm parameter c2 obtained from fitting N-Body
data holds pertinent insights into the impact of small-scale feedback on large scales. Analogously,
the viscosity of a fluid cannot be deduced solely from an effective low-energy fluid description
but is experimentally measured before being incorporated into the fluid equations, such as the
Navier-Stokes equation, for predictive purposes. The success of EFT over other Perturbation
Theory (PT) formalisms largely stems from the incorporation of such free parameters within the
theory. These parameters not only serve to renormalize loop integrals but also furnish additional
information about small-scale clustering and its influence on large scales. Hence, one can factorize
c2Λ as
c2Λ (z) = c̃2Λ (z) + c2phy (z) (22.59)
where c̃2Λ is the Λ-dependent term that absorbs the cutoff dependence of P (13) term, and c2phy is a
‘physical’ cutoff independent term that contains nontrivial information regarding the UV effects

125
of small scales on large scale modes. Hence, we can rewrite the full one-loop EFT matter power
spectrum as
 
(13)
P EFT (k, z) =D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z)PΛ (k) − 2D2 (z)c̃2Λ (z)k 2 P (11)
− 2D2 (z)c2phy (z)k 2 P (11) . (22.60)
(13)
Since the c̃2Λ (z) term must cancel the Λ-dependence of PΛ at all redshifts, it should vary with
redshift exactly like D2 (z). Hence,
 
(13)
P EFT (k, z) =D2 (z)P (11) (k) + D4 (z)P (22) (k) + D4 (z) PΛ (k) − 2c̃2Λ (0)k 2 P (11)
− 2D2 (z)c2phy (z)k 2 P (11) (22.61)

where c̃2Λ (0) is the value of the counterterm at z = 0. Note that c2phy (z) can have an arbitrary red-
shift dependence, contingent upon the evolution of the residual pressure induced by gravitational
clustering. More importantly, we note that the only IR-surviving quantity inherited from the UV
effects is the renormalized parameter c2phy (z). When analyzing cosmologies with different initial
conditions and cosmological parameters, a comparison between c2phy (z) can act as an additional
distinguishing feature. For instance, refer to Fig. 3 in 2306.09456 where the authors show the
variation of c2phy (z) as a function of a cosmological parameter that alters the small-scale power.
From the structure of the counterterm contribution, we expect the c2phy (z) ∼ 1/kNL 2 (z). This

gives a value of c2phy (z = 0) ≈ O(10) for kNL (z = 0) ≈ 0.3 using the renormalization scheme
mentioned in 2306.09456. Note that this value differs from the usual O(1) value typically quoted
for the bare c2 (z).

22.4 EFTofLSS matter power spectrum result


In Fig. 23 we compare the matter power spectra obtained from SPT and EFT formalism against
N-Body data. We observe that the EFT curves perform far better than SPT in matching the N-
Body data points. Here, we note that at the one-loop order in EFT, we have only one counterterm,
c2 , whereas at the two-loop order we require 3 counterterms. Despite the increase in the number
of free parameters (counterterms), the EFT curve at the two-loop order performs better than one
loop, and matches with the simulation data point up to k ≈ 0.2 h/Mpc. This is also shown in
Fig. 24 where we show yet another comparison of the EFT curves with their SPT counterparts
along with relevant cosmic and theory errors. The shaded blue region enveloping the two loop
EFT curve in Fig. 24 indicates our estimate of the theoretical error due to the higher order
loop terms. The dashed curve represents an estimate of the cosmic variance error ∝ k −3 which
diminishes at small scales.

22.5 Application to iso-curvature perturbations


Through the above two plots, we have highlighted that the EFT formalism remedies the issues
in SPT namely integration over non-perturbative small-scales in loop terms and apparent Λ-
dependence. These are ameliorated by introducing a free counterterm parameter. The SPT
also suffers from not being applicable to any arbitrary or general set of cosmological initial
conditions. As discussed previously, if we consider a power law form for the linear power spectrum

126
Figure 23. Plot showing comparison of matter power spectrum as obtained from SPT and EFTofLSS
against data from N-Body simulations. Each curve has been divided by the linear power spectrum. The
fully nonlinear N-Body power spectrum is plotted in black boxes. The red and blue dashed curves show
one and two-loop results from SPT whereas similar order curves from EFTofLSS are shown in solid colors.
The above figure is taken from O. Philcox’s presentation, 2020.

Figure 24. Similar to Fig. 23. Here, the curves are divided by the nonlinear (NL) matter power spectrum
as obtained from NBody simulations. Taken from 1507.05326.

P (11) (k) ∝ k n , then some of the loop terms diverge for n > −1. Such conditions can arise naturally
if we consider mixed primordial initial conditions consisting of adiabatic fluctuations with a small
fraction of CDM blue-tilted isocurvature power as shown in Fig. 25. This is an example from our
own research work published in 2306.09456. We briefly mentioned isocurvature in Sec. 6.6.3.
For such cosmologies, one is often forced to choose a particularly small value of Λ to avoid
large spurious contributions from small scales. In the EFTofLSS, however, there are no such
divergences. This occurs since the domain of integration is bounded; since the internal momenta
q are limited by Λ and the integrand is analytic, the loop integrals are guaranteed to be finite.
If there are divergences lurking in the high-q regime, they are themselves absorbed within the
counterterms. In Fig. 26 we plot EFT curves for the pure adiabatic and mixed cosmologies. For
both of the cases we choose an arbitrarily large value of the cutoff scale Λ ≈ 100 h/Mpc. For

127
Linear matter power spectrum at z = 2.0

103

P(k) [h 1Mpc]3
102

101
AD
MX (niso = 3.75, = 0.25)
MX (niso = 3.0, = 0.25)
100
10 3 10 2 10 1 100 101
k [hMpc 1]

Figure 25. Plot showing a comparison between the linear matter power spectra at a redshift of z = 2 for
the pure adiabatic and mixed initial conditions. For the mixed case, we show two examples in which the
power deviates from the adiabatic scenario on small scales with the spectral indices n = −0.25 (dashed)
and n = −1 (dotted) respectively. Taken from 2306.09456.

the pure adiabatic case, the counterterm tends to an asymptotic value as Λ → ∞ and is an O(1)
(13)
number as shown in the figure. However, due to the diverging structure of the PΛ term for the
mixed scenario with limk→∞ P (11) (k) ∝ k −0.25 , the counterterm runs with the variation in Λ. For
our choice of Λ = 100 h/Mpc, we find that c2Λ (z = 1) ≈ −6.23 (Mpc/h)2 . While a negative value
of the c2 compared to adiabatic may seem alarming, note that the only IR-surviving quantity
inherited from the UV effects is the physically relevant parameter c2phy . For the pure adiabatic and
mixed case, the physical parameter c2phy are nearly identical in magnitude. A small difference
in their magnitude is due to a larger power on smaller scales within the mixed scenario. On
the other hand, given that the bare c2Λ can become negative for Λ ≳ O(3), there is an unclear
interpretation of this parameter, which leaves room for more intricate UV dynamics being at play
here such as for the mixed (isocurvature) scenario. A nonlinear UV model exploration of this
issue may be useful to further elucidate the difference.

22.6 From dark matter to galaxies: The bias expansion


Up to this point, we have considered only the statistics of dark matter fluctuations in the Universe.
In practice, most observational probes measure either the integrated mass distribution (weak
lensing) or the galaxy distribution (photometric or spectroscopic surveys). For this reason, we
will now briefly discuss biasing, and the associated calculation of galaxy power spectra and discuss
EFTofLSS approach to biasing.
In most nonlinear biasing models (see 1611.09787 for a review of galaxy bias), we use the one-
loop galaxy power spectrum obtained through a bias expansion for the galaxy density contrast
by including all operators allowed by Galilean symmetry up to cubic order in magnitude of the

128
Non-linear matter power spectrum at z = 1.0
250
225
200

k P(k) [h 1Mpc]2
175
150
125
FastPM AD
100 FastPM MX
1-loop-eft AD (c2 = 0.4)
75 1-loop-eft MX (c2 = 6.23)
10 1
k [hMpc 1]

Figure 26. In this figure we highlight the fitting of one-loop EFT power spectrum to the N-body data.
Note that we plot scaled power spectrum, k × P (k), on the y-axis for clarity. For the mixed case, we use
fiducial value as n = −0.25. The value of the bare c2Λ (at cutoff Λ = 100 h/Mpc) one-loop EFT parameter
is given in the label for the EFT curves. Note that the value of c2Λ for the mixed case is negative. The
one-loop EFT curves is accurate up to ≈ 0.5 h/Mpc at redshift z = 1. We also plot the approximate
theoretical error band expected from two-loop contributions.

(1)
coarse-grained linear overdensity δl ,
X
δg (x) = (bO + ϵO (x)) O(x) + bϵ ϵ(x) (22.62)
O
= b1 δ(x) + bϵ ϵ(x)
b2
+ δ 2 (x) + bG2 G2 (x) + ϵδ (x)δ(x)
2
b3
+ bδG2 δ(x)G2 (x) + δ 3 (x) + bG3 G3 (x) + bΓ3 Γ3 (x) + ϵδ2 (x)δ 2 (x) + ϵG2 (x)G2 (x)
6
+ b∇2 δ ∇ δ(x) + b∇2 ϵ ∇2 ϵ(x)
2
(22.63)

where all the operators O in the above expression are considered to be coarse-grained and the
subscripts l or Λ have been dropped for brevity. In Fourier space the Laplacian takes the form
∇2 → (k/k∗ )2 where k∗ is some characteristic scale of clustering for biased tracers and we re-
strict to scales k/k∗ ≪ 1. Hence every insertion of a Laplacian is equivalent to a second order
correction to an operator O and the derivative operators in the last-line of Eq. (22.63) are
counted approximately as cubic order in bias expansion. Therefore, Eq. (22.63) is a double ex-
pansion in density fluctuations and their derivatives. The remaining operator set {δ 2 , G2 , ϵδ δ}
and {G2 δ, δ 3 , G3 , Γ3 , ϵδ2 δ 2 , ϵG2 G2 , ∇2 δ(x), ∇2 ϵ(x)} are second and third order respectively and we
refer the readers 1611.09787 for definition and details regarding these operators. Notably, the
2
operators non-local in δ such as G2 = (∇i ∇j Φ)2 − ∇2 Φ arise naturally due to gravitational
evolution and renormalization requirements respectively. This was first shown in 1402.5916.

129
22.7 Application of the EFTofLSS to simulations and real data
Finally, within the EFTofLSS, we model the perturbative galaxy-galaxy power spectrum Pgg at
one loop level as sum of the deterministic, stochastic and counterterm parts:
det sto ctr
Pgg = Pgg + Pgg + Pgg . (22.64)

During cosmological parameter inference from simulation or observational data, we fit the afore-
mentioned theoretical power spectrum with the relevant number of bias and counterterm param-
eters. The theoretical spectrum can be obtained from a few existing codes such as CLASS-PT
(2004.10607) for EPT, and Velocileptor (2012.04636) for a more complicated LPT implemen-
tation. CLASS-PT is an adaptation of the CLASS code designed to compute the non-linear
power spectra of dark matter and biased tracers using one-loop cosmological perturbation theory
in Eulerian coordinates. It handles both Gaussian and non-Gaussian initial conditions. It’s an
easy-to-use and convenient code when performing LSS analysis. Now consider the simplest case
where we fit the one loop spectrum to a data in real space. In this case there exists only one
free parameter c2 which is often absorbed into the Laplacian bias coefficient. In redshift space,
discussed in Sec. 20.6, we often consider only the first 3 multipoles ℓ = 0, 2, 4 and attach in-
dependent counterterms to each multipole spectra. In Fig. 27 we show the results of a blinded
challenge that was performed and reported in 2003.08277 using EFTofLSS in redshift space for
the first two multipoles.
In Fig. 28 we show the results from a recent cosmological parameter inference performed using
four independent Baryonic Oscillation Spectroscopic Survey (BOSS) datasets across two redshift
bins (zeff = 0.38, 0.61) in flat ΛCDM, marginalizing over 7 nuisance parameters for each dataset
P
(28 in total) and varying 5 cosmological parameters (ωb , ωcdm , H0 , As , mν ). The theory model
includes a complete perturbation theory description that properly takes into account the non-
linear effects of dark matter clustering, short-scale physics, galaxy bias, redshift-space distortions,
and large-scale bulk flows. The constraints on H0 and Ωm as obtained from the EFT analysis
of BOSS data are already competitive with the CMB measurements of Planck for the same
cosmological model with varied neutrino masses. This highlights the success of EFTofLSS and
setting the stage for precision cosmology from future surveys.

23 N-body simulations
The fluid approximation breaks down on small scales. For example, the velocity field is no longer
single valued at a point in space, once shell crossing happens (i.e. clouds of mass pass through
eachother).
Next to perturbation theory of fluids, the second main way to evaluate the dynamics of the
universe are N-body simulations of (dark) matter. N-body simulations are not intrinsically
perturbative, and can thus in principle extend our reach to non-perturbative scales to extract
cosmological parameters with more sensitivity. On the other hand, N-body simulations are
computationally costly and it is difficult to simulate the survey volume of a galaxy survey with the
required resolution. In addition, dark matter N-body simulations are only valid on scales where
baryonic feedback is unimportant. To go to smaller scales, one needs even more computationally

130
Figure 27. (Taken from 2003.08277) The upper panel shows comparison of the data for the monopole
and the quadrupole with the best-fit EFT model. The residuals for the monopole and the quadrupole
for the best-fit model (right panel). Note that the quadrupole data points are slightly shifted for better
visibility. In the lower panel we show different contributions to the monopole (left panel) and quadrupole
(right panel) power spectra. The data errors and the two-loop estimate are also displayed. We plot the
absolute values, some terms are negative. Here, k 4 -ctr is the contribution due to the Finger-of-God effect.

expensive (magneto-)hydrodynamic simulations. Improving simulations is a very active


field of research. Using modern autodifferntiation techniques, imported from machine learning,
there are even differentiable simulations, which we will briefly get back to in Sec. 28.3.
A good review on N-body simulations, though not very recent, is https://ned.ipac.caltech.
edu/level5/March03/Bertschinger/paper.pdf. A nice and more recent review is https:
//arxiv.org/abs/2112.05165. This section is based in part on Dodelson-Schmidt. There are
many large sets of simulations that you can download, such as Quijote (1909.05273), CAMELS
(2010.00619), IllustrisTNG (1812.05609) and Abacus (2110.11398). These require millions of
CPU hours, and sometimes are used in hundreds of publications. For most research projects you
will not need to run your own simulations.

23.1 Equations for particles


N-body simulations are typically performed in a cubic volume with periodic boundary conditions,
so that particles exiting the volume on one side re-enter on the other side. We discretize the

131
Figure 28. (Taken from 1909.05277) Left panel: The posterior distribution for the late-Universe parame-
ters H0 , Ωm and σ8 obtained with priors on ωb from Planck (gray contours) and BBN (blue contours). For
comparison we also show the Planck 2018 posterior (red contours) for the same model (flat ΛCDM with
massive neutrinos). Right panel: The monopole (black dots) and quadrupole (blue dots) power spectra
moments of the BOSS data for high-z (upper panel) and low-z (lower panel) north galactic cap (NGC)
samples, along with the best-fit theoretical model curves. The corresponding best-fit theoretical spectra
are plotted in solid black and blue.

dynamics by tracking N = Nside 3 particles from their initial (almost uniform) position to their
late time positions as a function of time. The equations of motion are simply Newtonian gravity
in an expanding space time:

dxi pi
= (23.1)
dt ma
dpi m ∂ϕ
= −Hpi − (23.2)
dt a ∂xi
Introducing the superconformal momentum pc = ap, which is conserved in the absence of
perturbations, this can be re-written as

dxi pi
= c2 (23.3)
dt ma
dpic ∂ϕ
= −m i (23.4)
dt ∂x
Solving these equations numerically, for a large number of particles such as 10003 , leads to a
beautiful and physically accurate matter distribution.

132
A computationally efficient and widely used method to solve these equations is the leapfrog
scheme, where density and velocity are evaluated with an offset of half a time step:
x(i) (t) and pc(i) (t − ∆t/2) (23.5)
After generating the initial conditions (usually using LPT as discussed in Sec. 21.3), the
algorithm proceeds as follows:
1. Compute the gravitational potential generated by the collection of particles, and take its
gradient to obtain ∇ϕ(x, t).

2. Change each particle’s momentum (”kick”) by


p(i) (i) (i)
c (t + ∆t/2) = pc (t − ∆t/2) − m∇ϕ (x, t)∆t. (23.6)

3. Move each particle position (”drift”) by


(i)
(i) pc (t + ∆t/2)
(i)
x (t + ∆t) = x (t) + ∆t. (23.7)
ma2 (t + ∆t/2)
4. Repeat.
In general, the more time steps, the more accurate the results. There are ways to optimize the
time steps.
From the particles, one usually proceeds to evaluate the matter density on a regular grid using
a mass assignment scheme (MAS). A particle does not only contribute to the mass on the
nearest grid point, but can contribute to the surrounding nodes (usually 8 in 3d). The most
commonly used way to do this is the Cloud-In-Cell (CIC) scheme.

23.2 Evaluating the potential


The computational bottleneck is to evaluate the gravitational potential efficiently. In principle,
calculating the gravitational potential for every particle requires of order N 2 operations where
N is the number of particles. This is not computationally tractable. Instead one uses a particle
mesh (PM) algorithm, where densities are interpolated to a regular grid (using the MAS), and
one can then solve the Poisson equation in Fourier space using a 3d FFT.
A problem with the PM method is that it does not scale well at very high resolution, i.e. one
would need a very high resolution grid to take into account local pairwise interactions between
nearby particles. On these small local scales, one thus uses a different method called the tree
algorithm. The tree algorithm generates a hierarchical tree of meta-particles. Seen from far
enough away, a collection of nearby particles can be replaced by a single meta particle which
combines their mass. For better accuracy one can also carry along multipoles of the mass dis-
tribution in the meta particles. The tree algorithm has the complexity N log N . Modern code
usually combine the PM method on large scales with the tree method on small scales. Perhaps
the most widely used code is GADGET (version 2 to 4).
One additional subtle point is that, because we sample the density with a finite number of
points, if points come too close there would be in principle infinite attraction between them, as
an artifact of the point sampling. To avoid this one smooths the density field using a force
softening kernel.

133
23.3 Baryonic simulations
To take into account baryonic forces, one uses magneto-hydrodynamic (MHD) simulations. These
can be implemented using Smoothed-particle hydrodynamics (SPH) simulations (i.e. still
using particles, but with additional forces), or with a (moving) mesh. Unfortunately it is
not possible to simulate these forces from first principles (e.g. how an AGN blows out gas), so
one needs to approximate them with a so-called subgrid model. There are different subgrid
models that lead to different answers. For example, in the CAMELS simulations, the same initial
conditions but different subgrid models can change the galaxy density by 30% or so. So while
dark matter simulations, given enough resolution, are in principle arbitrarily accurate, the same
is not true once we include baryonic physics. This is a key difficulty in simulation-based inference
on small scales.

24 Halos and Galaxies


Perturbation theory is only valid if δ ≪ 1, which is only true on the largest scales over the entire
history of the universe. Perturbation theory can never describe the formation of galaxies or galaxy
clusters, which form when matter collapses to a small region in space with δ ≫ 1. Fortunately
there are nevertheless analytic methods that help us to understand this domain. The methods we
briefly discuss now are used in practice in cosmology to analyze data and forecast experiments.
Halo formation is a rich subject that includes concepts like the subhalos, merger trees and
the halo occupation distribution, and we can only give a brief outline. Halo/galaxy formation
is sensitive to cosmological parameters, astrophysical parameters and properties of dark matter.
In addition, by patching together halo statistics and perturbation theory of the matter field in
the so-called halo model, one can arrive at a theoretical description that covers all scales of
cosmology. The halo model allows us to model observables and forecast measurements on very
non-linear scales. Its predictions agree with simulations and data rather well.

24.1 Halos and Halo mass profile


Halos are structures of (dark) matter that are gravitationally bound, which formed by gravita-
tional collapse of over-densities. Such over-densities will ultimately virialize. Galaxies are hosted
inside much larger dark matter halos. There are different ways to precisely define halos, given
a dark matter distribution. The most widely used method is the friends-of-friends algorithm
and its refinement called ROCKSTAR algorithm. These algorithms are also called halo find-
ers. The result is a halo catalogue with various masses, and various other properties, such as
center-of-mass position and velocity. Every particle is assumed to be inside only one halo.
It turns out that to good approximation the spherically averaged mass density profile of a
dark matter halo is described by the Navarro-Frenk-White (NFW) profile, given by
ρs
ρ(r|m, z) = (24.1)
(r/rs )(1 + r/rs )2
It is a function of halo mass m and red shift (or time). The scale radius rs and the density ρs can
be expressed in terms of the halos mass. Note that this profile needs to be cut off at some radius for
the integral to be finite. The NFW profile is well-known enought that there is a plot on Wikipedia
(https://en.wikipedia.org/wiki/Navarro%E2%80%93Frenk%E2%80%93White_profile).

134
Figure 29. Halo formation. Halos form where the smoothed density field crosses the critical density. For
illustration, we plot a single large-scale mode (dashed) and a few small scale modes. Figure adapted from
Baumann’s Cosmology book.

24.2 Halos mass function


The main statistical property of halos is their abundance. It is described by the halo mass
function n(m, z) which gives the differential number density of halos with respect to mass at a
given mass m and red shift z. It is possible to calculate the halo mass function approximately
using a method called the Press-Schechter formalism. The main ideas are the following:

• Matter perturbations on large scales are Gaussian and grow with the growth factor D(z).

• We can smooth the density field on various scales R.

• In spots where the smoothed density field crosses the critical density δc , a halo will form.
Because perturbations grow, new halos will form in time. It turns out that the critical
density is independent of the halo mass or smoothing scale and is about δc = 1.6, which
can be derived from Newtonian gravity. This is illustrated in Fig. 29.

• Since this picture depends on the smoothing scale R, in principle smaller halos can be
contained in larger ones. This is handled more carefully in the extended Press-Schechter
formalism.

We don’t have time to derive the mathematical results, but I want to show you the widely
used result. The halo mass function can be expressed as

ρm d ln σ(m, z)
n(m, z) = 2
f (σ, z) , (24.2)
m d ln m
where ρm is the mean matter density. The quantity σ 2 (m, z) is the variance of mass within a
sphere of radius R(m) defined as
Z ∞
1
σ 2 (m, z) = 2 dk k 2 P lin (k, z)W 2 (kR) (24.3)
2π 0

135
Figure 30. Sheth-Tormen mass function at different redshifts (from 2108.04279).

Here, R = R(m) and the window function in Fourier space is

3 [sin(kR) − kR cos(kR)]
W (kR) = (24.4)
(kR)3

where m and R are related by the mean density as

m = 4πρm R3 /3. (24.5)

R can be interpreted as the radius we need to collect primordial mass from to form the halo. The
term f (σ, z) is called the halo multiplicity and one often assumes the Sheth-Tormen halo
multiplicity function:
r   2 p 
aδc2
 
2a σ δc
f (σ, z) = A 1+ exp − (24.6)
π aδc2 σ 2σ 2

with A = 0.3222, a = 0.75, p = 0.3, and δc = 1.686. The resulting mass function is plotted in
Fig. 30.
The halo mass function, as a function of cosmological parameters, can also be “learned” from
simulations. This is done for example in 1804.05866, 2003.12116. By measuring the HMF from the
data and comparing it to the theoretical expectation from simulations one can then in principle
measure cosmological parameters. This is called cluster abundance or cluster counting. While
small halos may be very sensitive to unknown baryonic physics, the largest halos are dominated
by gravity and might provide reliable measurements.

24.3 Halo bias


Press-Schechter formalism can also be used to calculate the halo bias as a function of mass. This
can be done using the peak-background split argument. The basic idea is to split perturbations
into long modes (background) δb and short modes (peaks) δh as

δ = δh + δb (24.7)

136
Figure 31. Example of contributions to the 1-halo and 2-halo power spectra.

The short modes will eventually form halos. The long modes can be interpreted as locally
shifting the required critical density for the short modes to form halos. This is illustrated in Fig.
29 (dotted line is the long mode). By expanding the mass function to linear order in δb one can
derive the linear halo bias. This leads to:
1 d log f
bh (m, z) = 1 + (24.8)
δc d log σ
Note that the halo bias satisfies a consistency relation:
Z ∞  
m
d ln m mn(m, z) bh (m, z) = 1, (24.9)
−∞ ρm (z)

i.e. the total matter field comprised of all halos is unbiased. Note that bias can be smaller than
one (and even negative, for voids, which preferentially form in underdense regions). The bias of
typical galaxies in a survey is larger than one.

24.4 Halo model


The halo model is the standard tool to forecast (and sometimes analyze) observables in the non-
linear regime. Despite being a phenomenological description, it agrees rather well with numerical
results in many cases. In the halo model, one makes the fundamental assumption that all the
dark and baryonic matter is bound up in halos with varying mass and density profiles. The
correlation function for density fluctuations then receives two contributions: a ”two halo term”
which arises from the clustering properties of distinct halos, and a ”one halo term” which arises
from the correlation in density between two points in the same halo. This is illustrated in Fig. 31.
A review of the halo model can be found in astro-ph/0206508. This section is based on appendix
A of 1810.13423.

137
24.4.1 Dark matter
In Fourier space, the dark matter power spectrum is given by
1h 2h
Pmm (k, z) = Pmm (k, z) + Pmm (k, z) (24.10)
Z ∞  2
1h m
Pmm (k, z) = d ln m mn(m, z) |u(k|m, z)|2 (24.11)
−∞ ρm
Z ∞   2
2h lin m
Pmm (k, z) = P (k, z) d ln m mn(m, z) bh (m, z)u(k|m, z) (24.12)
−∞ ρm

In these expressions, m is the halo mass, ρm is the present day cosmological matter density,
n(m, z) is the halo mass function (i.e. the differential number density of halos with respect
to mass), u(k|m, z) is the normalized fourier transform of the halo profile, P lin (k) is the linear
matter power spectrum, and bh (m, z) is the linear halo bias. The one halo term is the shot noise
convolved with the profile.
We need u(k|m, z), the Fourier transform of the dark matter halo density profile, which for
spherically symmetric profiles is defined as
Z rvir
sin(kr) ρ(r|m, z)
u(k|m, z) = dr 4πr2 . (24.13)
0 kr m

We assume that halos are truncated at the virial radius, and have mass
Z rvir
m= dr 4πr2 ρ(r|m, z) (24.14)
0

Note that with this definition of mass, u(k|m, z) → 1 as k → 0. Returning to the two-halo
term and using the consistency relation in Eq. (24.9), this property of u(k|m, z) ensures that
2h (k, z) ≃ P lin (k, z) in the limit where k → 0, as it should.
Pmm

24.4.2 Baryons, Galaxies and other observables


It is straight forward to generalize the halo model to other fields than dark matter. For
example, the baryonic gas distribution in the halo model is modelled by assuming gas is bound
within dark matter halos, having density profiles ρgas (m, z) which we assume to be a function
of the host halo mass and redshift only. The gas power spectrum is given by Eq. 24.10 with
u(k|m, z) calculated through Eq. 24.13 by replacing ρ(m, z) with ρgas (m, z). The halo model can
thus be used to calculate small scale observables such as kSZ, tSZ and gravitational lensing, as
well as their cross-correlation.
There are also extensions of the halo model that can be used to calculate the distribution of
galaxies. The complication for galaxies is that we often observe them in groups of smaller galaxies
surrounding a larger galaxy. The standard treatment of galaxies in the halo model assumes that
a dark matter halo is filled up with galaxies according to the halo occupation distribution
(HOD). This HOD often assigns a central galaxy to the center of the halo, and a distribution
of satellite galaxies around them, where more massive halos have more satellite galaxies. The
HOD can be calibrated with observations. The HOD is also used to populate dark matter
simulations with galaxies.

138
Halo model power spectra can be calculated with various codes, such as https://github.
com/borisbolliet/class_sz. The halo model can also be used to calculate higher N-point
functions such as the bispectrum. While the halo model is powerful, remember however that
the assumptions of a set of spherical halos that includes all matter is not a very realistic one.

25 Analyzing a Galaxy Survey Power Spectrum


To measure cosmological parameters from the two point function for galaxy surveys, both the
correlation function in position space and the power spectrum in harmonic space are frequently
used as a basis for the likelihood. Under sufficient conditions, both analyses should give the same
result. The harmonic space analysis is more directly related to the perturbation theory, and this
is the approach we are discussing here.
There is also a difference in the coordinate basis between photometric surveys and spectro-
scopic surveys. Spectroscopic galaxy samples are usually binned into a small number of 2d maps,
as discussed in Sec. 20.8. The analysis then works similar to the one of the CMB (e.g. one can
use the PyMaster code to take the power spectrum in the bin). On the other hand, spectroscopic
surveys are done in 3d red shift space. We are focussing on the spectroscopic case here.
The goal of power spectrum analysis is of course to fit a theoretical parameterization of the
power spectrum, such as the EFT model discussed in Sec 22.6, to a measurement of the power
spectrum, using some likelihood. We already discussed the likelihood step in Sec. 10.3.1 for
N-body data, and for real galaxy surveys it works conceptually the same.

25.1 Power spectrum estimator


As we have seen in our N-body analysis, if we could measure the universe uniformly without a
mask or noise or RSD, power spectrum estimation would just be taking the Fourier transform
and squaring the modes. For a real galaxy survey, the analysis is more complicated due to the
mask and noise properties of a real galaxy survey.
The most widely used power spectrum estimator is the so-called FKP estimator (Feldman-
Kaiser-Peacock). It is decribed in the original paper astro-ph/9304022, and a modern version is
discussed for example in 1505.05341, 1704.02357. The FKP estimator is computationally tractable
and near optimal for current surveys. As is the case for the CMB, one can also define an optimal
quadratic estimator (quasi maximum likelihood QML estimator), which is computationally more
involved. A summary of power spectrum estimation, and a discussion of the optimal quadratic
estimator, is given in Oliver Philcox’ PhD thesis http://arks.princeton.edu/ark:/88435/
dsp01v692t9422.
Let’s sketch the FKP estimator. To take into account the mask of a galaxy survey, the common
procedure is to generate a random catalog (also called synthetic catalog) of galaxy positions
from the mask of the survey. The random catalog accounts for the angular mask and radial
selection function of the survey. It is generated by throwing random galaxies into the survey
volume in a Poisson process, which is not modulated by the cosmological power spectrum. I.e. it
gives us galaxy positions as we would observe them if galaxies were unclustered. You can often
download such random catalogs in addition to the true data from a galaxy survey colllaboration.
We begin by defining the weighted galaxy density field in redshift space (r) ,

139
w(r)
[n(r) − αns (r)] ,
F (r) = (25.1)
I 1/2
where n and ns are the observed number density field for the galaxy catalog and synthetic catalog
of random objects, respectively. Here we have assigned the galaxies to a regular grid using some
mass assignment scheme such as CIC. The factor w(r) is a general weight factor which we discuss
shortly. The factor α normalizes the synthetic catalog to the number density of the galaxies, so
that ⟨F ⟩ = 0. The field F (r) is normalized by the factor of I, defined as I ≡ dr w2 n̄2 (r).
R

The estimator for the multipole moments (recall that we are in red shift space) of the power
spectrum is

Z 
2ℓ + 1 dΩk
Z Z
ik·(r1 −r2 )
P̂ℓ (k) = dr1 dr2 F (r1 )F (r2 )e Lℓ (k̂ · r̂h ) − Pℓnoise (k) , (25.2)
I 4π
where Ωk represents the solid angle in Fourier space, rh ≡ (r1 + r2 )/2 is the line-of-sight to the
mid-point of the pair of objects, and Lℓ is the Legendre polynomial of order ℓ. The shot noise
Pℓnoise is Z
Pℓnoise (k) = (1 + α) dr n̄(r)w2 (r)Lℓ (k̂ · r̂), (25.3)

This expression can be simplified further to obtain the Yamamoto estimator.


FKP showed that a good power spectrum estimator can be obtained with the FKP weight
1
w(r) = (25.4)
1 + n̄(r)P (k)
where n̄ is the unclustered mean number density and P (k) a fiducial power spectrum. This
weight thus down-weights regions that we observed deeply. The FKP estimator P̂ℓ (k) estimates
the power spectrum convolved with w(r). So when we compare the measured power spectrum to
the theory, we also need to convolve the theory prediction with w(r) .

25.2 Covariance matrix estimation


To extract cosmological parameters for the estimated P̂ℓ (k) we lack one last crucial ingredient,
the covariance matrix of P̂ℓ (k). Once we have that, we can make a Gaussian likelihood as we did
in Sec. 10.3.1 for N-body data, and fit our theory model to the estimates.
Calculating the convariance matrix analytically is in general not possible although approxima-
tions exists. Not only do we need to take into account the survey geometry, but unlike the CMB,
now also the observed galaxy modes are correlated due to non-linear evolution. The covariance
matrix is thus estimated from simulations, sometimes called mock catalogs. These simulations
have gravitational clustering in them, they are not the same thing as the random catalogs above.
Making realistic simulations of galaxy survey volumes is computationally intense and one uses
simplified simulations rather than full N-body dynamics. The required number of simulations is
order 1000 for a typical power spectrum pipeline.
The covariance matrix can be extracted from the mocks as
Nm
(ℓℓ′ ) 1 X   
Cij = Pℓ,n (ki ) − P̄ℓ (ki ) Pℓ′ ,n (kj ) − P̄ℓ′ (kj ) , (25.5)
Nm − 1
n=1

140
where Nm is the number of mock catalogs and P̄ℓ (k) is the mean power spectrum,
Nm
1 X
P̄ℓ (k) = Pℓ,n (k). (25.6)
Nm
n=1

This is done at a fiducial cosmology. There are some subtleties with covariance matrix estimation,
see in particular the Hartlap factor correction which affects the inverse covariance matrix at
the level of a few percent.

26 Non-Gaussianity
Let’s briefly discuss going beyond the power spectrum. Here we are concerned not primarily with
primordial non-Gaussianity (see Sec. 16 in the CMB unit), but rather with gravitational and
baryonic interaction.

26.1 Tightening measurements of cosmological parameters


In galaxy surveys, unlike the CMB, higher N-point functions are non-zero even in the absence of
primordial non-Gaussianity. This is of course because of non-linear coupling. Two fields that have
the same power spectrum can look very different, because the difference can be encoded in higher
order correlation functions. It is now quite common to measure also the galaxy bispectrum

Bg (k1 , k2 , k3 ) ∼ ⟨δg (k1 )δg (k2 )δg (k3 )⟩ (26.1)

and extract cosmological parameters from it, together with the power spectrum. Note that the
bispectrum and power spectrum estimators have a covariance, they are not independent, due to
mode coupling. At perturbative scales which we can use for cosmological analysis, including the
bispectrum improves cosmological parameters by 10 to 30% (2206.08327). Bispectrum parameter
estimation works the same as power spectrum parameter estimation, i.e. we need a bispectrum
estimator, a theoretical model of the bispectrum and a likelihood with covariance.
Of course there are even higher point correlation functions. The next is the galaxy trispec-
trum

Tg (k1 , k2 , k3 , k4 ) ∼ ⟨δg (k1 )δg (k2 )δg (k3 )δg (k4 )⟩ (26.2)

The trispectrum is not yet normally used for galaxy survey analysis, but should squeeze some more
signal-to-noise out of cosmological parameter constraints (in particular by breaking degeneracies
with biases). Higher N-point functions become progressively more difficult to model theoretically
and more computationally difficult to estimate in the data. In the perturbative regime, higher
N-point functions have progressively less signal-to-noise, since they are higher order in the small
initial perturbations. So there is no point in continuing this to ever higher order correlators. On
non-perturbative scales, it is likely that N-point functions are not the right thing to do, as we
discuss below.

141
26.2 Primordial non-Gaussianity
Higher N-point functions are also a way to measure primordial non-Gaussianity (e.g. review
1412.4671). As in the CMB, in general the most promising observable is the bispectrum. The
problem with non-Gaussianity estimation is to tell apart the signal coming from non-linear evolu-
tion and that of primordial origin. The degeneracy of the two signals severely degrades constraints
on primordial non-Gaussianity from galaxy surveys. Even next generation galaxy surveys can
only about equal (2211.14899) existing constraints from Planck for equilateral and orthogonal
non-Gaussianity. However in the far future, we hope that intensity mapping of the dark ages can
improve constraints by orders of magnitude (1610.06559).
The situation is better for local non-Gaussianity, or any signal that peaks in the squeezed
limit. Interestingly, in that case there is an observable signal in the galaxy power spectrum
called scale-dependent bias. Scale dependent bias is likely to improve the constraint on fN local
L
by a factor of 10 or so over Planck, within the next 10 years. Scale-dependent bias leads to a
characteristic kink of the primordial power spectrum on large scales. I have spent a lot of time
with this signal in my own research and hope to add a discussion here later.

27 Galaxy Weak Lensing


So far we have focussed our discussion of large-scale structure on the galaxy density. There is
a second probe of the matter distribution using galaxies, which is weak lensing of galaxies.
Reviews of galaxy weak lensing include 1612.06535, 1710.03235, 2007.00506.
The key to weak lensing is measuring the subtle distortions in the shapes of galaxies
caused by gravitational lensing due to the matter distribution between the source galaxy
and us. These distortions are typically small (a few percent or less). The lensing effect is thus
weak and not detectable in individual galaxies but becomes apparent when analyzing the shapes of
many galaxies statistically. As is the case for CMB lensing, one can use the measured distortions
to reconstruct the underlying mass map. Of course, measuring galaxy shapes is difficult and
many systematic errors have to be overcome, such as atmospheric distortion, imperfections in the
telescope optics, and the intrinsic shapes of galaxies and their intrinsic alignment with eachother.
The result of these statistical shape measurements is the galaxy lensing convergence field
κg . Once measured, its power spectrum can be used to constrain cosmological parameters.
Lensing has an advantage over galaxy clustering in that it is mostly sensitive to dark matter, and
thus much less sensitive to baryonic physics than galaxy clustering. For that reason one can use
lensing to probe somewhat higher k scales reliably than is possible with galaxy clustering.
Galaxy weak lensing is often used in cross-correlation with galaxy clustering and even CMB
lensing. Let’s now discuss these cross-correlations. We use capital Roman subscripts to denote
observables, A, B ∈ {δg , κg , κCMB }, where δg denotes the density contrast of lens galaxies, κg
the lensing convergence of source galaxies, and κCMB the CMB lensing convergence. Using the
galaxy data only we get the so-called 3x2 analysis which includes
gg
• galaxy clustering (Cℓ i j )
g κgal,j
• galaxy-galaxy lensing (Cℓ i )

142
κ κgal,j
• cosmic shear tomography (Cℓ gal,i ).

The indices i and j indicate red-shift bins. Adding the CMB lensing convergence field, we can
extend the data vector with 3 more two-point functions:

• galaxy-CMB lensing (Cℓgi κCMB )


κ κgal,j
• CMB lensing-galaxy lensing (Cℓ CMB ).

• CMB lensing power spectrum (CℓκCMB κCMB )

This can be called a 6x2 analysis. The angular power spectrum between redshift bin i of observ-
able A and redshift bin j of observable B at Fourier mode ℓ (using the Limber approximation)
is given by
WAi (χ)WBj (χ)
 
ℓ + 1/2
Z
ij
CAB (ℓ) = dχ Pm , z(χ) , (27.1)
χ2 χ
where χ is the comoving distance, Pm (k, z) is the matter power spectrum, and WAi (χ), WBj (χ)
are weight functions of the observables A, B given by

nilens (z(χ)) dz
Wδig (χ) = big , (27.2)
n̄ilens dχ
Z χimax
3H02 Ωm χ ni (z(χ′ )) dz χ′ − χ
Wκi g (χ) = dχ′ source , (27.3)
2c 2 a(χ) χimin i
n̄source dχ′ χ′
3H02 Ωm χ χ∗ − χ
WκCMB (χ) = , (27.4)
2c2 a(χ) χ∗

where χimin/max are the minimum and maximum comoving distance of the redshift bin i. Here
a(χ) is the scale factor, Ωm the matter density fraction at present, H0 the Hubble constant, big
is the galaxy bias in bin i, and χ∗ the comoving distance to the surface of last scattering. Note
that the weight function of κCMB does not depend on redshift bins. The galaxy density and
CMB convergence weight functions we have encountered before in these lectures. The galaxy
lensing weight function integrates the lensing effect over the source density in a bin i. Details
on cross-correlation analyses are given in 1607.01761 and 2108.00658. Of course, one can also
consider bispectra involving the three signals.

28 Modern Inference Methods


In this section we discuss modern inference techniques that leverage machine learning and/or
auto-differentiation. Machine learning in cosmology is reviewed for example here: 2210.01813. I
will cite some references here, with preference to papers I know (i.e. the list is not ordered by
precedence or importance).

143
28.1 Overview
In recent years, a lot of effort is made in the community to go beyond power spectra, bispectra and
Gaussian likelihood approximations. The hope of course is to extract more sensitive parameter
constraints from the data. The broad tools we use for this include simulations, optimization
(auto-differentiation), and the many forms of machine learning. I will try to give you a broad
overview with suitable references to study more. Despite massive effort, it is still somewhat
debated whether these methods really allow us to get better parameter constraints from real
experiments. This is because the methods need to be robust with respect to non-linear small-
scale physics, which is difficult to achieve. Currently most state-of-the-art constraints still come
from a more traditional analysis. See 2405.02252 for a recent quantitative comparison of some of
these methods with traditional approaches.
Modern methods can be broadly classified into two different categories, which we dicuss in
more detail below:

• Simulation-based inference (SBI), also called Likelihood-free inference (LFI) or


implicit inference uses simulations to learn how a summary statistic depends on cos-
mological parameters. The summary statistic can itself be learned. For example, we can
train a neural network on simulations of galaxy catalogs to infer cosmological parameters
from them.

• Probabilistic forward modelling, also called explicit inference or Bayesian Hierar-


chical Modelling. This means that we keep track of all latent variables of the simulator.
In practice it means that we jointly reconstruct the initial conditions of the universe together
with cosmological parameters, as we will see.

In both cases we need a forward model, which can be a simulator, a neural network, or
even analytic perturbation theory. The forward model maps cosmological parameters
and initial conditions to observable data. Of course a crucial aspect of the forward model
is that it is accurate at the scales we are interested in and that it is computationally tractable to
evaluate (as often as required by the chosen parameter inference method). These conditions are
not easy to meet.

28.2 Simulation-based Inference


For a review of SBI see 1911.01429. A code implementation of SBI is https://sbi-dev.github.
io/sbi/. SBI can also be implemented using more general probabilistic machine learning pack-
ages such as https://docs.pyro.ai/. More details on SBI can also be found in my ML in
Physics lecture slides here https://ai.physics.wisc.edu/teaching/.

28.2.1 Summary Statistics


The first step of SBI involves compressing the data d into summary statistics x. The max-
imum possible compression is a summary statistic of the same dimensionality as the parameters
Θ we want to learn (1712.00012). We want summary statistics to be as good as possible, ideally
they would be optimal which means they lose no information.
Here are some summary statistics that are useful in cosmology:

144
• Power spectrum and bispectrum of course.

• Cluster or Galaxy number density and mass distribution. Cluster counting, espe-
cially of very massive clusters which are less sensitive to baryonic physics, can be used to
constrain cosmological parameters.

• Moments of the matter distribution δm


N (x) as a function of smoothing scale.

• Wavelet scattering transform. Uses the distribution of wavelet transform coefficients


for various scales.

• Topological data analysis. Aims to use the distribution of topological features (“sim-
plices”) in the data.

• Minkowski functionals. Some other morphological characterization.


This list is not complete but probably covers the most important ones.

28.2.2 Learned Summary Statistics


A learned summary statistic, e.g. a 3d convolutional neural network, or a graph neu-
ral network applied to the galaxy field and trained to measure cosmological parameters, is in
principle optimal assuming that
• It has enough capacity (it can represent the required function with its weights).

• There was enough training data to learn the required function.

• The optimizer did find the global minimum.

• The simulation can be trusted on the scales that the neural network gets to see (e.g. we
can filter out small scales first to make it more robust, but losing sensitivity).
These conditions don’t neccessarily hold in practice so there is still some interest in coming up
with new “hand made” summary statistics. A neural network is usually trained to directly give
estimates of the cosmological parameters, while for other summary statistics there is a second
step involved in mapping them to cosmological parameters. A neural network can also be trained
to estimate error bars and covariances for its measurements. However, a more robust approach
is to learn these error bars after training in a second step, which we discuss now.

28.2.3 Neural Density Estimation (NDE)


After having obtained a measurement of the summary statistics xi , either learned or not or a
combination thereof, the second step is to infer parameters Θ. Sometimes it is a good approx-
imation to assume that the likelihood is Gaussian with a fiducial covariance matrix, which we
can estimate from simulations as discussed for the power spectrum above. More generally, the
relation has to be learned probabilistically using a (neural) density estimator.
To do so, we first need to create a dataset of parameters and simulated summaries

(θ n , xn )}Nsim
n=1 (28.1)

Now we want to learn either of the following functions:

145
• In Neural Likelihood Estimation (NLE) we learn the likelihood p(x|θ), i.e. the con-
ditional probability of the summary x given the parameters θ. If we have learned the full
distribution, we can do fast amortized inference, which means that we do not need to run
new simulations to get the posterior for a new set of observations. On the other hand, it can
be too expensive to run enough such simulations, in which case one can use a Sequential
Neural Likelihood Estimator (SNLE) which focuses on learning the likelihood near
the observed data. Once the neural likelihood is learned, we multiply by a prior and run
the ususal MCMC.

• In Neural Posterior Estimation one learns directly p(θ|x). One thus does not have to
run an MCMC anymore.

• In Neural Ratio Estimation we learn the likelihood-to-evidence ratio using a classifier.

To learn either of the likelihood or posterior we need a parametric density model, that
is some function that we can fit to the data set Eq. (28.1) by adjusting its parameters. The
state-of-the-art to do this is to use so-called normalizing flows, for example the masked au-
toregressive flow (MAF). A normalizing flow is a neural network that transforms a simple
base distribution (usually a Gaussian) into a complicated target distribtion by learning a series
of diffeomorphisms (i.e. an invertible and differentiable change of coordinates). Fitting/Training
the normalizing flows works by adjusting its weights using auto-differentiation, in the same way
as ordinary neural networks are trained (though with a different loss function). By now SBI
methods have been fairly well established and, as in the case of MCMC, you don’t neccessarily
have to understand the methods in great detail to use them. A key challenge is to check that
the learned likelihood or posterior is correct (especially that it is not over confident). One ap-
proach to do so is called Simulation-based calibration. My lecture slides on AI in Physics
https://ai.physics.wisc.edu/teaching/ contain more details of the above methods, for ex-
ample the required neural network training objectives.

28.2.4 SBI results in cosmology


One large project in cosmology that uses various summary statistics and SBI for parameter infer-
ence is the SimBig project. Recently this project analyzed BOSS data in 2211.00723, 2310.15246.
The simulation training data comes from the ∼ 20.000 Quijote dark matter simulations, which
are forward modeled to include galaxies (through an HOD), survey geometry and observational
systematics. The result is that they can tighten parameter constraints by a moderate factor over
the power spectrum analysis. This is a large and encouraging effort. However it is still somewhat
unclear how it compares to an EFT based analysis. The improvement factor depends on the
non-linear cutoff scale, and in simulations it is not possible to marginalize over biases in the same
way as we can in the perturbative analysis. So as soon as we tighten constraints over the EFT
(including power spectrum, bispectrum and perhaps trispectrum) one has to be careful about
the robustness of the measurement. Further, the Quijote simulations have limited volumes, span
a limited model and parameter space, and do not include any hydrodynamics. In the foreseeable
future we will use both EFT and SBI analyses and see how well they agree.

146
28.2.5 Theory emulators
An approach that is related to SBI and getting popular in cosmology is the generation of neural
network based emulators of summary statistics, in particular of the power spectrum. Even for
linear physics, running CAMB or CLASS at each point in the Monte Carlo chain is annoyingly
slow. To speed this up, cosmologists have trained neural networks to emulate the Boltzmann
solver, i.e. provide the power spectrum P theo,lin (k, Θ) as a function of Θ. An example of this is
Cosmopower 2106.03846. Using an emulator, the MCMC will run much faster than with the full
Boltzmann solver.
Using simulations with different cosmological parameters, one can also make a power spectrum
emulator of the non-linear matter power spectrum. This was done for example here 2207.12345.
It is also possible to combine dark matter simulations with a biasing model (1910.07097) to obtain
a bias dependent emulator of the halo (or galaxy) power spectrum. Emulators can be paired with
SBI by learning their likelihood with NDEs.

28.3 Probabilistic Forward Modeling at Field Level


A second approach to cosmological parameter estimation removes summary statistics and instead
works with the entire PDF of the data. This approach has the advantage that there are no black
boxes (such as learned summary statistics) and systematics are easier to model since everything
is treated probabilistically. The downside is that it takes considerably more, often intractable
amounts of, computational resources. Let’s see how this works.
Let’s first assume that we have a deterministic simulator f (s, Θ), our forward model. It maps
the (near-)Gaussian initial conditions s (the primordial density perturbations, usually including
the BAO feature) to the observable smoothed galaxy density field d assuming cosmological pa-
rameters Θ (e.g. Ωm , H0 ). For example, f can be a dark matter simulation with a bias model for
galaxies, and s would be the Nside3 initial displacements. Second, we assume that our observed
obs
galaxy density d is equal to the true density d plus some uncorrelated Gaussian noise n (e.g.
shot noise):

dobs = d + n (28.2)

where the noise has covariance N . The likelihood of observing dobs given s is thus given by
1
log L(d|s, Θ) = − (f (s, Θ) − dobs )T N −1 (f (s, Θ) − dobs ) + const. (28.3)
2
We want to turn this around and get the posterior P(s, Θ|d). This will give us the joint PDF
of the initial conditions of the universe and the cosmological parameters, and thus measure both
of them. We thus need a prior on s, which is that it is a Gaussian field with some primordial
parameters Θ′ (e.g. As , ns ) that define the primordial power spectrum. The prior is

log P(s, Θ′ ) = log P(s|Θ′ ) + log P(Θ′ ) (28.4)


1 1
= − sT S(Θ′ )−1 s − log |S(Θ′ )| + log P(Θ′ ) (28.5)
2 2
(28.6)

147
Using Bayes theorem we get the posterior
1 1
log P (s, Θ, Θ′ |d) = − (f (s, Θ) − dobs )T N −1 (f (s, Θ) − dobs ) − sT S −1 (Θ′ )s (28.7)
2 2
1 ′ ′
− log |S(Θ )| + log P(Θ ) + log P(Θ) + const. (28.8)
2
Usually we don’t care about the initial conditions (the “phases”) of the perturbations and only
want to know the cosmological parameters. In this case we need to marginalize to get
Z
P (Θ, Θ |d) = dsP (s, Θ, Θ′ |d)

(28.9)

The posterior Eq. (28.7) is not so different conceptually from the posterior of the power spectrum,
but here our variables are the entire cosmological field. The reason it is even possible to write
down a posterior PDF for the field is that we know the PDF of the initial conditions, and we
can forward model the field to late times. Assuming that our forward model is correct, this
analysis will be statistically optimal, i.e. it includes all available information. The problem is
to handle such a huge computation problem. The data d and initial conditions s can be easily
10003 dimensional. A normal MCMC would never converge. Fortunately there are techniques
for extremely high dimensional inference. Before discussing inference let’s get an overview of the
forward models. Forward modelling does not have to be done with the galaxy density of course,
weak lensing or intensity mapping is also an appealing target.
There are different types of forward models that are being used, in particular:

• Pertubation theory at field level plus a bias expansion (1808.02002). This case is the
most tractable, but can of course not go beyond the regime of perturbation theory. It is
still somewhat unclear to what extent a PT forward model can outperform the traditional
analysis (2307.04706).

• Differentiable simulations (2010.11847,2211.09958), which include both structure for-


mation and some approximation of halo formation.

• Neural network emulators of structure and halo formation (1811.06533,2206.04594).

• Hybrid EFT (1910.07097), a combination of dark matter simulations and a bias expansion
in Lagrangian space.

Recall however that to draw a single sample of the posterior, we need to call the forward model.
So our forward model should be fast enough to be evaluated millions of times. Above we have
not written out nuisance parameters of the forward model and one can also include stochastic
sampling to model galaxy formation, but our discussion captures the essential features. As a side
note, these same forward models (minus the requirement for differentiability) can also be used to
get the SBI training data in Eq. (28.1).
Next to the forward model we need an inference algorithm to approximate the posterior
and/or sample from it. All of these require that the forward model is differentiable with respect to
all parameters, including the initial conditions s. This is where auto-differentiation, for example
Jax or pytorch, comes in. Inference algorithms that have been proposed include:

148
• Finding the MAP and making a Gaussian approximation around it for error bars (1706.06645).
Finding the MAP by gradient descent is faster than sampling, but it is hard to get reliable
error bars.

• Hamiltonian Monte Carlo (HMC), (1203.3639). This is the most reliable but also most
computationally intense approach. Recently a different variant of Monte Carlo was used
that is also promising: 2307.09504.

• Variational inference or Variational Self Boosting with normalizing flows (2206.15433).

In all of these cases it is difficult to deal with a multimodal posterior which is expected at
small scales. Even without that problem, it is difficult to generate enough independent samples
and be sure the posterior surface is well covered. Also, not all parameters are created equal
and it is hard for example to sample from band powers of the initial power spectrum. While
appealing in principle, it is still very hard to use this approach in practice on data, especially
with a non-perturbative forward model. On the other hand, forward modeling can in principle
strongly improve constraints by breaking parameter degeneracies present in N-point function
analysis (2112.14645) and is the only provably optimal approach.

28.3.1 Reconstruction of initial conditions


I want to briefly look at the problem above from the point of view of solving an inverse
problem. Assume that we want to reconstruct the unobservable initial perturbations s from
the observed noisy data dobs , for known fiducial cosmology parameters. This kind of problem
is called an inverse problem in statistics and computer science and many algorithms exist to
solve such problems, making various approximations. Note that the problem is ill-conditioned.
Because we don’t observe the transverse velocities of galaxies, and for other reasons, we cannot
simply simulate the observed field backwards in time. In our forward reconstruction of initial
conditions above, this problem is solved by having a prior on the initial conditions, that makes the
problem probabilistically invertible. There is a second class of algorithms to reconstruct the initial
conditions, called backwards reconstruction. This methods starts from the observed field and
uses a sort of backwards perturbation theory to reconstruct the initial conditions. The simplest
form of this algorithm is called standard (BAO) reconstruction. It was devised to sharpen
the BAO feature which is being somewhat “washed out” by large-scale “bulk movements” of
matter. By first undoing the Zeldovich displacement (1-LPT), one makes the feature sharper and
can thus improve cosmological parameters from a power spectrum measurement. This method is
widely used in galaxy survey analysis. There are also more advanced iterative initial condition
reconstruction methods (1704.06634). In addition to probing BAO, one can imagine running
a bispectrum estimator on the reconstructed initial conditions to search for primordial non-
Gaussianity.
Forward reconstruction is more powerful in principle because it is clear how to include sys-
tematics (by adding them to the forward model). Backwards reconstruction on the other hand
is far less computationally demanding. Neural networks can also be trained to perform back-
wards reconstruction, see e.g. 2104.12864,2305.07018. One can also learn a probabilistic initial
conditions reconstruction with a diffusion model from N-body simulations (2304.03788) .

149
28.4 Generative Machine Learning at field or point cloud level
I want to briefly mention one more major area of machine learning research in cosmology, that
of generative modeling at the field level. A generative model can emulate a simulation and be
potentially much faster than the original simulation that it was trained on. Making simulations
(or emulations if you prefer) is also possible without generative (probabilistic) modelling. For
example one can train a deterministic U-net to go from initial conditions, which are very fast to
generate, to the late time matter distribution. Generative modeling on the other hand includes
a step of random sampling, i.e. every time we run the machine learning model we get a different
result.
The main machine learning models that can generate cosmological distributions such as δm (x)
at high resolution are

• GANs and WGANs (2001.05519)

• Diffusion Models (2311.05217)

• Normalizing flows (2105.12024, 2202.05282)

and all of them have been used in cosmology. One can model either density fields or displacement
fields of particles. Very recently, people also work with point cloud models (2311.17141), that
don’t generate a field (image) but a set of points (galaxies). All three generative methods can
also work with point clouds.
A main use for generative models is to speed up simulations. For example, one may be
able to upgrade a low-resolution dark matter simulation to an emulated high resolution hydro
simulation. Or one may populate a low resolution dark matter simulation with realistic galaxies.
Of course, to train all of these models one needs to have some high-resolution training simulations.
The generation process can also be conditioned on cosmological parameters. Such conditional
generative models can also be run in inverse mode to estimate cosmological parameters.
There is a rapidly growing literature in this field and I have only cited a small subset thereof.

150

You might also like