8.04: Quantum Physics I (OCW) : Lecturer: Professor Barton Zwiebach
8.04: Quantum Physics I (OCW) : Lecturer: Professor Barton Zwiebach
IAP 2022
Fact 1
These notes were taken for my own preparation for the 8.04 Advanced Standing Exam, and they are transcribed
from video lectures found on the Spring 2016 OCW page. As a result, some content may be less complete than
in an ordinary course (because recitations aren’t available on OCW). The main reason for these notes’ existence
is to provide more continuity into my 8.05 and 8.06 notes from previous semesters.
Fact 2
Quantum mechanics should be thought of as a framework to do physics which has replaced classical mechanics
as the “correct” description at the fundamental level. While classical mechanics provides a good approximation at
everyday scales, we know that it breaks down at some point, and thus the way things really work are conceptually
very different from what our intuition tells us.
What this course (and the ones after this) will do is take the principles and the framework of quantum mechanics and
apply them to different physical phenomena, like quantum electrodynamics, quantum chromodynamics, quantum optics,
and quantum gravity. In today’s lecture and a bit of the next, we’ll cover five introductory ideas: linearity of quantum
mechanics, the necessity of complex numbers, the laws of determinism, interesting features of superposition, and
the concept of entanglement.
The first idea, linearity, is important whenever we’re considering a physical theory. Such theories have dynamical
variables that we’d like to measure (because they are connected with values of physical observations) which are tied
to certain equations of motion.
1
Example 3
Maxwell’s theory of electromagnetism is a linear theory, meaning that if we have two different solutions to Maxwell’s
equations (such as two plane waves traveling in different directions), their sum (two plane waves propagating
simultaneously) is also a valid solution.
In other words, solutions don’t “affect” each other, and in practice this is useful because (for example) the air
around us is constantly filled with electromagnetic waves. But linearity tells us that a single phone cable can transmit
many phone calls at once, and it also tells us that if we’re doing an electromagnetism experiment, it’s okay to have
other electromagnetic waves around us in the air.
⃗ B,
Let’s rigorize this discussion a bit more: in Maxwell’s theory, we have the variables (E, ⃗ ρ, J)
⃗ which must satisfy
Maxwell’s equations. What linearity tells us is that (αE,⃗ αB,
⃗ αρ, αJ)⃗ (for any real number α) will then also be a
solution to the equations of motion, and if (E ⃗1 , B
⃗ 1 , ρ1 , J⃗1 ) and (E
⃗2 , B
⃗ 2 , ρ2 , J⃗2 ) are both solutions, then so is their sum
⃗1 + E
(E ⃗2 , B
⃗1 + B
⃗ 2 , ρ1 + ρ2 , J⃗1 + J⃗2 ). We haven’t written out Maxwell’s equations explicitly, but we’ll mention instead
that the fundamental concept necessary for linearity is to write an equation schematically as
Lu = 0,
where u is some unknown and L is a linear operator. More generally, we can have multiple equations of this form,
and u can be a vector of unknowns instead, but what’s really important is that L satisfies the two properties of a linear
operator:
L(au) = aLu, L(u1 + u2 ) = Lu1 + Lu2 .
so if u1 , u2 are solutions, then L(αu1 + βu2 ) = α · 0 + β · 0 = 0, and thus any linear combination of solutions is also
a solution when we have a linear theory.
Example 4
du
The differential equation dt + τ1 u = 0 can be written in the form Lu = 0 by taking L to be the linear operator
du
Lu = dt + τ1 u (which we will also write in the form L = d
dt + τ1 ).
We can indeed check that this operator L above is linear by verifying the two properties of linearity: indeed,
d 1 d 1
L(au) = (au) + (au) = a u + a u = aLu,
dt τ dt τ
and the other property follows in a similar manner.
Fact 5
Linear theories are generally much simpler than nonlinear theories (for example, Maxwell’s theory is linear, while
Einstein’s theory of general relativity is very nonlinear and very complicated).
However, classical mechanics is actually very nonlinear (for example, Newton could solve the two-body problem
but not the three-body one), and we can see this through the following example:
2
Example 6
Consider the classical motion of a one-dimensional particle under the influence of a potential V (x); the equation
of motion for the dynamical variable is then Newton’s second law,
d 2 x(t)
m = −V ′ (x(t)).
dt 2
(Note that V ′ will always denote the derivative of V with respect to its argument.) The issue is that the left-hand
side has a linear operation (taking two derivatives), but the right-hand side may not be linear because V (x) can take
an arbitrary form (for example, if V (x) ∝ x 3 , then V ′ (x) ∝ x 2 is not linear). And this finally leads us to our discussion
point for this class: quantum mechanics is a linear theory, in which the dynamical variable is a wavefunction Ψ
(which can depend on time t and other variables), describing the dynamics of a quantum system over time. The
equation that governs Ψ is then the linear Schrodinger equation
∂Ψ
iℏ = ĤΨ,
∂t
where Ĥ is the Hamiltonian (which is a linear operator). In other words, we can write LΨ = i ℏ ∂Ψ
∂t − ĤΨ = 0 (where
because derivatives are linear operations and Ĥ is linear, L is also linear). We then have the advantage (over classical
mechanics) that whenever we have solutions in quantum mechanics, we can scale and add them together. And as we
go through this course, we’ll see that the quantum mechanics solutions are indeed simpler and more elegant than the
classical mechanics ones.
Fact 7
The ℏ on the left-hand side of the Schrodinger equation is the reduced Planck’s constant (pronounced “h-bar”),
originating from when Planck tried to fit the blackbody spectrum and needed a constant. And the Hamiltonian Ĥ
is for us to invent or discover (it depends on the physical characteristics of our system).
The physical interpretation of the wavefunction Ψ was not obvious for those who first invented quantum mechanics;
Max Born later found that it had to do with probability, but we’ll discuss this soon.
For now, we’ll first turn to our second idea, which is the role of complex numbers in quantum mechanics.
√
Schrodinger’s equation above includes the constant i = −1 – such complex numbers were originally invented because
they were necessary for solving some equations (like x 2 + 1 = 0), and it turns out that “no more numbers need to be
invented” after that to solve polynomial equations.
Definition 8
√
A complex number is of the form z = a + i b, where a, b ∈ R are real numbers and i = −1. We define the
real part of z to be Re(z) = a, the imaginary part of z to be Im(z) = b, the complex conjugate of z to be
√
z ∗ = a − i b, and the norm of z to be |z| = a2 + b2 . The set of complex numbers is denoted C.
We often represent complex numbers on the complex plane, identifying the point a + i b with (a, b) in the xy -plane
(so that the horizontal axis represents real part, and the vertical axis represents imaginary part). The norm of z is
then given by the ordinary Euclidean distance formula, and we have
|z|2 = a2 + b2 = z z ∗ ,
3
which will be pretty important for us as we progress through this course. Notice that by trigonometry, if we want to
find the complex number of norm 1 at an angle θ (counterclockwise from the x-axis), we have
z = cos θ + i sin θ
(because the real and imaginary parts are just the horizontal and vertical projections). What’s important and nontrivial
is that we also have Euler’s formula
cos θ + i sin θ = e iθ ,
which we’ll see come up as we work more with the equations of motion.
We may have seen complex numbers being used in classical mechanics and in electromagnetism, but they have
always been used in auxiliary manners (because positions, velocities, electric fields, and other dynamical variables are
always real-valued). But in quantum mechanics, Ψ must be complex-valued because the Schrodinger equation has an
i in it (after all, if Ψ were real-valued, the left-hand side would be imaginary and the right-hand side would be real,
which could only occur if both sides were 0). And yet, we can never measure complex numbers – any measurement we
perform in a lab gives us a real number, and now we come to the point where we explain that |Ψ2 | is the physically
relevant quantity which is proportional to the probability of our systems being in a given configuration.
Remark 9. Many physicists disliked or didn’t believe this Born interpretation – Schrodinger’s cat was actually a
thought experiment designed to explain that this interpretation was ridiculous. And there were even papers at the time
by famous physicists, like the EPR paper, which were later proven wrong. But when very good physicists are wrong,
there’s still a lot we can learn from them and a lot of interesting physics that results from the discussion.
With that, we’ll turn to a discussion of determinism which must come up when we talk about probabilities. Einstein
was the one who (reluctantly) came up with the idea that light is quantized as photons (so not only can we observe
light as an electromagnetic wave, we can also see it as a particle through the photoelectric effect). But the difference
between a Newtonian and quantum particle is that the former has zero mass and carries a precise energy, position,
and momentum, while the latter is some indivisible amount of propagating energy or momentum. So a photon is a
quantum mechanical particle in the sense that it is a packet that cannot be decomposed further into smaller packets.
What Einstein found was that for any photon, we have the relation
E = hν,
where ν is the frequency of the light (of which the photon is part of) and h (= 2πℏ) is Planck’s constant.
Fact 10
Here, we should remember that c = νλ for any light wave, and because h is very small, individual photons have
very low energy. But our eyes are very good at detecting light – if we’re in a completely dark room, we are perhaps
able to detect as few as five photons.
Example 11
Consider what happens when a beam of light hits a polarizer, which is a material with a preferential direction
(which we’ll set to be the x-axis). Recall that this means that light linearly polarized along the x-axis will pass
through, but light linearly polarized along the y -axis will all be absorbed.
In fact, that linearly polarized light in the x-direction that passes through the polarizer looks identical when it enters
and when it exits (same frequency and wavelength, and thus same energy). But now suppose we send light into the
4
polarizer which is polarized at some angle:
⃗α = E0 cos(α)x̂ + E0 sin(α)ŷ .
E
From our study of electromagnetism, we know that the component along the y -direction is absorbed, so after going
⃗α = E0 cos αx̂. But we know that the energy in an electric field is proportional to
through the polarizer, we just have E
⃗ 2 , so this tells us that (because the initial electric field has magnitude E0 , and the final electric field has magnitude
|E|
E0 cos α) the fraction of energy that passes through the polarizer is cos2 α. (This checks out for α = 0, π2 .) But
from the point of view of the photon, what this means is that for individual (identical) photons that hit a polarizer,
we must see a fraction cos2 α of the photons pass through and a fraction 1 − cos2 α of them get absorbed (since
we can’t have “half a photon” pass through, or else that would change the frequency of the light, which we know
doesn’t happen). Classical physics doesn’t like the fact that sending identical particles into a physical situation may
yield varying outcomes, but that is indeed what we must have in quantum physics: identically prepared experiments
may give different results, and thus we lose predictability as long as photons exist.
There are a few ways that we could try to get around this debacle: for example, we could hypothesize that photons
will or will not pass through the polarizer based on the interatomic structure of the interaction, but this was found
to not be true after repeated experiments. Another suggestion was that there are in fact hidden variables that are
unknown to us (which distinguish the photons but make them still look identical), and those hidden variables affect
whether or not the photon passes through the polarizer. (In other words, quantum theory is not complete.) It sounds
philosophical, to the point where it seems like we wouldn’t be able to refute it without measuring these hidden variables,
but in fact the Bell inequality proved that even with them, quantum mechanics could not be made deterministic.
Fact 12
The Bell experiment will be discussed more in 8.05, but the main idea is that we can design an experiment in
which the hidden variables (under classical mechanics) would imply a certain inequality. Then this inequality was
shown experimentally not to hold once the technology was good enough to run the experiment!
So in summary, determinism is lost: photons must either go through the polarizer or not, and we can only predict
the probability with which they go through. To write down the wavefunction of a photon, we can think of the
photon as being in one of several states (we will sometimes also use “vectors” or “wavefunctions” for this same term).
The motivation is that we can scale or add vectors together, and that’s also what we can do with solutions to our
quantum equations. Dirac invented the notation |x⟩ to represent a vector (where x is just some label that describes
the particular state): for example, for our polarizer, we have the two possible states
|photon; x⟩ , |photon; y ⟩ .
Then linearity tells us that we can always create the linear combination of states
which is the quantum mechanical description of a photon polarized in the α-direction. Notice that this is similar to
⃗α = E0 cos(α)x̂ + E0 sin(α)ŷ in that we have the same cos α and sin α terms, but we no longer have
the classical E
the E0 coefficient because we’re now representing wavefunctions at the individual photon level. And once the photon
makes it through the polarizer, its wavefunction is now just |photon; x⟩.
That leads us to our next discussion of superposition. In classical physics, superposition tells us (for example)
that we can add electric fields or force vectors, but in quantum physics we get something much stranger.
5
Example 13
Consider a Mach-Zehnder interferometer, which is a device with two beam splitters, two mirrors, and a detector,
as shown below.
mirror
This device works in the following way: when incoming light hits each beam splitter, half of the light is reflected
and half is transmitted. Those resulting light beams are reflected off of two mirrors to recombine at a second beam
splitter, and after interference each detector will pick up some signal (in fact, we can adjust the system so that any
fraction of the light, between 0 and 1 inclusive, goes to detector 0). Such a device was invented in the 1890s, and it
is interesting to think about (given our discussion above) because we know that there must be something probabilistic
going on at each beam splitter. In particular, if we have a superposition of different states, “some photons can go
both up and down.” And in fact, when we say that there is an interference pattern being formed, it can’t be that
different photons are interfering with each other (because energy can’t be created or destroyed); instead, each photon
interferes with itself, and we can verify that this is what causes constructive or destructive interference by sending
one electron into the device at a time.
More mathematically, we can say that each photon is in a superposition between the upper and lower beams in
our diagram (where “upper beam” and “lower beam” are now our fundamental states). Remembering that we’re often
associating the words “state,” “vector,” and “wavefunction” with each other, we’ll now explain how to associate physical
properties with these superpositions. Suppose that we have two states |A⟩ and |B⟩, and suppose that we’re trying to
measure some particular property (such as energy, spin, or position), so that measuring on |A⟩ always gives some
value a, and measuring on |B⟩ always gives some value b. Then we can look at a quantum mechanical state of the
form
|Ψ⟩ = α |A⟩ + β |B⟩
for complex numbers α, β ∈ C, which is a linear combination (superposition) of the states |A⟩ and |B⟩. If we measure
that same property in |Ψ⟩, what quantum mechanics tells us is that we will always get either the value a or the value b,
but never something intermediate! This is very different from the situation we might have had in classical mechanics –
instead of taking a weighted average (as we might be tempted to do), what |Ψ⟩ really represents is that the probability
of measuring a is proportional to |α|2 , and the probability of measuring b is proportional to |β|2 , and those are the
only two possibilities.
Example 14
Recalling the example state |photon; α⟩ = cos α |photon; x⟩ + sin α |photon; y ⟩ we had above, we can think of
the polarizer as measuring the direction of polarization: it will either measure that the photon is polarized in
the x-direction (with probability cos2 α) or that it is polarized in the y -direction (with probability sin2 α).
And after we measure the state α |A⟩ + β |B⟩ once, that state will either collapse to the state |A⟩ or the state
|B⟩, and future measurements will always return the same value – this is the measurement postulate of quantum
6
mechanics. (So if we wanted to know the original value of |Ψ⟩ and didn’t know α and β, we would need to prepare
many identical copies of our setup and do our experiment many different times to assess the relevant probabilities).
where we can put any nonzero coefficient in front of the fundamental state |A⟩ and still have essentially the same
object. So it makes sense to pick a convenient representative out of all of these, and typically that is the normalized
state (which satisfies properties having to do with the norm of the state). We’ll talk about this more later, but the
reason we bring this up is to connect it more to our discussion of states of light from last time. If we think about our
photon from last lecture with a superposition of x- and y -direction polarization, we can write it most generally as
α |photon; x⟩ + β |photon; y ⟩ ,
where α, β are complex numbers and thus we have four real parameters to dictate the polarization direction. But we
already know (and will review in a second) that there should only be two parameters to dictate the polarization of a
particle, and our discussion of normalization helps us out here: if the overall constant doesn’t matter, we can multiply
1 β
the whole state by α to get a state of the (generic) form |photon; x⟩ + α |photon; y ⟩, and thus all of the physics of
β
a polarization state is contained in the ratio γ = α (which is a single complex parameter, corresponding to two real
parameters). And indeed, the most general polarization of an electromagnetic wave is elliptical and dictated by two
real parameters, the angle of its major axis and its eccentricity (since the overall size of the ellipse doesn’t matter).
This act of normalizing will come up again and again as we get farther into the course.
We’ll emphasize one more important concept of superposition using spins.
Definition 15
Spin is a fundamental property of elementary particles, dictating their internal (intrisic) angular momentum.
While no model of an elementary particle has ever been constructed where there is actually something inside
spinning, there is indeed some angular momentum (even for a point particle), and spin is a very quantum mechanical
property. Since angular momentum is a vector, we must decide the direction in which spin is pointing.
Example 16
Suppose we measure spins along the z-direction. Then if our particle is of spin 1/2 (such as protons, neutrons,
and electrons), we will either measure the particle in the spin-up or the spin-down state (corresponding to the
respective direction of angular momentum).
Curiously, what we’re saying is that whenever we measure the spin of a spin 1/2 particle, it is always either spin up
(+1/2) or spin down (-1/2) with full magnitude. We’ll denote the spin up and spin down states as |↑; z⟩ and |↓; z⟩,
7
respectively, and like in our earlier discussions, if those two are valid quantum mechanical states, so are states arising
from superpositions such as
|Ψ⟩ = |↑; z⟩ + |↓; z⟩ .
(We won’t talk about normalizing these states at the current moment; it’s not too important.) So if we have a bunch
of electrons which are in the state |Ψ⟩ above, and we try to measure each of their spins in the z-direction, then we
expect about half of the electrons to be spin up and about half to be spin down, due to the relative magnitude of the
|↑; z⟩ and |↓; z⟩ terms.
But Einstein might ask the question of how we really know that the particles we measured were in the states |Ψ⟩
– after all, our current setup does not help us distinguish between a system of all electrons in |Ψ⟩, versus an ensemble
of electrons where about half of them were in the state |↑; z⟩ to start with and the other half were in the state |↓; z⟩.
This is what Einstein meant by realism: if we measure a particle to be spin-up, he would claim that this means the
particle was spin-up to begin with. And we won’t resolve this paradox until we learn more about spins (as we will later
in the course). Instead, we’ll just point out that if we had an ensemble where we had half spin-up and half spin-down
particles, we could take that ensemble and measure its spin along the x-direction instead. It turns out that the
particles in the |Ψ⟩ state will all come out + along the x-direction, while the particles in the mixed ensemble will come
out half-and-half + and -. So there is indeed an experiment that can tell whether these quantum states exist, and in
fact they do.
This now prepares us to discuss the last of the fundamental introductory ideas for 8.04, which is entanglement. It
turns out that two particles can become entangled without having strong interactions between them – they can even
be non-interacting.
Example 17
Suppose particle 1 can be in one of the states |u1 ⟩ and |u2 ⟩, while particle 2 can be in the states |v1 ⟩ and |v2 ⟩.
We wish to describe the states of the combined system of two particles.
To describe the overall system’s state, it seems reasonable to specify each particle’s state – for example, if particle
1 is in state |u1 ⟩ and particle 2 is in state |v1 ⟩, then we represent the whole system by the tensor product |u1 ⟩ ⊗ |v1 ⟩.
In such a product, we always list the first particle’s state on the left and the second particle’s state on the right, so
more generally we may have something that looks like
Tensor multiplication then looks similar to regular multiplication, but we never move the states across the product:
expanding and moving all of the constants out gives us
which is a superposition of the states |u1 ⟩ ⊗ |v1 ⟩ and |u2 ⟩ ⊗ |v2 ⟩. But this time, we can’t “factor” this state as a state
of particle 1, tensored with a state of particle 2 – after all, matching coefficients with the general multiplication we
did above would force α1 β1 = α2 β2 = 1 but α1 β2 = α2 β1 = 0, which is a contradiction (since α1 β1 α2 β2 would then
have to be both 1 and 0). So we have an unfactorizable state, in which we cannot just describe the configuration
as independently analyzing the first and second particle. In other words, we have an entangled state – knowing
8
something about the first particle tells us about the second and vice versa – even though no interactions have occurred
between them!
Example 18
If we consider two spin 1/2 particles, then we can consider the entangled state
People often speak of “Alice and Bob” each having one of these two particles (arranged in their entangled state),
with the idea being that Alice and Bob are very far away (perhaps one on Earth and the other on the moon). Then
Alice and Bob share an entangled pair, and very interesting experiments can be done (in real labs, this can be done
with two photons that are hundreds of kilometers apart), coming from the fact that the two particles’ properties are
correlated in strange ways. For example, if Bob measures his particle and finds that it is spin-down, then the whole
state must collapse into the |↑; z⟩1 ⊗ |↓; z⟩2 state, meaning that if Alice also simultaneously measures her particle,
she must find that it is spin-up (even before light has had time to travel from one of the entangled particles to the
other). It turns out that this does not contradict special relativity – we cannot actually send any information through
this process – but the collapse is still instantaneous.
Einstein would again object to this concept of entanglement, suggesting instead that some entangled pairs are
really in the |↑; z⟩1 ⊗ |↓; z⟩2 state while others are in the |↓; z⟩1 ⊗ |↑; z⟩2 state. But that’s where Bell’s inequality
comes in – if Alice and Bob can measure in three different directions, then the correlations that result are impossible
to explain with classical physics. So what’s going on is really subtle and violates classical mechanics in a peculiar way.
Example 19
We’ll now return to the Mach-Zehnder interferometer from Example 13 and perform some further analysis.
so that our generic state is indeed a superposition of the “upper beam” and “lower beam” states. If we now think about
our beam splitter, we know that an incoming photon from the top would produce some reflection amplitude s " (to#the
1
top path) and some transmission amplitude t (to the bottom path), meaning that the beam splitter takes to
0
" #
s
. (The values of s, t depend on the particular design of the beam splitter.) Clearly, we must have |s|2 + |t|2 = 1.
t
" # " #
0 u
Similarly, a photon coming from the bottom results in a change from to , where |u|2 + |v |2 = 1. In other
1 v
9
" #
α
words, we need four numbers to characterize this beam splitter – indeed, for an arbitrary state passing through
β
the beam splitter, by linearity we have that
" # " # " # " # " # " # " #" #
α 1 0 s u αs + βu s u α
=α +β 7→ α +β = = .
β 0 1 t v αt + βt t v β
In other words, we can think of the beam splitter’s action as multiplying by a certain 2 × 2 matrix! This perspective
will be very useful for us going forward for a variety of reasons. For our purposes, we’ll assume we have a balanced
beam splitter where half of the light goes through and the other half is reflected, so that |s|2 = |t|2 = |u|2 = |v |2 = 21 .
But that doesn’t" allow us# to determine exactly what s, t, u, v are (because of the phase), and let’s first make the
√1 √1
guess that it is 2 2 . As long as this matrix satisfies probability conservation, there should theoretically exist a
√1 √1
2 2
beam splitter that can perform that operation. But this doesn’t work – after all, notice that
" #" # " #
√1 √1 √1 1
2 2 2 = ,
1 1 1
√
2
√
2
√
2
1
" #
√1 √1
and the norm of our column vector has changed. Instead, it turns out one answer is 2 2 ; indeed,
√1 − √12
2
" #" # " #
√1 √1 α 1 α+β
2 2 =√ ,
√1 − √12 β 2 α−β
2
and this final vector indeed has the same norm as the initial one, because
2 2
1 1 1 1
√ (α + β) + √ (α − β) = (α + β)(α∗ + β ∗ ) + (α − β)(α∗ − β ∗ ),
2 2 2 2
at which point the cross-terms vanish and we’re left with 12 (2α2 + β 2 ) = α2 + β 2 , which is 1 by assumption. So this
beam splitter matrix will do the job for us – it’s not the only solution, and we’ll consider two beam splitter matrices
" # " #
1 −1 1 1 1 1
BS1 = √ , BS2 = √ .
2 1 1 2 1 −1
corresponding to the two beam splitters in the diagram below. We may also suppose that along the bottom path
(hitting the bottom mirror), we place a piece of glass which shifts the probability amplitude by a phase:
α mirror
δ
beam spltr. 2 detector 1
β
mirror
In other words, a probability amplitude of β will turn into a probability amplitude of βe iδ , where δ ∈ R. Since
|β| = |βe iδ |, this preserves the norm of the probability amplitude. (We won’t use this phase shifter in today’s lecture,
but it’s still a useful concept that will come up in the future.)
10
Problem 20
Consider the setup"in #the diagram above without the phase shifter. What do we find at the detectors if we have
α
an incoming state ?
β
We’ll assume that the mirrors do not do anything to the column vector – in fact they each multiply a component by
−1, so overall that’s just a constant shift which does nothing – meaning that the detector state comes from applying
first the beam splitter matrix BS1 , then the beam splitter matrix BS2 . In other words, we have
" # " #" #" # " #" # " #
α 1 1 1 −1 1 α 1 0 2 α β
output = (BS2 )(BS1 ) = = = .
β 2 1 −1 1 1 β 2 −2 0 β −α
" #
0
Thus, if we send in a photon from the bottom (meaning that we start with the column vector ), the output
1
" #
1
will be the column vector . In particular, this means that even after the photon split at beam splitter 1, there was
0
an interesting interference at beam splitter 2 which caused the amplitudes to cancel out along detector 1 and combine
along detector 0, such that every photon will be detected at the top detector! (And indeed, this interference pattern
was detected in experiments.)
Problem 21
Next, consider the setup in the diagram above, but along the bottom path we place a wall of concrete (at the
location of the phase shifter) so
" that
# light cannot pass through the bottom path. What do we find at the detectors
α
if we have an incoming state ?
β
We can no longer use our matrix multiplication formula from above; instead, we’ll look at calculation step by step.
After passing through the first beam splitter, we are in the state
" # " #" # " #
0 1 −1 1 0 √1
(BS1 ) =√ = 12 .
1 2 1 1 1 √
2
However, along the bottom path, we lose the probability amplitude of √1 due to the concrete wall – in other words,
" # 2
√1
the input into the second beam splitter is 2 (nothing reaches the splitter from below). Thus, the final detected
0
column vector is " # " #" # " #
√1 1 1 1 √1 1
2 2 2
(BS2 ) =√ = 1
.
0 2 1 −1 0 2
Interestingly, this means that by blocking some photons, we’ve created a signal at detector 1, which may seem
somewhat counterintuitive. (And in fact we’ll see by the end of this lecture that this is even more unintuitive than we
might initially think!) Remembering that the squared components correspond to the probabilities, we can summarize
the results of Problem 20 and Problem 21 as follows:
11
Outcome (open paths) Probability Outcome (one blocked path) Probability
2
2
Photon is detected at detector 0 1 =1 Photon ends at concrete block √1 = 12
2
Photon is detected at detector 1 02 = 0 Photon is detected at detector 0 1 2
= 14
2
1 2
= 14
Photon is detected at detector 1 2
We can take these values and put them into the following thought experiment:
In classical physics, this is clearly impossible – if we do the measurement to see if the bomb explodes, then either
the bomb is defective or we have no more bomb left. But with the Mach-Zehnder interfereometer system that we’ve
just described, we can indeed find a working bomb without having it explode!
Solution. Place the bomb where our phase shifter is placed in our diagram above, such that a photon passing through
the bottom path would set off the bomb. Then if the bomb is defective, it’s as if both paths are open (see left table
above), and thus photons must be detected at detector 0 (with probability 1). But if the bomb is working, then
we’ve essentially put a block of concrete along the bottom path (with the extra effect that the bomb explodes in that
1
situation). Thus (see right table above), half of the time the bomb will explode, but there is a 4 chance that the
1
bomb does not explode and the photon is detected at detector 0, and likewise also a 4 chance that the bomb does
not explode and the photon is detected at detector 1. Thus, whenever we see a photon at detector 1, we must have
a working bomb which has still not detonated.
Fact 23
It turns out that adjusting the experiment (placing the bomb in a resonant cavity, for example) can give us
arbitrarily high probabilities that the bomb will not go off before we find out that it works! And thus quantum
mechanics indeed allows us to perform very surprising measurements.
12
Fact 24
The photoelectric effect can be traced back to Hertz’s experiment in 1887, in which high-energy beams of light
were shined on polished metal plates and electrons (called photoelectrons because they are created in this way)
were released, creating a photoelectric current.
The primary interesting feature that was observed was that there was always a certain threshold critical frequency
ν0 , above which a current is measured but below which nothing happens. And this threshold depended on the metal
being irradiated, as well as the roughness and crystalline nature of the surface – it was later discovered that there
are free electrons roaming around this crystalline structure, and the critical frequency ν0 thus depends on how much
energy is required to free an electron from the metal. Additionally, it was found that the intensity of the light governed
the magnitude of the photoelectric current, but that this intensity does not affect the energy of the photoelectrons
themselves! So the incoming energy beam affects the number of excited photoelectrons but not their energy – in fact,
the energy of the photoelectrons has a linear relation with the frequency ν of the incoming light.
These properties are difficult to explain if we treat light solely as a wave, and Einstein’s answer in 1905 was (as
previously mentioned) that light must come in quantized bundles of energy (which were later named photons by
Gilbert Lewis in the 1920s), with energy E = hν. The picture that we should have in mind is the following:
Essentially, this graphs the potential energy function for our electrons – while they are in the metal, they are stuck
to it unless we can provide some potential energy to get them out of the well. Once we do this, the electron will be
able to fly freely and not be affected by the metal.
Definition 25
For any surface, the work function (denoted W ) is the energy needed to release an electron to the vacuum around
the surface.
If Einstein’s proposition about photons is correct, then that implies that the energy of a photoelectron satisfies
1
mv 2 ≈ Ee − = Eγ − W = hν − W
2
(the energy of the photon is transferred to kinetic energy of the photoelectron, except the work function W ). It wasn’t
until 1915 that this was verified experimentally by Millikan, and in fact that experiment gave an estimate of h within 1
percent of the accepted value today (which was the best up until that point). But even then, the photoelectric effect
experiment wasn’t enough on its own to really convince people that photons existed, because Maxwell’s theory was
very successful and accepting photons meant accepting loss of determinism and all of the other features of quantum
physics we’ve been discussing.
Let’s see an example computation to see what kind of setting we’re dealing with:
13
Problem 26
Consider ultraviolet light of λ = 290nm shined on a metal with work function W = 4.05 eV. What are the speed
and energy of the emitted photoelectrons?
Solution. The idea is that we should be able to do this problem without needing to remind ourselves what “eV” means,
or searching up the value of ℏ, and in general perform back-of-the-envelope estimates and calculations.
First of all, the energy of a photon is
hc 2πℏc
Eγ = hν = =
λ λ
(using the definition of the reduced Planck’s constant), and it’s useful to remember that ℏc ≈ 200 MeV · fm (the
actual number is 197.33), where 1 fm = 10−15 m (fm stands for “fermi” or “femtometer”). Thus, we can simplify to
and this is essentially a nonrelativistic electron because the rest mass of an electron is about 511 eV:
1 1 v 2 1 v 2
0.23 eV = mv 2 = me c 2 = · (511000 eV) ,
2 2 c 2 c
We’ll talk a bit more about Planck’s constant directly: because it shows up in the equation Eγ = hν, it must have
units
[E] [M][L2 ]/[T 2 ] [M][L2 ]
[h] = = =
[ν] 1/[T ] [T ]
(where we’re working with units characterized by mass M, length L, and time T – that is all we need). We’ll rearrange
this expression slightly to match with another physical quantity:
[M][L] ⃗
= [L] · = [r ][p] = [L];
[T ]
in other words, Planck’s constant has the same units as angular momentum! This is something for us to keep in
mind, and that’s why saying that we have a “particle of spin 12 ” actually means that “the particle has intrinsic angular
ℏ
momentum 2 .” (Unfortunately, as we go forward, we’ll need to be careful about the 2π factor between h and ℏ –
we’ll use whichever one looks nicer in the present situation.) And the idea is that once we have a new constant
of nature, like h or c or G, we can get inspired and create new quantities with that constant. Since [h] = [r ][p],
we can invent an associated length for each particle depending on its momentum p (which we’ll do shortly), or even
more simply do the following:
Definition 27
h
The Compton wavelength of a particle of mass m is λc = mc .
14
Example 28
The electron’s Compton wavelength is
h 2πℏc 2π · 197.33 MeV · fm
λc (e) = = = ≈ 2426 fm = 2.426 pm.
me c me c 2 0.511MeV
For comparison, this is smaller than the Bohr radius of an electron (about 50 pm), but much larger than the size
of a nucleus.
To understand the significance of this length that we’ve just defined, we’ll see its use in both an experiment and a
thought experiment:
Example 29
Suppose we have a particle of mass m. Then its associated rest mass (energy) is mc 2 , so a photon with the same
mc 2 c h
energy as the particle has frequency ν = h and thus wavelength ν = mc = λc .
This actually has some experimental implications in high-energy particle physics: for example, if a photon hits an
electron with a Compton wavelength equal to the wavelength of the light, then the comparable energy scales can
create new particles. Thus, it is difficult to isolate a particle to length scales shorter than its Compton wavelength
without causing damage to it!
Around the time that Einstein discovered general relativity, he was also thinking about photons again – throughout
his life, he was very suspicious about the quantum theory and spent much of his time thinking about it. In 1916, he
proposed that not only do these photons (which weren’t yet called photons at the time) have energy, but they also
serve as quanta of momentum. Specifically, there is a relativistic relation
E 2 − p2 c 2 = m2 c 4
2
between the energy and the momentum of a particle (this comes from the two formulas E = qmc and p⃗ = qm⃗
v
,
2 2
1− vc 2 1− vc 2
which reduce to E = 21 mv 2 and p⃗ = m⃗
v in the low-velocity limit). Thus, if we know the momentum and the energy
of a particle, we also know its mass. Since photons have zero mass, we must have
mγ = 0 =⇒ Eγ = pγ c,
Eγ hν h
pγ = = = .
c c λγ
This looks similar to the Compton wavelength relation, but it was still not strong enough evidence until Compton
scattering was studied:
Example 30
Consider x-rays shining on atoms, where the incoming photons are comparably energetic (100 eV to 100 keV,
while binding energies of atoms are on the order of 10 eV).
If these photons are high enough in energy, then the resulting excited electrons are almost as if they started off
being free. In such a situation, we have a violation of the classical Thompson scattering, which was what caused
15
physicists to finally accept the wave-like nature of photons – those scattering phenomena could be calculated just like
scattering phenomena of other particles.
In the classical case, we think of light as a wave, in which the magnetic field doesn’t do very much (because we
have a low-energy electron) and the electric field shakes the electron: the resulting cross-section is
2
e2
dσ 1
= (1 + cos2 θ)
dΩ mc 2 2
as a function of the angle θ between the incident angle of the wave and the direction that the photon is scattered in.
If we haven’t seen this kind of formula before, the units are area per solid angle, which is the same as just area; what
this cross-section measures is, for a given solid-angle region dΩ, the surface area dσ from the incoming beam that
scatters into that solid-angle. That area then corresponds to an energy (because a certain amount of energy from the
electromagnetic wave goes into each area), and thus with all of this we get a formula for the intensity of radiation
as a function of the solid angle. But another property of this classical case is that the frequency of the outgoing wave
(after hitting the electron) is the same as the frequency of the incoming wave, because the electron is being driven at
the same frequency as the light wave.
It turns out that at high energies, the result actually looks very different! If we treat a photon as a particle, so
that we have a particle-particle collision, then both particles have some energy and momentum and we should use
conservation laws. Before the collision, we have a photon of energy E and momentum p and an electron at rest,
and after the collision both particles are moving. It’s an exercise to show that the photon can’t be observed by those
conservation laws, and because the electron now has kinetic energy, the photon must lose some energy, meaning that
λf > λi (longer wavelength means lower energy), and in fact we have
h
λf = λi + (1 − cos θ)
me c
h
where me c is the Compton wavelength λc (e) for the electron. In other words, if θ = 0, the photon keeps going and
basically doesn’t kick the electron, resulting in no change. However, if the photon bounces totally backwards, then
θ = π and the wavelength increases by 2λc .
Fact 31
This Compton scattering experiment was in fact performed, and let’s describe briefly how it was set up. Molyb-
denum X-rays with λ = 0.0709 nm (corresponding to an energy of Eγ = 17.49 keV) hit a piece of carbon foil.
Then the relation between intensity and outgoing λ turns out to have two peaks: the smaller one occurs around
λi = 0.0709 nm, while the larger peak occurs at λf = 0.0731 nm.
Noticing that the shift in wavelength is 0.0022 nm, this is pretty close to the Compton wavelength calculated in
Example 28, meaning that the larger peak is indeed corresponding to what we expect (a shift on the order of λc ).
But the smaller peak needs slightly more explanation, and this is where Louis de Broglie’s work of 1924 comes into
consideration.
Treating light as both a particle and a wave means that it will have characteristics of both (light exhibits interference,
but it also comes in packets), and de Broglie inferred that this must be a more general property – he took the
fundamental step of claiming that all matter particles behave as waves, not just light. This is particularly interesting,
because in his description, the wave that’s being associated to light is not the electromagnetic wave – it’s the probability
amplitude wave (which are the numbers that we were tracking with the interferometer last lecture)! And that
wavefunction was not known about yet, so what was being unsaid in the following conjecture was the question of “of
what?” (that is, what is the wave made of).
16
Conjecture 32 (de Broglie (1924))
The wave-particle duality is universal for all matter: for each particle of momentum p, we have an associated
plane wave of wavelength λp = ph , known as the de Broglie wavelength.
At the time, there was no experimental evidence for the claim, but experiments came a few years later – it was
seen that electrons could be diffracted in ways so that they would collide in lattices like waves! (We can read about the
Davisson-Germer experiment, for example.) Over time, the two-slit diffraction experiment was performed with larger
and larger particles – in fact, in the last few years, molecules of weight around 10000 atomic mass units have exhibited
interference patterns. And we should remember that these interference patterns come from a particle’s wavefunction
interfering with itself! We’ll see next lecture how this leads us to the Schrodinger equation and the wave-like nature
of matter.
Example 33
For simplicity, consider non-relativistic (Galilean) physics. Suppose that from our perspective, we have a particle
of momentum p, and there is a “boosted observer” moving at constant velocity v relative to us. We wish to
consider the differences in observed wavefunctions between the two observers.
h h 2π
Because p = λ = 2π λ , we can rewrite the momentum as p = ℏk , where ℏ is the usual reduced Planck’s constant
and k is the wavenumber (the number of periods per unit distance). Call our frame S, and call the other frame S ′ ; fix
the coordinates so that S ′ is moving with velocity v in the x-direction. Then after t seconds, the two frames are v t
units apart, and our particle of mass m will have velocities v and v ′ in frames S and S ′ , respectively, and suppose that
it has momenta p and p ′ respectively as well. We can now relate all of our coordinates with Galilean transformations
(which are accurate enough for low velocities):
x ′ = x − v t, t ′ = t.
dx ′ dx
Taking a time-derivative, we have dt ′ = dt − v , so v ′ = v − v , which we expect from our usual concept of relative
motion. Therefore,
p ′ = mv ′ = mv − mv = p − mv ,
17
so the “moving observer” will see a different de Broglie wavelength of
h h h
λ′ = = ̸= = λ.
p′ p − mv p
But if this were a familiar propagating wave like a sound or water wave, the relation between λ and λ′ would see
a Doppler shift for the frequency, but the wavelength would not change! Indeed, consider an ordinary wave under
Galilean transformations, and consider its phase kx − ωt (where k is the wavenumber and ω is the angular frequency).
Waves can then be expressed as sines, cosines, or exponentials of this phase, and what’s important is that the phase
of a wave is a Galilean invariant (if two people look at the same point on a wave, then they will always agree on the
value of that phase – for example, both observers will agree that the wave has a maximum or minimum at a particular
point). In particular, we can write
ω 2π 2πx 2πV
φ=k x− t = (x − V t) = − t.
k λ λ λ
ω 2πf
where we use the fact that k = 2π/λ = λf = V is the speed of the wave (not the same as the observer’s relative
velocity v ). Since this quantity is Galilean invariant, observers at S and S ′ should see the same phase for the same
point at the same time, meaning that
2π 2π ′ 2π ′
φ= (x − v t) = (x + v t − V t) = x − (V − v )t ′ ,
λ λ λ
which further simplifies to
2π ′ 2π v ′
= x − V 1− t,
λ λ V
2π ′ 2πV
should be equal to the phase φ′ = λ x − λ t ′ seen by the moving observer. In particular, the wavenumber and
2π 2π
angular frequency satisfy k ′ = ′
1 − Vv , so that
λ and ω = λ V
v
ω′ = ω 1 − , k ′ = k =⇒ λ′ = λ
V
for an ordinary wave moving in a medium. So the takeaway is that our de Broglie matter wave does not behave like
an ordinary sound wave – instead, two observers may observe different values for the wavefunction, meaning Ψ
is not directly measurable. (In fact, we’ve already seen some hints of this – for example, we mentioned that we can
multiply our wavefunction by constants, including phases, and that doesn’t change the physical meaning.) Additionally,
wavefunctions are not Galilean invariant, but (as an exercise left for us) the values of the wavefunction between
two observers are still related.
Notice that while we’ve talked a lot about the wavelengths of the de Broglie waves, we haven’t discussed their
frequencies yet. de Broglie did in fact answer that as well: since p = ℏk, we’ll also set E = ℏω, and thus the frequency
E
of a matter wave is ω = ℏ, where E is the particle’s energy. This is one of the postulates of quantum mechanics, and
we’ll try to explain why it makes sense now.
Definition 34
The phase velocity of a wave with phase kx − ωt is
ω
vphase = .
k
The phase velocity is the velocity with which the nodes and extrema of the wave will move. If we’re working
18
non-relativistically, our de Broglie waves satisfy
1
ω E mv 2 1
vphase = = = 2 = v,
k p mv 2
and something may seem to be weird because the de Broglie wave looks like it’s moving at half the speed of the particle.
But that’s not unexpected – plane waves themselves don’t carry any signal, and instead we should be representing
particles with a wave packet. So phase velocity is not a very meaningful physical quantity, and instead we should use
the following:
Definition 35
The group velocity of a wave with phase kx − ωt is
dω
vgroup =
dk k
dω dE d(p 2 /2m) p
vgroup = = = = = v,
dk k dp dp m
Fact 36
If we look at the relativistic version of energy and momentum and do all of these calculations again, we’ll again
find the the group velocity vgroup lines up with the velocity v of the particle. And in fact,there is some motivation
from relativity here – in special relativity, Ec , p⃗ form a four-vector, and similarly ωc , ⃗k also form a four-vector,
so it makes sense to set the two four-vectors proportional to each other through this constant ℏ.
(In our equations above, like p = ℏk, we really have vector identities in multiple dimensions and p, k represent the
corresponding vector magnitudes.) The above argument, combined with Einstein’s claim of E = hν = ℏω for photons,
E
provides even more justification for why these de Broglie waves indeed have angular frequency ℏ.
We’ll now return more to our discussion of group velocity. There are certain waves where given the wavenumber
k, we can write down the corresponding ω = ω(k): for example, for light waves we have ω = kc, but for other waves
the relation is more complicated (for example, ω ∝ k 2 in mechanics, because E ∝ p 2 ). Then in general, the group
velocity represents the velocity of a wave packet constructed via superposition, taking the form
Z
Ψ(x, t) = Φ(k)e i(kx−ω(k)t) dk,
where we integrate as a continuous sum, where e i(kx−ω(k)t) represents a wave at wavenumber k and angular frequency
ω(k), and Φ(k) is the amplitude of the wave at wavenumber k. Specifically, being able to talk about a group velocity
means that we consider the case where Φ(k) is peaked at some value k0 , and we want to see how the superposition
moves in time. We can answer that question quickly or more rigorously:
Quick explanation of group velocity. We can use the principle of stationary phase, which is essentially just mathe-
matical intuition. If we look at an expression like f (x) sin x, where f is some positive-valued function, we’ll have a
function which is positive half the time and negative half the time, so it will contribute very little to the integral if f is
slowly varying relative to sin x (because adjacent parts will cancel).
19
The principle of stationary phase then basically says that integrating something like f (x) sin x only gives us a
contribution when the wave varies slowly (and thus the “phase is stationary”). So in the integral above, we can just
integrate from k0 − δ to k0 + δ, since the only place where the peaked Φ(k) contributes is around k = k0 , and then we
require that Φ is stationary with respect to k. And that means our wavefunction is only significantly nonzero around
k0 if
dφ(k) dω(k)
0= =x− t = 0,
dk k0 dk k0
dω
meaning that x = dk k0 t and the group velocity is indeed the speed at which the wave packet is propagating.
We’ll now actually evaluate the integral and show that the shape of the wave does move with velocity vgroup :
More rigorous derivation of group velocity. Start with the above equation for the wavefunction
Z
Ψ(x, t) = Φ(k)e i(kx−ω(k)t) dk.
Evaluating at t = 0, we have Z
Ψ(x, 0) = Φ(k)e ikx dk,
which we’ll come back to later. Turning back to the original integral, we do a Taylor expansion
dω
ω(k) = ω(k0 ) + (k − k0 ) + O((k − k0 )2 ),
dk k=k0
since the values of k that matter are those near k0 anyway. (And with this we already see the group velocity popping
up, which is a good sign.) Splitting up the exponential, we find
Z
−ik dω
|t ik0 dω
| t
Ψ(x, t) = dkΦ(k)e ikx e −iω(k0 )t e dk k0
e dk k0
e (negligible)
(we note here that the negligible term is important if we care about the distortion of the wave pattern, which we
won’t talk about for a few lectures). This integral may look difficult, but the e −iω(k0 )t factor can come out, and so
ik0 dω | t
can the e dk k=k0 factor, meaning that we’re left with
Z
ik0 dω | t dk |k0
−ik dω t
= e −iω(k0 )t e dk k0 dkΦ(k)e ikx e .
But now the first two terms in front of the integral are pure phases, and the integral resembles the wavefunction at
t = 0 (no ω(k) factor), so if we take magnitudes on both sides, we arrive at
dω
|Ψ(x, t)| = Ψ x − t, 0 .
dk k0
In other words, the norm of the wave at time t looks like the norm of the wave at time 0, but with an additional
displacement distance of vgroup t. So a peak that started at x = 0 at t = 0 will move to x = vgroup t at time t, indeed
showing that the shape of the wave packet moves at the group velocity.
With all of that discussion done, we’re finally ready to write down the equation of a matter wave. If we have a
particle of energy E and momentum p, we know that E = ℏω and p = ℏk, and we’re going to make an argument
based on superposition and probability to find the shape of the wave. If we want a plane wave in the x+ direction,
we might have a wave of the form sin(kx − ωt), cos(kx − ωt), e ikx−iωt , or e −ikx+iωt (all of these are periodic functions
that depend on the phase kx − ωt). But suppose we want a wavefunction of a particle with equal probability of
20
traveling to the right or to the left. Then using sin requires us to write down a wavefunction like
(since having equal coefficients gives us “equal probability” of moving in each direction), but expanding this gives us
π 3π
which is not acceptable because this wavefunction is identically zero at ωt = 2, 2 ,···, meaning that the particle
has completely vanished. So that kind of wave (and similarly trying to produce a wavefunction out of cosines, yielding
2 cos kx cos ωt) won’t work for us, and instead we’ll want a wavefunction of the form
This wavefunction now never vanishes for any value of t, since the t-dependence is always a phase, so there’s nothing
problematic that occurs here! Similarly, we can also use a +i ωt in the exponential and write
and again we have not run into trouble yet. But we can’t use both of these kinds of matter waves at once – indeed,
if both of them were correct, then we could superimpose e ikx−iωt and e −ikx+iωt (both representing a particle moving
to the right), and if both represent the same state then the superposition should also give us a particle moving to the
right. But instead, we get
e ikx−iωt + e −ikx+iωt = 2 cos(kx − ωt),
and we’ve already seen that we can’t have that be a wavefunction that represents a right-moving particle. All of this
is to say that we must choose one of the exponentials to be the wavefunction for a de Broglie matter wave, and by
convention the following is decided:
Ψ(x, t) = e ikx−iωt .
Next lecture, we’ll determine the wave equation that this matter wave satisfies, and that will lead us to the
Schrodinger equation.
21
take an x-derivative to get out a factor of k:
ℏ ∂
Ψ(x, t) = ℏke ikx−iωt = pΨ(x, t).
i ∂x
ℏ ∂ ∂
In other words, acting with the differential operator i ∂x = −i ℏ ∂x on the wavefunction gives us the momentum (a
number) times that same wavefunction – thus, remember that operators take in functions and give us back functions,
so this is the closest we’ll get to extracting a pure quantity out of the wavefunction.
Definition 38
The momentum operator, denoted p̂, is the differential operator
ℏ ∂
p̂ = .
i ∂x
For our free particle, we then get the equation p̂Ψ(x, t) = pΨ(x, t) , and when we have an equation of this form
(operator on function equals number times function), the function Ψ(x, t) is called an eigenstate of the operator p̂
with eigenvalue p. (This language is motivated by linear algebra, in which we have eigenvalues and eigenvectors of a
matrix M when there are equations of the form Mv = λv .) Not all wavefunctions are eigenstates, just like most vectors
are not eigenvectors of a given matrix (the matrix will usually rotate the vector in some way), but the eigenstates will
be important for us in describing physical characteristics. In particular, what we’re saying is that Ψ(x, t) is a state of
definite momentum: if we measure the momentum of the state, we will always find p (with no uncertainty). And of
course, that’s what we want, because this is supposed to be the wavefunction of a free particle of momentum p.
We’ll now consider another aspect of this wavefunction: since we can extract the momentum p, it makes sense to
also extract the energy E of the particle. Notice that (because Ψ = e ikx e −iωt ) taking a time-derivative yields
∂
iℏ Ψ = (i ℏ)(−i ω)Ψ = ℏωΨ = EΨ.
∂t
∂
The result i ℏ Ψ = EΨ is another eigenvalue equation, telling us how a wavefunction of energy E evolves over time.
∂t
p2
But this time, we can extract even more physics: remember that for a nonrelativistic particle, we have E = 2m , and
we want to capture this dependence in some way. Specifically, we’ll try to write down another operator O satisfying
p2
OΨ = EΨ, and because E = 2m and pΨ = p̂Ψ, we have
p p ℏ ∂
EΨ = (pΨ) = Ψ
2m 2m i ∂x
(where we’ve replaced one copy of p with a p̂), and because the other p is still a constant we can move it inside the
derivatives and other constants to get
21 ℏ ∂ 1 ℏ ∂ ℏ ∂
= (pΨ) = Ψ,
2m i ∂x 2m i ∂x i ∂x
and now setting the first and last expressions equal gives us the equation
ℏ2 ∂ 2
− Ψ = EΨ ,
2m ∂x 2
22
Definition 39
The energy operator for a free particle, denoted Ê, is the differential operator
ℏ2 ∂
Ê = − .
2m ∂x 2
In particular, Ψ is an energy eigenstate (of energy E), or equivalently Ψ is a state of definite energy, and from
1 2
the way we defined the energy operator we also have Ê = 2m p̂ . And if we now look at the two eigenvalue equations
involving E, we arrive at the equation
ℏ2 ∂ ∂
− 2
Ψ = iℏ Ψ .
2m ∂x ∂t
This is the free Schrodinger equation, and it carries a lot of information. For example, it tells us the relation between
k and ω for a de Broglie matter wave, because trying the solution Ψ = e ikx−iωt gives us
ℏ2 ℏ2 k 2 p2
i ℏ(−i ω)Ψ = − (i k)2 Ψ =⇒ ℏω = ⇐⇒ E = .
2m 2m 2m
So the differential equation admits plane waves as solutions but constrains the momentum and energy of our particles,
and thus we really are making progress towards describing our general wavefunctions. In particular, notice that this
free Schrodinger equation is linear, so superpositions of solutions Ψ are also solutions, and linearity tells us that sums
of plane waves are also solutions to the free Schrodinger equation. Fourier theory then tells us that given plane
waves, we can construct whatever wave packet we want, and then we know how to evolve the wavefunction over time
because we’ve already described how to evolve each plane wave over time!
Remark 40. This kind of logic is common in physics: we take little pieces of evidence and put them together, and
even if they don’t completely rigorously lead to a given conclusion, we will have good reason to believe the end result
(in this case, the Schrodinger equation).
Rephrasing the paragraph above, our most general solution Ψ(x, t) to the free Schrodinger equation is (integrating
over waves of different wavenumbers)
Z ∞
Ψ(x, t) = Φ(k)e ikx−iω(k)t dk .
−∞
dω dE p
As previously discussed, such a wave moves with a group velocity dk k0 = dp = m = v if Φ(k) is localized at k0 (and
if it is not localized then the group velocity does not make sense). We can also make a few more observations:
• The “full wavefunction” Ψ cannot be purely real, because the left-hand side of the free Schrodinger equation
would then be purely imaginary and the right-hand side would be purely real, but Ψ cannot be identically zero.
(Later on, we’ll talk about time-independent wavefunctions, and those will be allowed to be purely real.)
• Even though we have a wave moving at some speed v , the free Schrodinger equation is not like the usual wave
∂2Ψ 1 ∂2φ
equation ∂x 2 = v 2 ∂t 2 = 0, both because of the number of derivatives and because there is no i in the ordinary
wave equation.
p̂ 2
Taking another look at the energy operator Ê = 2m (which looks like our usual “kinetic energy” in classical
p2
mechanics), the next natural step is to add in a potential V (x, t), so that our total energy is E = 2m + V . It then
makes sense to modify our energy operator definition, and there is a different naming convention in this general case:
23
Definition 41
The Hamiltonian operator, denoted Ĥ, is the differential operator
p̂ 2
Ĥ = + V (x, t).
2m
The Hamiltonian basically represents the energy in terms of position x and momentum p, and thus we’ll need to
talk about the position operator as well. But first, this allows us to write down the general Schrodinger equation:
ℏ2 ∂
∂Ψ
iℏ = − + V (x, t) Ψ.
∂t 2m ∂x 2
One thing that we may be surprised about is that we multiply our potential V by the wavefunction Ψ, but that’s
the simplest way that we can keep the differential equation linear – the Hamiltonian must still be a linear operator. In
particular, we should think of V not just as a function but also as an operator (which just multiplies a function f (x, t)
by V (x, t) to get f (x, t)V (x, t)).
Fact 43
Quantum mechanics is about inventing energy operators, solving the resulting Schrodinger equation and
finding the valid wavefunctions. In particular, we can consider different potentials V , and the method for solving
the resulting Schrodinger equation will vary and give us very different solutions.
As 8.04 goes on, we’ll see many of these methods employed, but now that x has showed up we’ll spend some time
talking about its role in all of this.
Definition 44
The position operator, denoted x̂, multiplies functions by x:
The reason we need to be careful with how we write this out is that it illuminates a relation between the position
(x̂) and momentum (p̂) operators. One important property of operators (which we may remember from linear algebra
in the context of matrices) is that the order in which they are applied or multiplied matters. Looking at operators
is how Heisenberg arrived at quantum mechanics, and that’s discussed more in 8.05, but right now we’re interested in
looking at whether two operators commute (that is, whether the order in which they are applied matters).
Example 45
We determine whether x̂ and p̂ commute by computing the difference between x̂ p̂φ and p̂x̂φ (for some function
φ(x, t)).
This computation is straightforward, but we need to be careful with it: we wish to evaluate the expression
x̂ p̂φ − p̂x̂φ,
24
but when we write down an expression like ÂB̂φ for operators Â, B̂, we really mean Â(B̂φ) (because we’re first applying
B̂ to φ, and then we apply  to that result). Thus, we are evaluating
ℏ ∂
x̂(p̂φ) − p̂(x̂φ) = x̂ φ − p̂(xφ),
i ∂x
and now both terms in parentheses are functions of x and t, so we can apply the definitions of x̂ and p̂ again to get
ℏ ∂ ℏ ∂
= x φ− (xφ).
i ∂x i ∂x
We now use the product rule on the second term, and one of the two parts will cancel with the first term:
ℏ ∂ ℏ ℏ ∂
= x φ − φ − x φ = i ℏφ .
i ∂x i i ∂x
Equating the positive parts, and using the fact that (Â + B̂)φ = Âφ + B̂φ, we find that
(x̂ p̂ − p̂x̂)φ = i ℏφ
for any function φ, meaning that the operators acting on φ must also be identical. We’ll use the following notation
throughout the rest of the course:
Definition 46
The commutator of two operators  and B̂, denoted [Â, B̂], is the operator ÂB̂ − B̂ Â.
Proposition 47
We have the equality of operators
[x̂, p̂] = x̂ p̂ − p̂x̂ = i ℏ.
This nonzero commutator really encodes the fact that the two operators interact nontrivially, and that will be the
basis of the uncertainty principle later on, as well as the matrix formulation of quantum mechanics. (Remember
that operators, wavefunctions, and eigenstates correspond to matrices, vectors, and eigenvectors, respectively.)
Example 48
A set of famous matrix commutators comes from the Pauli matrices (which we’ll talk about in 8.05, encoding
the spin of spin 1/2 particles)
" # " # " #
0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = .
1 0 i 0 0 −1
⃗ = ℏ⃗
Specifically, these matrices lead us to the spin operator S 2 σ.
We won’t talk much more about spin here in 8.04, but we’ll compute the matrix commutators: we can verify that
" # " #
i 0 −i 0
σ1 σ2 = , σ2 σ1 = ,
0 −i 0 i
25
And even though the commutator of Pauli matrices gives us another Pauli matrix, rather than a number like in the
[x̂, p̂] = i ℏ case, we’re going to see many more complications with the position and momentum operators – it’s much
harder to understand fully what that commutator means! In particular, in 8.05, we’ll talk more about linear algebra
and explain how we can write operators like x̂ and p̂ using matrix representations, but those matrices will need to be
infinite-dimensional. (This is a mathematical exercise – there are no two finite-dimensional matrices whose commutator
is a multiple of the identity.) So something strange is going on, but we’ll still get more familiar with position and
momentum and see where it leads us.
We can generalize the one-dimensional Schrodinger equation from Theorem 42 (to avoid focusing on a simple case,
ℏ ∂ ℏ ∂
and because we actually live in three dimensions): instead of having p̂ = i ∂x , we now have p̂x = i ∂x (for momentum
ℏ ∂ ℏ ∂
along the x-direction) and similarly p̂y = i ∂y and p̂z = i ∂z . (This corresponds to the fact that we can write a de
i⃗ x −iωt
k·⃗
Broglie wave as e , and then we can expand the dot product out across the three components.) To make the
ℏ ∂
notation easier, we will sometimes replace x, y , z with x1 , x2 , x3 , so that we can just write pk = i ∂xk (where k ranges
from 1 to 3). Thus, our momentum operator now operates on vectors, and we have
ℏ
p⃗ˆ = ∇.
i
⃗ ⃗
(Indeed, we can check that p⃗ˆe i k·⃗x −iωt = ℏ⃗ke i k·⃗x −iωt , and thus we still have a valid egienvalue equation.) The
Schrodinger equation does not become much more complicated, because the Hamiltonian operator is now
2
p⃗ˆ
Ĥ = + V (⃗
x , t),
2m
and the squared vector operator really means that we have
2 ℏ ℏ
p̂⃗ = ∇ · ∇ = −ℏ2 ∇2 ,
i i
where ∇2 is the Laplacian operator. This gives us the following general three-dimensional Schrodinger equation:
ℏ2 2
∂Ψ
iℏ = − ∇ + V (⃗
x , t) Ψ.
∂t 2m
This three-dimensional equation follows pretty naturally from the one-dimensional equation, but we’re introducing
it now so we keep it in mind throughout all of our work solving differential equations in this class. And the commutation
relation [x̂, p̂] = i ℏ now still holds for x̂k and p̂k for each k, but operators like p̂x and ŷ do commute (because y is a
constant with respect to the x-derivative). We can summarize the nine different commutators between position and
momentum operators in the concise statement
where δij is the Kronecker delta symbol which returns 1 if i = j and 0 otherwise.
Our final task in this lecture will be to return to the wavefunction and understand its interpretation. We’ve
taken the de Broglie matter wave and used it to obtain a differential equation, but we still don’t know what Ψ actually
means. Schrodinger initially thought that the spread of the wavefunction (under the Schrodinger equation) represented
disintegrating particles, so that more Ψ represents more of the particle. But physicists found that when a particle
26
1
comes in and hits a Coulomb potential, the wavefunction falls away as r, but that is in fact not what happens –
experimentally, the particle chooses a direction to go in and stays localized.
So Born’s interpretation of the wavefunction in terms of probability was the one that made sense – Schrodinger,
Einstein, and many others hated it, but over time physicists had to agree that it was right. The statement is essentially
that Ψ(x) gives us the probability (density) of finding the particle at position x at time t; more rigorously, this
means that if we look at an infinitesimally small box d 3 x around a point x, then the probability of finding the particle
inside that cube is
x , t) = |Ψ(x, t)|2 d 3 x.
dP (⃗
In particular, for this to be a valid probabilistic interpretation, the probability of finding the particle anywhere in space
must be 100% (we must measure it to be somewhere), and thus
Z
1= |Ψ(x, t)|2 d 3 x .
R3
But already this interpretation adds a few complications. In particular, because the Schrodinger equation tells us how
to evolve our wavefunction forward in time, knowing Ψ(x, t0 ) for all x allows us to know Ψ(x, t) for all x and all t > t0 ,
thus specifying the wavefunction for all future times. But for the Born interpretation to make sense, we must ensure
that as long as the boxed equation holds at some time t0 , it will still hold at all future times t > t0 (in other words,
the normalization of our wavefunction preserves probability). We’ll do that discussion next time!
Remark 50. While there are technically mathematical examples where Ψ(x, t) does not have a limit as x → ∞, and
the integral of |Ψ|2 still converges, those examples do not pop up in any physical contexts and thus we are (mostly)
safe to ignore them.
In addition, we’ll require that the function Ψ does not oscillate too quickly: we also ask that
∂Ψ(x, t)
is bounded as x → ±∞.
∂x
(Since the Schrodinger equation has derivatives in it, we should not be surprised that we need some regularity in Ψ of
this form.)
We should now turn back to a statement we made a few lectures ago, in which we stated that Ψ and cΨ always
27
R∞
represent the same state. Multiplying Ψ by a constant c multiplies the integral −∞ |Ψ|2 by a factor of |c|2 , so it may
seem weird that the normalization condition holds for one but not the other if they’re the same state. That can be
clarified through the following statement:
Definition 51
R∞
For a wavefunction Ψ(x, t) with −∞ |Ψ(x, t)|2 dx equal to some finite number N, we say that Ψ is normalizable,
and we let Ψ′ = √Ψ be the associated normalized wavefunction.
N
In particular, we have
∞ ∞
|Ψ(x, t)|2
Z Z
′ 2 N
|Ψ (x, t)| dx = dx = = 1,
−∞ −∞ N N
′
so Ψ is indeed normalized and is equivalent to the original wavefunction Ψ. And whenever we work with probabilities
and probabilistic statements, we’re pre-normalizing our wavefunctions Ψ – in principle, we can always work with
√
normalizable wavefunctions and divide by N only when needed. But we’ll be flexible and do whatever is more
convenient for us, because multiplying by a constant does not change whether a wavefunction is normalizable or not.
We’re now ready to do the check of conservation of probability under the Schrodinger equation. We start with
a normalized wavefunction at time t0 , so that (noting that |Ψ|2 = ΨΨ∗ )
Z
Ψ∗ (x, t0 )Ψ(x, t0 )dx = 1.
Here, we’ll let Ψ∗ (x, t0 )Ψ(x, t0 ) be called the probability density (whose interpretation we’ve already discussed), and
Z
we denote it by ρ(x, t). Then we can define N(t) = ρ(x, t)dx , and our goal is to show that N(t) = 1 for all
t > t0 , given that N(t0 ) = 1. Rephrasing the question, we thus wish to show that the Schrodinger equation
dN
guarantees dt = 0.
To check this, we write out the definition of N in the derivative: by differentiation under the integral sign and the
product rule, we have
∂Ψ∗
Z Z
dN ∂ ∂Ψ
= ρ(x, t)dx = Ψ + Ψ∗ dx.
dt ∂t ∂t ∂t
We now have a time-derivative of Ψ, so here is where we can use the Schrodinger equation
∂Ψ ∂Ψ i
iℏ = ĤΨ ⇐⇒ = − ĤΨ .
∂t ∂t ℏ
But we also have a time-derivative of the complex conjugate Ψ∗ , so we want to take a complex conjugate of the
equation above, which gives us
∗
∂Ψ∗
∂Ψ i
−i ℏ = (ĤΨ)∗ ⇐⇒ = (ĤΨ)∗ ,
∂t ∂t ℏ
because the complex conjugate of the partial derivative is also the partial derivative of the complex conjugate. Plugging
those boxed expressions back into our calculation above yields
Z
dN i
(ĤΨ)∗ Ψ − Ψ∗ (ĤΨ) dx,
=
dt ℏ
and in order for this to be zero, it’s equivalent to ask for the identity
Z Z
(ĤΨ)∗ Ψdx = Ψ∗ (ĤΨ)dx
28
to hold, where on one side the conjugate is with the Ĥ term, and on the other it’s not. So conservation of probability
has been rephrased into a condition on the Hamiltonian Ĥ, and it turns out a situation where we do have this condition
is when Ĥ is a Hermitian operator. More generally, Hermitian operators should satisfy
Z Z
(ĤΨ1 )∗ Ψ2 = Ψ∗1 (ĤΨ2 )
dN
for all wavefunctions Ψ1 , Ψ2 – applying this to Ψ1 = Ψ2 = Ψ gives us dt = 0 above. But the actual definition is more
general:
Definition 52
For any (linear) operator T , the Hermitian conjugate of T , denoted T ∗ or T † , is the linear operator that satisfies
Z Z
Ψ1 T Ψ2 = (T † Ψ1 )∗ Ψ2 .
∗
It’s useful to think of Hermitian conjugates as similar to complex conjugates, so that Hermitian operators are similar
to real numbers. We’ll talk more about this later on, especially in 8.05, but for now we’ll get back to the calculation,
ℏ ∂ 2 2
since we know the actual formula for our Hamiltonian. Plugging in the expression Ĥ = − 2m ∂x 2 + V (x, t) yields
Z Z
dN i i
(ĤΨ)∗ Ψ − Ψ∗ (ĤΨ) dx = (ĤΨ)∗ Ψ − Ψ∗ (ĤΨ) dx
=
dt ℏ ℏ
ℏ2 ∂ 2 Ψ∗ ℏ2 ∗ ∂ 2 Ψ
Z
i ∗ ∗
= − Ψ + V (x, t)Ψ Ψ + Ψ − Ψ V (x, t)Ψ dx,
ℏ 2m ∂x 2 2m ∂x 2
where we’ve used the fact that V (x, t) is always real (because it’s some energy). Since V (x, t) is just some number,
the second and fourth terms in the parentheses cancel out, and we’re left with
Z 2 ∗ 2
iℏ ∂ Ψ ∗∂ Ψ
=− Ψ − Ψ dx.
2m ∂x 2 ∂x 2
This integrand is not zero, but one common way in physics to show that an integral vanishes is to show that the
integrand is actually a total derivative (and show that the boundary terms also vanish). Indeed, we can verify directly
that
∂ 2 Ψ∗ 2
∂Ψ∗
∗∂ Ψ ∂ ∂Ψ
2
Ψ − Ψ 2
= Ψ − Ψ∗
∂x ∂x ∂x ∂x ∂x
by expanding out the product rule and noting that two of the terms cancel out and the other two give us the desired
iℏ ℏ
integrand. Plugging back in and rewriting − 2m as 2im , we have
∂Ψ∗
Z
dN ∂ ℏ ∗ ∂Ψ
= − Ψ −Ψ dx,
dt ∂x 2i m ∂x ∂x
so the final integral is just the difference of the bracketed term at x = ∞ and x = −∞; both of those are zero by our
∂Ψ ∂Ψ∗
assumptions that Ψ (and therefore Ψ∗ ) go to 0 at the boundaries, while ∂x (and therefore ∂x ) are bounded, so both
product terms go to 0. This finishes our proof, and we’ve indeed verified that the wavefunction remains normalizable.
But along the way, we’ve found something useful for our discussion: notice that inside the total derivative, we have
an expression of the form z −z ∗ , where z = Ψ∗ ∂Ψ ∗
∂x . Since z = a +bi =⇒ z −z = (a +bi )−(a −bi ) = 2bi = 2i Im(z),
what we’ve actually found is that (reminding ourselves that ρ(x, t) is the probability density)
∂ρ ∂ ℏ ∂Ψ
=− Im Ψ∗ .
∂t ∂x m ∂x
29
Definition 53
The current density J(x, t) for the wavefunction Ψ(x, t) is given by
ℏ ∗ ∂Ψ
J(x, t) = Im Ψ .
m ∂x
∂ρ ∂J ∂ρ ∂J
=− =⇒ + =0,
∂t ∂x ∂t ∂x
which is a current conservation statement similar to what we might have seen in electromagnetism – basically, we
begin with a charge density ρ(x, t) (which is really a probability density), and we arrive at a current J(x, t). We’ll
discuss units briefly here – since |Ψ|2 = 1, and we’re integrating in one dimension, Ψ has units of √1 , and thus
R
[L]
[M][L]2 [L]2
∂ 1 ℏ
Ψ∗ Ψ = , [ℏ] = =⇒ = ,
∂x [L]2 [T ] m [T ]
Remark 54. This same probability conservation statement can be derived in three dimensions as well, and that’s
mostly left as an exercise to us. Basically, the three-dimensional current is now vector-valued, involving a divergence:
⃗ t) = ℏ Im (Ψ∗ ∇Ψ) ,
J(x,
m
so that our current conservation equation becomes
∂ρ
+ ∇ · J⃗ = 0.
∂t
This last equation should look familiar to us from 8.02 – it’s how Maxwell discovered the displacement current. In this
case, we now have [J]⃗ = 21 (so the units of current are probability per unit area per unit time).
[L] [T ]
where J indeed vanishes at x → ±∞ because of our boundary conditions on the wavefunction Ψ, we’ve checked that
N is conserved and thus our wavefunction stays normalizable (thanks to the existence of a probability current). And
because the integral canceled out nicely enough when we used the wavefunction Ψ, it’s reasonable to suspect that the
Hamiltonian Ĥ is actually Hermitian (satisfying the equation in Definition 52) – that’s an exercise that we can verify
ourselves.
We’ll finish this lecture by discussing the connection of current conservation to electromagnetism in more detail,
which should help clarify some of the intuition. The probability density ρ(x, t) is directly analogous to the charge density,
and (integrating that density) the probability to find the particle in some volume is analogous to the charge contained
within in some volume. Additionally, the probability current density is just like the current density in electromagnetism.
The key point is that the existence of a probability current implies local probability conservation, which is
30
dN ∂ρ ∂J
stronger than just saying that dt = 0. Indeed, the differential relation ∂t + ∂x = 0 tells us that probability changes
somewhere in space are due to some nonzero current at that point, just like a change of charge density is due to some
current in electromagnetism.
In particular, the probability that a particle can be found between positions a and b will change only if there is a
difference in the currents at the endpoints – let’s see this in more detail. If we have some volume V , and we consider
the charge in that volume Z
QV (t) = ρ(x, t)d 3 x,
V
then we can calculate the change in time of the current via
Z Z
dQV (t) ∂ρ 3 ⃗ 3x
= d x =− (∇ · J)d
dt V ∂t V
(where we’ve used current conservation), and now by Gauss’s law we can relate the divergence to a surface integral
on the boundary S of V ,
Z
= − J⃗ · d A
⃗.
S
Indeed, the equality of boxed statements here says that charge is never created or destroyed, so current must escape
through the surface in order for the charge inside V to change. And what we’re saying here is that the same holds
for probability: if we look at quantum mechanics in one dimension, and we wish to look at the probability
Z b
Pa,b (t) = ρ(x, t)dx
a
In other words, the probability that we find the particle in [a, b] only occurs if probability escapes from the edges:
indeed, if a current is positive at a, then probability is moving into the box [a, b] and thus increasing Pa,b , and if a
current is positive at b, then probability is escaping [a, b] and thus decreasing Pa,b .
31
Theorem 55 (Fourier inversion)
Suppose Ψ(x, 0) = √12π Φ(k)e ikx dk. Then we can recover the coefficients via Fourier inversion:
R
Z
1
Φ(k) = √ Ψ(x, 0)e −ikx dx.
2π
In other words, if we’re given an initial wavefunction (like a sine function, or a Gaussian, or a localized wave packet),
we can find Φ(k) by integration, and then we can reconstruct that wavefunction as an integral of plane waves weighted
by Φ(k). We’ve already talked about a similar idea when discussing group velocity, but for now we’ll focus on how to
understand uncertainties of position and momentum through this result.
Example 56
As before, consider some Φ(k) which has a single peak at some wavenumber k = k0 , so that it has some
uncertainty ∆k which characterizes the typical width of our peak. Assume that Φ(k) is real.
We won’t make a precise definition of what uncertainty means (that will come later), but for the purpose of
intuition, we can imagine taking the full width at half maximum, which is the distance between the two points where
Φ(k) achieves half of its peak value. (We don’t need to worry about things like factors of 2 from the left and right
until future lectures when we’re more rigorous.)
Recalling our discussion of stationary phase, we can then also think about the resulting wavefunction Ψ(x); the
contribution only comes from around k = k0 because that’s the only place where Φ(k) is large. Furthermore, we need
to have a stationary phase at k0 to have a wave packet, and because Φ(k) is real it doesn’t contribute to the phase,
and we just have phase φ = kx at time t = 0 (there is no ωt term). The k-derivative of the phase is then x, meaning
that Ψ(x, 0) must be peaked around x = 0. Thus, we can also associate some uncertainty width ∆x to Ψ(x) at
time t = 0, and we wish to relate ∆x and ∆k in our subsequent discussion.
Fact 57
We’re glossing over a complication here – we claim that Ψ(x, 0) is not real-valued even though Φ(k) is, so we
have to talk about the peak of |Ψ|(x, 0) instead.
32
it must be the zero function). Therefore, we get reality of the wavefunction as long as
Φ(k) = Φ(−k)∗ .
And indeed, this is not a condition that holds for our wavefunction peaked at some wavenumber k0 ̸= 0, so Ψ(x, 0)
will not be real in this case.
We’re thus ready to return to our discussion of uncertainty and relating ∆x and ∆k – for this, we write an arbitrary
wavenumber as k = k0 + k̃, so that we have
Z
1
Ψ(x, 0) = √ e ik0 x Φ(k0 + k̃)e i k̃x d k̃,
2π
where k̃ is now peaked around 0, and thus the relevant range of integration is in an interval of width on the order of
∆k. As we vary k̃ across this width of ∆k, the phase in the exponent of e i k̃x changes by x∆k (we can call this the
“total phase excursion”). But as long as this phase excursion is small enough – for example, if ∆k · x ≲ 1 – we’ll have
a large contribution because the phase doesn’t get a chance to wrap around the full 2π. And if ∆k · x ≫ 1, then the
phase has washed out the contributions of Φ, and thus the wavefunction will turn out to be close to 0. So our final
conclusion is basically the following:
1
This means that the uncertainty ∆x is also roughly on the order of ∆k , and overall we have found that ∆k∆x ≈ 1
for this kind of localized wave packet.
This kind of relation turns out to not be restricted to quantum mechanics – it’s only when we start interpreting e ikx
as a state of definite momentum that we get the physical connection. Since p = ℏk for our matter waves, meaning
that ∆p = ℏ∆k multiplying both sides of the boxed equation gives us
∆p∆x ≈ ℏ.
ℏ
It turns out the more precise result is that ∆p∆x ≥ 2 for any wavefunction, but we’ll need to define uncertainties
more rigorously to explain that in full, and we won’t do that in this lecture.
Example 58
Consider a Φ(k) wave which is basically a rectangular pulse:
√1 − ∆k ∆k
∆k 2 <k < 2 ,
Φ(k) =
0 otherwise.
Because this function satisfies Φ(k) = Φ(−k)∗ (the conjugate doesn’t matter because Φ is real), the resulting
wavefunction will be real; indeed,
∆k ∆k
e ikx
Z
1 2 1 1 2
Ψ(x, 0) = √ √ dk = √ ,
2π − ∆k
2
∆k 2π∆k i x − ∆k
2
33
The typical width of this wavefunction can then be calculated by looking at when it first hits the x-axis, which is at
2π 2π
± ∆k , and thus we find that ∆x ≈ ∆k , and indeed ∆x∆k ≈ 2π regardless of the uncertainty in k.
We’ll now move to discussing how our wavepacket evolves, changing shape over time and causing technological
complications. Recall that we have Z
1
Ψ(x, t) = √ Φ(k)e ikx e −iω(k)t ,
2π
and we’re still dealing with a wavepacket centered at k0 , so it still makes sense to expand ω in a Taylor series: this
time, we’ll write
dω 1 d 2ω
ω(k) = ω(k0 ) + (k − k0 ) + (k − k0 )2 + ··· ,
dk k0 2 dk 2 k0
including the second-derivative term. Last time, we only needed the first-derivative term to understand the group
velocity, and this time we’re going to see how the next term affects the distortion. (Recall that
dω dE p ℏk
= = = ,
dk k0 dp m m
Also using the relation ∆p∆x ≈ ℏ from above, we can equivalently phrase this as the condition
m
|t| ≪ (∆x)2 .
ℏ
In other words, knowing the uncertainty in our particle’s momentum or position tells us the amount of time we can
wait before the wave starts to significantly distort.
Fact 59
There’s yet another way to rewrite these boxed conditions in a more intuitive way, which is
∆p|t| ℏ
≪ = ∆x.
m ∆p
This is understandable because it is a statement about lengths: the group velocity is not the same for all of the
frequencies, because particles not having a definite momentum means there is some dispersion in the group velocity,
∆p
corresponding to m . And if we multiply that uncertainty by the time |t| that has passed, our shape is only preserved
if the distortion is smaller than the uncertainty in the inherent shape of the wavepacket ∆x.
Problem 60
If we have a wavepacket of size ∆x = 10−10 m (which is on the order of the size of an atom, so it corresponds to
an electron moving around), how long does that packet stay localized?
34
The answer is that the characteristic timescale here is
m mc 2 (∆x)2
t∼ (∆x)2 = ≈ 10−16 seconds,
ℏ ℏc c
and this turns out to be rather important when we think about applications like particle accelerators trying to keep
particle bunches together!
With that discussion finished, in the last part of this lecture, we’ll finally discuss time-evolution of a free particle
wavepacket. If we know the shape of an initial wavefunction Ψ(x, 0), and we wish to calculate Ψ(x, t), we first
calculate the coefficients Z
1
Φ(k) = √ Ψ(x, 0)e −ikx dx,
2π
so that we can rewrite our arbitrary function as a superposition
Z
1
Ψ(x, 0) = √ Φ(k)e ikx dk,
2π
and then we can evolve each of the plane waves in the way that we already know:
Z
1
Ψ(x, t) = √ Φ(k)e i(kx−ω(k)t) dk ,
2π
ℏ2 k 2 p2
where we should remember that we have the relation ℏω(k) = 2m because E = 2m . The key tool that we’ve used
i(kx−ω(k)t)
here is linearity: each plane wave e solves the Schrodinger equation, so the full integral also does, and
we’ve set up the integral in such a way that it has the correct initial condition at t = 0. So this boxed expression is
indeed our answer, and if we’d like the solution more explicitly, we can (sometimes) do the k-integral. But the point is
that with Fourier inversion, we’ve found a way to propagate any wavefunction without needing to solve the differential
equation directly!
Example 61
If we consider an initial condition which is a Gaussian of uncertainty on the order of a,
x2
1
Ψa (x, 0) = (2π)−1/4 √ exp − 2 ,
a 4a
we should use the procedure above for time-evolution, finding Φ(k) and evolving plane waves.
This is an exercise for us – we’ll find that Φ is also a Gaussian, the wavefunction spreads out over time, and that
2ma2
the relevant timescale for which this change in width occurs is τ = ℏ .
35
giving us a similar inverse transform Z
1
Φ(k) = √ Ψ(x)e −ikx dk.
2π
In particular, these two equations tell us that knowing Φ(k) is equivalent to knowing Ψ(x), since we can always get
from one to the other. Thus, the two representations of the wavefunction carry the same amount of information, and
Φ(k) can be thought of as carrying the weight of the various plane waves in our superposition for Ψ(x). But we can
go deeper, and we’ll need a technical tool to help us manipulate integrals and functions in our subsequent discussion.
The idea is to try to substitute one of the two equations above into the other, and that will give us the equation
for a delta function. We have
Z Z
1 1 ′ −ikx ′ ′
Ψ(x) = √ √ Ψ(x )e dx e ikx dk,
2π 2π
where we had to be careful when we substituted in the bracketed term for Φ(k) because x is a dummy variable in that
integral and should not be reused. We’ll further simplify, changing the order of integration freely (if we want to be
more mathematically rigorous we need to be careful with that, but we’ll skip the technical details for now), to
Z ∞ Z
1 ′
Ψ(x) = Ψ(x ′ ) e ik(x−x ) dkdx ′ .
−∞ 2π
The blue term is then a function of the quantity (x − x ′ ), and it’s what we might recognize as the delta function
δ(x − x ′ ) (because integrating it against Ψ(x ′ ) picks out the integrand at the point where x ′ − x = 0, meaning we
are left with Ψ(x)). Thus, the integral is one way to represent our delta function, and we note that δ(x − x ′ ) and
δ(x ′ − x) are equivalent (for example, we can do a change of coordinates in the blue integral k 7→ −k and preserve
everything except the exponent’s sign).
Definition 62
The Dirac delta function is a “function” given by
Z ∞
1 ′
δ(x − x ′ ) = e ik(x−x ) dk.
2π −∞
This delta function is very useful whenever we’re dealing with Fourier transforms, but it’s a strange integral –
notice that when x = x ′ , the integrand is always 1 and the integral diverges, while for nonzero x it’s tricky to make
the integral converge to any value either. So this expression is singular in a way that forces us to be careful with it,
and usually that means that we will work with the delta function inside integrals.
Fact 63
If we haven’t worked with delta functions before, it’s useful to verify using the integral representation that δ(ax) =
1
|a| δ(x) for any nonzero real number a.
We’ll now turn to our discussion about momentum space, starting with the question of how our normalization
condition looks when we switch to Φ(k) from Ψ(x). Recall that we start with the normalization condition
Z
Ψ∗ (x)Ψ(x)dx = 1,
and we wish to write something similar for Φ(k); we’ll do that by substituting in our Fourier representation of Ψ(x),
36
using different variables of integration k, k ′ for the two terms (but the same x):
Z Z Z Z
∗ 1 ∗ −ikx 1 ′ ik ′ x
Ψ (x)Ψ(x)dx = √ Φ (k)e dk √ Φ(k )e dk dx.
2π 2π
From here, we know that we can’t do the integrals over k in general, since those are in the most general form possible,
and thus it makes sense to try to do the x-integral. Rewriting the order of the integrals, we have
Z Z Z
∗ ′ 1 i(k ′ −k)x
= Φ (k) Φ(k ) e dx dk ′ dk,
2π
and now the inner integral is indeed the delta function δ(k ′ −k) as in Definition 62, just with different dummy variables,
so this simplifies to
Z Z Z
∗ ′ ′ ′
= Φ (k) Φ(k )δ(k − k )dk dk = Φ∗ (k)Φ(k)dk .
R
|Ψ(x)|2 = 1, there is also
In other words, because we get a probabilistic interpretation for the wavefunction when
R
a corresponding probabilistic interpretation for Φ(k): we have the same normalization condition |Φ(k)|2 = 1, so
it makes sense that we will also have a probability distribution in this “momentum space.”
We’ve been using k so far instead of p = ℏk, so now we’re ready to actually introduce everything in terms of
dp
momentum. Since ℏ = dk, and we can rewrite functions of k (such as Φ) as functions of p (which we’ll respectively
denote Φ̃), and our Fourier relations can now be written in language involving x and p:
Z
1 dp
Ψ(x) = √ Φ̃(p)e ipx/ℏ ,
2π ℏ
Z
1
Φ̃(p) = √ Ψ(x)e −ipx/ℏ dx.
2π
We lose a bit of symmetry in our Fourier inversion – there’s a factor of ℏ in one equation but not the other, so now
√
we’ll define a new function Φ(p) via Φ̃(p) = Φ(p) ℏ (where there’s some abuse of notation – this Φ(p) does not
have the same functional form as the Φ(k) above). Checking our constants, we then arrive at the two equations
Z
1
Ψ(x) = √ Φ(p)e ipx/ℏ dp ,
2πℏ
Z
1
Φ(p) = √ Ψ(x)e −ipx/ℏ dx .
2πℏ
Parseval’s theorem also deserves another look: the left-hand side stays the same, and when we convert from Φ(k) to
√
Φ̃(p) we also gain no constant factors, but then we switch to Φ(p) and pick up a factor of ( ℏ)2 = ℏ, canceling out
dp
with the ℏ factor when switching integration variables. So the factors actually cancel out: Parseval’s identity now
tells us that Z Z
2
|Ψ(x)| dx = |Φ(p)|2 dp ,
37
and in particular Ψ(x) is normalized if and only if Φ(p) is.
Fact 65
Much like we did for position, we will thus interpret |Φ(p)|2 dp as the probability to find a particle’s momentum to
be in the range [p, p + dp] (and by conservation of probability and Parseval’s theorem, this is indeed valid under
time-evolution too).
These equations also have three-dimensional versions as well: since we need to do three integrals instead of one,
it turns out that we get Z
1
Ψ(⃗
x) = Φ(⃗p )e i p⃗·⃗x /ℏ d 3 p⃗,
(2πℏ)3/2
Z
1
Φ(⃗
p) = x )e −i p⃗·⃗x /ℏ d 3⃗
Ψ(⃗ x.
(2πℏ)3/2
We also similarly have a three-dimensional delta function, also defined with an integral:
Z
′ 1 ⃗ ′
3
δ (⃗x −⃗x)= 3
e i k·(⃗x −⃗x ) d 3⃗k,
(2π)
Basically, everything is very analogous in three dimensions and we don’t have to memorize anything new!
With that, we’re finally ready to discuss expectation values of operators, which is one of the first steps towards
a full interpretation of quantum mechanics. We’ll start with the basic probabilistic definition:
Definition 66
Let Q be a random variable which can take values in a finite set {Q1 , · · · , Qn } with probabilities p1 , · · · , pn ,
respectively. Then Q has expectation or expected value
n
X
⟨Q⟩ = pi Qi .
i=1
This is basically a weighted average which tells us the average value obtained by repeatedly taking many samples of
Q. Expectations can also be computed for continuous random variables, which is what we need in quantum mechanics
for our wavefunction: essentially, because Ψ∗ (x, t)Ψ(x, t)dx tells us the probability that our particle is in the range
[x, x + dx], we can define the expectation value of x̂, the position operator, via
Z
⟨x̂⟩ = xΨ∗ (x, t)Ψ(x, t)dx
(where x is the value of the position operator, and Ψ∗ (x, t)Ψ(x, t)dx is the corresponding probability). What we’re
really doing here is treating our position operator as a random variable, and experimentally we’re saying the following:
suppose we have a system that is represented by the wavefunction Ψ. Then if we build many identical copies of this
system and measure the positions of the particles (at the same time), then the average value should be close to ⟨x̂⟩.
The key point to always keep in mind is that repeated measurements of the state can result in varying answers!
Similarly, because we have a probability density for the momentum p (specifically, |Φ(p)|2 dp is the probability to
find the particle’s momentum in [p, p + dp]), we should define
Z
⟨p̂⟩ = p|Φ(p)|2 dp.
38
But we know how Φ(p) is related to Ψ(x) through the Fourier relations, and it’s natural to see the expectation value
for the momentum operator in terms of the same density as the expectation value for the position operator. So we’ll
substitute: Z
= pΦ∗ (p)Φ(p)dp,
We’re almost there – the inner integral is now the delta function δ(x − x ′ ) (if we just do a change of variables u = pℏ ),
∂
and then we can do integration by parts on the x-integral to have the ∂x act on the other term instead. Here,
we’re using the important theme that while integration by parts has boundary terms, they will vanish as long as our
wavefunction goes to 0 sufficiently fast (which we will always assume). Thus, we end up with
Z Z
ℏ ∂
= Ψ∗ (x ′ ) Ψ(x)δ(x − x ′ )dxdx ′ ,
i ∂x
The integral over x ′ will evaluate Ψ∗ (x ′ ) at x, so putting everything back together gives us
Z
ℏ ∗ ∂Ψ
⟨p̂⟩ = Ψ (x) dx,
i ∂x
or in the most revealing form possible, we actually have
Z
⟨p̂⟩ = Ψ∗ (x)p̂Ψ(x)dx ,
ℏ ∂
where p̂ = i ∂x is our momentum operator. So even though the expectation value is initially defined by a probabilistic
p|Φ(p, t)|2 dp, the end result is that we sandwich p̂ between Ψ∗ and Ψ (and only have it act on Ψ).
R
interpretation
Notice that if we replace the x in the integral for ⟨x̂⟩ with an x̂ operator between Ψ∗ and Ψ, that also gives us the
same answer, and this motivates the following general definition:
39
Definition 67
The expectation value of a quantum mechanical operator Q̂ in a state Ψ is defined as
Z
⟨Q̂⟩Ψ = Ψ∗ (x, t)(Q̂Ψ(x, t))dx.
(If it is clear which state we are referring to, or if we are speaking generically, we will sometimes omit the Ψ
subscript for clarity.) Such expectation values are varying in time, and that time-dependence can be studied.
Example 68
p̂ 2
Consider the kinetic operator T̂ = 2m – we’ll write down its expectation value in the position-space representation.
ℏ ∂
If we’re acting on the wavefunction Ψ(x), then we should think of p̂ as i ∂x , and thus we have
ℏ2 ∂ 2
Z
∗
⟨T̂ ⟩ = Ψ (x, t) − Ψ(x, t) dx.
2m ∂x 2
But we can also write down the expectation integral in momentum space, and then p̂ acts as p on a wavefunction
Φ(p). Thus, we have
p2 p2
Z Z
⟨T̂ ⟩ = |Φ(p)| dp = Φ∗ (p)
2
Φ(p)dp,
2m 2m
where the first form of the integral comes from the probabilistic definition of expectation values, while the second
is thinking of it as a “sandwiched” operator. It might seem like the momentum integral is nicer (in particular, it’s
manifestly positive because the integrand is positive), but we can actually partially integrate by parts the x-integral to
find that we also have
2
ℏ2 ∂Ψ
Z
⟨T̂ ⟩ = dx,
2m ∂x
which is also clearly positive. So both integrals are valid to work with!
We can now return to the time-dependence of expectation values, which is a pretty fundamental result in quantum
mechanics. We aim to calculate Z
d d ∗
⟨Q̂⟩ = Ψ (x, t)(Q̂Ψ(x, t))dx ,
dt dt
and by the product rule this simplifies to
∂Ψ∗
Z
∂Ψ
= Q̂Ψ + Ψ∗ Q̂ dx.
∂t ∂t
But now applying the Schrodinger equation (because that’s how we deal with time-derivatives) in the form i ℏ ∂Ψ
∂t = ĤΨ,
we have Z
i i
= (ĤΨ)∗ Q̂Ψ − Ψ∗ Q̂ĤΨ dx.
ℏ ℏ
Multiplying by i ℏ to cancel out the constants on the right, we now have
Z
d
Ψ∗ Q̂ĤΨ − (ĤΨ)∗ Q̂Ψ dx,
i ℏ ⟨Q̂⟩ =
dt
and by Hermiticity the Ĥ on the second term can be brought to act on the other term, QΨ, instead:
Z Z
Ψ∗ QĤΨ − Ψ∗ ĤQΨ dx = Ψ∗ (Q̂Ĥ − Ĥ Q̂)Ψdx.
=
But now we see a commutator show up – in fact, this is one of the reasons commutators are so important in quantum
40
physics – and we get the final equation
Z
d
i ℏ ⟨Q⟩ = Ψ∗ [Q̂, Ĥ]Ψdx = ⟨[Q̂, Ĥ]⟩ ,
dt
where in the last step we’ve noticed that the commutator sandwiches Ψ∗ and Ψ and thus gives us the usual formula
for an expectation value. One way to phrase what’s happening is that we’ve actually encoded all of the dynamics in the
observables – instead of having wavefunctions changing in time and therefore having expectations change as a result,
we have the expectations evolving in terms of their commutators with the Hamiltonian. In particular, any operator
expectation that commutes with the Hamiltonian has time-derivative zero, and thus commutation relations give rise
p̂ 2
to conservation laws. For example, since p̂ commutes with Ĥ = 2m for the free particle, a free particle’s expected
momentum must be conserved. We’ll see much more of this concept in the future!
for any two sufficiently well-behaved wavefunctions Ψ1 , Ψ2 . We’ll sometimes use a more concise notation for these
kinds of expressions:
Definition 69
Define the inner product (. , .), which takes in two wavefunctions Ψ1 , Ψ2 and outputs
Z
(Ψ1 , Ψ2 ) = Ψ∗1 (x)Ψ2 (x)dx.
This inner product satisfies some important properties: for example, we can directly check that
Notice that we can rewrite the definition of Hermiticity in a nicer form now: we are asking for
Proposition 70
For any Hermitian operator Q̂, the expectation value ⟨Q̂⟩Ψ is real.
41
(if we want to take the complex conjugate of an integral, we can just conjugate the integrand). The integrand is
the product of Ψ∗ and Q̂Ψ, which are each complex-valued (an important note here: we are not thinking of just
conjugating Q̂ on its own, because it is acting on a wavefunction), and thus we can rewrite this as
Z Z
= Ψ(Q̂Ψ) dx = (Q̂Ψ)∗ Ψdx.
∗
where we’ve used the fact that (Ψ∗ )∗ = Ψ. And now we use the Hermiticity of Q̂ to move the Q̂ from one term to
the other: Z
= Ψ∗ (Q̂Ψ)dx = ⟨Q̂⟩Ψ .
Thus, the expectation value of Q̂ is its own conjugate, meaning that it is real.
Proposition 71
The eigenvalues of a Hermitian operator ⟨Q̂⟩ are always real.
(Recall that if we have a wavefunction Ψ1 that satisfies Q̂Ψ1 = q1 Ψ1 , then q1 is an eigenvalue of Q̂ and Ψ1 is
the associated eigenvector or eigenfunction.)
Proof. Let q1 be an eigenvalue of Q̂ with associated eigenvector Ψ1 . There are many ways to show this result, but
we’ll apply Proposition 70 to Q̂ in the state Ψ1 . Then we have the real number
Z Z
∗
⟨Q̂⟩Ψ1 = Ψ1 Q̂Ψ1 dx = Ψ∗1 q1 Ψ1 dx
(by the eigenvalue condition), and then taking q1 out of the integral gives us
Z
= q1 Ψ∗1 Ψ1 dx.
The integrand of this integral is always a positive real number, so the integral is real. Because our left-hand side ⟨Q̂⟩Ψ1
was also real, our eigenvalue q1 is indeed real.
We may notice that Ψ1 can be arbitrarily scaled and it will still be a valid eigenvector; thus, we can always scale
so that “being in the state Ψ1 ” means that we have a normalized state. (In fact, we must do this normalization
whenever calculating expectation values of our states!) Then the calculation above simplifies to ⟨Q̂⟩Ψ1 = q1 , so the
eigenvalue of an operator in a state Ψ1 is the expectation value of Q̂ in that state, and it is always real (as
desired).
The key point now is that (informally) Hermitian operators are rich: they have as many eigenvectors as we
need to span the whole space of states. In other words, if we take any Hermitian operator and write down its set
of eigenstates, then any wavefunction is a superposition of those eigenstates! (In linear algebra terms, we can think
about a finite-dimensional matrix for a Hermitian matrix – what we’re saying is that the eigenvectors form a basis of
the vector space on which the matrix acts on.) This is called the spectral theorem, and we’ll talk about it more in
8.05 when we have more discussion of linear algebra, but we’ll still write out our claim more explicitly.
42
Proposition 72
For any Hermitian operator Q̂, consider its eigenvalues and eigenfunctions given by
Q̂Ψ1 = q1 Ψ1 , Q̂Ψ2 = q2 Ψ2 , · · ·
(the set of which can be finite or infinite). Then the eigenfunctions can be organized to satisfy the orthonormality
relation Z
Ψ∗i (x)Ψj (x)dx = δij
In particular, setting i = j tells us that each Ψi is a normalized state, so they are “good eigenvectors” in the
expectation value sense. But in addition, for any i ̸= j, we have (Ψi , Ψj ) = 0, and we can think of the usual three
basis vectors of R3 (the x-, y -, and z-direction unit vectors, all of which have unit length and are orthogonal) as an
illustrative example of what’s going on more generally! So the inner product we’ve defined should be thought of as a
dot product of wavefunctions.
Proof of a simple case. We’ll consider the case where the eigenvalues of different eigenvectors are always different
(meaning that qi ̸= qj if i ̸= j). In this case, we can always first normalize our eigenfunctions by rescaling. Then we
have Z Z Z
Ψ∗i Q̂Ψj dx = Ψ∗i qj Ψj dx = qj Ψ∗i Ψj ,
where we know that qi∗ = qi by Proposition 71. Equating the two boxed expressions, because qi ̸= qj , we must have
R ∗
Ψi Ψj = 0, as desired.
However, the more general case occurs when we have degeneracy, and this is a very important concept that
often comes up in quantum mechanics. Degeneracy occurs when several different eigenfunctions have the same
eigenvalue (we’ll in fact see a case where this happens later in the lecture), and in such a case it is important that
Proposition 72 includes the clause “can be organized” – we need to pick appropriate eigenfunctions in the eigenspaces,
forming linear combinations, so that we do indeed still have orthonormality, and we still do have enough eigenstates
to span the whole space. Let’s state that last claim more explicitly:
Proposition 73
The eigenfunctions of any Hermitian operator Q̂ form a set of basis functions, meaning that any (reasonable)
wavefunction Ψ can be written as a superposition
X
Ψ = α1 Ψ1 + α2 Ψ2 + · · · = αi Ψi .
i
While those coefficients αi may seem mysterious, we can in fact use Proposition 72 to calculate them easily: we
have Z X
(Ψi , Ψ) = Ψ∗i αj Ψj dx
j
43
(we should make sure not to reuse indices here, which is why we sum over j), and now we can switch the sum and
integral to get Z
X
= αj Ψ∗i Ψj dx.
j
Thus, we compute the coefficient αi (and thus how much of the wavefunction Ψ is “along” the state Ψi ) by integrating
Ψ∗i Ψ. Additionally, we can compute through expansion that
Z Z Z !∗
X X
2 ∗
1 = |Ψ| dx = Ψ Ψdx = αi Ψi αj Ψj dx,
i j
Phrasing this in words, if we have a state which is a superposition of orthonormal basis vectors, then the sums
of squares of coefficients gives us the normalization condition – there’s no mixing between Ψi and Ψj . And this
expansion can be done for any state and any Hermitian operator!
With that, we’ve done all of the work necessary for stating the measurement postulate which we introduced in
our beginning lectures:
If the outcome of the measurement is qi , then the state of the system becomes Ψi (this is called the collapse of
the wavefunction).
For example, if we measure the kinetic energy of a particle, then we’ll get some eigenvalue of the kinetic operator,
and after we measure we’ll be in a state of definite kinetic energy (so that subsequent measurements also give us that
same kinetic energy). The key is that this result essentially came from Hermitian operators being rich enough for their
eigenvectors to span our space of wavefunctions – if we want to measure some quantity X, we should write our state
as a superposition of states with definite X, use the spectral theorem, and then compute the coefficient αi that is
relevant.
This measurement postulate may seem strange to us, because it essentially divides quantum mechanics into two
realms: that of the Schrodinger equation (the natural time-evolution) and that of measurement (the collapse). People
have wondered why measurement doesn’t come out of the Schrodinger equation if it’s supposed to govern quan-
tum mechanics, but nothing is sufficiently clear for us to question this framing for now, and thus we’ll take this
measurement postulate as an additional assumption of quantum mechanics.
44
Example 75
P
Suppose that we have a state Ψ = i αi Ψi . We’ll compute the expectation of Q̂ on Ψ in terms of the eigenvectors
as a consistency check.
But we’ve derived this expected result (achieving qi with probability |αi |2 ) with properties of our eigenvectors, rather
than with the definition of an expectation value of a random variable (which is what the end result here is telling us)!
Since the measurement postulate tells us that pi = |αi |2 , everything we’ve said so far is consistent. Let’s do a physical
example to see this in exaction:
Example 76
Consider a free particle on a (topological) circle x ∈ [0, L], where we identify the points at x = 0 and L, and
consider the wavefunction at some fixed time
r r !
2 1 2πx 2 6πx
Ψ(x) = √ sin + cos .
L 3 L 3 L
We’ll calculate the different values and probabilities that we can obtain when we measure the momentum of
the particle.
Remark 77. We can verify that Ψ(L) = Ψ(0), so this is a valid wavefunction on the circle. Notice that Ψ(x) is purely
real at this point in time, and this is in fact allowed, but our discussion that wavefunctions cannot be purely real (from
a previous lecture) tells us that it will not be real forever.
Since we’re trying to measure the momentum of our particle, we have to look at the momentum eigenstates,
which are plane waves of the form e ikx . On the real line, we could never normalize these states (because the norm
squared of e ikx is 1), but on a circle we’re actually okay, because there’s only a finite length we have to integrate
along. Specifically, momentum eigenstates take the form
Ψm ∝ e 2πimx/L
for some integer m, because these are plane waves that are periodic with period L (meaning that Ψ(x + L) = Ψ(x)),
so they have definite momentum and we just have to normalize them. For that, notice that the squared norm of
e 2πimx/L is always 1, so to make the integral 1 over the length L we want
1
Ψm = √ e 2πimx/L ,
L
45
for any integer m, to be our normalized momentum eigenstates with corresponding momentum
ℏ ∂ 2πℏm 2πℏm
pm Ψm = Ψm = Ψm =⇒ pm = .
i ∂x L L
Thus, we’ve found the momentum eigenfunctions, and all that’s left to do is to rewrite our Ψ as a linear combination
of the Ψm s. The momenta of the different eigenfunctions are all different, so we have orthonormal eigenfunctions and
we can just use the trigonometric identities
e ix − e −ix e ix + e −ix
sin x = , cos x =
2i 2
from here to find that r r
2 1 2 1 1 1
Ψ= Ψ1 − Ψ−1 + √ Ψ3 + √ Ψ−3 .
3 2i 3 2i 3 3
We can now read off the probabilities and measured momenta: since Ψm has momentum 2πℏm L , our state Ψ will have
q 2
momentum 2πℏ
L with probability
2 1
3 2i = 16 , and similarly we’ll get momentum − 2πℏ 1
L with probability 6 , momentum
6πℏ
L with probability 13 , and momentum − 6πℏ 1
L with probability 3 , when measuring in the state Ψ. These probabilities
indeed add to 1, and thus we’ve now seen an example of decomposition along eigenstates that we’ve talked about
abstractly above.
Finally, in the same probabilistic framework, we can now properly define what uncertainty really means in a
mathematical context.
Definition 78
Let Q be a random variable which takes on values Q1 , · · · , Qn with probabilities p1 , · · · , pn , and suppose its
P
expectation value is Q = ⟨Q⟩ = i pi Qi (we use Q and ⟨Q⟩ interchangeably). Then the variance of Q is
X
(∆Q)2 = pi (Qi − Q)2 ,
i
p
and the uncertainty or standard deviation of Q is ∆Q = (∆Q)2 .
In other words, the variance calculates the expected (weighted) squared deviation from the average – we can indeed
see that the right-hand side is always positive because it’s a sum of positive terms, so ∆Q will be real. Furthermore,
if ∆Q = 0, then each term on the right-hand side must be zero, meaning that Qi = Q for any potential value Q takes
on (and thus Q is with probability one just its average value Q).
We can simplify the variance with some algebraic manipulation: by expanding, we find that
X X X X 2
(∆Q)2 = pi (Qi − Q)2 = pi Q2i − 2 pi Qi Q + pi Q .
i i i i
Now we can separate out the different interpretations, noting that Q is a constant, and we find that this simplifies to
X 2 X 2 2
= Q2 − 2Q pi Qi + Q pi = Q2 − 2QQ + Q = Q2 − Q .
i i
2
In particular, because the variance of a random variable is always nonnegative, we find that Q2 ≥ Q for any real-valued
random variable Q. And now this allows us to turn back to the quantum mechanical interpretation:
46
Definition 79
Let Q̂ be a Hermitian operator. We define the uncertainty of Q̂ in the state Ψ, denoted (∆Q̂)2Ψ , to be
The point here is that Q̂ and Q̂2 are Hermitian operators, so we can calculate the right-hand side because we always
know how to compute expectation values. And from here, we claim the following (which we’ll prove next lecture):
Lemma 80
We may write the uncertainty of Q̂ in the state Ψ as a single expectation
(In the above, ⟨Q⟩ is the multiplication-by-a-constant operator.) In addition, we’ll show that analogous to the
zero-variance observation in the probabilistic context, a state Ψ has zero uncertainty for Q̂ if and only if it is an
eigenstate for Q̂.
Proof of Lemma 80. For the first claim, we do a direct computation (remembering that ⟨Q̂⟩ is thought of as the
operator which multiplies any state by the constant ⟨Q̂⟩). We expand the square, noticing that ⟨Q̂⟩ and Q̂ commute
(because the former is a number)
⟨(Q̂ − ⟨Q̂⟩)2 ⟩ = ⟨Q̂2 − 2⟨Q⟩Q̂ + ⟨Q̂⟩2 ⟩.
(Notice that (A + B)2 = A2 + AB + BA + B 2 ̸= A2 + 2AB + B 2 if A and B don’t commute.) Now the expectation
of a sum is the sum of the individual expectations, so we can further simplify to
(where we’ve used the fact that we can take constants out of expectations: ⟨c Q̂⟩ = c⟨Q̂⟩). Combining the last two
terms indeed gives us the definition of the uncertainty of Q̂, as desired.
For the second claim, we start with the expression ⟨(Q̂ − ⟨Q̂⟩)2 ⟩ and plug in the definition of an expectation value:
we find that Z Z
∗
2
(∆Q̂) = 2
Ψ (x)(Q̂ − ⟨Q̂⟩) Ψ(x)dx = Ψ∗ (x)(Q̂ − ⟨Q̂⟩)(Q̂ − ⟨Q̂⟩)Ψ(x)dx.
We now think of the first (Q̂ − ⟨Q⟩) as an operator acting on the rest of the integrand – because Q̂ is Hermitian and
so is ⟨Q⟩ (multiplying by a real constant is equivalent no matter which part of the integrand it’s done on), the whole
47
term (Q̂ − ⟨Q⟩) is Hermitian, and thus by the definition of Hermiticity we have
Z Z
2
(Q̂ − ⟨Q̂⟩)Ψ(x) ∗ (Q̂ − ⟨Q̂⟩)Ψ(x)dx =
= (Q̂ − ⟨Q̂⟩)Ψ(x) dx,
as desired.
Notice that our second claim in fact writes the uncertainty (∆Q̂)2 as an integral of a positive quantity, meaning
that ∆Q̂ is always real.
Corollary 81
A state Ψ is an eigenstate of a Hermitian operator Q̂ (meaning that Q̂Ψ = λΨ) if and only if (∆Q̂)2 = 0. In
addition, we have λ = ⟨Q̂⟩ (so that we can write Q̂Ψ = ⟨Q̂⟩Ψ Ψ).
so the eigenvalue is indeed the expectation of Q̂. So now for the first claim, Ψ is an eigenstate of Q̂ if and only if
(Q̂ − ⟨Q⟩)Ψ = 0, if and only if (by part two of Lemma 80, since the integral is only zero if the nonnegative integrand
is always zero) (∆Q̂)2 = 0, if and only if ∆Q̂ = 0, as desired.
We now have several different expressions for the uncertainty of an operator Q̂, and they are each useful in different
situations. For example, if we have a Gaussian wavefunction that is centered at the origin, then ⟨x̂⟩ = 0, so the easiest
way to calculate the uncertainty in position is
(∆x)2 = ⟨x 2 ⟩ − ⟨x⟩2 = ⟨x 2 ⟩,
and that’s just a Gaussian integral for us to do. And now that we’ve defined uncertainty, we can actually state the
ℏ
uncertainty principle precisely: we’re saying that ∆x∆p ≥ under the mathematically rigorous definition.
2
We’ll now move on to our next topic, stationary states, which will keep us busy for a few weeks as we develop
intuition for solving the Schrodinger equation.
Definition 82
A stationary state is a solution to Schrodinger’s equation with factorized space and time dependence
Ψ(x, t) = g(t)ψ(x).
Note that these solutions aren’t static – they do have time-dependence, but that time-dependence is very simple.
From here on, we’ll be careful to use the capital letter Ψ for full wavefunctions with time-dependence and lowercase
ψ for the time-independent wavefunctions.
The reason these states are called “stationary states” is that time-independent observables will not have time-
dependence in expectation value if we have a state Ψ of this form. We’ll see this in a few minutes, but first let’s use
a “separation of variables” technique for solving differential equations to understand Ψ(x, t) more generally. Plugging
in this stationary state into Schrodinger’s equation, we start with
ℏ2 ∂ 2
∂Ψ(x, t)
iℏ = ĤΨ(x, t) = − + V (x) Ψ(x, t).
∂t 2m ∂x 2
48
Fact 83
We’re considering here systems where the potential V (x) does not depend on time – otherwise, if the “landscape”
of our problem is changing, we can’t easily get stationary states.
Substituting Ψ(x, t) = g(t)ψ(x), we find that because the left-hand side only acts on the time-coordinate and the
Hamiltonian on the right-hand side only acts on the space-coordinate,
dg(t)
i ℏψ(x) = g(t)Ĥψ(x),
dt
where Ĥψ(x) is again just a function of x. Rearranging by dividing the whole equation by Ψ, we find that
1 dg(t) 1
iℏ = Ĥψ(x).
g(t) dt ψ(x)
Since the left-hand side is now only a function of time, and the right-hand side is only a function of space, the only
way for these to be equal in general is that both sides are constant, and we’ll call that constant E. This E must
have units of energy (because those are the units of Ĥ and also of ℏt ), and it must be real – we’ll see later that if we
try to choose E to be complex, we’ll run into problems.
But for now we’ll go ahead and solve the two (decoupled) differential equations: we have
dg(t)
iℏ = Eg(t) =⇒ g(t) = Ce −iEt/ℏ ,
dt
and thus the time-dependence of a stationary state is just the phase e −iEt/ℏ . On the other hand, the other equation
becomes
ℏ2 d 2
Ĥψ(x) = Eψ(x) =⇒ − + V (x) ψ(x) = Eψ(x)
2m dx 2
(we can use a normal derivative now that we have just functions of a single variable). So the rest of the problem is
still complicated – we still need to solve the second-order differential equation for ψ(x), and this is called the time-
independent Schrodinger equation. What’s interesting is that when we solve this differential equation, we often
find that we are constrained to particular values of E (if we think about the analogy with matrices, the equation
Ĥψ(x) = Eψ(x) is really an eigenfunction equation, and matrices often only have a discrete set of eigenvalues).
Putting things together, we find that once we solve the time-independent Schrodinger equation, we arrive at the
full wavefunction
Ψ(x, t) = Cψ(x)e −iEt/ℏ .
If we try to normalize this state, we can just normalize using ψ (meaning that we absorb C into the time-independent
solution). Thus, we are asking for
Z Z
1 = Ψ∗ (x, t)Ψ(x, t)dx = ψ ∗ (x)e iEt/ℏ ψ(x)e −iEt/ℏ dx.
But the phase terms e iEt/ℏ and e −iEt/ℏ cancel out here, so in fact the condition for normalization is just
Z
ψ ∗ (x)ψ(x)dx = 1,
and we can just check normalization for ψ. In particular, the time-dependence doesn’t end up making a contribution
for normalization, and this is the key reason why we want E to be real – if the energy weren’t real, the complex
conjugate of e iEt/ℏ would not just be e −iEt/ℏ , so we’d have a function of t that would also need to be equal to 1 on
49
the left-hand side. Thus, stationary states must have real E if we want to consider them in actual physical contexts
(where we do need normalized states to say meaningful things).
We’ll now check out properties of the Hamiltonian operator Ĥ, and we’ll indeed see that this is all connected to
energies and energy eigenstates. If Ψ(x, t) is a stationary state, then
Z Z
⟨Ĥ⟩Ψ(x,t) = Ψ (x, t)ĤΨ(x, t)dx = ψ ∗ (x)e iEt/ℏ Ĥe −iEt/ℏ ψ(x),
∗
and again the time-dependence cancels out because e −iEt/ℏ is a constant from the point of view of Ĥ. Thus, this
expectation is just Z
= ψ ∗ (x)Ĥψ(x) = ⟨Ĥ⟩ψ(x) ;
that is, the expectation value of the Hamiltonian on the full stationary state is the expectation value of the
Hamiltonian on just the spatial part. And furthermore, because we have the eigenvalue equation Ĥψ(x) = Eψ(x),
we can further evaluate this to be Z
= ψ ∗ (x)Eψ(x) = E .
So the expectation value of the Hamiltonian is in fact equal to the energy E, and the ψ(x)s are the energy eigenstates
of Ĥ. (And there is no uncertainty in the measured energy either, because ⟨Ĥ 2 ⟩ = ⟨Ĥ⟩2 = E 2 .)
More generally, we can now verify the claim from above:
Proposition 84
The expectation value of any time-independent operator Q̂ in a stationary state Ψ(x, t) is time-independent.
and now because of commutativity, we can move the e −iEt/ℏ term past the ψ(x) and Q̂ terms to cancel with the
e iEt/ℏ and remove the time-dependence:
Z
= ψ ∗ (x)Q̂ψ(x)dx = ⟨Q̂⟩ψ(x) ,
Fact 85
Note that in general, the superposition of two stationary states is not a stationary state, because we can’t factor
Ψ1 (x, t) + Ψ2 (x, t) if the time-components e iEt/ℏ look different (that is, if the energies E are different).
50
associated to the particular Hamiltonian Ĥ). And remember that finding the energy eigenstates ψ1 , ψ2 , · · · and the
associated energy eigenvalues is basically the gold standard for what we want to do, because from there we can write
any arbitrary state as a superposition of the ψi s. The spectrum may be discrete, or there may be a set of continuous
allowed energies, but for any problem (for instance, any potential V (x) that we might be given), our goal is always to
find out what that spectrum is.
We’ll slightly rewrite the second-order differential equation that we’re trying to solve: we are looking for functions
ψ(x) that satisfy
d 2ψ 2m
= 2 (V (x) − E)ψ(x).
dx 2 ℏ
The potential V (x) can take on many forms – it might be smooth, or it might have discontinuities and kinks, or it
may have delta functions or infinite jumps. We’ll accept all of these different cases in our considerations, because
potentials can be as strange as we can imagine depending on the problem that we’re trying to solve. (But there are
worse cases, such as potentials discontinuous at every point or derivatives of delta functions, and we won’t think about
those cases.) The key is that for each of the different possible kinds of behavior for V (x), we want to understand the
boundary conditions that are being imposed on our solution ψ(x). What we really care about is properties of ψ ′
and ψ, because those are what allow us to stitch together solutions at interfaces and discontinuities of V .
Proposition 86
For a potential V (x) as described above, we have the following:
• The wavefunction ψ(x) must be continuous at all x.
Proof. For the first point, if ψ(x) had a discontinuity at some point, then ψ ′ (x) would contain a delta function, and
thus ψ ′′ (x) would contain the derivative of the delta function. But the right-hand side (involving V (x)) can only have
a delta function at worst, so this is not allowed.
For the second point, we’re equivalently saying that ψ ′ is discontinuous only if V (x) has a delta function. Notice
that ψ ′ (x) is discontinuous only if ψ ′′ (x) has a delta-function in it; since ψ itself is continuous (by our above argument),
2m
the only way for ℏ2 (V (x) − E)ψ(x) to have a delta-function in it is if V (x) does.
With this, we’re finally ready to move on to solve the Schrodinger equation in some particular cases. Our first
example is one that we’ve already briefly touched on:
In other words, x = 0, L, 2L, 3L, · · · all correspond to the same point on the circle, and it’s equivalent to think of
a circle as “infinitely many copies” of whatever happens between 0 and L also occuring between L and 2L, 2L and 3L,
and so on. Thus, we have the boundary condition
ψ(x + L) = ψ(x).
ℏ d 2 2
Because we have zero potential, our Hamiltonian is Ĥ = − 2m dx 2 , and the time-independent Schrodinger equation we
51
must solve is
ℏ2 d 2 ψ d 2ψ 2mE
− = Eψ ⇐⇒ =− 2 ψ .
2m dx 2 dx 2 ℏ
We claim that all solutions to this equation must have E ≥ 0. To see this, notice that if we “multiply both sides by
RL ∗
0 ψ dx,” we have Z L Z L
ℏ2 d d
− ψ ∗ (x) ψ(x) = E ψ ∗ (x)ψ(x)dx.
2m 0 dx dx 0
Assuming that ψ is a well-normalized solution, the right-hand side simplifies to E, and now typically we will apply
integration by parts to the left-hand side. But we’ll do it slowly here: we can verify by the product rule that we can
rewrite as
L
ℏ2 dψ ∗ dψ
Z
d ∗ dψ
− ψ − dx = E.
2m 0 dx dx dx dx
dψ ∗
The first term in the bracket is now a total derivative, so integrating that term from 0 to L (and using that dx is the
dψ
conjugate of dx ) gives us
L L 2
ℏ2 ∗ dψ ℏ2
Z
dψ
− ψ + dx = E.
2m dx 0 2m 0 dx
But because we’re on a circle here, the points L and 0 are the same point, and thus the boundary term vanishes!
(Importantly, we cannot make arguments about ψ going to 0 at the boundary here – we are using the geometry of the
problem.) Thus, we find that E is the integral of some positive quantity, and thus the energy is always nonnegative,
as claimed.
We’ll finish solving this differential equation next time, but for now we’ll just write down a few solutions: looking
back at the boxed equation above, we’ll make the definition
2mE
− = −k 2 , k ∈ R.
ℏ2
We know that the left-hand is indeed negative, so this is a valid definition of the constant k, and furthermore we can
ℏ2 k 2 p2
rearrange to find that E = 2m , which is 2m if p = ℏk. So our notation is actually very good here – the constant k
that we’ve defined is actually such that ℏk is the momentum of the particle! So all that’s left to solve is
d 2ψ
= −k 2 ψ,
dx 2
and this is the usual wave equation, solved by sines and cosines or by exponentials. For convenience, we’ll use solutions
of the form ψ = e ikx , and next time we’ll see which values of k work and how to normalize these wavefunctions!
52
Our next step is to apply the periodicity condition: since ψ(x + L) = ψ(x), we must have
2πn
for some integer n. We’ll index the allowed values of k as kn = L , and from this we find that the nth momenta and
energy levels are
2πℏn ℏ2 kn2 ℏ2 4π 2 n2 2π 2 ℏ2 n2
pn = ℏkn = , En = = · 2
= .
L 2m 2m L mL2
To find our energy eigenstates, we now just need to find the normalization constants for ψn (x) ∼ e ikn x . And here’s
where it’s nice that our particle is on a circle rather than on a line: exponentials like this have |e ikn x |2 = 1, so we would
not be able to normalize the integral over all real numbers, but we can normalize the integral over the range [0, L].
Explicitly, the way we do this is to write ψn (x) = Ne ikn x , and then require
Z L Z L
2 1
1= |ψn (x)| dx = N 2 dx = LN 2 =⇒ N = √ .
0 0 L
(Note that we can choose N to be positive real – an overall phase doesn’t change the state.) This gives us our final
answer:
1 1
ψn (x) = √ e ikn x = √ e 2πinx/L .
L L
Thus, we’ve found the energy eigenstates for the time-independent Schrodinger equation, and to obtain our stationary
states, we must add in the time phase: for each n, we have
One important point to address here is that as we’ve stated so far, n can be any integer (positive, negative, or zero).
2πℏn
Indeed, the momentum of these states, L , are different for all values of n, so they must be different states (and
we don’t have any additional degeneracy where different values of n are actually the same state). We might also be
suspicious about n = 0, but that just corresponds to the wavefunction ψ0 = √1 , which has no x-dependence and thus
L
no energy or momentum – this is all consistent with the Schrodinger equation we are trying to solve.
However, notice that states ψn and ψ−n have the same energy En = E−n , so they are in fact degenerate energy
eigenstates. The important concept to keep in mind is the following:
Fact 88
If we have two energy eigenstates of the same energy, there must be something physical that distinguishes them,
and we must figure out what that is to get a full picture of the physics of our system.
In our case, we’ve already figured that out – states ψn and ψ−n have different momenta pn and −pn (informally
one represents the “particle moving to the right” and the other represents the “particle moving to the left”). And this
additional physical quantity is also good for us because it tells us that ψn and ψ−n are orthonormal – even though
they are degenerate in energy, they are eigenstates of the Hermitian operator p̂ of different momenta, so their inner
product is indeed zero.
Remark 89. If we hadn’t known about the momentum, we would potentially have had to do some additional work with
each ψ−n and ψn , finding linear combinations until we get two orthonormal states of energy En . And if we had tried
to use sines and cosines instead of exponentials when we solved this problem, that’s what we would have encountered.
Putting everything together, we’ve thus found an orthonormal set of states which spans our time-independent
53
wavefunctions. So if we’re given any periodic ψ(x) (any wavefunction on the circle), we can write
X
ψ(x) = an ψn (x).
n∈Z
Furthermore, notice that ψk + ψ−k ∼ cos(kx) and ψk − ψ−k ∼ sin(kx), so we can always rewrite our wavefunction
ψ(x) as a sum of sines and cosines as well. Either way, this is secretly getting back to Fourier’s theorem and Fourier
series: any periodic function can be written as a sum of sines and cosines, or as a sum of appropriate exponentials.
We’re now ready to move on to our next problem of the day:
To solve this problem, first notice that if the potential is infinite outside the interval [0, a], then the wavefunction
must be zero there as well. Informally, this is because being in an area of infinite potential requires infinite energy – if
we want to be more rigorous with that, we can solve the problem with a finite potential outside [0, a] and take that
potential to ∞, and we will indeed see the finite square well solution later in this lecture. But for now, let’s just assume
Furthermore, we can see that it does not really matter whether the potential V is zero or infinite at the endpoints
– since we require that ψ is continuous by Proposition 86 we must also have ψ(0) = ψ(a) = 0 by continuity. So it
suffices now to solve the free Schrodinger equation in the region [0, a], and this is very similar to what we had before
for the particle on a circle: we have
2mE
ψ ′′ (x) = − ψ(x) = −k 2 ψ(x),
ℏ2
and this time it’s more convenient to use sines and cosines instead of exponentials:
This is because we can now impose boundary conditions – since 0 = ψ(0) = c1 , the first term goes away, and then
since 0 = ψ(a) = c2 sin(ka) and we can’t also have c2 = 0 (that would be a zero wavefunction), we must similarly
have
πn
ka = nπ =⇒ kn = ,
a
but this time we won’t use all integers n – first of all, n = 0 makes the wavefunction vanish, which is not allowed.
(It was allowed for the exponential because n = 0 gave us a nonzero constant there, but having the wavefunction be
identically zero means we have no probability of finding the particle anywhere.) In addition, taking kn or k−n gives us
the same wavefunction up to a sign, because sin(kn x) = − sin(−kn x). Thus, we instead index our particle-in-a-box
states by integers n ≥ 1 , and all that’s left is to normalize. Writing ψn (x) = N sin nπx
a , we have
Z Z a nπx
1= 2
|ψn (x)| dx = N 2
sin2 dx,
0 a
54
2
and we can either do the integral using
q double-angle identities or note that the average value of sin over any half-period
is 21 , so that 1 = N 2 · a
2 =⇒ N = 2
a. This gives us our final equation for the energy eigenstates:
r
2 nπx ℏ2 kn2 ℏ2 π 2 n2
ψn (x) = sin , En = = .
a a 2m 2ma2
This time, every energy state (for n ≥ 1) has different energy, so there are no degeneracies – in fact, the energy levels
become more spaced out as n gets larger.
We’ll now extract a few more properties of these infinite square well eigenstates, particularly looking at nodes and
symmetries. Here are some sketches of the first 4 (smallest) energy eigenstates:
0 a 0 a
0 a 0 a
Notice that each time we add 1 to n, moving up to the next energy eigenstate, the argument of the sin at x = a
increases by π, meaning that we gain another half-wavelength per energy level. We call ψ1 the ground state, because
it has the lowest energy out of any eigenstate, and we can notice that the ground state has no nodes (that is, no
points in the interior, not including the endpoints or infinity, where the wavefunction is zero). Then the next excited
state has one node, the one after that has two nodes, and so on. This behavior with nodes is actually a general result
for potentials with bound states (that is, normalizable states that decay to zero) – the number of nodes will increase
with the energy of the eigenstate. We won’t do a very rigorous proof of this result in this class, but we will see some
strong evidence of it through many of our examples.
Additionally, we can also pay attention to the symmetry of our energy eigenstates. For simplicity in setting up
notation, we set up our infinite well between x = 0 and x = a, but it would have been potentially more enlightening to
set it up between x = − 2a and x = a
2 – in that case, we would have a potential V (x) which is symmetric around x = 0.
Correspondingly translating the four energy eigenstates above, notice that the first and third energy eigenstates are
symmetric around x = 0, while the second and fourth are antisymmetric! This is also a general fact – if we have
bound states in a symmetric potential V (satisfying V (−x) = V (x)), then all energy eigenstates are either odd or even.
(We’ll prove this in our next lecture.)
Finally, there’s one more important property about bound states in a one-dimensional potential: there are no
degeneracies for bound states as long as our potential spans the whole real line or has hard walls. (On the other
hand, this assumption did not hold for the particle on a circle because of the identification we performed, so that’s why
degeneracy is possible in that case.) Basically, the point of covering the infinite square well is for us to start seeing
these kinds of general behaviors for wavefunctions satisfying the Schrodinger equation, and we’ll go into more detail
about them in the coming lectures.
55
With that, we’ll turn to our final example of the lecture, where for the first time we won’t be able to write down
the solutions directly and will sometimes need to use numerical methods. This time, we’ll set it up so that we have a
symmetric potential:
We will look for bound states, which are normalizable solutions that tend to remain in a particular region (definition
to come next lecture). In particular, thinking of this potential classically, if our state is at energy E < 0 and is localized
at the origin (inside the well), then it requires some additional energy to escape the well and get to an energy of 0.
So what we’ll end up finding with the wavefunction is that a bound state (with energy between 0 and −V0 ) will have
some small but nonzero probability of being found outside the well.
Remark 92. It turns out that there are never any solutions to the Schrodinger equation if the energy E of our state
is lower than any point in the potential (so in this case, there are no states of energy E < −V0 ).
E − (−V0 ) = V0 + E = V0 − |E|
(remember that we’re taking E < 0), and because our potential V (x) is piecewise constant, we can solve the
Schrodinger equation in each region and patch the solutions together – in each case, we have an equation of the form
ψ ′′ = αψ for some constant α. Inside the well, we have α < 0, meaning that solutions will be trigonometric functions,
while outside the well, we have α > 0, meaning that solutions will be real exponentials. And when we try to stitch
those different forms together into a continuous wavefunction, we expect that we’ll get some quantization of allowed
energies, because that was indeed the case in the infinite square well as well.
We’ll use the result that symmetric potentials will always give us even or odd solutions for ψ(x), and we’ll first
look for even solutions (where ψ(x) = ψ(−x)). The key to having a nice solution here is to make good use of
unit-free numbers – we’ll make lots of definitions and it won’t look like we’re solving anything for a while, and then
56
suddenly we’ll see the solution pop up. For an even solution in the region −a < x < a, we’re asking for a solution to
d 2ψ 2m 2m
= − 2 (E − (−V0 ))ψ = − 2 (V0 − |E|)ψ.
dx 2 ℏ ℏ
2m
We’ll define k 2 = ℏ2 (V0 − |E|) > 0, so that our differential equation becomes
the key here being that we only have cosine (and not sine) because we assume our solutions are symmetric. We won’t
normalize our solutions because they’re pretty messy (it’s not necessary for obtaining energy eigenstates anyway).
Instead, we’ll normalize by fixing the constant of cos(kx) to be 1 – this gives us the form of the wavefunction inside
the well, and now we turn to the wavefunction outside the well. Then because V (x) = 0 in this region, we have
2mE 2m|E|
ψ ′′ = − 2
ψ= ψ = κ2 ψ,
ℏ ℏ2
2m|E|
where we define the positive constant κ2 = ℏ2 . Our solutions are then exponentials: ψ(x) ∼ e ±κx , and thus we
must have
ψ(x) = Ae −κx for x > a, ψ(x) = Ae κx for x < −a
for some positive constant A. Furthermore, notice that we have the constraining relation
2mV0
κ2 + k 2 = ,
ℏ2
and now we can rewrite it to be unitless: multiplying through by a2 gives us
2mV0 a2
k 2 a2 + κ2 a2 = .
ℏ2
(We shouldn’t lose track of our goal – we want to find the allowed energies E.) We’ll now define the unit-free constants
2mV0 a2
ξ = κa > 0, η = ka > 0, z02 = ,
ℏ2
so that we have η 2 + ξ2 = z02 . It might look like all we’ve done is traded k, κ for unitless constants η, ξ, but it turns
out that z0 is an important quantity which encodes the physical parameters of our system (it’s large if the well is deep
or wide and small if the well is shallow or narrow) and actually is all we need to calculate the number of bound
states in our well. We’ll soon see that z0 being large gives us many bound states, while z0 being small gives us very
few of them!
Turning back to calculations, we know that our wavefunction ψ and its derivative ψ ′ must be continuous at x = a
(and also x = −a, but it’s sufficient to check the former because we know our solution is even). We thus require that
Dividing the second equation by the first and then multiplying by a on both sides gives us
Thus, our problem has now reduced to finding either η or ξ, from which all of the other constants in the problem will
57
follow (since we know z02 and a from the physical parameters). Notice that
2
2m|E|a2 2mV0 a2 |E|
2 2 2 |E| |E| ξ
ξ =κ a = = = z02 =⇒ = ;
ℏ2 ℏ2 V0 V0 V0 z0
in other words, we can get a dimensionless measure of the energy (which essentially tells us the proportionality constant
between the depth of the well and our energy) in terms of our newly defined constants! And now we’re ready for the
final answer: if we graph η, ξ in their first quadrant (since we know they must both be positive), then the condition
ξ = η tan η looks as shown below:
η
π π 3π 2π
2 2
Then the other curve ξ2 + η 2 = z02 is just the circle of radius z0 centered at the origin, and we want to find the
points where these two curves intersect. For example, here’s the situation for z0 = 5 (it looks like an ellipse below
because the axes are scaled differently):
η
π π 3π 2π
2 2
And just looking at this graph, we can count the number of bound states by looking at how many points to
intersection we have, which is given by the number of multiples of π smaller than z0 (because each of those blue
branches gives us one intersection). In particular, notice that there will always be a bound state for any positive value
of z0 ! Furthermore, because z0 is fixed and we have the boxed relation between ξ and z0 above, the leftmost solution
corresponds to the most deeply bound state (and thus the state of lowest energy), and going from left to right gives
us the states in increasing order of energy.
We’ve thus solved the finite square well for the even solutions case – a similar solution method works for the odd
solutions case, but in that case we have ξ = −η cot η:
58
ξ
η
π π 3π
2 2
Thus, the odd solutions don’t always exist – the potential V0 must be sufficiently deep for us to have them, and
with a bit more inspection we can see that the even and odd solutions will interleave for the finite square well. The
main lesson that we should learn here is the power of unit-free constants and how they can tell us when solutions
exist!
Definition 93
A bound state is a normalizable energy eigenstate over the real line (meaning that ψ → 0 as x → ±∞).
The idea is that if a wavefunction goes to 0 at infinity, then it has a bump in the middle at some point, so the
potential is “keeping the particle bound.”
Proposition 94
For one-dimensional potentials over the real line, there are no nondegenerate bound states.
(This is left as an exercise for us – it’s a problem in Griffiths.) The second result is related to the fact that we were
able to construct our wavefunctions to be real-valued, even though generally the full wavefunction is complex-valued.
Essentially, just like we saw in last lecture, often we can encode all of the complex-valued behavior in just the phase
e −iEt/ℏ , because the time-independent Schrodinger equation Ĥψ = Eψ has no i in it:
Proposition 95
Energy eigenstates of the time-independent Schrodinger equation, where V (x) is real, can be chosen to be real.
What this result says is that in general, we can always have the possibility to work with real solutions for our
energy eigenstates.
59
Proof. Consider the Schrodinger equation Ĥψ = Eψ. If we take the complex conjugate of both sides, we obtain
Ĥψ ∗ = Eψ ∗ (because the potential V (x) is real, we can check that (Ĥψ)∗ = Ĥψ ∗ ). Thus, ψ ∗ and ψ are two energy
eigenstates of the same energy. If they are linearly independent, then taking ψr = 21 (ψ ∗ + ψ) and ψi = 1
2i (ψ − ψ∗ )
gives us two new real, linearly independent energy eigenstates of the same energy (which we can work with instead).
And if ψ ∗ and ψ start off linearly dependent, then we just have one unique energy eigenstate, and at least one of those
linear combinations is nonzero. Thus we can always form enough new energy eigenstates that are real as long as V (x)
is real.
Essentially, these linear combinations of ψ and ψ ∗ , which we denoted ψr and ψi , give us the real and imaginary
parts of ψ, and both of those parts turn out to be solutions to the (time-independent) Schrodinger equation.
Corollary 96
All bound states in one-dimensional potentials are real up to a phase.
Proof. Proposition 95 tells us that all energy levels are nondegenerate. Following the proof of Proposition 95, ψ and
ψ ∗ must then be proportional to avoid degeneracy. Thus the real and imaginary parts are also proportional to each
other, so the full wavefunction ψ is just a complex number times its real part (which is a real wavefunction up to a
phase).
This is a strong statement – we’re essentially forced to work with real solutions for one-dimensional potentials, and
the only complex part of the full wavefunction comes in the time-evolution term e −iEt/ℏ of our stationary states.
Our third result will be the one that we used last lecture to solve for energy eigenstates of the finite square well:
Proposition 97
If a one-dimensional potential V (x) is even (that is, V (x) = V (−x)), then the energy eigenstates can be chosen
to be either even or odd (under x 7→ −x).
dφ(x) d 2φ
(x) = ψ ′ (−x) · −1 = −ψ ′ (−x), (x) = −ψ ′′ (−x) · −1 = ψ ′′ (−x).
dx dx 2
Thus, evaluating the original time-independent Schrodinger equation at -x yields
2m
ψ ′′ (−x) + (E − V (−x))ψ(−x) = 0,
ℏ2
and now replacing ψs with φs and noting that V (−x) = V (x) yields
2m
φ′′ (x) + (E − V (x))φ(x) = 0,
ℏ2
which is the same Schrodinger equation but for φ. Thus, both ψ(x) and φ(x) = ψ(−x) are solutions to the Schrodinger
60
equation with the same energy, so we can form the symmetric and antisymmetric parts of the wavefunction
1 1
ψs (x) = (ψ(x) + ψ(−x)), ψa (x) = (ψ(x) − ψ(−x)),
2 2
which are even and odd, respectively, and by superposition they are also energy eigenstates of the same energy E.
Thus we can use these symmetric and antisymmetric wavefunctions instead (and regardless of whether ψ and φ are
linearly dependent or not, doing this operation preserves that fact).
Corollary 98
All bound states in one-dimensional potentials are either even or odd.
Proof. Again, Proposition 95 tells us that all energy eigenstates are nondegenerate, so the only choice we have for our
eigenstates is multiplying by a constant (which does not change whether a function is even, odd, or neither). Thus
the result above directly applies.
More specifically, ψ(x) and ψ(−x) must be proportional, and we can assume by Corollary 96 that ψ(x) (and thus
ψ(−x)) must be real. But if ψ(−x) = cψ(x), then ψ(x) = cψ(−x) by replacing x with −x in the equation above.
Thus ψ(x) = c 2 ψ(x), and thus c = ±1, corresponding to even and odd wavefunctions respectively.
This explains why our search for solutions of the finite square well last lecture was divided into two cases (symmetric
and antisymmetric) – there are no other energy eigenstates that are possible. We’ll be using this result in future lectures
as well!
Now that we’ve established some general results, we’ll turn to some qualitative insights of our (now established
real) energy eigenstates. Whenever we have a problem with a potential V , we have a total energy E = K + V , and in
our classical intuition energy is conserved, so both K and V are functions of x, but E is some fixed number. (And as
V gets larger, K gets smaller, and vice versa.) Classically, if we know the kinetic energy of a particle, then we know its
p2
momentum (because K = 2m ). But we know that the corresponding de Broglie wavelength of the particle is λ = ph ,
so given that the particle has some kinetic energy K, we should expect some kind of “wavelength” of λ to pop up in
the wavefunction for our energy eigenstate. And all of that discussion is exactly true if our potential V is constant,
because that’s exactly how we derived the Schrodinger equation in the first place.
Example 99
Suppose we have a linearly growing potential V (x) and some total energy E, so that as the particle moves to the
right, its potential energy increases and thus its kinetic energy decreases.
Having the kinetic energy K decrease means that the momentum p gets smaller and that the “wavelength λ gets
larger” (at least, we can say this if the potential is not growing too quickly, since that means V is approximately
constant. Specifically, we can define a K(x), p(x), and λ(x) in terms of the position of our particle x, and that means
we predict that the wavefunction will oscillate more and more slowly as x increases. The exact formula for the solution
to the Schrodinger equation in a linear potential relies on Airy functions and so on, so everything we’re saying is only
approximate but still useful (and if we plot the wavefunctions that’s indeed what we’ll see).
Example 100
Consider a potential V (x) as in the diagram below – we’ll try to extract some more qualitative features of the
wavefunction, much like we’ve described above.
61
E
K(x)
V (x)
Classically, we know that the kinetic energy of a particle (represented by the black dashed line) cannot become
negative, so the particle cannot move past either intersection point where V (x) = E. (Thus, there are regions on the
left and on the right which is classically forbidden.) Those intersection points are called turning points, because in
a classical setting those are the positions where (if we imagine a ball rolling around in the potential) the particle would
stop and start turning back.
What we discussed above is basically that if our potential V were constant, then our wavefunction would take on
a very simple form with a fixed wavelength λ. But if V (x) varies slowly enough – that is, if the percentage change in
V (x) is small compared to the relevant distance λ(x) – then we can speak of a “local wavelength.” This is actually
getting to the discussion of the WKB approximation, which we discuss in 8.06, and the requirement is that
dV
λ(x) ≪ V (x),
dx
h h
where λ(x) = =p is the local de Broglie wavelength of the wavefunction at a position x. Really,
p(x) 2mK(x)
all we’re saying is that if we were to sketch the wavefunction ψ(x) corresponding to the potential V (x) above, we’d
generally have a faster oscillation where V (x) is smaller and a slower oscillation where V (x) is larger.
On the other hand, we can also discuss the amplitude of the wavefunction at these different positions, and this is
connected to the correspondence principle. Basically, if a particle spends more time in some region, then the
wavefunction should have larger magnitude there, so in the example above, we should spend more time where V (x)
is larger (because classically that corresponds to a smaller K(x) and thus a smaller velocity for the particle). This
turns out to be true, but we’ll explain it in a bit more detail.
The probability for a particle to be found in a region [x, x + dx] is |ψ(x)|2 dx, which is essentially proportional to
dt
the fraction of time T spent in this interval (where T is the total time in a period, if we imagine the particle rocking
back and forth classically). Thus, we have
dt dx 1 λ(x)
|ψ|2 dx ∼ = =⇒ |ψ|2 ∼ = ,
T v (x)T p(x) h
p
so |ψ| ∼ λ(x) – that is, in areas where the wavelength is longer and the potential V (x) is larger, we will have a
larger amplitude for the wavefunction ψ(x). Combining this with our previous discussion, as well as the node theorem
from last lecture (telling us how many times ψ intersects zero), we should now be able to estimate the wavelength
and amplitude of a wavefunction, giving us a general sketch of ψ, without explicitly solving the Schrodinger equation.
We’re now ready to get the full local picture of the wavefunction: we can rewrite the time-independent Schrodinger
equation as
1 d 2ψ 2m
2
= − 2 (E − V (x)).
ψ dx ℏ
(Note here that ψ is an energy eigenstate of energy E.) We then have three cases to think about when trying to
describe what ψ looks like (and remember that we can always treat it as real-valued):
62
d 2ψ
• When E − V (x) < 0 (the classically forbidden region), the right-hand side is positive, so ψ and dx 2 always
have the same sign (unless ψ is zero). In other words, ψ is always convex towards the axis – if ψ is positive,
then we have a “parabola going up,” and if ψ is negative, then we have a “parabola going down.” But since we’re
in a classically forbidden region, the wavefunction should not get arbitrarily big – instead, what’s often actually
happening is that our wavefunction is asymptotically tending to zero as x → ∞ and x → −∞ (and thus the
probability of finding the particle deeper into the classically forbidden region decays to zero as well).
d 2ψ
• when E − V (x) > 0 (the classically allowed region), the right-hand side is negative, so ψ and dx 2 have different
signs. We can’t say anything asymptotically for this kind of behavior, but we should essentially imagine a sine
function in the classically allowed region (downward parabola for ψ positive and upward parabola for ψ negative);
basically, ψ is always concave towards the axis.
d 2ψ
• When E − V (x) = 0 (one of the turning points), the right-hand side is zero, so either ψ = 0 or dx 2 = 0.
This corresponds to having inflection points where ψ is nonzero (though notice that when ψ is zero, we also
automatically have a point where the second derivative is zero). Thus, the key insight here is to remember that
inflection points of occur at turning points and nodes.
Example 101
We’ll finish this lecture by returning to a generic smooth symmetric potential V (x) and trying to understand its
energy eigenstates.
Suppose that V (x) looks as shown below, and as always we want to find bound states with some energy E < 0.
We’ve labeled some example energies on the diagram below:
V (x)
E3
E2
E1
E0
Because our wavefunction will always hit the classically forbidden region for large enough |x|, we know that ψ will
decay asymptotically to zero on both the left and right side. Furthermore, we can choose ψ to be real-valued, and we
can multiply it by a sign so that ψ is positive for large positive x (but decaying to 0), but depending on whether ψ is
symmetric or antisymmetric, it may be negative or positive for large negative x.
We’ll try to get an understanding for why a potential of this form naturally gives us energy quantization by
sketching the wavefunctions that would arise if we tried to form eigenstates at energy E0 , E1 , E2 , and E3 by numerically
integrating from −∞ and ∞ inward to 0. Each energy eigenstate will decay to zero at the endpoints, but between
the turning points it will be concave towards the axis. For low energies, the turning points are close to the y -axis, and
thus there isn’t enough time for ψ to turn all the way back to 0. This means we must have an even wavefunction,
but the turning must happen in such a way that the wavefunction does not have a discontinuity in the derivative
(because V is continuous). So it’s possible that in this example, the wavefunction formed by E0 will not work as an
energy eigenstate because of a kink, but the one formed by E1 will, as shown below. (Turning points are labeled in
black.)
63
E0 : E1 :
In other words, as we slowly increase the energy E, we will finally get to a point where the curve flattens at x = 0,
and that’s our first energy eigenstate. If we continue to increase our energy, the turning points will move further out,
and we will have even more time to curve back towards the x-axis. This means that we again will not have a valid
wavefunction until the ψ we obtain hits the origin – at that point, we can obtain an antisymmetric solution as shown
with the energy E3 below.
E2 : E3 :
So the second energy eigenstate occurs when the curve matches up to form an antisymmetric wavefunction. Further
increasing the energy will create more oscillation in the purple (classically allowed) region, in such a way that for some
higher energy E, ψ matches up with continuous derivative after two nodes. Repeating this process gives us higher and
higher energy eigenstates – this is essentially the intuition for the node theorem and also for why energy is quantized!
Example 102
We can connect this argument to one other piece of intuition through the shooting method for solving differential
equations.
If we have a symmetric potential and we’re looking for even energy eigenstates, we can do the following procedure:
pick some energy E0 , and require ψ(0) = 1, ψ ′ (0) = 0. (We don’t need to normalize our eigenstates, so we’ll use this
as the rescaling instead, and the reason for zero derivative is, just like above, to ensure that we have no kinks in ψ.)
Since we have a second-order differential equation, these boundary conditions allow us to numerically integrate and
find the wavefunction ψ for all x.
What we’ll find typically is that after a while, ψ will start to blow up to ±∞ for large x, which is bad because that
means we won’t have a normalizable energy eigenstate. But if we change E slightly, we might find that ψ now blows
up in the opposite direction ∓∞ for large x. Then what that means is that in between our two values of E, there’s a
single point at which the wavefunction will decay, and that’s the allowed energy eigenvalue for our potential V (x)!
Fact 103
Repeatedly searching in a smaller and smaller interval gives us the energy E to any arbitrary accuracy that we
want, as long as Mathematica can still integrate for us, and the only thing we need to do is clean up our differential
equation so that there are no units.
64
Lecture 13: The Delta Function Potential, Node Theorem, and
Harmonic Oscillator
We’ll start today by solving the Schrodinger equation for a new potential:
Example 104
Suppose we have a particle in one dimension placed in the potential V (x) = −αδ(x) for some α > 0 (this can be
represented with a thick arrow pointing in the downward y -axis direction).
Essentially, we should imagine that we have a potential which is “infinitely negative” at x = 0, or equivalently
the limit of a square well that gets deeper and narrower while keeping the area the same (which is in fact one way
to analytically calculate the energy levels of this potential). Just like with the other potentials, we would like to
calculate whether there are bound states (in this case, this means E < 0) and what their energy eigenvalues are for
the corresponding Schrodinger equation.
It turns out we can discover a lot without having to do explicit calculations with the differential equation, and
we do so by using our intuition and some of the discussions we’ve been having in the past few lectures. Our first
approach will be using units: notice that the three constants that are present in the problem are α, m, and ℏ. If there’s
only one way to construct a quantity with units of energy using these constants, then all energy eigenvalues must
be proportional to that quantity. When three constants have units that are not “linearly dependent,” that is indeed
possible – we build objects with units of length, mass, or time, and then from that we can do anything. Because δ(x)
1
has units of [L] (remember that integrating the delta function over the real line gives us a unitless constant), and V
has units of energy, we must have
[α] [m][α]2
[E] = = .
[L] [ℏ]2
This means that because the bound state energy must be in the forbidden region E < 0 throughout x, we therefore
predict that
mα2
Eb = −#
ℏ2
for some positive number #, and there can’t be any other possibilities just from our unit considerations! And we
should expect that this number should not be particularly large or small – for the problem to be “natural,” we should
expect # on the order of 1. But we’ll be more precise about that in a second.
Another way we can think about this potential is to consider the regularized version of the delta function, which
is basically a very deep and very narrow finite square well. Recall that because this is a symmetric potential (because
there’s nothing asymmetric about the delta function), the ground state should be even and have no nodes – refer to
the pictures under Example 101 for illustration. But as the wavefunction in the classically allowed region becomes
narrower and narrower, we expect the “curving back” in that region to occur more and more rapidly, so that in the
limit we actually get a discontinuity in the derivative of the wavefunction ψ. With this, we can now write down the
differential equation for the delta function potential (though we’ll still not solve it): for x ̸= 0, we have no potential,
65
so
ℏ2 ′′
− ψ = Eψ =⇒ ψ ′′ = κ2 ψ for x ̸= 0,
2m
where κ2 = − 2mE
ℏ2 > 0 just like in the finite square well. Such a differential equation has solutions e
±κx
(as we already
know), or equivalently cosh(κx) and sinh(κx) if we prefer. But we can think now about how many bound states this
potential will have – if there’s one of them, it will be even and have no nodes, and if there’s a first excited state
after that, it must be odd and vanish at the origin (so that it has one node). That occurs only with a function like
ψ(x) ∝ sinh(κx), but that’s not good because the function is convex away from the axis even in a forbidden region.
So there can only be at most one bound state, and now we’re ready to solve the problem algebraically. Because our
ground state needs to be even, we must have
so that our function decays as |x| → ∞, and this seems to be on the right track because it’s what we expect from a
ground state of the form in Example 101 with a very narrow middle region. Our wavefunction is indeed continuous,
and now we just need to use the fact that the delta function is of intensity α. (It’s good that our energy scale given
in Eb has α in the numerator – indeed, as the potential gets stronger, the bound state should get deeper, so we have
some reasonable conditions.) This next part is important because we’ll do it again and again in this course: the
Schrodinger equation with the potential term is
ℏ2 d 2 ψ
− + V (x)ψ(x) = Eψ(x),
2m dx 2
and now we integrate both sides from −ε to ε (where eventually we’ll be looking at the limit ε → 0) to get
Z ε Z ε
ℏ2
dψ dψ
− − + −αδ(x)ψ(x)dx = E ψ(x)dx,
2m dx x=ε dx x=−ε −ε −ε
because the integral of the second derivative is the first derivative. Now taking a limit, we have
Z ε
ℏ2
dψ dψ
− lim − − αψ(0) = lim E ψ(x)dx = 0,
2m ε→0 dx x=ε dx x=−ε ε→0 −ε
because ψ is not divergent and we’re shrinking the limits of integration to a single point. Thus, rearranging this
equation gives us what we wanted – we get the differences between the derivatives at 0+ and 0− , which allows us to
see the discontinuity in ψ ′ , which we’ll denote ∆0 ψ ′ :
ℏ2 2mα
− ∆0 ψ ′ − αψ(0) = 0 =⇒ ∆0 ψ ′ = − 2 ψ(0) .
2m ℏ
In other words, as long as the wavefunction doesn’t vanish, it can have a discontinuity due to a delta function, and in
fact the change in derivative is proportional to the value of ψ and also to the strength of the delta function α. And
returning to our wavefunction, notice that the derivative of our boxed ψ(x) above has discontinuity
2mα 2mα
A = − 2 ψ(0) = ∆0 ψ ′ = lim −κAe −κε − κAe −κε = −2κA,
− 2
ℏ ℏ ε→0
so that the 2As cancel (we should not expect A to show up because it’s just a normalizing factor in a linear equation),
mα
and we find κ = 2 . And this is good, because we’ve now specified the energy of our bound state
ℏ
ℏ 2 κ2 ℏ 2 m 2 α2 mα2
E=− =− 4
= − 2 ,
2m 2m ℏ 2ℏ
66
and indeed the number has been determined to be # = 12 . And with that, we’ve learned that a single delta function will
give us a single bound state – adding more delta functions will give us more bound states, as we’ll examine ourselves.
Example 105
We’ll now spend a few minutes talking more about the node theorem, which we’ve stated and used for some of
our previous arguments. We won’t do a mathematically rigorous justification, but we’ll give the main intuition for
why we should believe it to be true.
Suppose we have bound energy eigenstates of a one-dimensional potential, labeled ψ1 , ψ2 , ψ3 , · · · with respective
energies E1 , E2 , E3 , · · · in increasing order. We are claiming that ψn has (n − 1) nodes, and we’ll understand this by
thinking about continuity.
Fact 106
Note that the wavefunction can never have ψ(x0 ) = ψ ′ (x0 ) = 0, because for a second-order differential equation
the initial conditions ψ and ψ ′ tell us everything and the general solution must be ψ = 0. So for any state ψ, the
derivative at a node must be nonzero.
The idea is that if we have some arbitrary potential V (x), we can truncate it to just the range [−a, a] and get the
screened potential
V (x) |x| < a,
Va (x) =
∞ |x| > a.
In other words, we have an infinite well, but our potential is no longer flat inside the well. And as a → ∞, the bound
states of our screened potential will become the bound states of our original potential V (x), because all bound states
decay to 0 and thus the potential makes “less and less of an impact” as we move the walls outward. And as a increases,
the bound states evolve continuously, but for small a we basically have a very narrow infinite well where the potential
is essentially flat. Thus, as a → 0, we can use the states of the infinite potential, in which we have a ground state
with no nodes, a first excited state with one nodes, and so on, satisfying all of the assumptions of the node theorem.
Thus, all that we need to show is that as we increase the length of the screen, we cannot change the number of
nodes continuously.
Indeed, consider the derivative of the nth energy eigenstate of ψa at x = a. Introducing a new node as we increase
a means that we go from approaching the boundary from below to above, or vice versa, meaning that ψa ’s derivative
at its endpoint must switch signs. Thus, there would need to be some intermediate value of a such that ψ ′ = 0 at
the endpoint. But because we have an infinite wall at that point, we also have ψ = 0, and by Fact 106 this cannot
occur for a nonzero wavefunction. Thus, no new nodes can be created. The same argument works for having a node
disappear from our wavefunction, or having a node appear in the middle of the region (which would imply a tangent
point when the node first emerges). So at least intuitively, we see that the nth energy eigenstate will continue to have
(n − 1) nodes as a gets larger, and thus in the limit we have the desired node theorem.
Our next step in this class is to look at a classical system that has a lot of deep theory, and we’ll be returning to
it in the next few lectures as well. Recall that classically, we have a system with a total potential plus kinetic energy
p2 1
E =K+U = + mω 2 x 2 ,
2m 2
q
k
where ω = m for the spring constant k (so that the potential term can also be written as 12 kx 2 ) and x represents
the amount that a spring is stretched from its equilibrium position. To invent a quantum version of this system, we
67
must come up with a Hamiltonian for the Schrodinger equation, and we do this in the purest possible way:
p̂ 2 1
Ĥ = + mω 2 x̂ 2 ,
2m 2
where x̂ and p̂ are operators satisfying [x̂, p̂] = i ℏ.
It will turn out that the quantization of energies in this problem is pretty interesting – even though the classical
oscillator can oscillate with any amplitude, this is not the case for the quantum oscilator. This Hamiltonian corresponds
to the quadratic potential V (x) = 21 mx 2 , and the reason it is so important is that it is a good approximation for any
potential around a local minimum (since by Taylor expansion, the first derivative vanishes), meaning that the logic
here can be (and in fact is) applied to many different oscillatory systems, like an electron in a magnetic field or the
motion of a diatomic molecule.
There are two ways we can find the energy eigenstates of this oscillator – notice that because V is unbounded,
all energy eigenstates are bound states for this system, which is not the case for something like the delta function
potential. Our goal is to solve the equation Ĥφn (x) = En φn (x) (where φn denotes the nth harmonic oscillator
eigenstate), and we can do this either by solving the differential equation directly or by inventing certain clever
raising and lowering operators which simplifies the problem to an algebraic trick. The latter of these will be covered
more in 8.05, but we’ll get a solution to the harmonic oscillator problem with both methods. First, we write down the
equation
ℏ2 d 2 φ 1
− + mω 2 x 2 φ(x) = Eφ(x),
2m dx 2 2
and we’ll first remove all of the units from the equation with a simple procedure: change the variable x into one
with no units, which will simplify numerical simulation and also make the structure more illuminating. We do this by
writing x = au for a unitless u and a constant a with units of length; here, we can use m, ω, ℏ to define a via
[ℏ]2 ℏ
[E] = 2
= [m][ω]2 [a]2 =⇒ a2 = .
[m][a] mω
q
ℏ
Plugging in x = mω u, we then get
ℏ2 d 2 φ 1
− + mω 2 a2 u 2 φ = Eφ,
2ma2 du 2 2
ℏ2
and now mω 2 a2 and ma2 have units of energy and must turn out to be something nice when we substitute in the actual
2
value of a : indeed, we end up with the equation
1 d 2φ 1
− ℏω 2 + ℏωu 2 φ = Eφ.
2 du 2
2
And finally, the equation looks nicest if we multiply by ℏω to get
d 2φ 2E
− 2
+ u2φ = φ = Eφ,
du ℏω
2E
where E = is the unit-free energy. Since knowing E and E are equivalent, we’ll find that working with E gives us
ℏω
68
the nicest answer: we’re essentially left with solving the second-order differential equation
d 2φ
= (u 2 − E)φ .
du 2
Thinking back to the “shooting method” from last lecture, this equation will have solutions for all values of E, but
most of them will diverge and not be normalizable. To understand this, we’ll consider the system for large x (that is,
large u) and see what the solution must look like there. Since E is a constant, the limiting behavior of the differential
d 2φ
equation looks like du 2 = u 2 φ. The way to get this kind of solution (where we gain powers of u after differentiating)
is not by using polynomials in u but by trying expressions of the form
αu 2
φ(u) = u k e 2 .
Indeed, we find that because we’re just trying to get a rough order of approximation, we have φ′ ∼ αuφ (the other
term is comparably negligible) and thus φ′′ = (αu)2 φ + (subleading terms). Thus, setting α = ±1 are likely to give us
approximate solutions, because those get us back to the original differential equation. Going further, we thus expect
that our large-|u| behavior will look like
2 /2 2 /2
φ(u) ∼ Au k e −u + Bu k e u ,
and now we can see where we might run into trouble: we can’t have the second of these two terms, so we must have
E chosen so that B = 0 is the desired solution. (And remember that everything here so far is still |u| → ∞, so we
don’t have any exact solutions, only general behavior for large u.)
So without any loss of generality, we’ll write
2 /2
φ(u) = h(u)e −u ;
2 /2
we can always multiply back by e u to get to the original function, but this should encode some of the limiting
behavior that we expect, and we hope that h(u) will not diverge too quickly (because then φ(u) will be normalizable)
– specifically, now that we’ve isolated the divergence, we hope to see that h(u) is actually a polynomial, because then
it is easier to see “when our solution ends.” Our next step is then to get the differential equation for h(u); it will turn
out that we must solve
d 2h dh
− 2u + (E − 1)h = 0 .
du 2 du
Next time, we’ll dive more into this and understand how quantization helps us get to the final answer!
69
We can proceed by directly plugging this into the differential equation, but that involves reindexing sums and can be
a bit complicated. Instead, we can just consider the terms with u j s: we must start with a u j+2 in the first term, a u j
in the second term, and a u j in the third term, so we find that for all j ≥ 0, we require
aj u j is zero.)
P
(Specifically, we’re using the fact that a power series of u can only be identically zero if each term in
In other words, the coefficients of our power series must be related as
2j + 1 − E
(j + 2)(j + 1)aj+2 = (2j + 1 − E)aj =⇒ aj+2 = aj ,
(j + 2)(j + 1)
giving us a recurrence relation for the aj s. So now we can start with a0 and successively obtain a2 , a4 , a6 , and so on,
which gives us an even solution h(u), or we can start with a1 and successively obtain a3 , a5 , a7 , and so on, giving us
an odd solution. (Notice that specifying a0 and a1 is just like specifying the value of h(u) and h′ (u) at u = 0, so it
makes sense that those two values are all we need to get the full solution of the differential equation.)
But now we must ask what happens to these coefficients aj , because as written the solution may go on forever.
Then we have
aj+2 2j + 1 − E 2j 2
= ∼ 2 = ,
aj (j + 2)(j + 1) j j
so the coefficients do decay. But they aren’t decaying fast enough – notice that for example that
∞
2
X 1 2n
eu = u .
n=0
n!
1 j 1 cj+2 1
Looking at the even j = 2n, we have terms of the form (j/2)! u , so the coefficients cj = (j/2)! satisfy cj = j+2 =
2
2 2
j+2 ∼ j (no issues with fractional factorials because j is even). So the point is that if we don’t truncate the recursion
2 2 /2
relation at some point, our wavefunction will actually be unbounded, because h(u) ∼ e u , meaning that φ(u) ∼ e u
(the “safety factor” isn’t enough to make our normalization work). So the differential equation cannot work with
arbitrary energies – the normalization requirement now quantizes the energies for us, because the only way we
aj+2
can make this work is to have aj = 0 for some j, meaning that
and for whichever value of j that is picked, we will have aj+2 = 0 (because the numerator in the recurrence is then 0).
Even for j = 0, we’ll have a nonzero a0 but not a2 , and that gives us a nontrivial wavefunction as well. So putting
everything together, our wavefunctions are of the form
for some nonnegative integer n (an even function for even n and an odd function for odd n – remember that energy
eigenstates must always be either even or odd), and those correspond to energies of E = 2n + 1 and thus an energy of
ℏω 1
E= (2n + 1) = ℏω n + . So we’ve arrived at a famous fact: the energy levels of a harmonic oscillator are
2 2
evenly spaced, except there is an offset of 12 ℏω so that the ground state is already a little above E = 0.
70
Definition 108
The Hermite polynomial Hn (u) is the polynomial satisfying the Hermite differential equation
d2 dHn
2
Hn (u) − 2u + 2nHn (u) = 0,
du du
with normalization chosen so that Hn (u) has leading term 2n u n .
(We know that the leading term is of degree n, and because the Hermite differential equation is linear, we can
always scale as we wish.) A few small cases that we can keep in mind are
∞
2 +2zu
X zn
e −z = Hn (u)
n=0
n!
which we can use to compute Hermite polynomials by expanding out the left-hand side and collecting terms by powers
of z. And with this formula, we can indeed see that Hn (u) begins with 2n u n , and showing that these z-coefficients
indeed satisfy the Hermite differential equation turns out to be pretty simple as well.
x
But turning back to our original problem, we can substitute a back in for u, so that we can see the form of our
original wavefunctions: we have
x x2 ℏ
φn (x) = Nn Hn e − 2a2 , a2 =
a mω
for some normalization constant Nn , with corresponding energies
1
En = ℏω n + .
2
We’ll now turn to an operator-based algebraic approach: if we look back at the simple harmonic oscillator Hamiltonian
again, we can write it as a sum of squares
p̂ 2 p̂ 2
1 1
Ĥ = + mω 2 x̂ 2 = mω 2 x̂ 2 + 2 2 .
2m 2 2 m ω
We’ll now try to factorize the Hamiltonian essentially as the product of two factors: we wish to write Ĥ = V̂ † V̂ + c,
where V̂ † is the Hermitian conjugate of V̂ . In particular, adding the constant c only shifts our energies without doing
anything important, and then writing as V̂ † V̂ ensures that Ĥ is still Hermitian (because (ÂB̂)† = B̂ † † , and (V̂ † )† = V̂ ).
We’ll soon see why this is a useful thing to do, but first we’ll see how to actually factor in this case: if we had a2 − b2 ,
then we could factor as (a − b)(a + b), and now because we have a2 + b2 , it makes sense to factor as (a + i b)(a − i b)
instead. Since quantum mechanics inherently needs complex numbers, this is a reasonable thing to try: we in fact
almost have
p̂ 2 ?
2 p̂ p̂
x̂ + 2 2 = x̂ − i x̂ + i ,
m ω mω mω
but this is not exactly true because we have operators instead of numbers, so commutativity matters: indeed, we
instead have
p̂ 2
p̂ p̂ i
x̂ − i x̂ + i = x̂ 2 + 2 2 + [x̂, p̂].
mω mω m ω mω
71
ℏ
Since the commutator here is i ℏ, the constant term is − mω , so we instead have that
p̂ 2
2 p̂ p̂ ℏ
x̂ + 2 2 = x̂ − i x̂ + i + I,
m ω mω mω mω
where I denotes the identity operator. But now we can call the two terms in parentheses V̂ † and V̂ , respectively – more
p̂
specifically, we can define V̂ = x̂ + i mω and notice that V̂ † is indeed the other term, because x̂ and p̂ are each their
own Hermitian conjugate and the i turns into a −i . So putting everything together and simplifying, our Hamiltonian
is now
1 1
Ĥ = mω 2 V̂ † V̂ + ℏω.
2 2
However, V̂ and V̂ † still have units here, so we’ll simplify a bit more. We can do this by computing the commutator
i p̂ i p̂
[V̂ , V̂ † ] = x̂ + , x̂ − .
mω mω
Definition 109
The destruction operator (also annihilation operator) â and creation operator ↠are given by
r r
mω † mω †
â = V̂ , â = V ,
2ℏ 2ℏ
p̂
where V̂ = x̂ + i mω .
We can verify that â and ↠are unit-free, because we have [a, a† ] = 1 . (And because ↠is different from â, this
is not a Hermitian operator.) Since we’re defining â and ↠in terms of x̂ and p̂, we can also get the equations the
other way around, which are r r
ℏ † mωℏ †
x̂ = (â + â ), p̂ = i (â − â).
2mω 2
Notice that from these expressions, it’s consistent that x̂ and p̂ are Hermitian (taking the conjugates of the right-hand
sides just swap the roles of â and ↠, and the i in p̂’s expression becomes a −i , giving us the original expression again).
So now we can write our Hamiltonian in terms of â and ↠instead of x̂ and p̂, since that was the original goal: we
know that
† 2ℏ † † 1
V̂ V̂ = â â =⇒ Ĥ = ℏω â â + .
mω 2
So we’ve now factorized Ĥ using these creation and destruction operators, and now it turns out we’ll be able to solve
the harmonic oscillator while barely needing to solve any differential equations (we just need to solve a single first-order
differential equation)! Recall that we have the inner product
Z
(φ, ψ) = φ∗ (x)ψ(x)dx,
72
so the expectation value of the Hamiltonian in a state ψ is
1 ℏω
⟨H⟩ψ = (ψ, Ĥψ) = ψ, ℏω ↠â + ψ = ℏω(ψ, ↠âψ) + (ψ, ψ).
2 2
Because ψ is normalized, the second inner product is 1, and now by the definition of a Hermitian conjugate, we can
move the ↠operator to the other side:
ℏω
= ℏω(aψ, aψ) +
.
2
Now (aψ, aψ) ≥ 0 for any state ψ (because plugging this into the definition of the integral gives us the integral of a
ℏω
nonnegative quantity). Thus, we find that ⟨H⟩ψ ≥ 0 + 2 , and this is the advantage of writing the Hamiltonian as
†
V̂ V̂ : this flipping argument tells us that we always get energies bounded from below.
ℏω ℏω
So now for any energy eigenstate ψ, ⟨H⟩ψ ≥ 2 , meaning that any energy eigenvalue is at least 2 . We know
ℏω
(from our previous method) that there is in fact an energy eigenstate of energy exactly 2 , and the way we arrive at
that fact is by noting that we get equality only if (âψ, âψ) = 0 =⇒ âψ = 0. Thus, if there is a ground state ψ0 , it
must satisfy r
mω p̂
âψ0 = 0 =⇒ x̂ + i ψ0 = 0,
2ℏ mω
by plugging in our definition of â. Removing the constant and converting everything to x-coordinates gives us the
differential equation
ℏ d
x+ ψ0 (x) = 0,
mω dx
and we’ve turned our second-order differential equation into a first-order differential equation by exploiting Hermiticity!
This differential equation is much easier to solve – we have
dψ0 mω mω 2
=− xψ0 =⇒ ψ0 (x) = N0 e − 2ℏ x ,
dx ℏ
1/4
which is normalized when N0 = mω
πℏ . So our ground state is a perfect Gaussian, and we can indeed check that
† 1 ℏω
Ĥψ0 = ℏω â â + ψ0 = ψ0 ,
2 2
because â acting on ψ0 already makes that term 0, and we recover the correct ground state energy for the harmonic
oscillator.
We’ll continue this discussion next time, since we still need to find all of the excited states. But basically, it turns out
â â is an important Hermitian operator called the number operator, and then using the fact that â↠− ↠â = 1 allows
†
us to “create” new energy eigenstates with the creation operator ↠. We’ll essentially see that ψ0 , ↠ψ0 , ↠↠ψ0 , · · ·
give us the full set of excited states for the harmonic oscillator!
73
ℏω
which allowed us to see that any energy eigenstate has energy at least 2 and also allowed us to find the unique
ground state ψ0 using a first-order differential equation. (Recall that â is a linear combination of x̂ and p̂, such that
[â, ↠] = 1.) And we do expect to see a nondegenerate ground state, because there are no degeneracies in the bound
state spectrum of a one-dimensional potential.
Definition 110
The number operator N̂ for the harmonic oscillator is N̂ = ↠â.
We can check that N̂ is Hermitian (because the conjugate of a product is the product of the conjugates, but in
reverse order) and that N̂φ0 = 0 (because N̂φ0 = ↠(âφ) = ↠0 = 0). In particular, N̂ is basically the Hamiltonian
up to a linear shift, and it’s unitless because â and ↠are, so it’ll be a useful thing to work with (as a dimensionless
energy). Specifically, eigenstates of N̂ and Ĥ are the same, with E = ℏω N + 12 .
Today, we’ll explore how to use this number operator to get excited states of the harmonic oscillator, completing
our solution. We’ll first understand how it interacts with the other operators we have here, and this often means
computing relevant commutators. We have that
(because the â commutes with the â), which is a simple expression, and similarly
So the commutator of N̂ with either â or ↠gives back a number times either of those two operators, and those
numbers are why â, the destruction operator, is also called the lowering operator and why ↠is similarly called the
raising operator. We can also additionally compute (if we’re confused about what’s going on here, we should try
some small values of k to have a clear picture)
because we essentially have to move an â across a string of k ↠s, and each time we move once we gain a contribution
of (↠)k−1 [â, ↠] = (↠)k−1 . This should remind us of the commutators [p̂, x̂ n ] from earlier in the course as well.
Example 111
Taking a look at the k = 2 case,
[â, (↠)2 ] = â↠↠− ↠↠â = â↠↠− ↠â↠+ ↠â↠− ↠↠â ,
so using these results, we find that we can do more with the number operator: we have
74
by basically taking the equation in the line above and putting an extra â in the left argument of the commutator (which
commutes with the right argument), and similarly
So even with a string of âs or ↠s, taking the commutator with N̂ still gives us back the same operator with a number,
and in fact that counts the number of âs or ↠s – this explains the reason for the name number operator.
We’ll now see how this all comes together – define the state
φ1 = ↠φ0 .
(Remember that âφ0 = 0, so we wouldn’t get an interesting quantum state if we tried applying the other operator.)
We may ask if this is an energy eigenstate, and the easiest way to check this is to see if it’s a number eigenstate.
Indeed, a useful manipulation is to say that
since N̂ kills φ0 , so the ↠N̂φ0 term is zero. And the formulas for commutators we’ve established above means that
it’s often easier to work with them than products – since [N̂, ↠] = ↠, we find that
N̂φ1 = ↠φ0 = φ1 ,
1 3
meaning that φ1 is an eigenstate of N̂ (and thus Ĥ) of eigenvalue 1 (and thus ℏω 1 + 2 = 2 ℏω). And this is
†
the reason for the name “creation operator” – acting on the vacuum (lowest energy state) with â gives us a new
eigenstate. And if we want a more concrete expression for the wavefunction, we can write ↠and φ0 in terms of x or
p and get an explicit formula.
We defined φ0 to be a normalized energy eigenstate, and it makes sense to ask whether φ1 is normalized. This
might seem like it is even more difficult than normalizing φ0 , but in fact most of the work has already been done:
(φ1 , φ1 ) = (↠φ0 , ↠φ0 ) = (φ0 , â↠φ0 ) = (φ0 , [â, ↠]φ0 ) = (φ0 , φ0 ) = 1
by the properties of the Hermitian conjugate and a similar commutator manipulation as above. (And learning how to
do these kinds of algebraic tricks just comes with practice.)
To get the next state, we’ll try to define the state
1
= 25 ℏω. This time when checking normalization, we find that
so we have an energy eigenstate of energy ℏω 2 + 2
Now the blue â would kill φ0 , so we can substitute in a commutator with the subsequent ↠s:
75
Pulling out the 2 and then using the same commutator trick as before gives us
Proposition 112
The nth excited state of the simple harmonic oscillator is
1
φn = √ (↠)n φ0 .
n!
1
This state has number eigenvalue n and thus energy eigenvalue ℏω n + 2 .
These calculations can be checked by noting (for the number operator) that
1 1 1
N̂φn = √ N̂(↠)n φ0 = √ [N̂, (↠)n ]φ0 = √ n(a† )n φ0 = nφn ,
n! n! n!
and (for the normalization) that
1 1 1
(φn , φn ) = (φ0 , ân (↠)n ) = (φ0 , ân−1 [â, (↠)n ]φ0 ) = (φ0 , ân−1 · n(↠)n−1 φ0 )
n! n! n!
1
and then inductively pulling out factors of (n − 1), (n − 2), · · · , 2, 1 to cancel out with the n! .
With this structure in place, we can notice that applying ↠to any of the energy eigenstates gets us the next higher
energy eigenstate (with some constant factor). On the other hand, applying â to an energy eigenstate gets us the
next lower energy eigenstate, because it essentially cancels out one of the ↠s. To be more precise, notice that
1 n 1 n 1 1 p n √
âφn = √ â ↠φ0 = √ [â, ↠]φ0 = √ n(↠)n−1 φ0 = √ n (n − 1)!φn−1 = √ φn−1 = nφn−1 ,
n! n! n! n! n
and similarly we have the relation
1 1 p √
↠φn = √ (↠)n+1 φ0 = √ (n + 1)!φn+1 = n + 1φn+1 .
n! n!
This gives us all of the tools we might need to do calculations like the following:
Example 113
Suppose we want to calculate the expectation values ⟨x̂⟩φn and ⟨p̂⟩φn in our energy eigenstates.
In our conventional language, this is difficult to compute, because we would need to write φn in terms of a Hermite
polynomial and then take a complicated integral. But there are very quick alternative calculations to get us the answer:
R
• For the expectation value of x̂, we are essentially calculating an expression like x(φn (x))2 dx. But because we
have a symmetric potential, all of the φn (x)s are either even or odd, so their squares are always even. Thus the
integrand is odd and the expectation is always zero.
• For the expectation value of p̂, because we have a stationary state, it doesn’t make sense to have a nonzero
76
momentum because “the state doesn’t move.” (And this is actually a more general result: the expectation of
the momentum on a bound state with a real wavefunction is always zero, by integration by parts.)
But in preparation for more complicated calculations, we can also write out these expectations in terms of raising
and lowering operators: r
ℏ
φn , (â + ↠)φn .
⟨x̂⟩φn = 0 = (φn , x̂φn ) =
2mω
√ √
But now (â + ↠)φn is a linear combination of φn−1 and φn+1 (specifically, it is nφn−1 + n + 1φn+1 ), and the overlap
of this with φn is zero by orthogonality of the energy eigenstates (because they have different energy eigenvalues).
Another way to understand this orthogonality in the harmonic oscillator case is that if we wanted to compute something
like
(φ3 , φ2 ) = (↠↠↠φ0 , ↠↠φ0 ) = (φ0 , âââ↠↠φ0 ),
there are too many âs – two of them can cancel out with the ↠s, but the last one will kill φ0 , and this will always
vanish. So whenever m ̸= n, (φm , φn ) will indeed be zero.
Example 114
Suppose that we now want to calculate the uncertainty ∆x in φn , meaning that we must calculate (∆x)2 =
⟨x̂ 2 ⟩φn − (⟨x̂⟩φn )2 = ⟨x̂ 2 ⟩φn .
This time, we will need the full power of the creation and annihilation operators (because doing the integral with
Hermite polynomials would require a lot of work or creativity): we find that
ℏ
⟨x̂ 2 ⟩φn = (φn , (x̂)2 φn ) = φn , (â + ↠)(â + ↠)φn .
2mω
Expanding out (what we really care about is the φn coefficient in the second argument), we have
ℏ
φn , (ââ + ↠↠+ â↠+ ↠â)φn
=
2mω
and now only the last two of the four terms give us a nonzero contribution by orthogonality, so we can use the boxed
relations (of â and ↠acting on φn ) above to get
ℏ √ √ √ √ ℏ
= φn , ( n + 1 · n + 1 + n · n)φn = (2n + 1),
2mω 2mω
or we can rewrite in terms of the number operator, using that â↠= [â, ↠] + ↠â = 1 + N̂, to find again that the
squared uncertainty is
ℏ ℏ
⟨x 2 ⟩φn =
φn , (2N̂ + 1)φn = (2n + 1) .
2mω 2mω
And similarly, we can calculate the uncertainty ∆p, giving us an explicit calculation of ∆x∆p for this state – this is left
as an exercise to us.
With that, we’ll now turn to a new topic:
Definition 115
Scattering states are energy eigenstates that cannot be normalized.
The motivating examples to keep in mind here are the e ipx/ℏ infinite plane wave solutions – even though they
individually cannot be normalized, they can be put together to create (normalizable) wave packets, which can properly
represent particles. We’ll start with a particularly illustrative example, the step potential:
77
Example 116 (Step potential)
Consider a one-dimensional system given by the potential
0 x < 0,
V (x) =
V
0 x ≥ 0.
Recalling that energy eigenstates must have energy at least min V (x), we know that any eigenstate will have
positive energy, and thus there are two qualitative cases, namely E < V0 and E > V0 . But it turns out we can just
solve one of the two cases and “do the other by analytic continuation” – no matter which case we’re in, the solution
will be non-decaying for x < 0 (because E > V ), and thus energy eigenstates will never be normalizable.
We’ll first do the case where E > V0 (so that the solution is sinusoidal for both x > 0 and x < 0), and we’ll
visualize a solution as “coming in from the left:”
Ae ikx + Be −ikx x < 0,
ψ(x) =
Ce ikx x ≥ 0.
This is a wave “coming in from the left” because putting in the e −iEt/ℏ factor into Ae ikx gives us an e i(kx−ωt) traveling
wave. Analogously to the systems studied in 8.03, this wave will then have a “reflected” and a “transmitted” component,
corresponding to the Be −ikx and Ce ikx parts, respectively. And we know what k and k must be from the Schrodinger
2 2 2mE 2 2m(E − V0 )
equation: since the kinetic energy of the particle is ℏ2mk , we must have k 2 = 2
and k = . Fur-
ℏ ℏ2
thermore, the wavefunction and its derivative must be continuous at x = 0 (since there are no delta functions),
so
k
A + B = C, i kA − i kB = i kC =⇒ A − B = C.
k
So there will be one free parameter among A, B, C, which is okay because A is the “input” strength of the incoming
B k −k C 2k
wave – what matters is the values = and = . And we’ll see next time how to turn this into a
A k +k A k +k
wave packet for a physical particle!
The relevant ratios for reflection and transmission in this potential turn out to be
B k −k C 2k 2mE 2 2m(E − V0 )
= , = , where k 2 = , k =
A k +k A k +k ℏ2 ℏ2
(since the particle has different kinetic energies in the two parts of the potential and thus different de Broglie wave-
lengths). And if we now look at the limit where E → V0 , we find that k → 0 and thus B → A and C → 2A, yielding
78
the solution
2A cos kx x < 0,
ψ(x) =
2A x ≥ 0.
(This isn’t normalizable, but neither is the original solution, so we don’t need to worry too much about that.)
We can now discuss the conservation of probability principle from earlier in the course. We can imagine a
probability current flowing into the step potential from the left, and if we imagine a narrow window around the
discontinuity, we must have the same amount of probability current leaving that window as entering: recall that
ℏ ∗ ∂ψ
J(x) = Im ψ ,
m ∂x
ℏk ℏk 2
JL (x) = (|A|2 − |B|2 ), JR (x) = |C| .
m m
Indeed, we see that the current doesn’t depend on the value of x and is (after some computation) equal for x < 0 and
x > 0 (keep in mind that all quantities in the following line are real numbers):
! 2 !
B 2
2
ℏk 2 ℏk k −k 2 ℏk 4kk 2 ℏk 2k ℏk 2
JL = 1− |A| = 1− |A| = |A| = |A|2 = |C| = JR .
m A m k +k m (k + k) 2 m k +k m
(And in fact, because this conservation law comes from Schrodinger’s equation, it makes sense that the relations
between A, B, and C are also encoded in that equation.) But because the solution ψ(x) is made up of two different
ℏk 2 ℏk 2
oscillating components, we can also write JL = JA − JB , where JA = m |A| and JB = m |B| are the probability
currents brought by the incoming and reflected waves alone. (We’ll thus also write JR = JC .) This suggests that we
should define reflection and transmission coefficients in terms of this current:
2 2
JB B JC k C
R= = , T = =
JA A JA k A
Essentially, a reflection coefficient of 0.1 would indicate that if we put particles into this potential from the left, then 10
percent of them would be reflected. (We don’t actually have particles yet, but this is the intuition!) And importantly,
C 2
T is not just A , as we might be tempted to write – the coefficients originate from probabilities, not amplitudes.
With these definitions, we can notice that (because JA − JB = JC )
JB JC
R+T = + = 1,
JA JA
as we should expect with reflection and transmission coefficients in general. And when we have a wavepacket instead of
a pure energy eigenstate, there will be some uncertainty in position and momentum, but the reflection and transmission
probabilities will be basically given by R and T if the packet is localized near some particular energy.
Example 117
We’ll now continue our study of the step potential, looking at the E < V0 case.
The idea is to trust the principle of analytic continuation here (luckily, in this case it’s not quite as complicated as
the mathematical words might suggest), which basically means that we can write down the same solution for x < 0
2mE
as before (since the energy E is still positive), with k 2 = ℏ2 . And for x > 0, we just replace the e ikx with an e −κx
(the form of the solution we know to expect). This is achieved by replacing k with i κ everywhere, which means that
79
2m(V0 − E)
κ2 = (we just pick up a negative sign). We could have also replaced k with −i κ, but the reason for our
ℏ2
choice here is that we now have
Ae ikx + Be −ikx x < 0,
ψ(x) =
Ce −κx x ≥ 0,
and ψ decays exponentially to zero as x → ∞, as it should. And by doing this, we save the time of having to compute
B C
A and A by matching continuity of ψ and ψ ′ again; instead, we can just replace k with i κ. We thus find that
B k − iκ −i (κ + i k) κ + ik
= = =− ,
A k + iκ i (κ − i k) κ − ik
which is a ratio of two complex numbers of the same magnitude. In other words (imagining a right triangle formed by
B
the points 0, k, and k + i κ in the complex plane), we have A = −e 2iδ(E) , where
r
−1 k E
δ(E) = tan = tan−1
κ V0 − E
C
is a phase shift that depends on the energy of the eigenstate. But A no longer plays the same role that it originally
did – because we have a purely real solution for x > 0 this time, the probability current is zero (because ψ decays, the
particle cannot have a positive rates of moving to the right). Instead, we have JC = 0 and JA = JB (consistent with
|A|2 = |B|2 as discussed above). With this, we can rewrite the solution as
Ae ikx − Ae 2iδ(E) e −ikx x < 0,
ψ(x) = ,
Ce −κx x >0
δ(E)
so that |ψ(x)|2 = 4|A|2 sin2 (kx − δ(E)). If we let x0 = k be the first positive x-coordinate where this probability
2
density is zero, then we have the following sketch for |ψ| :
|ψ|2
x
x0
q
π dδ 1 1
Since δ(E) ranges from 0 to 2 as E ranges from 0 to V0 (and in fact the derivative dE = 2 E(V0 −E) is large near
E = 0 and E = V0 ), we see that near E = 0 the probability density is close to zero at x = 0, but near E = V0 the
probability density is near its maximum.
Example 118
We can now connect this solution form to a physical problem by considering wavepackets – this is very similar to
past discussions with the principle of stationary phase.
80
We’ll start by considering E > V0 (so that we do have a transmitted wave), and we’ll set A = 1 first:
e ikx + k−k e −ikx x < 0,
k+k
ψ(x) =
2k e ikx x > 0.
k+k
We’ll now superimpose waves of this form. First, we need to add the time-dependent factor e −iEt/ℏ to our whole
solution, then we need to pick the coefficient f (k) with which this solution appears, and finally we superimpose all
such solutions by integrating over k:
R
∞ f (k) e ikx + k−k e −ikx e −iEt/ℏ dk x < 0,
0 k+k
Ψ(x, t) = R
∞ f (k) 2k e ikx e −iEt/ℏ dk x > 0.
0 k+k
The key point here is that we can choose the bounds of integration (since we’re creating our own superposition of valid
solutions to the Schrodinger equation), and we should not integrate from −∞ to ∞ because only the k > 0 solutions
correspond to an incoming packet moving to the right. Furthermore, to ensure that we have a localized packet, f (k)
should be sharply peaked around some k = k0 . (In the past, the only way to compute this kind of integral numerically
was with a supercomputer, but we can quickly do it on a laptop now.) We’ll split up the integral a bit by letting the
incident wave, only defined for x < 0, be
Z ∞
Ψinc (x, t) = f (k)e ikx e −iEt/ℏ dk,
0
ℏ2 k 2 ℏk0
since E = 2m , and this means that the incident wave is propagating to the right at speed m for t < 0 (this
is consistent because Ψinc is only defined for negative x anyway), and it “hits the barrier” at x = 0 at time zero
– after that, Ψinc will be approximately zero, since we only look at negative x.
81
• For the reflected wave, our condition is similarly that
d Et ℏk0
−kx − = 0 =⇒ x = − t.
dk ℏ k=k0 m
ℏk0
This also makes sense in that the reflected wave propagates to the left at a speed m for t > 0 and only
significantly contributes to the wavefunction after the incident wave hits the barrier.
• Finally, for the transmitted wave, the calculation is more complicated because we have k instead of k, and
ℏk
taking a derivative of k with respect to k is more interesting. It turns out that we will have x = mt (and
because Ψtrans only exists for x > 0, this is only significant for t > 0), and this means that the transmitted
wave propagates to the right, but at a different speed.
We will now consider wavepackets for E < V0 – a similar strategy for superposition gives us the incident wave (for
x < 0) Z ∞
Ψinc (x, t) = f (k)e ikx e −iEt/ℏ dk
0
and the corresponding reflected wave (also for x < 0, this time referring to our discussion of phase δ(E) from above)
Z ∞
Ψref (x, t) = − f (k)e −ikx e 2iδ(E) e −iEt/ℏ dk.
0
The transmitted wave is much less interesting in this case, because there really isn’t a transmitted wave at all. Instead,
what’s happening physically here is that we send in a wavepacket, look at the reflected shape, and use that to deduce
ℏk0
the type of potential that the particle encountered. The stationary phase approximation again gives us x = m t for
the incoming wave, but this time the reflected wave is more complicated:
d Et ℏk0
−kx + 2δ(E) − = 0 =⇒ x = − (t − 2ℏδ ′ (E)) .
dk ℏ k=k0 m
Since the reflected wave is only defined on x < 0, we see that we get a significant contribution when t > 2ℏδ ′ (E), or
approximately when t is positive, like before. But because of this additional 2ℏδ ′ (E) term, the wave does not bounce
perfectly off the barrier at x = 0; there is a time delay of 2ℏδ ′ (E) (which is large for E near 0 or V0 , as previously
discussed), and this type of expression is what is used in scattering theory to determine the type of potential we’re
scattering off.
To finish this lecture, we’ll think a bit about the forbidden region (where x > 0 and E < V0 ) and what the decaying
exponential wavefunction means. In particular, we may ask what a particle looks like if we find it to have energy
E < V0 , meaning that it has negative kinetic energy. It would be contradictory to say both that the particle is in
the forbidden region and that it has energy less than V0 , and quantum mechanics evades this problem (this is not a
completely precise argument, but it’s enough to get the intuition) by arguing about uncertainty. Specifically, because
1
the wavefunction decays as e −κx , the length scale at which we are likely to find the particle is on the order of κ
2m(V0 −E)
(recalling that κ2 = ℏ2 ). But to say that the particle is in the forbidden region, we need to measure the particle
1 ℏ
with uncertainty ∆x < κ , and thus the particle has an uncertainty in momentum p > ∆x = ℏk. This uncertainty then
gives us a “uncertainty” kinetic energy of
p2 ℏ2 k 2
KE = = = V0 − E,
2m 2m
and thus the “negative kinetic energy” is compensated for by the uncertainty in momentum – the total energy of the
particle is now E + (V0 − E) = V0 , and no contradiction arises – we’ll just find a normal particle in the region x > 0.
82
Lecture 17: Ramsauer-Townsend and General Scattering
We’ll continue our discussion of scattering states today – last lecture, we analyzed the behavior of reflection and
transmission under a step potential, and today we’ll explore a different effect called resonance transmission (and how
it leads to the Ramsauer-Townsend effect) before turning to more general resonance problems. First, we will do a toy
problem with the finite square well:
Example 119
Consider the one-dimensional system given by the potential
−V0 −a < x < a
V (x) =
0 otherwise,
and suppose we send an incoming wave into the potential from the left (meaning that we are looking for scattering
states of E > 0).
We are interested in understanding reflection and transmission coefficients, but we know from last lecture’s dis-
cussion that energy eigenstates tell us most of the story – if a wavepacket is localized around some energy, then it’ll
behave like an energy eigenstate of that energy. At both x = −a and x = a, the discontinuity in the potential means
there may be reflection and transmission, and physically we can imagine that the particle might “bounce back and
forth” repeatedly between those two points.
Remark 120. In quantum mechanics, reflection and transmission can occur whether the potential goes up or down –
this is different from the classical case in which we can only reflect off of a higher barrier.
Even though many reflections may occur, we will still have ψ(x) = Ae ikx + Be −ikx in the region x < −a, and
similarly we will have ψ(x) = Ce ik2 x + De −ik2 x (different wavenumber because we have a different kinetic energy) in
2mE 2m(E+V0 )
the region −a < x < a. In particular (by the same logic as usual), we have k 2 = ℏ2 and k22 = ℏ2 . Finally, we
ikx
will only have ψ(x) = F e in the region (no wave moving to the left) because there is nothing to bounce off of past
x = a. Putting everything together,
Ae ikx + Be −ikx , x < −a
ψ(x) = Ce ik2 x + De −ik2 x , −a < x < a
F e ikx
x > a.
We’ll reason about these coefficients by thinking about probability current, since we’ve already seen that this is a
good way to know how to define reflection and transmission coefficients. With the step potential, we had an extra
C 2
factor beyond the A because the two sides of the barrier were at different energies, but in this case the two sides
of the square well are at the same energy. Thus, it’s reasonable to believe that we just have
2 2
B F
R= , T =
A A
in this case, and this will make sense if it turns out that R + T = 1. Indeed, following the current conservation
calculations from last time, we know that JL ∼ |A|2 − |B|2 (the current to the left of x = −a) and JR ∼ |F |2 (the
current to the right of x = a), with the same proportionality constant, so |A|2 −|B|2 = |F |2 and we do have R +T = 1.
But to actually calculate those ratios, we do need to do some work – there are five variables and four boundary
conditions (two at each discontinuity), which makes sense because there’s one free variable of overall normalization.
83
We’ll skip those calculations here – it turns out that
1 1 V02
=1+ sin2 (2k2 a)
T 4 E(E + V0 )
(notice that the second term on the right-hand side is nonnegative, so T is indeed between 0 and 1). Looking at some
relevant limits, if sin2 (2k2 a) → 0 or E → ∞, then we actually have T → 1, giving us complete transmission (we’ll
discuss this more soon), and if E → 0, we have T → 0, giving us no transmission. To understand this more, we can
rewrite this expression in unit-free language: notice that
s
E
r
2
2ma (E + V0 ) 2ma2 V0 (1 + V0 )
q √
2k2 a = 2 2
=2 = 2 z02 (1 + e) = 2z0 1 + e,
ℏ ℏ2
E 2ma2 V0
where we define the unit-free energy e = V0 (relative to the depth of the potential) and z02 = ℏ2 should look
familiar from our past discussions of the finite square well. Similarly simplifying the prefactor in terms of e gives us
1 1 √
=1+ sin2 (2z0 1 + e) .
T 4e(1 + e)
√
As we mentioned above, we are now interested in the energies E for which T = 1, which means that 2z0 1 + e = nπ
√
for some integer n. (Furthermore, because 1 + e ≥ 1, we must have n ≥ 2zπ0 .) Rearranging, we find that
n2 π 2 n2 π 2 V0 n 2 π 2 V0 n2 π 2 ℏ2
4z02 (1 + en ) = n2 π 2 =⇒ en = −1 + =⇒ E = −V 0 + = −V 0 + = −V 0 + .
4z02 4z02 2ma2 V0 /ℏ2 2m(2a)2
In this form, the answer can actually make some intuitive sense: we have resonant transmission (meaning T = 1)
n2 π 2 ℏ2
if the distance between the energy E and the bottom of well −V0 is 2m(2a)2 , which is the energy level of an infinite
(rather than finite) square well of width 2a. And this is really because from the point of view of the wavefunction,
(2a)
the condition that k2 (2a) = nπ (for resonant transmission) is the same as λ = n2 . In other words, the de Broglie
wavelength of the particle inside the well fits a half-integer number of times inside the well of length 2a, and the energy
eigenstate passes completely through the well in this case.
Example 121
13π
For a numerical example, suppose we have a square well where z0 = 4 , so that we only get valid energies E
2z0 13
when n ≥ π = 2 .
(In other words, we need to start from the seventh lowest energy of the infinite square well to get a positive E.)
We can then plug in numerically to find that
E7 E8 E9
e7 = ≈ 0.15976, e8 = ≈ 0.51479, e9 = ≈ 0.91716.
V0 V0 V0
E
More generally, here is a plot of the transmission probability as a function of e = V0 :
E
e= V0
1
84
Example 122
We’re now ready to talk about a famous experiment performed by Ramsauer and Townsend (two physicists) in
1921, elastic scattering of electrons off of rare gas atoms.
Here, “elastic scattering” means that no particles are created or destroyed, and the gas atoms have completely full
outer shells (so that they are inert noble gases and have high ionization energies). That means that we can essentially
visualize the gas atoms as a spherical cloud around a central nucleus, and from the point of view of an electron (by
Gauss’s law) there will be no electric field until we enter the cloud. At that point, the electric field will point inward,
so the electron will curve towards the nucleus and eventually exit the cloud at a different angle. So this is kind of like
a finite spherical well, analogous to the finite square well we’ve been discussing above!
To turn this into a situation where reflection and transmission coefficients make sense, we can say that particles
are reflected if they scatter significantly and are transmitted if they pass through. (More mathematically, we could talk
about the scattering cross-section.) But what experiments found was that the reflection coefficient R (as a function
of e) also exhibited this “bouncy” behavior as in the plot above, starting off very high but reaching a local minimum
around 1 eV (which corresponds to the electron moving at around 600 kilometers per second). And this sensitivity
of the reflection coefficient to the energy of the particle comes from a very similar reasoning of resonance within the
well! But the explanation didn’t come until years later when quantum mechanics was more fully developed, and if we
want the exact numbers we’ll have to do a three-dimensional version of the calculation above.
In the rest of this lecture (and the next few as well), we’ll now expand our discussion to more general one-dimensional
scattering.
Example 123
Suppose we have a one-dimensional system where there is an infinite barrier left of x = 0 and a finite R above
which the potential is zero (called the range of the potential):
0 x > R,
V (x) = V (x) 0 < x < R,
∞ x < 0.
The scattering is performed by having an observer “throw” incoming waves in from x = +∞, which reflect off of
the barrier. We then wish to measure the outgoing wave to deduce information about the potential, and this is indeed
what is done in particle colliders in modern physics when multiple particles interact with each other.
Remark 124. The infinite barrier left of x = 0 makes sense if we think about x as taking the place of the radial
coordinate r in three dimensions – it’s never possible to be a negative distance away from the origin. And in a future
quantum physics course, we might study scattering in three dimensions, and this will be a useful tool in that discussion.
We’ll begin with the simplest case where V (x) = 0 for 0 < x < R, and thus we are just scattering off of an infinite
wall with no additional potential. Incident and outgoing waves will then look like e −ikx and e ikx respectively (both
are energy eigenstates and thus valid solutions), and we can combine these together into a solution to the Schrodinger
equation of the form
φ(x) ∼ e ikx − e −ikx
e ix −e −ix
so that the wavefunction is zero at x = 0. Remembering that sin x = 2i , we thus get the solution φ(x) = sin(kx)
−ikx ikx
for x > 0, but we can still think of this as being a combination of an incoming (− e 2i ) and outgoing ( e2i ) plane wave.
85
This gives us an inspiration for what to write in the more general case with a nonzero potential: we’ll have an
−ikx
incoming wave − e 2i just like before, but that is only valid for x > R (because plane waves are not energy eigenstates
when the potential is nonzero). The outgoing wave is then going to be of the form c 2i1 e ikx for some constant c (that’s
the only other energy eigenstate at energy E besides e −ikx ), but remembering our discussion about probability current,
we must actually have |c| = 1 by conservation of probability (there is no transmission to x < 0, and in a stationary
state we cannot have probability accumulating or depleting inside the potential).
In summary, the potential can only influence the outgoing plane wave by a phase, and we’ll write the outgoing
1 ikx 2iδ
wave as 2i e e for some phase δ. (And remembering that δ depends on the energy E of the plane waves we’re
sending in, we see that we can get δ as a function of E or k, which can give us substantial information about V (x).)
The phase is only defined up to a multiple of 2π, but for reasons we’ll see next lecture, it turns out to be convenient
to fix the phase δ at k = 0 and then make the phase continuous as a function of k (so if it keeps going clockwise,
we keep increasing it even past 2π). The total solution to the Schrodinger equation, incoming plus reflected, is thus
1 ikx+2iδ e iδ i(kx+δ)
ψ(x) = e − e −ikx = e − e −i(kx+δ) = e iδ sin(kx + δ)
2i 2i
for x > R. (And if δ = 0, we’re back to the original wavefunction for the “no potential” case.)
Comparing the probability density |φ(x)|2 = sin2 (kx) (no potential) with |ψ(x)|2 = sin2 (kx + δ) (with potential),
we see that the same features of the wavefunctions (maximums, nodes, and so on) occur in both cases, but shifted
a0 a0
by a constant distance − kδ . (Specifically, kx = a0 occurs at x = k , but kx + δ = a0 occurs at x = k − kδ .) So
when δ > 0, the wavefunction is pulled into the potential (in other words, the potential is attractive), and when δ < 0,
the wavefunction is pushed outward (the potential is repulsive). And in preparation for next lecture, we’ll make the
following definition:
Definition 125
The scattered wave ψs (x) is the extra component of the wavefunction added from the no-additional-potential
case, defined via
ψ(x) = φ(x) + ψs (x).
−ikx
Remembering that ψ and φ were defined to have the same incoming wave − e 2i , we see that ψs represents how
much more of an outgoing wave we get with the potential between 0 and R, and it must be an outgoing wave itself.
And because we have the explicit expressions for the two waves, we see that
86
(only valid on x > R) when we do have a nonzero potential. As mentioned at the end of last lecture, we often write
ψ(x) = φ(x) + ψs (x), where the scattered wave turns out to be the outgoing wave ψs (x) = e iδ sin δe ikx for x > R.
Here, As = e iδ sin δ is called the scattering amplitude, and we are often interested in |A2s | = sin2 δ to quantify the
effect of the nonzero potential.
Today, we’ll connect this to the concept of time delay by calculating what happens when we send in wavepackets
into this potential. (Recall that we’ve previously discussed this in the context of the step potential.) If we have an
incident wave of the form Z ∞
Ψinc (x, t) = f (k)e −ikx e −iE(k)t/ℏ dk
0
in the region x > R (here we put back the time dependence to get the time-dependent wavefunction), then we know
the reflected wave that corresponds to it will be
Z ∞
Ψref (x, t) = − f (k)e ikx e 2iδ(E) e −iE(k)t/ℏ dk
0
in the region x > R (negative sign and e 2iδ factors coming from the expression for the reflected wave). As usual,
f (k) will be a real function that peaks sharply at some k = k0 . And now we can keep track of how the peak of the
wavepacket moves by doing the usual stationary phase approximation at k = k0 – we’ll skip the calculations because
we’ve done them a few times already. The incident wave turns out to satisfy x = − ℏk
m t = −vg t for a constant group
0
velocity vg (and this is only valid for t < 0), while the reflected wave satisfies x = vg (t − 2ℏδ ′ (E)). So the reflected
wave moves at the same group velocity as the incident wave, but it only starts to appear after a delay (which can be
positive or negative) of
dδ dk 2 dδ 2 dδ
∆t = 2ℏδ ′ (E) = 2ℏ = 1 dE
=
dk dE ℏ dk k=k0
dk k=k0 vg dk k=k0
(we have total rather than partial derivatives, so there’s nothing to worry about with these manipulations). This
equation can be further manipulated to
1 dδ ∆t
= .
R dk 2R/vg
This is now a unitless relation, and the right-hand side compares the time delay ∆t to the time it takes to traverse
the finite-range potential and back (length 2R) at the group velocity vg ! So this ratio of the delay to the free transit
time gives us a sense of whether the potential has caused a significant delay in how the wavepacket travels.
We’ll now do an example where we can calculate everything explicitly (though the formulas may still be rather
messy and are only insightful if we plot with a computer):
Example 126
Suppose we have a potential well of length a, meaning that our potential is of the form (for some V0 > 0)
∞ x < 0,
V (x) = −V0 0 < x < a,
0 otherwise,
Since this falls under the general setup we’ve been using, we know already that ψ(x) = e iδ sin(kx + δ) for x > a
2mE
(this is the specific combination of e ikx and e −ikx that is necessary), where k 2 = ℏ2 as usual. And for 0 < x < a, we
ik ′ x −ik ′ x ′2 2m(E+V0 )
must have a combination of e and e , where k = ℏ2 , but it is convenient to use trig functions instead
87
– we can only have something proportional to sin(k ′ x) because the wavefunction must vanish at 0. This gives us the
ansatz (not putting an additional normalization in the x > a case just to remove a variable)
e iδ sin(kx + δ) x > a,
ψ(x) =
A sin(k ′ x) x < a,
with the additional boundary conditions that ψ and ψ ′ must be continuous at x = a, meaning that
Much like with the finite square well, we can divide the two equations to get
k′
k ′ cot(k ′ a) = k cot(ka + δ) =⇒ cot(ka + δ) = cot(k ′ a).
k
We can now isolate δ by taking cot−1 on both sides, but there’s a bit of trigonometric manipulation we can do now:
cot A cot B−1
specifically, we can use the fact that cot(A + B) = cot A+cot B and then solve for cot δ, which yields
k′ ′
tan(ka) + k cot(k a)
cot δ = k′
.
1− k cot(k ′ a) tan(ka)
Given any value of the energy E, we can then calculate k ′ and k and find the phase shift by plugging in those values.
But if we want to plot this on a computer, the right variables are the unitless ones such as ka and k ′ a: define
2mEa2
u 2 = (ka)2 = ,
ℏ2
so that
2mEa2 2mV0 a2
(k ′ a)2 =
2
+ = u 2 + z02
ℏ ℏ2
with the standard unit-free length scale z0 for the
q finite square well! So now taking the reciprocal of both sides of the
k′a z02
boxed equation, and using the fact that ka = 1+ u2 , we have
q
(k ′ a) z2
cot(k ′ a) tan(ka)
p
1− (ka) 1− 1 + u02 cot z02 + u 2 tan u
tan δ = k′a
= q ,
tan(ka) + ka cot(k ′ a) z2 p
tan u + 1 + u02 cot z02 + u 2
and now z0 is a property of the potential well we’re given – for any well of depth V0 and width a, we can plug in
z0 and plot tan δ as a function of u. And notice that as u → 0, the numerator approaches a constant
the constant q
z02
(because the 1+ u2 diverging and tan u going to zero balance each other out), while the denominator goes to infinity
because of the second term. So tan δ → 0 as u → 0, and we can pick the phase shift to be δ = 0 for u = 0 (rather
than 2π, for example).
We can now study the behavior of δ versus u = ka for various values of z0 :
• When z02 = 3.4 (corresponding to z0 ≈ 0.59π), δ(u) starts off at 0, becomes more negative, and then stabilizes
at −π for large u. The scattering amplitude is strongest when δ = − π2 , so sin2 δ peaks once. Finally, the
unit-free delay (the derivative dδ
du ) starts around −4 and asymptotically goes to zero. The plots of δ and sin2 δ
are shown below:
88
δ sin2 δ
0.87 u
1
− π2
−π
u
In particular, notice that a negative time delay corresponds to the packet coming back earlier than ordinarily
expected. But this makes sense, because the kinetic energy of the packet increases when the potential is lower
and thus the group velocity is larger.
As a sidenote, if we plot the amplitude A as a function of u, we get a graph similar to that in Example 121.
• The behavior remains very similar as we increase z0 , until a sudden change happens. In particular, by the time
we are at z0 = 5, the graphs look as shown below (with the two circles indicating the points at which δ crosses
− π2 and − 3π
2 :
δ sin2 δ
1.11 7.52 u
1
−2π
u
Exploring the limiting behavior of δ for different values of z0 , we see that it either goes back to 0 (for small z0 ),
or it approaches one of −π, −2π, −3π, and so on. And it turns out that the fundamental relation relates the overall
excursion in the phase δ to the number of bound states of the potential! We can check as an exercise that for
this “half-square well,” this potential has all of the odd bound states of the full finite square well, and we found (see
calculations in Example 91) that we determine how many of these solutions we have by seeing if z0 is larger than
π
specified multiples of 2.
It turns out these numbers will indeed increase at the same time, and this leads us to Levinson’s theorem. This
is the most subtle derivation we’ll do in this class, and we’ll need to use the following fact:
Fact 127
As we vary a potential, the energy eigenstates will vary, but they do not appear or disappear.
This may seem confusing given our discussion above, but when we think about the states of (for example) a finite
square well, we should remember that there are an infinite number of scattering states above the finitely many
bound states! So if we decrease the depth of a finite square well, eventually the highest-energy bound state will change
into a scattering state. And if we increase the depth, eventually the scattering states will “lend” the bound states an
extra state. This may seem like an argument that doesn’t make much sense because we have uncountably infinitely
many states, but one way to make this rigorous is to imagine putting this whole system in a very large box of length L,
making even the positive-energy states discrete. Then as we change the depth of our well, we’ll indeed see the states
of positive energy drop down and change into states of negative energy, and still nothing is created or destroyed.
89
Theorem 128 (Levinson’s theorem)
Suppose we are in the finite-range potential setting in Example 123. Then the number of bound states N of the
potential is predicted by the behavior of scattering via
1
N= (δ(0) − δ(∞)) .
π
Proof. Motivated by the discussion above, in order to avoid a continuum of states, we will also place an infinitely high
potential barrier for x > L, where L ≫ R. (This is sometimes called a regulator.) Then our states will be quantized,
and for large L, they will be close to each other and look a lot like scattering states (but still be bound states in reality).
And we’ll take the limit as L → ∞ – if the answer doesn’t depend on L, then we can claim the answer holds in the
limit as well.
If we think about the case with no potential V (x), then all energy eigenstates have positive energy and are of the
form φ(x) = sin(kx), where φ(L) = 0 (boundary conditions for an infinite square well). Thus, we require kL = nπ for
some positive integer n, and now indeed k can only take on integer multiples of πL (meaning that the energy states
π
are discrete). Within a differential dk, we then have dn positive energy states in that range.
L
On the other hand, in the case where we do have a potential, we can’t solve the Schrodinger equation in general,
but we do know that we have the universal solution ψ(x) + e iδ sin(kx + δ) in the region R < x < L, and again we
must demand ψ(L) = 0. So this time we require kL + δ(L) = n′ π for some integer n′ , meaning that for any interval
dk, the number of positive energy states is given by
dδ L 1 dδ
Ldk + dk = πdn′ =⇒ dn′ = dk + dk .
dk π π dk
If we then look over the entire range of k-values (in the positive real axis), we can count the difference in the number
of positive-energy states. In particular, if we imagine doing a slow deformation from V = 0 to V = V (x), the states
will also slowly vary (using the logic of Fact 127), and we can count how many energy states are gained or lost in that
interval:
1 dδ
positive-energy states lost in dk when potential is turned on = dn − dn′ = − dk,
π dk
and thus the total number of energy states lost overall is the integral
Z ∞
1 dδ 1
positive-energy states lost = − dk = (δ(0) − δ(∞)) ,
0 π dk π
because the integrand is just a total derivative. But (again by Fact 127) those states cannot disappear – they must
have become bound states. Since we have zero bound states at V = 0, the number of bound states for the potential
V (x) must be the boxed quantity above.
In particular, this argument gets around the fact that we have infinitely many scattering states in both cases – by
counting the differences in number of states in infinitesimal regions, we are able to reason about the count of the
finitely many bound states.
90
some particular energies E.
We’ll start with the same setting as last lecture, with a finite-range potential within 0 < x < R and an infinite
barrier for x < 0. Our first question will concern the time delay 2ℏ dδ(E)
dE , which we know can be positive or negative –
specifically, we want to ask if the time delay can be arbitrarily negative (meaning that the packet comes back much
earlier than if we had no potential). The answer is no - essentially, it wouldn’t make sense if the particle came back
earlier than if there was a hard (infinite) wall at x = R, because there’s nowhere for the particle to be reflected before
that point – this has to do with causality. More explicitly, because the wavepacket travels at speed vg , we have
dδ(E) 2R
time delay∆t = 2ℏ ≳−
dE vg
(≳ instead of ≥ because our argument isn’t completely rigorous here, and it does turn out there’s a bit of a correction
dE ℏ2 k
we need to make), and this rearranges to (because dk = m = ℏvg
1 dδ 1 dδ 2R dδ
2ℏ dE = 2ℏ ≳− =⇒ ≳ −R .
dk
dk ℏv g dk vg dk
On the other hand, we may ask whether we can have an arbitrarily positive time delay, and this time the answer turns
out to be yes. We know that if we have a positive potential V0 in the range 0 < x < R, that will delay particles of
energy E > V0 , but that isn’t very special because it will also advance particles with energy E < V0 (they will reflect
off of the barrier). So what we really need to do is to trap the particle in the potential:
Example 129
Consider a potential of the form
∞ x < 0,
−V0
0 < x < a,
V (x) =
V1 a < x < 2a,
0 x > 2a,
where V0 , V1 > 0. This potential will have states of energy 0 < E < V1 which look like bound states but will
instead act as resonances with very large time delay.
We’ve done problems like this before, so we’ll skip some of the calculation – qualitatively the solution is exponentially
decaying in the region a < x < 2a and sinuosoidally oscillating in the others. The relevant constants are then
we will try to find what values of k make A very large. Within the middle region a < x < 2a, we can either use
e iκx and e −iκx , or we can use cosh(κx) and sinh(κx). But a third choice is easiest for imposing boundary conditions,
namely sinh(κ(x − a)) and cosh(κ(x − a)) (we can check that each of these are also linear combinations of the original
91
exponentials and are solutions to the Schrodinger equation in this region). Since cosh(0) = 1 and sinh(0) = 0, our
solution must look like
A sin(k ′ x) 0 < x < a,
ψ(x) = A sin(k ′ a) cosh(κ(x − a)) + B sinh(κ(x − a)) a < x < 2a,
e iδ sin(kx + δ)
x ≥ 2a.
We now just need to pick A, B, and δ so that ψ and ψ ′ are continuous at x = a and x = 2a (we already ensured
continuity of ψ at a). It turns out that the relation between all of our variables is
′
k sin(k ′ a) cosh(κa) + kκ cos(k ′ a) sinh(κa)
tan(2ka + δ) = · ′
κ sin(k ′ a) sinh(κa) + kκ cos(k ′ a) cosh(κa)
(the numerator and denominator are just slight rearrangements of each other). Rewriting this in terms of unitless
quantities, we will define
2mV0 a2 2mV1 a2
u = ka, z02 = , z1
2
= ,
ℏ2 ℏ2
so that the unit-free energy (comparing the energy of our packet to the height V1 of the tall barrier) can be written as
E ℏ 2 k 2 a2 u2
e= = = .
V1 2mV1 a2 z12
So if we’re given a fixed potential (in particular the values of a, V0 , and V1 ), the right-hand side now becomes a function
of u, because z0 and z1 just become numbers. We can then vary u to see the behavior of δ – it’s messy, but we can
do it in principle.
Example 130
For a numerical example, let’s consider z02 = 1 and z12 = 5, and we’ll plot δ as a function of u (which must range
√
between 0 and z1 = 5 if we are indeed looking at energies 0 < E < V1 ).
The dot on the curve below indicates the point where tan(2ka + δ) = 0, and u∗ ≈ 1.8523 marks the point where
δ crosses − π2 for the second time (scaling on the x-axis is 4 times larger than on the y -axis):
√
δ(u) u∗ 5
u
− π2
−π
The slope of this graph is about −2 for small u, because for low energies the tall barrier will essentially make the
dδ
packet bounce back at x = a instead of x = 0, and the delay is proportional to the derivative du . But at a certain
92
point around u∗ , δ jumps very quickly. If we then plot sin2 δ, the amplitude of the scattered wave |A2s | will peak around
the two values where δ = − π2 , and will do so more sharply at u∗ .
However, the first point where δ = − π2 is not very notable – since the derivative of δ is negative (and not very
large), this corresponds to a time advance, and we’ve already discussed above that ∆t cannot be very negative. On
the other hand, the peak of |A2s | at u∗ instead corresponds to a time delay in which the particle gets trapped in the
barrier – this is the resonance that we’re looking for. So if we look at the overall delay, which is proportional to the
dδ
derivative du , it will be negative for small u, sharply peaking to a large positive value around u∗ , and then returning to
being slightly negative. And for basically the same reasons, if we plot the amplitude |A| of the wavefunction inside the
well, we will see a huge jump (to around 3) when u ≈ u∗ and not anywhere else.
We can now ask how we could solve for resonances like this more mathematically, and we’ll do this by modeling
what a resonance near some point k = α would look like. We claim that the graph of δ (as a function of k) typically
looks something like
β β
tan δ = ⇐⇒ δ = tan−1
α−k α−k
β
around k = α, for some β > 0. This is reasonable, because we get a large value of sin2 δ around δ = π
2, and α−k
goes to ±∞ around k = α. Specifically, as k → α− , tan δ → ∞, and we can characterize how quickly it goes to ∞
β
by noting that α−k = 1 when k = α − β:
β
tan δ = α−k
1
α
k
α−β
Similarly, tan δ reaches −1 at k = α + β, so most of the action in δ is happening within a narrow band of width
2β. So to get sharp behavior (and strong resonance), we must set up the system to have β small. If we then plot
δ (remember that we choose the multiple of π in arctan so that δ is continuous), we’ll see that it is exhibiting the
correct behavior as well:
β
δ = tan−1
α−k
π
2
α k
93
We can then calculate some relevant quantities for this modeled resonance:
dδ 1 β2
= , |ψs |2 = |As |2 = sin2 δ =
dk k=α β β2 + (α − k)2
The derivative here makes sense because the phase is changing by almost π over a length scale of β, and the function
for the amplitude is actually the famous (non-relativistic) Breit-Wigner (also Cauchy or Lorentzian) distribution,
usually written in terms of energy rather than momentum. If we let Eα be the energy of a particle at k = α, we’ll
make the approximation
ℏ2 k 2 ℏ 2 α2 ℏ2 2 ℏ2 ℏ2 ℏ2 α
E − Eα = − = (k − α2 ) = (k − α)(k + α) ≈ (k − α) · 2α = (k − α).
2m 2m 2m 2m 2m m
Substituting this in, we get the formula (now written in the conventional notation) for the squared magnitude of the
scattered wave:
1 2
4Γ 2αβℏ2
|ψs |2 = , Γ= .
(E − E α )2 + 14 Γ2 m
Here, Γ is the “full width at half maximum” of the distribution, which is distance between the two points on the curve
where |ψs |2 = 0.5:
|ψs |2
1
0.5
Eα
Γ Γ
E
Eα − 2 Eα + 2
2αβℏ2
Looking more at the physical significance of Γ = m , we notice that it has units of energy, so we can get a
characteristic time
ℏ m
= τ= .
Γ 2αβℏ
If we then compare this time to the time delay we’ve been studying,
dδ dk dδ
∆t = 2ℏ = 2ℏ ,
dE dE dk
dδ 1 dk 1 1
and now because (at resonance k = α) we have dk = β and dE = dE = ℏ2 α/m , substituting in gives us
dk
m 1 2m ℏ
∆t = 2ℏ = = 4 = 4τ.
ℏ2 α β αβℏ2 Γ
In other words, the half-width of the scattering distribution gives us a characteristic energy, which is associated with
a characteristic time that is related to the physical time delay of the scattering problem! So the larger the time-delay,
the smaller our width Γ, and the more sharply peaked our distribution will be. (And this corresponds to δ changing
very rapidly around k = α.)
94
Fact 131
This kind of analysis is used in nuclear and particle physics – the Higgs boson is an unstable particle which decays
very quickly, but it stays within the potential well during its lifetime (about 10−22 seconds). So its cross-section
is governed by resonances, and that’s a more accurate description than really thinking of it as a particle. We can
then observe that the Higgs boson has a central energy around Eα ≈ 125 GeV and a width of Γ ≈ 4 MeV (to
about 5 percent accuracy). So we have a very narrow resonance, and the corresponding time ∆t indeed turns out
to be around 10−22 seconds with those numbers! So the discovery of the Higgs boson had to do with observing
that a resonance did occur at this particular energy.
In summary, we find that we cannot have long time advances, but that long time delays can occur in situations of
resonance (rapid positive changes of δ). We’ll go a little further to put this topic into a more intriguing footing: we
can rewrite the scattering amplitude as
sin δ sin δ tan δ
As = sin δe iδ = = = .
e −iδ cos δ − i sin δ 1 − i tan δ
If we want this amplitude to be large, we can say that setting tan δ = −i would make As infinite. Since δ is a phase
shift, tan δ is always real and it doesn’t seem to make sense to set it equal to −i . However, we can imagine phase
shifts as living in the complex plane, and the idea is that a real phase near an imaginary number with large As will also
give a large amplitude. To make this more precise, let’s plug in the tan δ that we’ve been modeling:
β
α−k β β
As = iβ
= = .
1 − α−k α − k − iβ (α − i β) − k
And because we usually plot the scattering amplitude As as a function of k, it’s nice that we now have an expression
in that form, and we can now imagine plotting As in the complex k-plane. We designed our model so that the
resonance occurs at k = α, but As really becomes infinitely large when k = α − i β, rather than when k = α. (In
complex analysis, this is referred to as a pole.) So because we have a pole at α − i β, we’ll have large values of As
near that pole.
So this is how we mathematically search for resonances: if we have a formula for δ(k), then we can search for
solutions to the equation tan δ(k) = −i . The real part of that solution k will then be the resonance α, and the smaller
the imaginary part of k, the more resonant our system actually is.
ℏ2 k 2
This viewpoint is additionally interesting because in the formula E = 2m , we can now imagine plugging in a pure
κ 2 2
imaginary k = i κ (for some real number κ > 0). Then we now have a negative energy E = − ℏ2m , which can instead
represent bound states of our potential! So the complex k-plane has room for bound states, scattering states, and
resonances, and in fact bound states will correspond to poles of As as well. We can even go a bit further and invent
anti-bound states on the negative imaginary axis, in which instead of matching our solution to a purely decaying
exponential in the forbidden region, we match it to a purely increasing exponential, and this turns out to actually have
applications in nuclear physics!
95
Our starting point today is that there is a certain active view of observables that we haven’t explored much
ℏ ∂
yet. Specifically, we’ve learned that the momentum operator p̂ = i ∂x can be thought of as a differential operator (an
x-derivative) which tells us how a function varies. In fact, this can be rephrased to saying that the momentum operator
actually generates translations and moves functions in space, and the way it generates translations (and the universal
trick with Hermitian operators) is to exponentiate it. If we want to exponentiate p̂, we need to make it unitless, and
i p̂a
we’ll do this by considering the operator e ℏ for some length a. Having this act on a wavefunction ψ(x) then yields
∞
d n dψ a2 d 2 ψ a3 d 3 ψ
i p̂a ∂
X 1
e ℏ ψ(x) = e a ∂x ψ(x) = a ψ(x) = ψ(x) + a + + + · · · = ψ(x + a).
n=0
n! dx dx 2! dx 2 3! dx 3
So this exponential operator takes ψ(x) and translates it, and this characterization of momentum as a “generator of
translations” is in some ways more fundamental than what we’ve already discussed! And similarly, angular momentum
will then end up being a “generator of rotations” instead, but we’ll need a bit more mathematics to discuss it.
There’s another story as well – we’ll see as the class progresses that angular momentum represents not just physical
rotations but also spin angular momentum (the mysterious property that particles can have angular momentum
without actually rotating). But let’s start from the beginning with the setup. As we’ve been mentioning throughout
this class, many of the objects we work with in one dimension generalize to three dimensions as well: for example, the
momentum operator becomes
ℏ ∂ ℏ ∂ ℏ ∂ ℏ⃗
px = , py = , pz = =⇒ p⃗ = ∇,
i ∂x i ∂y i ∂z i
, with the commutators [x, px ] = [y , py ] = [z , pz ] = i ℏ. The time-independent Schrodinger equation then becomes
ℏ2 2
− ∇ ψ(⃗r) + V (⃗r)ψ(⃗r) = Eψ(⃗r)
2m
for a three-dimensional vector ⃗r, where we are trying to find an energy eigenstate ψ(⃗r) of energy E.
In this class, we’ll simplify the problem to the case where we have a central potential (in other words, spherically
symmetric and invariant under rotations) which only depends on the magnitude of r :
V (⃗r) = V (r ).
This is a strong assumption, but it is the setting of many three-dimensional problems that we will face, and the
simplification allows angular momentum to play a role (because “angular momentum generates rotations”). And
having a central potential simplifies our calculations, because the Laplacian ∇2 in spherical coordinates looks like
1 ∂2
1 ∂ ∂ψ 1 1 ∂ ∂
∇2 ψ = 2 r2 ψ+ 2 sin θ + 2 ψ.
r ∂r ∂r r sin θ ∂θ ∂θ sin θ ∂φ2
Our goal today will basically be to build up a structure that allows us to ignore the second term with the angular
derivatives. It turns out that we can treat the expression
1 ∂2
2 1 ∂ ∂
−ℏ sin θ +
sin θ ∂θ ∂θ sin2 θ ∂φ2
ℏ ∂
as the differential version of angular momentum (much like i ∂x was the differential version of momentum), And
it makes sense that this expression instead represents squared angular momentum for a variety of reasons: to make
the units match up (since L ⃗ should have units of ℏ), because angular momentum should be a first-order derivative
(⃗r × p⃗ involves one derivative) while there is a second-order derivative in the expression above, and because angular
momentum is supposed to be a vector. This will end up being the operator L ⃗ 2.
96
Fact 132
The main 3-D system we’re going to be discussing in the remainder of this course is the hydrogen atom, which
is a system of two particles. We’ll justify (next lecture) that the Schrodinger equation above is relevant in a
two-body problem with potential only depending on distance |⃗
x1 − ⃗
x2 |, so we can reduce the two-body problem (in
general) to a one-body problem with a spherically symmetric potential.
⃗ = ⃗r × p⃗ classically,
For now, we’ll focus on developing the theory of angular momentum operators. Since we have L
we can make an analogous definition quantum mechanically (which will turn out to be good):
Definition 133
The angular momentum operators are defined in the cyclic manner
(Notice that because position and momentum operators in different directions commute, it doesn’t matter whether
we write ŷ p̂z or p̂z ŷ , and so on.) These angular momentum operators are Hermitian, because
but all operators on the right-hand side are Hermitian so the daggers go away, and then we can switch the order
back to the original order by commutativity. So L̂†x = L̂x , and L̂x is Hermitian (similar logic works for the other two
operators). In other words, the operators L̂i s must be observables, but they’ll turn out to be not so simple in certain
ways.
Our first step once we define quantum operators is often to compute their commutators: first, let’s compute
We start by looking at the first term in the first argument, ŷ p̂z . Because ŷ commutes with everything in the sec-
ond argument, and p̂z only has a nonzero commutator with ẑ, the only nonzero commutator here is [ŷ p̂z , ẑ p̂x ] =
ŷ [p̂z , ẑ]p̂x = −i ℏŷ p̂x . (Here, we’re using properties that simplify expressions of the form [A, BC] and [AB, C] – we
can check these steps more carefully as an exercise.) Similarly, the only nonzero commutator coming from −z p̂y is
[−ẑ p̂y , −x̂ p̂z ] = p̂y [ẑ , p̂z ]x̂ = i ℏp̂y x̂, meaning that
So even though there were a lot of terms in the commutator, the end result simplifies very nicely! It turns out this
is not that miraculous – if we have symmetry transformations and take their commutator, we must get a symmetry
back. (And for some classical intuition, we know that when we rotate in different directions, the order does matter, so
this commutator should be nonzero.) Furthermore, because we defined our angular momentum operators in a cyclic
way, we must also have
[L̂y , L̂z ] = i ℏL̂x , [L̂z , L̂x ] = i ℏL̂y .
These relations give us the quantum algebra of angular momentum, and this algebra in fact appears in many different
fields of mathematics and physics – it’s related to the algebra of generators of the special unitary group SU(2), as well
as the orthogonal group of rotations O(3). So the structure is universal and deeper than the derivation – if there are
other sets of operators which satisfy these commutation relations but don’t come from x̂s and p̂s, they can still be
97
angular momentum operators, and that is in fact what happens with spin (in which we have [Ŝx , Ŝy ] = i ℏŜz and so
on).
Fact 134
Mathematicians approach this topic as the study of Lie algebras, and the job there is basically to classify all
potential consistent commutation relations – the one we’ve described is the simplest nontrivial one.
Since we have observables, it now makes sense to ask whether we can actually “observe” them, in the sense that
there is a state with a definite value of L̂x , L̂y , or Lz . Recall that because position and momentum do not commute,
we cannot tell simultaneously the values of both at once. And these angular momentum operators don’t commute
either, so we basically cannot have simultaneous eigenstates of L̂x , L̂y , and Lz . More rigorously, suppose that we
have a simultaneous eigenstate φ0 of L̂x and L̂y , with corresponding eigenvalues λx and λy . Then
so subtracting yields [L̂x , L̂y ]φ0 = 0 =⇒ i ℏL̂z φ0 = 0. This means that φ0 is then also an eigenstate of L̂z with
λz = 0. But now that we’re an eigenstate of all three operators, applying the same logic to a different commutator
then tells us that λx = λy = 0 as well, so the only way to be a simultaneous eigenstate is if φ0 is killed by all of the
operators L̂x , L̂y , and L̂z , which is not very interesting. Thus, no state is a nontrivial eigenstate of two angular
momentum operators, and we can only ever “measure” one of the values of L̂x , L̂y , and L̂z at a given time.
But it is possible to know more about the state, and this basically comes down to thinking of another operator
that commutes with all of L̂x , L̂y , and L̂z (meaning that it must be rotationally invariant). And this is now connected
to an object we introduced earlier in the lecture:
Definition 135
⃗ 2 is defined to be
The operator L
⃗ 2 = L̂x L̂x + L̂y L̂y + L̂z L̂z .
L
1 ∂ ∂ 1 ∂2
It turns out that this definition is equivalent to the −ℏ2 sin θ ∂θ sin θ ∂θ + sin2 θ ∂φ2
mentioned above, but we’ll work
with the more algebraic definition for now. In particular (just doing a direct computation),
⃗ 2 ] = [L̂x , L̂x L̂x + L̂y L̂y + L̂z L̂z ] = [L̂x , L̂y L̂y ] + [L̂x , L̂z L̂z ]
[L̂x , L
= [L̂x , L̂y ]L̂y + L̂y [L̂x , L̂y ] + [L̂x , L̂z ]L̂z + L̂z [L̂x , L̂z ]
= (i ℏL̂z )L̂y + L̂y (i ℏL̂z ) + (−i ℏL̂y )L̂z + L̂z −i ℏL̂y .
(Remember that we can move constants around but should not switch the order of the angular momentum
⃗ 2 commute,
operators.) And now the first and last terms cancel, as do the second and third, and thus L̂x and L
⃗ 2 then commutes
meaning that it is possible to be a simultaneous eigenstate of both operators! And similarly, L
with L̂y and L̂z as well.
Fact 136
We’ll see more of this in 8.05, but it turns out (by linear algebra) that for any two commuting Hermitian operators,
we can always find simultaneous eigenstates of both operators. (The reason for wanting simultaneous eigenstates
is to discover more specific information about our states, and so we will generally aim for the maximal set of
commuting operators.
98
Returning to coordinates now, remembering that in spherical coordinates we have
and the special role that z plays (over x and y ) means that the angular momentum operator L̂z looks the nicest in
spherical coordinates, because rotations about z don’t change the value of φ. In particular, the multivariable chain
rule tells us that
∂ ∂y ∂ ∂x ∂ ∂z ∂
= + + .
∂φ ∂φ ∂y ∂φ ∂x ∂φ ∂z
The last term is zero because z doesn’t depend on φ, and we can compute the derivatives of x and y with respect to
∂y ∂x
φ: we have ∂φ = r sin θ cos φ = x, and similarly ∂φ = −y , so
∂ ∂ ∂
=x −y .
∂φ ∂y ∂x
ℏ ∂ ∂
Remembering that momentum operators are differential operators, we thus see that L̂z = i x ∂y − y ∂x , meaning
ℏ ∂
that L̂z = in spherical coordinates! (The units of this expression are correct, because angles have no units
i ∂φ
and angular momentum should have units of ℏ.) And using this kind of technique (but more work), we can also
⃗ 2 , the latter of which agrees with our expression above. So the angular momentum
get expressions for L̂x , L̂y , and L
operator does indeed relate to the angular component of the Laplacian.
Turning our attention back now to simultaneous eigenstates, we now wish to construct functions which are
⃗ 2 and one of L
eigenstates of both L ⃗x , L
⃗ y , and L
⃗ z . (For such functions, the angular part of the Laplacian is just a
number, which will allow us to simplify the process of solving the Schrodinger equation to just solving a differential
equation for the radial component.) Because L̂z looks simple in spherical coordinates, we always choose L̂z instead of
the others by convention.
We’ll denote the eigenstate ψℓ,m (θ, φ) for some numbers ℓ, m related to the eigenvalues (which are currently
arbitrary). Since L̂z should have units of ℏ, it makes sense to require that
⃗ 2 ψℓ,m = ℏ2 λψℓ,m
L
⃗ 2 ψℓ,m ) = (ψℓ,m , L̂x L̂x ψℓ,m ) + (ψℓ,m , L̂x L̂x ψℓ,m ) + (ψℓ,m , L̂x L̂x ψℓ,m )
ℏ2 λ(ψℓ,m , ψℓ,m ) = (ψℓ,m , L
= (L̂x ψℓ,m , L̂x ψℓ,m ) + (L̂y ψℓ,m , L̂y ψℓ,m ) + (L̂z ψℓ,m , L̂z ψℓ,m ).
The right-hand side is now nonnegative (because we’re calculating a squared norm), so the left-hand side must be
nonnegative as well, meaning λ ≥ 0. Anticipating this, we will actually write
99
for some real number ℓ ∈ R – this exact choice will make sense very soon as we return to the differential equation, and
we’re not losing any generality here because ℓ(ℓ + 1) can take on any nonnegative real value (and even some negative
ones). In fact, because the range of ℓ(ℓ + 1) over nonnegative ℓ is [0, ∞), we can always pick ℓ to be nonnegative.
We’ll now solve the two boxed differential equations to wokr towards an explicit formula for ψℓ,m . To satisfy the
first equation, we must have
ℏ ∂ ∂
ψℓ,m = ℏmψℓ,m =⇒ ψℓ,m = i mψℓ,m =⇒ ψℓ,m (θ, φ) = e imφ Pℓm (θ)
i ∂φ ∂φ
for some arbitrary function Pℓm . So the φ-dependence is not complicated if our state is an eigenstate of L̂z , but we
can actually say a bit more – we must have ψℓ,m (θ, φ + 2π) = ψℓ,m (θ, φ), because φ is a coordinate that is only defined
modulo 2π anyway. Thus we must require e 2im = 1, or in other words that m must be an integer. So the eigenvalues
of L̂z actually need to be quantized – they can only be integer multiples of ℏ.
⃗ 2 , we have
For the second equation, a bit more work needs to be done. Substituting in the expression for L
1 ∂2
2 1 ∂ ∂
−ℏ sin θ + ψℓ,m = ℏ2 ℓ(ℓ + 1)ψℓ,m .
sin θ ∂θ ∂θ sin2 θ ∂φ2
∂2
We already know how the ∂φ2 acts on ψℓ,m (each derivative contributes an i m factor), and then the ℏ2 s and e imφ s on
both sides cancel out. Further multiplying by − sin2 θ, we are instead solving the differential equation
∂ ∂
sin θ sin θ − m2 Pℓm = −ℓ(ℓ + 1) sin2 θPℓ,m .
∂θ ∂θ
∂ ∂ m
Remember thatmsin θ ∂θ sin θ ∂θ means that we apply these operators from right to left, so applying them on Pℓ is really
d ∂P
sin θ dθ sin θ ∂θℓ . So moving everything to one side, and remembering that P is just a function of a single variable,
we wish to find solutions to
dPℓm
d
+ ℓ(ℓ + 1) sin2 θ − m2 Pℓm = 0 .
sin θ sin θ
dθ dθ
This equation has been studied extensively (because it comes up when we are studying the Laplacian), and one useful
d
parameter substitution to use is to have the variable x = cos θ instead, so that dx = − sin1 θ dθ
d d
and sin θ dθ d
= −(1−x 2 ) dx
(we should check this as an exercise). But with these facts, plus the fact that sin2 θ = 1 − x 2 , our differential equation
can be rewritten to only involve polynomials: after some simplification (dividing by (1 − x 2 )), we arrive at
m
m2
d 2 dPℓ
(1 − x ) + ℓ(ℓ + 1) − P m (x) = 0,
dx dx 1 − x2 ℓ
where our (now normalized) simultaneous eigenstates are ψℓ,m = Nℓ,m e imφ Pℓm (cos θ) for some numbers Nℓ,m . This
differential equation still looks rather complicated, and the way physicists approach it is usually to start with the m = 0
case (where Pℓ0 is usually just denoted Pℓ ):
d 2 dPℓ
(1 − x ) + ℓ(ℓ + 1)Pℓ (x) = 0.
dx dx
We can solve this with a series solution: letting Pℓ (x) = k ak x k and substituting in, the x k coefficient gives us the
P
And just like with the harmonic oscillator, if our series solution doesn’t determinate, we’ll get singular solutions diverging
100
at x = 1 or x = −1, which is bad (since those points are included in the range of the wavefunction). So the series
must terminate at some point, and that can only occur if ℓ(ℓ + 1) − k(k + 1) = 0 for some nonnegative integer k.
And now we see exactly why we chose this form for the eigenvalue of L⃗ 2 – since we’re picking ℓ to be nonnegative
(as mentioned earlier in the lecture), ℓ = k must be a nonnegative integer. So Pℓ (x) is a polynomial of degree ℓ
(these polynomials are known as the Legendre polynomials), and we find quantization of the eigenvalues of L⃗ 2 ! In
other words, both the angular momentum along the z-direction and the overall magnitude of the angular momentum
are quantized. Next lecture, we’ll explore these equations some more, understanding what happens for nonzero m and
what additional constraints exist for our operators.
Remark 137. In Fact 136, we mentioned that the goal is often to be simultaneous eigenstates of as many operators
as possible. One additional point here is that we have succeeded if we can uniquely characterize states using the
corresponding observables! For example, we might have multiple degenerate states at the same energy, but they may
have different values of ℓ, allowing us to tell them apart by measurement. And there might be degenerate states of the
same E and ℓ, but then they may have different values of m. This is important because we want to always be able to
tell apart two states with some physical difference, and that’s a setting we will see soon with the hydrogen atom.
We also started working towards more explicit forms of the ψℓ,m states at the end of the lecture – last time, we
found that for m = 0, we have ψℓ,0 = Nℓ,m e imφ Pℓ (cos θ), where Pℓ are the Legendre polynomials which turn out to
be given by
ℓ
1 d
Pℓ (x) = ℓ (x 2 − 1)ℓ .
2 ℓ! dx
We’ll complete this analysis today, beginning by studying nonzero m in the differential equation for Pℓm . It turns out
there is a simple rule for going from the Legendre polynomials to the Legendre functions:
m
|m| d
Pℓm (x) = (1 − x 2 ) 2 Pℓ (x)
dx
Notice that the exponent of (1 − x 2 ) can be a half-integer, but this is not actually a problem because (1 − x 2 ) will
become
h sin2 θ when
i hwe substitute it2 back
i into our coordinates. Checking that this Pℓm solves the differential equation
m
d 2 dPℓ m
dx (1 − x ) dx + ℓ(ℓ + 1) − 1−x 2 Pℓm (x) = 0 takes some work, and we won’t check it here, but what’s important
is to notice that we can only take ℓ derivatives before Pℓ (x) (a polynomial of degree ℓ) vanishes completely, and
thus we must have |m| ≤ ℓ ⇐⇒ −ℓ ≤ m ≤ ℓ .
101
Fact 138
It turns out that there are no additional regular (non-divergent) solutions to the differential equation besides
the Pℓm s that we compute here. So this boxed condition is an actual constraint on our quantum numbers for
angular momentum – given any magnitude for the angular momentum, there are only certain values for the
angular momentum in the z-direction (which makes sense).
In other words, the only state with ℓ = 0 will also have m = 0, but there are three states with ℓ = 1 (with
m = −1, 0, 1 respectively), five states with ℓ = 2 (with m = −2, −1, 0, 1, 2 respectively), and so on – in general, there
are 2ℓ + 1 states at total angular momentum quantum number ℓ. This leads us back to the final normalized solution
that we want:
Theorem 139
The spherical harmonics are simultaneous normalized eigenstates of L̂z and L ⃗ 2 given by
s
2ℓ + 1 (ℓ − m)!
Yℓ,m (θ, φ) = (−1)m e imφ Pℓm (cos θ)
4π (ℓ + m)!
when 0 ≤ m ≤ ℓ and
Yℓ,m (θ, φ) = (−1)m (Yℓ,−m (θ, φ))∗
when −ℓ ≤ m ≤ 0.
We can find the explicit formulas for these by looking them up – they are complicated in general, but it’s useful to
remember the ones for ℓ = 0, 1:
r r
1 3 ±iφ 3
Y0,0 =√ , Y1,±1 = ∓ e sin θ, Y1,0 = cos θ.
4π 8π 4π
And here, normalization means that the spherical harmonics are normalized over all θ, φ, so that (integrating over solid
angle)
Z Z π Z 2π
(Yℓ,m (θ, φ))2 dΩ = sin θ |Yℓ,m (θ, φ)|2 dφdθ = 1.
0 0
But we can recognize further that sin θdθ = −d(cos θ), so making a change of variables means that we are requiring
Z 1 Z 2π
|Yℓ,m |2 dφd(cos θ) = 1 .
−1 0
(We will often represent this double integral by dΩ to simplify our notation.) And because the different Yℓ,m s are
eigenfunctions of operators with (at least one being) different eigenvalues, they must be orthogonal:
Z
dΩYℓ∗′ ,m′ (θ, φ)Yℓ,m (θ, φ) = δℓ,ℓ′ δm,m′ .
We’re now ready to return to the full Schrodinger equation in three dimensions for a central potential, which takes
the form
ℏ2 2
− ∇ ψ + V (r )ψ = Eψ.
2m
⃗ 2 , we now have
Plugging in the form of the Laplacian in spherical coordinates, including the definition of L
ℏ2 1 d 2
1 ⃗2
− r ψ − L ψ + V (r )ψ = Eψ.
2m r dr 2 ℏ2 r 2
102
⃗ 2 and
(Notice that L 1
commute, because the former only involves θ and φ, so we don’t have to worry about the
r2
⃗ 2 , rˆ] = 0 by direct calculation as well, but that’s more work.) So
ordering there. And if we want, we can prove that [L
now we can make the most important simplification: try a factorized solution of the form
ℏ2 1 d 2
ℓ(ℓ + 1)
− (r RE (r )) − RE (r ) + V (r )RE (r ) = ERE (r ).
2m r dr 2 r2
So we now just need to solve a one-variable differential equation for RE (r ), and then multiplying by a spherical harmonic
will give us a solution to the Schrodinger equation with energy E, angular momentum ℓ, and z-component of angular
momentum m. To make that easier, we’ll clean it up by multiplying everything by r :
ℏ2 d 2 ℏ2 ℓ(ℓ + 1)
− 2
(r RE (r )) + (r RE (r )) + V (r )(r RE (r )) = E(r RE (r )).
2m dr 2m r2
Defining U(r ) = r RE (r ), we find that the differential equation we actually need to solve is
ℏ2 d 2 U(r ) ℏ2 ℓ(ℓ + 1)
− + V (r ) + U(r ) = EU(r ) .
2m dr 2 2mr 2
So the radial equation turns out to just be exactly like a a one-dimensional Schrodinger equation, except the radial
dependence needs to be obtained by dividing through by r to get to RE (r ), and we need to solve the equation repeatedly
ℏ2 ℓ(ℓ+1)
for all values of ℓ. We can basically think of this additional term 2mr 2 as a centrifugal barrier, making it harder to
reach the origin, so we now have an effective radial potential
ℏ2
Veff (r ) = V (r ) + ℓ(ℓ + 1)
2mr 2
taking the usual place of V (r ). And notice that even for a Coulomb potential V (r ) ∝ 1r , the second term will dominate
near r = 0 for any nonzero ℓ, meaning that a particle with any nonzero angular momentum cannot reach the origin
(intuitively because it must spin faster and faster). And now that we’re in the one-dimensional setting, many of
the properties and theorems we’ve discovered previously in the class (about bound states and eigenstates and so on)
U(r )
suddenly become useful. Our full wavefunctions are thus of the form ψ(⃗r) = r ψℓ,m (θ, φ), and we can notice that
U(r ) depends on ℓ but does not depend on m – the m-dependence in this wavefunction is really only showing up in
the spherical harmonic. (And we can write UE,ℓ (r ) to represent this.)
We’ve talked about normalization for the spherical harmonics, but now we need to ask about normalization for the
full wavefunction: we have that
∞
|U(r )|2
Z Z Z
2 3
1= |ψ| d x = |Yℓ,m (θ, φ)|2 r 2 dr dΩ.
0 r2
R
But the r 2 factors will cancel, and the solid angle integral |Y |2 dΩ is just one by normalization of spherical harmonics,
and therefore everything simplifies to Z ∞
1= |U(r )|2 .
0
So we truly can think of this as a one-dimensional problem – the normalization condition for ψ is just the usual one-
dimensional normalization condition for U! This means that we’re successful, and we’ve reduced a spherical potential
103
to a one-dimensional problem.
There is an additional difference, though – because r only runs over the positive reals, we might ask if there are
boundary conditions that need to be satisfied as r → 0. Our argument here will not be completely general, but it
ℏ2 ℓ(ℓ+1)
will be good enough – for any potential V (r ) where the centrifugal barrier 2mr 2 dominates as r → 0 (such as the
Coulomb potential), our differential equation will become
ℏ2 d 2 u ℏ2 ℓ(ℓ + 1) d 2u ℓ(ℓ + 1)
− 2
+ 2
u ≈ 0 =⇒ 2
≈ u
2m dr 2mr dr r2
to leading order. This is essentially a Cauchy-Euler equation, and the solutions are proportional to r ℓ+1 or r −ℓ . But
the latter can be ruled out – it’s not normalizable for ℓ ≥ 1, and for ℓ = 0 we can’t actually have a valid exact solution
to the Schrodinger equation (though we won’t talk about this in full detail). So instead we must have
U(r ) ∼ r ℓ+1 as r → 0 ,
and in particular U(r ) always vanishes as r → 0 for any ℓ, vanishing faster and faster for higher values of ℓ. So this is
consistent with imagining an infinite barrier at r = 0.
With all of the theory set up, we’re now ready to attack the main problem we’d like to solve:
Example 140
Consider a proton (of position and momentum ⃗
xp and p⃗p , respectively) and electron (of position and momentum ⃗
xe
and p⃗e , respectively), in which we use canonical variables for each particle. In other words, we have [(⃗
xp )i , (⃗
p p )j ] =
i ℏδij (where i and j range over 1, 2, 3 to represent the x, y , or z-coordinates), and similarly [(⃗
xe )i , (⃗
pe )j ] = i ℏδij .
Since we now have a system of two particles, our wavefunction should account for both, meaning that we now have
a function Ψ(⃗
xe , ⃗
xp ). The interpretation of this wavefunction is still that the squared wavefunction is the probability
density, but specifically
dP = |Ψ(⃗ xp )|2 d 3⃗
xe , ⃗ xe d 3 ⃗
xp
is the probability that we find the electron in a region d 3⃗ xe and the proton in a region d 3⃗
xe around ⃗ xp around ⃗
xp . (So
if we just care about the position of the proton, we’d integrate this density out across all d 3⃗
xe .) And what’s important
to note here is that there is still only one wavefunction and one Schrodinger equation, even with multiple particles,
but this time our Hamiltonian looks a bit more complicated: we have
p p )2
(⃗ pe ) 2
(⃗
Ĥ = + xe − ⃗
+ V (|⃗ xp |),
2mp 2me
because each particle has a kinetic energy and the potential energy between them depends only on their distance. So
in our Schrodinger equation Hψ = Eψ, we treat p⃗p as taking derivatives of the proton’s position and p⃗e as taking
derivatives of the electron’s position.
But this Schrodinger equation now has a lot of variables, and our goal will be (as we’ve previously mentioned) to
simplify the equation so that it reduces to an equation for the distance between the particles. The change of variables
we must perform is now motivated by classical mechanics: because the center-of-mass moves at constant velocity
in a classical two-body problem, we should also be able to define a new quantum coordinate and momentum for the
center-of-mass. So we’ll define
P⃗ = p⃗p + p⃗e ,
104
⃗ – motivated by classical mechanics, we try
and now we need to define a corresponding coordinate X
⃗ = mp ⃗
X
x p + me ⃗
xe
.
me + mp
Now we must check if these are indeed a valid coordinate-momentum pair; in other words, we must check that
⃗ i , P⃗j ] = i ℏδij . Indeed, this commutator is
[X
mp (⃗
xp )i + me (⃗
x e )i mp me
, (⃗
pp )j + (⃗
p e )j = [(⃗
xe )i , (⃗
p e )j ] + [(⃗
xe )i , (⃗
p e )j ]
me + mp me + mp me + mp
(since commutators between proton and electron coordinates and momenta are zero), and this is zero unless i = j, in
which case it is i ℏ mem+m
p
p
+ mem+m
e
p
= i ℏ. So yes, this is a valid pair of quantum mechanical canonical variables (they
have the right units and commutator). To get a second one, it makes sense to use the relative coordinate
⃗ xe − ⃗
x =⃗ xp
which appears in our expression for the Hamiltonian. But even when we write this down, we have to be careful –
remembering that the variables for the proton and electron commuted, we must also have that the relative and
center-of-mass variables commute with each other. This does turn out to be the case for ⃗ ⃗ and P⃗ – all ⃗
x with X xs
x with P⃗ , the negative sign
commute with each other (so there’s no problem there), and when taking commutators of ⃗
for the proton commutator means that the i ℏs cancel out. So we just need to define a corresponding combination of
the momenta
pe − β⃗
p⃗ = α⃗ pp .
This will always commute with P⃗ , but we need to choose the coordinates so that [⃗xi , p⃗j ] = i ℏδij (this turns out to
⃗
enforce α + β = 1) and that all components of p⃗ commute with X (this enforces αmℓ − βmp = 0). Solving the system
of two equations gives us
mp me
p⃗ = p⃗e − p⃗p .
me + mp me + mp
So we now have two new pairs of canonical variables, and to make the notation simpler, it’s useful to define the
reduced mass and total mass
me mp
µ= , M = me + mp .
me + m p
(Since we’re in a setting where the proton is much heavier than the electron, we’ll have µ ≈ me and M ≈ mp in this
µ µ
case.) In terms of these variables, we then have the unit-free constants α = me and β = mp , so that another way to
write our relative momentum is
p⃗e p⃗p
p⃗ = µ − .
me mp
The point of doing all of these calculations is that we want to write the two-body Hamiltonian in terms of our new
coordinates. Since P⃗ and p⃗ are both linear combinations of p⃗e and p⃗p , we can solve to find that
mp ⃗ me ⃗
p⃗p = P − p⃗, p⃗e = P + p⃗ .
M M
And now is where the effort pays off: the kinetic energy terms in the Hamiltonian are
p p )2
(⃗ p e )2
(⃗ 1 mp ⃗ 2 1 me ⃗ 2
+ = P − p⃗ + P + p⃗ ,
2mp 2me 2mp M 2me M
105
and now the cross-terms vanish: the kinetic energy operator in the Hamiltonian indeed simplifies to
p p )2
(⃗ pe )2
(⃗ 1 ⃗2 1 2
+ = P + p⃗ ,
2mp 2me 2M 2µ
a center-of-mass contribution and a relative contribution. And next lecture, we’ll see that this allows us to separate
the Schrodinger equation into center-of-mass motion and relative motion, reducing to a one-body problem.
p p )2
(⃗ pe ) 2
(⃗ 1 ⃗2 1 2
Ĥ = + xe − ⃗
+ V (|⃗ xp |) = P + x |).
p⃗ + V (|⃗
2mp 2me 2M 2µ
In other words, this Hamiltonian describes a system in which the center-of-mass moves as a free particle of mass
me mp
M = mp + me , and in which the relative position moves as a particle of reduced mass µ = me +mp under a central
potential V only depending on the magnitude of the distance. But to actually check that, we need to solve the
Schrodinger equation, and we have to be a bit careful with the two different coordinates – we have P⃗ = ℏi ∇X⃗ and
p⃗ = ℏi ∇⃗x (gradients must be taken with respect to the corresponding canonical coordinate). So to solve the (time-
independent) Schrodinger equation, we’ll write down our wavefunction ψ(X, ⃗ ⃗
x ) in terms of the new coordinates, and
we’ll do separation of variables
⃗ ⃗
ψ(X, ⃗ rel (⃗
x ) = ψCM (X)ψ x ).
Substituting this into Ĥψ = Eψ (grouping the second and third terms of Ĥ together), we have
1 ⃗2 ⃗ 1 2 ⃗ = EψCM (X)ψ
⃗ rel (⃗
P ψCM (X) ψrel (⃗
x) + p⃗ ψrel (⃗ x |)ψrel (⃗
x ) + V (|⃗ x ) ψCM (X) x)
2M 2µ
(since P⃗ only acts on the center of mass and p⃗ only acts on the relative motion). If we now divide by the total
⃗ rel (⃗
wavefunction ψCM (X)ψ x ), we have
1 1 ⃗2 ⃗ + 1 1 2
P ψCM (X) p⃗ ψrel (⃗ x |)ψrel (⃗
x ) + V (|⃗ x ) = E.
ψCM (⃗
x) 2M ψrel (⃗
x ) 2µ
106
Example 141 (Hydrogen-like atom)
Suppose that instead of a single proton, we have a nucleus of Z protons. Then the potential for the relative
motion is given by
Ze 2
V (r ) = − ,
r
where r = |⃗
x | is the distance between the particles and e is the charge of a proton (remembering that a proton
and electron have opposite charge).
The characteristic length scale for this system is the Bohr radius, given by setting equal the units from kinetic and
ℏ
potential energy: since a0 has units of momentum, we can set equal
ℏ2 e2 ℏ2
= =⇒ a0 = .
ma02 a0 me 2
Intuitively, the e appears in the denominator here, because the stronger the Coulomb force, the smaller the atom
should be (because the electron is more tightly bound). To get a rough estimate of this quantity, we can write
ℏ2 c 2 ℏc
a0 = = e2
,
e 2 mc 2 ℏc · mc 2
e2 1
and now we use the fact that the fine structure constant α = ℏc is approximately 137 , and we plug in the mass of
the electron instead of the reduced mass (it makes only a small percentage difference):
197 MeV · fm
= 1 ≈ 52.9 pm,
137 · 0.5 MeV
or about 0.529 Angstroms. We then have a corresponding energy scale (plugging in a0 for r into the energy)
e2 e4m e4
= 2 = 2 2 mc 2 = α2 mc 2
a0 ℏ ℏ c
In other words, the bound state energies in this problem are on the order of α2 of the rest energy 511 keV of the
electron, which is about 27.2 eV. Half of this quantity is the potentially-familiar 13.6 eV, which is the negative of the
true ground state energy of the hydrogen atom. But of course we wouldn’t know that at this stage, since we’re just
doing dimensional analysis! And connecting this to some calculations we did at the beginning of 8.04, note that the
Compton wavelength of the electron is a factor 2π off of
λc ℏ
αa0 = = ≈ 400 fm,
2π mc
much smaller than the Bohr radius, and then we get the classical electron radius with another factor of α:
α2 a0 ≈ 2.8 fm.
(For comparison, the size of a proton is approximately 1 fm.) With this, we now have a general scale (length and
energy) for the problem we’re considering, and we’ll solve the Schrodinger equation for bound states of this potential
(meaning that E < 0) – scattering states are usually covered in more advanced courses. This is an interesting physical
problem, because we’ll get the exact forms for the energy levels of a system with many applications.
Remark 142. There are many other effects that change the energy levels of the actual hydrogen atom, such as the
fine structure, relativitistic effects, and so on. But those are higher order effects, and we won’t discuss them for now.
107
Plugging in our V (r ) into the radial Schrodinger equation, we are trying to solve for solutions to
ℏ2 d 2 ℏ2 ℓ(ℓ + 1) Ze 2
− + − U(r ) = EU(r )
2m dr 2 2mr 2 r
with E < 0. (Remember that the solution U depends on E and ℓ, so we need to solve for the allowed energy levels
for each of ℓ = 0, 1, 2, · · · – once we find the solutions to the differential equation, our full wavefunction will be
U(r )
ψ(r, θ, φ) = r Yℓ,m (θ, φ).) Our first step will be to get rid of units and replace r by a unit-free variable – we do
this by making use of the characteristic length a0 , and it turns out the other constants will simplify best if we define
a0 2 d2 4Z 2 d 2
r= x : skipping some of the algebra here (but using that ℏm = a0 e 2 and that dr 2 = a2 dx 2 ), we end up with
2Z 0
2Z 2 e 2 d2 d2
ℓ(ℓ + 1) 1 ℓ(ℓ + 1) 1
− + − U(x) = EU(x) =⇒ − + − U = −κ2 U,
a0 dx 2 x2 x dx 2 x2 x
E
which is now a unit-free equation in terms of the unit-free energy κ2 = − – our goal is now to know what
2Z 2 e 2
a0
values of κ give us valid solutions to the differential equation. The equation now looks a lot simpler than before, but
it’s still unfortunately not that easy to solve – in particular, if we try to write down a series solution with the equation
in its current form, we will get a three-term recursion (which is more complicated than a two-term one).
A good strategy for simplifying is to look at the behavior of the solution near zero or near infinity – because the
2
second and third terms on the left-hand side vanish as x → ∞, the differential equation is approximately − ddxU2 = −κ2 U,
which is solved by U = e ±kappax (we’ll want the decaying solution for normalization reasons). This motivates us to
2κz
again make a transformation ρ = κx = a0 r , which turns the differential equation into
d2
ℓ(ℓ + 1) 1
− 2+ − U = −U .
dρ ρ2 κρ
(Note that it’s important that κ still shows up somewhere in this equation, since the problem we’re trying to solve
fundamentally involves fixing the allowed values of the energy.) And now if we look at the solution as ρ → 0 (which is
the same as taking r or x to 0), we know that u ∼ ρℓ+1 from the discussion last lecture. So we know that the solutions
to this equation are not polynomial (because they decay exponentially as e −κx = e −ρ ) and also that the don’t start in
the power series until ρℓ+1 . To encode all of that information, we’ll write down the ansatz
for some unknown function W (ρ), with the hope that W (ρ) will be simpler than U(ρ). Plugging into the boxed
differential equation (which just involves a lot of algebra), we get something that initially looks like it’s worse than the
original:
d 2W
dW 1
ρ + 2(ℓ + 1 − ρ) + − 2(ℓ + 1) W = 0.
dρ2 dρ κ
But this is actually a better equation for a series solution, because if we plug in W = ρk into the left-hand side, every
term ends up being proportional to either ρk−1 or ρk , meaning that we will get a simple one-step recursion relation
(given ak , we can find ak+1 ). So letting W = ∞ k k
P
k=0 ak ρ , we can get our recursion relation by setting the total ρ
coefficient to zero, and we find
ak+1 2(k + ℓ + 1) − κ1
= .
ak (k + 1)(k + 2ℓ + 2)
ak+1
Looking at the limiting behavior as ρ → ∞, we find that ak ∼ k2 . But this doesn’t decay fast enough – in particular,
108
ak+1 2 2k a0
even if ak = k+1 (which even decays slightly faster), we would have ak = k! , and that would give us
X X 2k
ak ρk ≈ a0 e k = a0 e 2ρ .
k!
k
And this is actually nicely consistent with what we found earlier – if the series does not terminate, then our solution
is proportional to a polynomial factor times e ρ e −2ρ , which means the behavior near ρ → ∞ has a e +ρ factor instead
of an e −ρ one, which still solves the differential equation but just diverges.
But if we want normalizable bound states, our series must be truncated at some point. For the sake of characterizing
aN+1
our solutions, we’ll be careful with this step: if we want W to be a polynomial of degree N, then aN ̸= 0 but aN =0
(and then aN+1 , aN+2 , aN+3 , · · · will all automatically be zero because we have a one-step recursion problem). So then
we must have
1
= 2(N + ℓ + 1),
κ
and because κ is related to the energy E, we see that the energy is quantized – it has to be the negative reciprocal of
some integer! And ℓ can take on any nonnegative integer value (because that’s how we set up the problem), and so
can N (because it’s the degree of our polynomial W (ρ)). But that means that there is degeneracy in this problem
– for any given value of N + ℓ + 1, there are many ways we can choose N and ℓ to get the correct sum. So defining
the principal quantum number
1
n =N +ℓ+1= ,
2κ
which is always a positive integer, we now have a formula for the nth energy level of the hydrogen atom: rearranging
the definition of κ,
2Z 2 e 2 2 Z2e2 1
En = − κ = − · ,
a0 2a0 n2
1
and we’ve derived the famous n2 factor for the energy levels of the hydrogen atom, involving the famous constant
e2
2a0 = 13.6 eV. For visualization, if we imagine drawing all of the nonnegative integer lattice points in the (N, ℓ)-plane,
then diagonally slanting lines correspond to a fixed value of n:
ℓ
n=4
n=3
n=2
n=1
N
In general, we see that there are n possible solutions where N + ℓ + 1 = n (with the constraint 0 ≤ ℓ, N ≤ n − 1),
which are (0, n − 1), (1, n − 2), · · · , (n − 1, 0), and all of these solutions are at energy level En . But that does not mean
that there is just an n-fold degeneracy, because now we have to remember that m can range over the (2ℓ + 1)
values from −ℓ to ℓ, giving us different energy eigenstates even if we have the same function U! So for n = 1, we
must have ℓ = 0, which only gives us one state. But then for n = 2, we can have ℓ = 1 or ℓ = 0, giving us 3 + 1 = 4
states. And for n = 3, the allowed values are ℓ = 2, 1, 0, corresponding to 5 + 3 + 1 = 9 overall states – this pattern
continues, and there are n2 total states at energy level En . And to summarize, each state in the hydrogen atom
comes with three quantum numbers, namely
109
• the principal quantum number n, which fixes the energy of the state and can be any positive integer,
• the angular momentum quantum number ℓ (this is more significant physically than the degree of the polynomial
N = n − ℓ − 1, so it makes sense to use this), which is a positive integer between 0 and (n − 1), and
• the magnetic quantum number m, representing the z-component of angular momentum, which is an integer
between −ℓ and ℓ.
1
Returning to the variables we’ve defined along the way, now that we know κ = 2n , we see that
2κZ Zr
ρ= r= ,
a0 na0
so the full wavefunction solution in three dimensions is, up to constant factors,
Un,ℓ ρℓ+1
ψn,ℓ,m = Yℓ,m ∼ Wn,ℓ (ρ)e −ρ Yℓ,m (θ, φ) = ρℓ W (ρ)e −ρ Yℓ,m (θ, φ)
r ρ
where W (ρ) is a polynomial of degree N = n − ℓ − 1 (these are called the Laguerre polynomials). And converting
back to the usual radial coordinate, we have
ℓ
r r
ψn,ℓ,m (r, θ, φ) = A polynomial in of degree (n − ℓ − 1) e −Zr /(na0 ) Yℓ,m (θ, φ)
a0 a0
for some normalization constant A. For reference, the simplest solution ψ1,0,0 is spherically symmetric and looks like
1
ψ1,0,0 = p 3 e −r /a0
πa0
when Z = 1, so we now have an explicit formula for the ground state of the hydrogen atom!
momentum of the state, and the magnetic quantum number −ℓ ≤ m ≤ ℓ tells us the state’s angular momentum in
the z-direction. And the wavefunction itself (in spherical coordinates) consists of a product of a normalization factor,
ℓ
a ar0 term (behavior as r → 0), a Laguerre polynomial in ar0 of degree (n − ℓ − 1), an exponential decay e −Zr /(na0 )
(behavior as r → ∞), and a spherical harmonic Yℓ,m .
Remark 143. Only very detailed calculations will actually require us to know the specific polynomial terms, so we
won’t talk much about it here. And it’s often faster to just do the recursion relation calculation, rather than trying to
search up the wavefunction and figure out discrepancies between units and conventions.
It turns out that there’s a classic way to draw the energy levels of bound states for a central potential, in which
E
we represent energy on the y -axis and ℓ on the x-axis. We’ll use units of Z2 e2
for our system, so that the energies are
2a0
at − n12 for positive integers n – the diagram below is not drawn exactly to scale because the energy levels would be
much more squashed near E = 0:
110
ℓ=0 ℓ=1 ℓ=2 ℓ=3
0
n=4
N=3 N=2 N=1 N=0
n=3
N=2 N=1 N=0
n=2
N=1 N=0
−1 n=1
N=0
E
Z2 e2
2a0
Basically, each dash represents a different E and ℓ for which we have an energy eigenstate, so each column
essentially corresponds to a different differential equation (because the effective potential changes depending on the
value of ℓ). And for our system, incrementing n by one means that we have a state at energy En in a new column
(specifically ℓ = n − 1), and N always ranges from (n − 1) to 0 within any row.
But there’s more that we can say: for any given column (that is, any particular value of ℓ), it makes sense that
the lowest energy state has N = 0, and the next energy state has N = 1, and so on, because the node theorem for
a one-dimensional potential (the radial equation for any given ℓ) tells us that the ground state has no nodes, the next
state has one node, and so on, and the only term in ψn,ℓ,m that can vanish is this polynomial. (Here, behavior at r = 0
doesn’t count because that’s the boundary of the wavefunction, which doesn’t count as a node.) So the incrementing
values of N should not be surprising to us, and in fact we learn that the Laguerre polynomials of degree N must have
N distinct real zeros.
On the other hand, the degeneracy across values of ℓ is extraordinary and very special to this problem! Usually,
when we solve the radial equations for ℓ = 0, 1, 2, · · · , we will get different energy levels for each of those problems,
but there is no reason a priori that the energy levels should actually line up for the hydrogen atom (and do so in
such an organized way). We do expect some degeneracy to always be present – for example, each line above in the
ℓ = 2 column really corresponds to a multiplet of five states with m = 2, 1, 0, −1, −2 – but the additional degeneracy
between different ℓs is very special and is not explained by any of the theory we’ve discussed in this class so far.
Fact 144
This fact led to developments with the Runge-Lenz vector, which is a conserved vector in planetary orbits in
Newton’s theory. Precession of elliptical orbits is not allowed under Newton’s theory (though it is under Einstein’s
theory of relativity), so it is not allowed for the hydrogen atom, and in fact that is connected to why we have all
of these hidden degeneracies.
For comparison, if we had instead solved the infinite spherical well problem (which we do in 8.05), in which the
potential is zero inside a sphere and infinite outside, we do not see the coincidences in energy levels between different
ℓs, even though the potential looks very simple in that case. The point to take away here is that this kind of hydrogen
atom behavior can only be explained by an extra symmetry!
111
Example 145
For some intuition, we’ll start with the Z = 1 ground state, ψ1,0,0 = √1 3 e −r /a0 , and understand how to obtain
πa0
the general-Z ground state without much additional work.
2 2
Remembering that the potential V (r ) goes from − er to − Zer when we go from 1 to Z protons, we should
ℏ2
substitute e 2 with Ze 2 everywhere in our wavefunction. And because a0 = me 2 , this means that we should replace a0
a0
with Z everywhere as well. (Often we write solutions in a “mixed way” involving both e and a0 , so we should make
sure to make both changes.) So in this case, the ground state wavefunction for a general hydrogen-like atom is
s
Z 3 −Zr /a0
ψ1,0,0 (r, θ, φ) = e .
πa03
(Intuitively, the fact that the “normalization did not care about a0 appearing in the wavefunction” means that if we
a0
replace a0 with any other constant, in this case Z, we will still have a normalized wavefunction.) And we’ve discussed
Zr
this before in slightly different words, but the reason for the a0 factor in the exponential is that the two of the terms
ℏ d U 2 2
in the radial differential equation go to zero as r → ∞, leaving just − 2m dr 2 = EU. This should have solutions
√ 2
± 2mE/ℏ r
proportional to e , and plugging in the value of E1 (or En in general) indeed recovers the correct exponential
decay for the ground state.
We can now turn our attention to another aspect of the hydrogen atom:
Definition 146
A Rydberg atom is an atom where the outermost electron is in a very high principal quantum number n.
In such a setting, we can imagine having a nucleus of charge Ze, a lot of electrons around it, and then the last
electron sees the charge from both the nucleus and the (Z − 1) other electrons. So essentially by Gauss’s law, the
outermost electron (if it’s outside the shell of the remaining electrons) sees an overall charge of +1. (It can be checked
that the electrons do appear further and further away from the nucleus for higher n, so this argument is basically valid.)
So to a good approximation, a Rydberg atom behaves a lot like a hydrogen atom.
A good first step is for us to calculate the size of a Rydberg atom (this doesn’t really make sense precisely, but
we can ask for the expectation of the radius), and it turns out the answer is not actually a constant factor times
na0 , even though the exponential factor in the wavefunction is e −r /(na0 ) . Instead, the intuitive explanation will come
from the virial theorem:
1
⟨T ⟩ = − ⟨V ⟩
2
where T is the kinetic energy and V is the potential energy.
(We won’t prove this theorem for now, but it involves some algebraic calculations with commutators.) We can
picture this as saying that the kinetic energy ⟨T ⟩ and the bound state energy Eb = ⟨T ⟩ + ⟨V ⟩ are negatives of each
other, since ⟨V ⟩ = −2⟨T ⟩. That means that ⟨V ⟩ = 2Eb , or for the Rydberg atom we’re discussing (for which Z = 1
as discussed),
e2 e2 1
1 1
− = 2Eb = 2 − =⇒ = 2 .
r a0 n2 r n a0
112
So this suggests that the typical radius is actually on the order of n2 a0 instead of na0 . Everything we’ve discussed
1 1
above is exact, but note that r = n 2 a0 does not imply that ⟨r ∠ = n2 a0 (we can’t manipulate expectation values
that way). In fact, the expectation of r actually depends on ℓ, and it can be calculated with a bit more effort to be
2 1 ℓ(ℓ + 1)
⟨r ⟩ = n a0 1 + 1− .
2 n2
(In particular, this expectation is 23 n2 a0 for ℓ = 0 and approximately n2 a0 for ℓ = (n − 1).) And now we can look back
at the wavefunction to understand where this n2 factor comes from: our solution is of the form
where fn,ℓ is a polynomial of degree (n − ℓ − 1) + ℓ = n − 1 times an exponential factor. So the probability density of
finding the radius to be between r and r + dr is
Z Z
2 2 2 2
p(r )dr = r dr |ψ| dΩ = r dr |fn,ℓ | |Yℓ,m |2 dΩ
(since we need to integrate out |ψ|2 d 3 x along the angular coordinates). But by definition of the spherical harmonics,
the integral over solid angle is 1, and then cancelling the dr s gives us the radial probability density
p(r ) = r 2 |fn,ℓ |2 .
And it is the polynomial factor that is causing us trouble here – since fn,ℓ contains a degree (n − 1) polynomial, let’s
just keep the leading order term r n−1 and say that fn,ℓ (r ) ∼ r n−1 e −r /(na0 ) , so that (plugging back in)
So there is a “fight” between the exponential and the polynomial factor in the probability density – since r 2n is increasing
and e −r /(na0 ) is decreasing, their product will basically be sharply peaked, and it will be have a maximum where the
derivative p ′ (r ) is zero:
2n 2
0 = p ′ (r ) = − r 2n e −r /(na0 ) =⇒ r = n2 a0 ,
r na0
giving us the expected n2 dependence for the radius!
Example 148
Rydberg atoms can be observed in nature in interstellar gases – during recombination, when protons capture
electrons to form atoms while the universe cooled down, they may capture the electron at a high quantum
number.
In astrophysics, n = 350 has been measured, which means that the size of the Rydberg atom is about 3502 a0 ≈
6.5µm. This size is on the order of the size of a red blood cell (which is about 8 µm) and just a bit smaller than the
diameter of a hair (which is about 50 µm). Also, these atoms are relatively stable – while an electron in the n = 5
energy state would jump to n = 1 within about 10−7 seconds, these Rydberg atoms with very high n sometimes last
a millisecond or a tenth of a second (because it takes a long time to spiral down the energy levels when the distance
1
between them is on the order of n3 ).
These atoms can be created using lasers in the lab, and they can be detected using ionization (because the energy
required to ionize these atoms is much smaller than those for ordinary atoms). In other words, Rydberg atoms are
semiclassical, and in fact Bohr’s calculations of the energy levels of the hydrogen atom (which were correct) didn’t
113
require quantum mechanical derivations – he just needed to assume quantization on the allowed photons emitted
during energy transitions.
Example 149
We’ll finish this lecture by discussing the effect of the effective potential in more detail – specifically, we’ll consider
a large n (like 100) and see what happens to the orbit of the electron as ℓ varies.
Initially, we might think that ℓ = 0 will be the most circular out of the different states (because it’s spherically
symmetric), but in fact it will be the most elliptical (and ℓ = 99 is the most circular). The intuition is as follows: the
effective potential for the radial equation is
ℏℓ(ℓ + 1) e 2
Veff = − ,
2mr 2 r
1 1
which looks roughly as shown below (with r dominating for large r and r2 dominating for small r ):
Veff
En r− r+
If we then imagine a particle at energy En rolling around in this potential, we get a more elliptical orbit if the
distance between the two endpoints r+ and r− (where Veff (r ) = En ) is larger. (After all, if we imagine a planetary
orbit, the most elliptical orbits get very close and very far away from the sun, and the most circular ones have a
constant r throughout the whole orbit.) So when ℓ = 0, because we don’t even have the upward behavior in the Veff
graph as r → 0, the electron reaches all the way to r = 0 and we have the largest possible orbit. And when ℓ gets
ℏℓ(ℓ+1)
larger and larger, the 2mr 2 factor will push the blue curve higher and higher, making the endpoints closer and closer.
So we get the most circular orbit when Veff is almost tangent to the horizontal line at En , and that comes from the
top value of ℓ = n − 1.
For another way of thinking about this, circular orbits do in fact have a lot of angular momentum ⃗r × p⃗ (because
the two vectors are orthogonal), but elliptical orbits do not because the angle between ⃗r and p⃗ is very small when
the particle is far from the center. And next lecture, we’ll do some more calculations with r+ and r− to understand
r+ +r−
how they relate to the elliptical orbit, but the punchline is that 2 = n2 a0 regardless of the value of ℓ! So in our
hydrogen atom spectrum that we drew in the beginning of this lecture, we should think of the energy states of the
same n as elliptical orbits of different eccentricity. And this connects to Kepler’s discovery that the total period of an
orbit depends only on r− + r+ , the major diameter of the ellipse, meaning that ellipses of the same major diameter
have the same energy.
114
ℏℓ(ℓ+1)
potential V (r ), solving the Schrodinger equation for the various effective potentials Veff (r ) = V (r ) + 2mr 2 involves
solving a bunch of different differential equations, and there is no reason that the energy levels will line up. But in the
hydrogen atom, a special symmetry happens to make the energy levels very degenerate (even beyond the (2ℓ + 1)-fold
degeneracy coming from the different values of m we can have for each (n, ℓ)).
At the end of the discussion, we started analyzing the various orbits of the hydrogen atom for a fixed n (which is
where our “semi-classical” arguments are more valid). In such a situation, we can imagine that our electron is making
elliptical orbits around the proton, and the r− and r+ from the diagram below Example 149 correspond to the shortest
and longest distance to the proton, which together make up the major axis of the ellipse. (Classically, we can imagine
that the electron is oscillating between r− and r+ . But quantum mechanically – and thus more accurately – we have a
wavefunction |ψ|2 which is exponentially decaying outside of the region [r− , r+ ], and the probability density will mimic
the time spent at various r s in an elliptical orbit.)
Example 150
r− +r+
We claimed last lecture that the “turning points” where Veff (r ) = En satisfy 2 = n2 a0 , and we’ll verify that
fact now with an explicit calculation.
Plugging in the formulas for the energy levels and the effective potential, we have
ℏ2 ℓ(ℓ + 1) e 2 e2 1
2
− =− .
2mr r 2a0 n2
This is a quadratic equation in r , but we can do a transformation r = a0 x to get the equation
ℏ2 ℓ(ℓ + 1) 1 e2 e2
− = − ,
2ma02 x 2 a0 x 2a0 n2
ℏ2 ℏ2 1 ℏ2 me 2 e2 e2
but now 2ma02
= 2ma0 · a0 = 2ma0 · ℏ2 = 2a0 , so all of the 2a0 factors cancel out and we’re left with
q
ℓ(ℓ + 1) 2 1 1 1 ± 1 − ℓ(ℓ+1)
n2
2
− = − 2 =⇒ =
x x n x ℓ(ℓ + 1)
1
by the quadratic formula (treating as the variable). We thus get the two solutions
x
q
ℓ(ℓ + 1) 1 ∓ 1 − ℓ(ℓ+1) n2
r !
ℓ(ℓ + 1) ℓ(ℓ + 1)
x= = = n2 1 ∓ 1 −
n2
q
ℓ(ℓ+1) ℓ(ℓ+1)
1 ± 1 − n2 1 − 1 − n2
by rationalizing the denominator (multiplying numerator and denominator by the radical conjugate), and thus r+ and
r− are just a0 times those two solutions. And indeed, the ± factor cancels out, and r+ + r− = 2a0 n2 , recovering
the result from last lecture. Furthermore, r+ ≈ r− when ℓ is large, because the additional ± term is close to zero in
that case, and r− = 0 and r+ = 2n2 a0 when ℓ is zero, giving us the most elliptical orbit. (But because this is still a
semi-classical argument, the idea of an “orbit” is not so accurate at the extreme values.) For a numerical example, in
a typical Rydberg atom with n = 100 and ℓ = 60, we have r+ ≈ 18000a0 and r− ≈ 2000a0 , but every other state with
n = 100 will also have r+ + r− = 20000.
Example 151
To finish the course, we’ll turn our attention to motivating spin angular momentum and the mathematical
treatment of spin by considering the “simplest quantum system.”
115
We might initially say that the simplest quantum system is the particle in the box, but that system has infinitely
many energy eigenstates, each of which is a function on an interval. And even states with only one bound state, like
the delta potential, may have infinitely many scattering states, which are still complicated to deal with.
So we’ll go back to the Schrodinger equation i ℏ ∂Ψ
∂t = ĤΨ, for which we can solve for energy eigenstates of the
form e −iEt/ℏ ψ and thus only need to solve the equation Ĥψ = Eψ. Picking a quantum system then comes down
to picking our Hamiltonian Ĥ, which can be any Hermitian operator with units of energy (which requires defining an
inner product). We’ve been solving these kinds of equations in this class when ψ represents a particle living in one
dimension (as motivated by classical mechanics), but such wavefunctions take on uncountably many values. So the
key to simplifying our system is to be less attached to the physics for a while.
What we’ll do is assume that we have a particle which can only live at one of two points x1 or x2 , rather than on
the whole real line. (It’s not interesting if the particle can only be at one point, because then the probability of finding
it there is just one.) So instead of having ψ(x) be a function "on the #real "numbers,
# we can just think of ψ as having
ψ(x1 ) α
the two pieces of information (which we’ll encode in a vector) = . The probability of finding the particle
ψ(x2 ) β
at position x1 is then |α|2 , and the probability of finding it at x2 is |β|2 .
Fact 152
This setting may look familiar from our discussion with interferometers back in the second lecture of this class!
But such a vector can also encode information about other systems, such as a particle which can either be on the
left or right side of a box with a partition. And it can also encode a situation where a particle is in one of two
states, either “spin down” or “spin up” – that’s what we’re moving toward right now.
Such two-state systems turn out to be very rich mathematically – to see this, we need to start by constructing
the inner product. Instead of the usual integral (φ, ψ) = φ∗ (x)ψ(x)dx, in which we’re basically adding up the
R
In other words, we can think of taking the conjugate transpose of the vector for ψ1 and performing matrix multiplication:
" #
h i α
∗ ∗ 2
(ψ1 , ψ2 ) = α1 β1
β2
(And as we might explore in a future class, we can also think of a wavefunction ψ(x) for a one-dimensional particle
as an infinite column vector of values · · · , ψ(−ε), ψ(0), ψ(ε), ψ(2ε), · · · .) If we now think of Ĥ as a 2 × 2 matrix
acting on these column vectors by matrix multiplication, the condition for being Hermitian is that (H T )∗ = H (we can
check this by a direct calculation, but we can think of it as coming from the fact that we are transposing and complex
conjugating ψ1 in the definition of the inner product.)
The power of this setup will now become clear as we try to classify all Hamiltonians for this two-state system. A
2 × 2 matrix equal to its conjugate transpose must have real entries on the diagonal, and the other two entries must
be complex conjugates, so the most general Hamiltonian is
" #
a0 + a3 a1 − i a2
Ĥ = ,
a1 + i a 2 a0 − a3
where a0 , a1 , a2 , a3 are real numbers. (We’ve chosen to represent the two diagonal entries in this way for reasons that
116
will become clear soon, but we can check that we can indeed pick a0 and a3 to get any real entries on the diagonal.)
We can then rewrite this matrix as
" # " # " # " #
1 0 0 1 0 −i 1 0
Ĥ = a0 + a1 + a2 + .
0 1 1 0 i 0 0 −1
The four matrices on the right-hand side are the four basic Hermitian 2 × 2 matrices – scaling them by real numbers
and adding them together still gives us a Hermitian matrix. In other words, we can say that there is a four-dimensional
space of matrices, spanned by these four basic matrices. They are so important that we give them names – the first
one is the 2 × 2 identity matrix, and the next three are called the Pauli matrices σ1 , σ2 , and σ3 , respectively.
If we want to write a Hamiltonian down for a physical system, though, we need Ĥ to have units of energy. And
there really isn’t a reason to use the identity matrix, because that’s like adding a constant to the Hamiltonian, which
doesn’t change the calculations or the states. (For example, we saw this in the harmonic oscillator, whose Hamiltonian
was Ĥ = ℏω N̂ + 12 , but for which we could just work with N̂ instead.) Since ℏω has units of energy, the Hamiltonian
we see that the three terms in parentheses have units of angular momentum, so we may hope that there is something
to do with angular momentum in this setup! Specifically, it is reasonable to try defining the operators
ℏ ℏ ℏ
Ŝx = σ1 , Ŝy = σ2 , Ŝz = σ3
2 2 2
as potential components of angular momentum. And in fact, the commutators here are easy to compute because we
have matrices:
" #" # " #" #!
−i −i
ℏ ℏ ℏ ℏ 0 1 0 0 0 1
[Ŝx , Ŝy ] = σ1 , σ2 = · −
2 2 2 2 1 0 i 0 i 0 1 0
" # " #!
ℏ ℏ i 0 −i 0
= · −
2 2 0 −i 0 i
" #
ℏ ℏ 2i 0
= ·
2 2 0 −2i
" #
ℏ 1 0
= iℏ ·
2 0 −1
ℏ
= i ℏ σz = i ℏŜz .
2
So we have the same commutation relation as with the angular momentum operators L̂x , L̂y , L̂z from a few
lectures ago – not only do the Ŝ operators have the same units as angular momentum, but they also have the correct
commutators (we can check that [Ŝy , Ŝz ] = i ℏŜx and that [Ŝz , Ŝx ] = i ℏŜy as well). So angular momentum does not
actually require complicated operators involving ⃗r and p⃗ – we can construct it with just 2 × 2 matrices. And what
we’ve created is spin one-half, the angular momentum coming from having two discrete degrees of freedom – it’s
really a mathematical object, but the interpretation of “spin up” and “spin down” came from physicists.
This system is the setting for a substantial fraction of the material of 8.05 – it turns out that the physical
117
interpretation takes some getting used to. Looking back at the Hamiltonian
⃗ · Ŝ,
Ĥ = ω1 Ŝx + ω2 Ŝy + ω3 Ŝz = ω
where ω
⃗ is a vector of real numbers and Ŝ = (Ŝx , Ŝy , Ŝz ), it turns out this Hamiltonian actually represents a spin
in a magnetic field, and solving the corresponding Schrodinger equation will reveal that the spin will precess in the
magnetic field – this is the origin of NMR (nuclear magnetic resonance), which is used to detect the density of fluids
in the body.
Looking ahead to 8.05, we’ll conclude by mentioning some properties of the eigenstates of Ĥ. Recall that for
our previous angular momentum problem, we measured the eigenvalues for L ⃗ 2 and L̂z . Similarly, we’ll measure the
" #
1 0
⃗ 2 and Ŝz here, where Ŝz represents the spin in the z-direction. Because Ŝz is the matrix ℏ
eigenvalues for S ,
2
0 −1
which is a diagonal matrix, its eigenvectors are
" # " #
1 0
|↑⟩ = , |↓⟩ =
0 1
ℏ
(here “up” and “down” arrows represent being in the upper or lower component), where the eigenvalues are 2 and − ℏ2 ,
respectively:
" #" # " # " #" # " #
ℏ 1 0 1 ℏ 1 ℏ ℏ 1 0 0 ℏ 0 ℏ
Ŝz |↑⟩ = = = |↑⟩ , Ŝz |↓⟩ = =− = − |↓⟩ .
2 0 −1 0 2 0 2 2 0 −1 1 2 1 2
And the reason for the name of “spin one-half” is that we have the factor of 2 in the eigenvalues and operators that
we’ve defined – it might look like that choice was arbitrary, but in fact it was necessary to make the angular momentum
commutators work out! In other words, two-state systems are forced to be spin one-half, and such systems must have
ℏ
angular momentum measured to be either 2 or − ℏ2 . (In contrast, a photon is an example of a spin-one system, in
which the two directions of circular polarization correspond to a spin of ℏ or −ℏ.)
But now that we know what spins look like along the z-direction, we may also be curious about what spin states look
like along the x- or y -direction. It may look like we’re running out of states, because we only have a two-dimensional
space of possible states and there are three directions of spins that we’re trying to represent. And in fact, the z-spin-up
and
" # z-spin-down states were already constructed to be orthogonal and form a full basis of the vector space: any " state
#
a 0 1
is a superposition a |↑⟩ + b |↓⟩ of those two states. But that’s not actually a problem: the operator Ŝx = ℏ2
b 1 0
does have two different eigenvectors. Specifically, if we consider the (normalized) state
" #
1 1 1
√ = √ (|↑⟩ + |↓⟩) ,
2 1 2
and we do have an eigenstate of Ŝx of eigenvalue ℏ2 (which we’ll denote |↑; x⟩), formed by taking a superposition of
" #
1 1
|↑⟩ and |↓⟩! Similarly, the state |↓; x⟩ = √2 = √12 (|↑⟩ − |↓⟩) will be an eigenstate of Ŝx with eigenvalue − ℏ2 .
−1
Finally, if we turn our attention to the spin states along the y -direction, we will also be able to construct eigenvectors
118
of Ŝy . Specifically, the state " #
1 1 1
√ = √ (|↑⟩ + i |↓⟩)
2 i 2
satisfies " # " #" # " #
1 1 ℏ 1 0 −i 1 ℏ 1 1
Ŝy √ = ·√ = ·√ ,
2 i 2 2 i 0 i 2 2 i
" #
1
meaning that we have an eigenstate |↑; y ⟩ of Ŝy of eigenvalue ℏ √1
2, and a similar calculation shows that the state 2
−i
is the eigenstate |↑; y ⟩ of eigenvalue − ℏ2 . So complex numbers play an important role here – there would be no way
to get an eigenstate for all directions otherwise.
Importantly, all of these eigenstates have nothing to do with our usual functions of x, r, θ, φ, and so on – spin is
its own thing with finitely many degrees of freedom and column vectors as wavefunctions. But within this system,
we have angular momentum, and this angular momentum is what we use to describe the spins of particles. And spin
systems will play a central role in 8.05 – they’ll be used to understand superposition, entanglement, Bell’s inequality,
and much more.
119