Modern Physics For Engineers v8
Modern Physics For Engineers v8
David A. B. Miller
Stanford University
Copyright © David A. B. Miller 2021
In memory of my father, Barclay Miller
Contents
Preface i
Prologue 3
Introduction 4
1.1 Why modern physics for engineers? 4
1.2 The background to modern physics 4
1.3 The development of classical physics for engineering 10
1.4 Introduction to the main topics of this book 11
Epilogue 215
Index 230
Preface
The physical technologies we need for key applications of major societal impact, from solar energy to the
ways we sense, move, store and process information, are based on ideas and concepts beyond the world of
classical physics. A purely classical view gives us no model for the atoms and molecules of chemistry or
of material properties such as color or electrical conductivity. We cannot understand the simplest solar cell
or even the color and brightness of the sun. To move forward, we need core ideas from quantum mechanics
and statistical mechanics. The transistors, lasers, photodetectors, and memory devices that enable all of
information technology similarly rely on these same concepts. Without those ideas, we simply cannot
understand how this all works, and we cannot make the advances we want.
To be able to create and apply the emerging technology for these and other important societal problems,
and to understand the limits imposed by science on the various solutions, the next generation of engineers
and scientists needs an understanding and a working knowledge of ideas beyond the classical world. Unless
we commit seriously to studying physics itself, it can, however, be difficult for us to acquire that
understanding. Typically, we would need many classes just to cover the necessary range of concepts, and
we may not have the time (or inclination) for that. Even physics students may find it challenging to get an
economical yet substantive introduction to the necessary ideas.
One common approach beyond core classical ideas like mechanics and electromagnetism is to teach a
sequence including introductory modern physics, followed by deeper courses in quantum mechanics,
thermodynamics and statistical mechanics. For many engineers and for scientists outside of physics,
however, there may not be enough time available for such a comprehensive syllabus before we need to start
using these ideas.
I have had the opportunity to work in physics and engineering for many years, both inside and outside of
academia, and to have taught over this spectrum to a wide range of students in engineering and science at
Stanford. This text, and the “Modern Physics for Engineers” course for which it has been developed, are
my attempt to solve this problem of introducing the core ideas of modern physics with enough rigor to give
a solid basis, but still in an efficient fashion. Indeed, I believe we can do this in a one-quarter or one-
semester course by carefully choosing the core ideas and finding the most economical yet coherent path
through them.
Obviously, we cannot go as deeply into many topics as such a longer suite of courses could allow. By
teaching an introduction in our one coordinated approach, however, we have one major advantage: we can
teach the topics in the sequence that makes the most intellectual sense, saving ourselves considerable time
and effort. That sequence is different from the historical order in which these areas emerged and in which
they are sometimes taught.
Here we start with the ideas of modes from classical oscillators and waves. These ideas, which are
sometimes not covered sufficiently in introductory classical physics, are useful in themselves; more
importantly here, however, they set the intellectual ground for the concepts of quantum mechanics, with its
extensive use of the ideas of eigenstates.
Building on these mode and eigenstate concepts, we can then introduce quantum mechanics. After
presenting the key ideas of the quantum mechanical view of the world, we can proceed to atoms and
crystals. The understanding of quantum mechanics in crystals is directly useful for a wide range of
applications, and gives a first example of a quantum mechanical treatment of the properties of materials.
The emergence of states in quantum mechanics sets us up to discuss the ideas of the statistical occupation
of these states, which in turn allows us to introduce the ideas of entropy and the various statistical
ii
distributions for occupations of the states. Then we can introduce some key ideas of thermodynamics,
though we can largely avoid having to introduce the rest of the bulk of that substantial subject at this point.
With this background in place, we can present some core applications in the physics of electronic devices.
After this, we give a first treatment of the quantum mechanics of light and its absorption and emission. This
enables a short introduction to optical and optoelectronic devices, including those used in optical
communications and solar cells. The goals in briefly introducing device applications are to give some
applied context to the basic scientific ideas, showing how those ideas impact real engineering, and to leave
students well prepared if they choose to take further device courses. These device applications also give
good examples to exercise the key ideas.
The ideas we introduce here really do allow us to understand much that a classical view simply cannot
explain, and they give us a sound basis of concepts for engineering modern technology in a wide range of
applications. Though we cannot teach everything in this text, I believe we can teach enough to give the
reader a strong basic understanding of how the modern world actually works. Indeed, we can see that there
is a relatively small set of physical processes that underlie the modern physics for a broad range of
applications. These ideas are also themselves intellectually fascinating and stimulating, and they change
the way that we think about the world.
I am pleased here to acknowledge the many contributions of Austin Brownlow, Christine Donnelly, Ching-
Ying Lu, Matthew Morea, and Marina Radulaski who worked with me in teaching this class as it was being
developed, and correcting many of my errors (though any remaining are entirely my fault!).
David Miller
Stanford, June 2021
Prologue
There are two pieces of good news about understanding the physical world around us. First, we largely
know how it works – we can write down some principles that let us describe essentially all the physical
behaviors we see, from the inside of atoms to the outside of the solar system. Second, there are not that
many principles we need to learn to do this. That might be a surprise, because there is certainly enough
science to fill many libraries. But a few concepts well understood can go a very long way.
There are real bounds to this conceit, of course. Complex systems still confound us; to take two examples,
it is hard for us to predict the weather because the calculations are too hard, and our brains still do not
understand how they themselves work. Nonetheless, we still likely presume that neither the behavior of the
weather nor the processes in our neurons are violating our physical principles. We should, of course, be
honest that there are aspects, especially of quantum mechanics, that puzzle us even though we understand
in a practical sense how to use them. And, we realize there is much we do not know when we head off to
cosmological scales or back to the big bang or down to the depths of elementary particles. Despite these
caveats, our understanding of some key physical ideas allows us to engineer much of the material world
and the energy, transportation, communication, and information that help us run it.
The challenge for understanding our physical world is that it does not actually work the way we might
think. The “classical” view of the simple mechanics of objects that we learn in elementary physics classes,
while useful in many ways, leads us to dead ends if we ask questions like why materials have chemical
properties, or why the sun is the color it is, or how to design a better transistor or solar cell. Unless you
have had the opportunity or even the luxury to study quite a lot of physics, you are unlikely to have been
able to “peer behind the curtain” to reveal what is going on. You will not have encountered many of the
principles we need, let alone see how they fit together into a coherent picture.
At this point, you may think hope is lost; you do not have the time (or perhaps the inclination) for that deep
and comprehensive study of half a dozen areas of intimidating physics. And, it is true, as it is for most
worthwhile endeavors, that some effort and persistence is going to be required here. But, there are three
other pieces of good news.
First, it turns out that much of the physics we need was discovered in the wrong order. That itself is not
good news, of course. But, by putting it together in the right order, we can save a lot of time while also
developing a coherent story.
Second, though we cannot and will not fully develop the various areas of physics we introduce, with just a
moderate amount of time and effort we can put together a coherent and compact core. We can progressively
grow the “trunk” of a tree of concepts here, even though we will only be able just to start some of the
branches that it can support.
Third, because the principles we learn are so economical yet universal, and often quite unexpected,
encountering them can truly be an exciting intellectual journey where we are drawn forward by successive
and quite remarkable revelations.
So, fortified by all this good news, we can set out on our adventure to find out how the world actually
works. I honestly believe you will find the journey fun and the destination worth the effort.
Introduction
1.1 Why modern physics for engineers?
It would be easy to imagine that engineering does not need modern physics. Is that not some esoteric
subject concerned only with elementary particles and strange aspects of quantum mechanics remote
from everyday existence? To be fair, we do need to be clear what we mean by “modern”. There certainly
is a “classical” physics of Newton’s Laws explaining mechanics and moving objects and Maxwell’s
equations describing electromagnetism, radio waves and many aspects of light, and we might add the
thermodynamics of steam engines and the like into that classical world. There is no doubt we have
constructed much useful engineering out of such classical ideas. We could draw the line between
classical and modern at roughly that point, which would put us somewhere about three-quarters of the
way through the 19th century. Is there really anything important for engineering that we would miss if
we stopped there?
Of course, the answer is quite certainly yes – we would be overlooking much of what makes our modern
world possible. Stopping at that point would mean we would understand nothing about the physical
origin of the properties of materials or of the actual processes of chemistry. We would have no model
of the physical basis of color and how light interacts with anything. We would have no idea how light
bulbs actually work, or why the sun gives off the amount of light it does, or even why the sun is the
color it is. Why some materials conduct electricity and others insulate would be a complete mystery.
Without the modern physics beyond that classical boundary at about 1870, we would have none of the
devices of the information age – no transistors, no lasers, no digital cameras, no optical fiber
communications, no televisions, no mobile phones, no computers, no display screens, no hard drives or
any other digital memory; we would not even have vacuum tubes. There would be no solar cells and no
LED (light-emitting diode) lighting to tackle our energy problems. Without the modern physics past our
boundary at ~ 1870, we would have little of what makes modern technology “modern”.
Matter
It may seem odd to us now, but there was a time before our ancestors seriously considered the idea that
everything was made of something. In that simpler view, a rock is just a rock, a tree is just a tree, water
1.2 The background to modern physics 5
is just water. Perhaps in those times we just had mythological explanations of how things came to be.
Now our elementary science education teaches us that all the matter we see in our everyday lives is
made up from atoms, which are themselves made from electrons, protons and neutrons. But, that is a
very modern notion, one that did not fully coalesce until several decades into the 20th century.
We can see the progression of thought towards a scientific view in the thinking of the ancient Greek
philosophers. The idea that matter is made of something dates back at least to Thales of Miletus (c. 620
BCE – c. 546 BCE). Empedocles (c. 490 – c. 430 BCE) proposed the idea of earth, air, fire and water
as being the four elemental substances that made up matter, an approach later also favored by Aristotle
(384 – 322 BCE). Though Democritus (c. 460 – c. 370 BCE) and/or his teacher Leucippus proposed that
matter was composed of indivisible “atoms”1, the “four elements” idea would remain very influential
for nearly 2000 years. (Today, we would recognize these ideas as being closer to a statement of the
different phases of matter – solid (earth), liquid (water), gas (air), and fire (plasma) rather than matter’s
constituents.)
This “four elements” approach formed the basis of alchemy, the precursor of modern chemistry. There
were some notable additions such as the elements sulphur and mercury (or at least the philosophical
essences of dry exhalations (sulphur) and moist exhalations (mercury)); these were discussed, together
with other chemical ideas, in works attributed to Jabir2 in about the 700’s CE. The “four elements” idea
was also not universally accepted during this time, being disputed by some, such as Avicenna3 (c. 980 –
1037).
The period from about 1500 – 1660 was a transitional one from alchemical ideas towards a modern
atomist approach, and to the use of what we would call the scientific method. The ideas of the scientific
method, involving a cycle of hypotheses, experiments to test them, modified hypotheses, and so on, has
roots extending back to the deductive methods of Parmenides (c. 515 – c. 460 BCE), Leucippus and
Democritus, and through scientific experiments such as those by Alhazen (Ibn al-Haytham)4 (c. 965 –
c. 1040). What we would recognize as modern ideas of scientific method were discussed by Francis
Bacon, René Descartes and Galileo in the early 1600’s.
By the time we get to Robert Boyle (1627–1691), we find a rigorous experimental approach towards
chemistry. In 1661, in “The Sceptical Chymist” he argued towards modern ideas that we would think of
as atoms, molecules and chemical reactions. He also argued for chemistry to be regarded as what we
would call a science, distancing it from some of the philosophical aspects of alchemy.
The subsequent period through the 1700’s saw progressive advances in chemistry, with identification of
various elements, the emergence of some modern chemical nomenclature, quantitative understanding of
reactions, and the idea of conservation of mass in chemical reactions. Such ideas are found, for example,
in Antoine Lavoisier’s classic “Traité Élémentaire de Chimie”, a key chemistry text published in 1789.
With the observation that compounds are made of definite proportions of their constituents, first by
Joseph Louis Proust in 1799, and then extended by John Dalton to show integer ratios5 in different
compounds, the ground was prepared for a clear atomic theory as proposed by Dalton in the first decade
of the 1800’s. Though the atomic theory was not fully accepted for about a further 100 years, it proved
a very successful basis for chemistry. In 1869 Dmitri Mendeleev published his first periodic table of the
elements, of which 66 were known by that time, organized by their relative atomic masses and by their
similarity of properties.
1
“Atom” is from the Greek, meaning “cannot be cut”.
2
Abu Mūsā Jābir ibn Hayyān, fl. c. 721 – c. 815, also known as Geber
3
Ibn-Sīnā, full name Abū ʿAlī al-Ḥusayn ibn ʿAbd Allāh ibn Al-Hasan ibn Ali ibn Sīnā
4
Abū ʿAlī al-Ḥasan ibn al-Ḥasan ibn al-Haytham
5
We would now refer to such integer ratios as “stoichiometric” or “stoichiometric numbers”.
6 Chapter 1 Introduction
But we had no idea of what atoms were, what were the actual differences between atoms, and why they
had their specific chemical properties. The full understanding of this atomic view would take at least
another 50 years or so, until the creation of modern quantum mechanics.
Laws of motion
Early ideas from Aristotle’s time held that, if you did not keep pushing something, it would stop moving.
The apparent continuous movement of the “heavens” – the stars, the planets, the sun and the moon –
required some different approach, such as saying they lay on perfect celestial spheres centered round
the earth. Ptolemy (c. 90 – c. 168 CE) put together a model on such a basis, and was able to calculate
astronomical tables. At that time, and even well into the early modern era, because astrology was
regarded with some seriousness for predicting events, such tables were considered important. Astrology
and astronomy were not really separate subjects, and arguably astrology essentially funded accurate
astronomical measurements. Of course, astronomy based on an earth-centered view becomes quite
complicated, especially for predicting the paths of planets.
Nicolaus Copernicus (1473 – 1543) realized that the whole subject became much simpler if we viewed
the earth and the planets as orbiting the sun, and he published this rather heretical “heliocentric” view
just before his death. Copernicus’s realization is often cited as the start of modern science, based on
observation and hypothesis that should agree with one another. Other astronomers, notably Tycho Brahe
(1546 –1601), followed with further accurate measurements. Using Brahe’s data, Johannes Kepler (1571
–1630) deduced rules (Kepler’s Laws of Planetary Motion, published in 1609 and 1619) for the elliptical
orbits of planets (an improvement on Copernicus’s circular orbits) round the sun, which is then
positioned at one of the foci of the ellipses. The dynamics of earth-bound motion meanwhile was
receiving more attention. Galileo made his famous observations that a pendulum swung with essentially
the same period regardless of the amplitude of its oscillation, argued that velocity of falling bodies was
not proportional to their weight, and advocated a law of inertia whereby bodies kept on moving even if
they were not explicitly pushed. (This law, with some refinements, essentially later turns into Newton’s
Second Law of Motion.)
Then Isaac Newton (1642 –1726/7) made the proposal in 1687 of his three Laws of Motion and his
gravitational theory, which together were able to explain Kepler’s Laws. (Along the way, he also created
calculus to construct his theory.) This was a remarkable success, of course, that arguably set up modern
science as a search for rigorous, predictive laws. It can also be argued that it represents a break point
from the previous approaches to explanations of the natural world. Prior to Newton’s proposal, it was
common to argue that there must be some harmonious plan behind nature, and the behavior we see
should follow from that plan6. It would be common at that time to regard physical theories with suspicion
if they did not correspond to such philosophical plans. Possibly we should even be able to deduce the
plan on the basis of pure logic, without making measurements. Newton’s attitude was apparently that it
was not necessary to have such a plan for the theory to be correct.
Newton’s mechanics remained the foundation for the dynamics of motion for over 200 years. It is still
the method we turn to first for most such calculations in engineering or science today, at least for objects
large enough to see with the naked eye or through microscopes and traveling at velocities much less
than the speed of light. More mathematically sophisticated versions were developed in the 1700’s and
1800’s, notably the approach by Joseph-Louis Lagrange (1736–1813) in 1788, and extended by William
Hamilton (1805–1865) in 1833; with such approaches, we came to view dynamics more in terms of
conservation laws, like conservation of energy and of momentum, but the basis of our dynamics
remained Newton’s Laws at least till the 20th century.
6
Even Kepler, for example, initially considered that the fact there are only five “platonic solids” (tetrahedron,
cube, octahedron, icosahedron, and dodecahedron) to be compatible with the fact that there were (apparently at his
time) only six planets, with those solids occupying the spaces between the planets, and he continued to look for
“harmonious relations” in planetary behaviors throughout his career.
1.2 The background to modern physics 7
The theories of relativity introduced by Albert Einstein (the special theory, introduced in 1905 and the
general theory, introduced in 1916, which is also the modern theory of gravitation) formally superseded
Newton’s models of classical dynamics, though Newton’s approaches remain practically useful and
valid in most everyday engineering situations for macroscopic objects.
7
Such an approach does, however, have difficulty explaining why we cannot see in the dark!
8
Abu Saʿd al-ʿAlaʾ ibn Sahl
9
Refraction is the process of the “bending” or change of direction of light rays when they enter materials of
different refractive index, and refractive index is the factor by which light travels more slowly in a material
compared to its speed in a vacuum.
10
Diffraction is the “breaking up” of light that we now understand in terms of interference effects of light waves
as they encounter edges, small apertures, periodic structures like gratings, or structures that vary on a scale
comparable to the wavelength of the light, such as holograms.
11
Objects can luminesce, giving off light, for example when they are hot, and objects can also fluoresce, in which
shining light on them at one wavelength (usually a shorter one) leads to light being emitted at another wavelength
(usually a longer one). Though fluorescent effects are quite common in certain modern man-made materials, such
as some dyes, strong fluorescent effects are relatively rare in nature. Modern non-linear optical techniques can also
generate new wavelengths of light from materials, but those typically require the high intensities of lasers to
demonstrate them.
12
Huygens’ principle states that we can find the location of the next wave front by considering a set of sources of
spherical waves on the current wave front. With some amendments, mostly to get rid of the backwards waves that
would result from a literal use of this approach, this becomes quite a good model for wave propagation, including
diffraction effects. For a modern version, see D. A. B. Miller, “Huygens’s wave propagation principle corrected”
Optics Letters 16, 1370-1372, (1991).
13
This approach also allows us to measure the wavelength without having any measuring device on a scale as
small as the wavelength.
8 Chapter 1 Introduction
theory of light and diffraction (1821), and with François Arago he determined that light was a
“transverse” wave, with polarizations (directions of the electric field vector that are perpendicular to the
direction of propagation). Hippolyte Fizeau made the first time-of-flight measurements of the velocity
of light in 1849, obtaining an answer within 5% of the modern accepted value. At this point in the mid
1800’s, though, there was still no clear connection between light and electromagnetism.
Electromagnetism
The existence of electrostatics and magnetism has been known for a long time; both may have been
known to Thales in the 6th century BCE, for example. The ancient Greeks were aware that rubbing amber
with fur could lead to attraction between the two, which we now understand as electrostatic attraction.
Lodestone (naturally occurring magnetite) was also known to attract iron objects, as was apparently
known in the 4th century BCE writings of Wang Xu. The Chinese scientist Shen Kuo (1031 – 1095) later
wrote about the magnetic needle compass; writings around 1111 – 1117 by Zhu Yu document its use
for navigation. In 1600, William Gilbert concluded that the Earth was magnetic, and that this was the
source of the action of magnetic compasses.
The invention of the Leyden jar capacitor (apparently independently by both Pieter van Musschenbroek
of Leyden and by Ewald von Kleist in 1745) allowed electrical charge, for example from frictional
generation, to be accumulated and stored. Benjamin Franklin flew a kite into a thunderstorm in 1752,
and captured the resulting charge from the lightning in a Leyden jar, establishing the link between
lightning and electricity. The inverse square law of electrostatics was introduced by Charles Coulomb
in 1785. A major discovery and substantial practical step was the creation, by Alessandro Volta in 1799,
of the voltaic cell and the battery, using copper or silver discs separated from zinc discs in brine (salt
water). This allowed much more convenient experiments using electrical currents and voltages.
Up to this point, electrostatics and magnetism were largely separate phenomena scientifically, but Hans
Christian Ørsted in 1820 noticed that a compass needle could be deflected by passing a current through
a wire, noting too that the direction of deflection depended on the direction of the current. These
observations essentially started the field of electromagnetism. Following this observation, in the same
year of 1820, André-Marie Ampère showed that passing a current through a coil of wire caused it to
behave like a magnet, and he developed the theory of the magnetic attraction and repulsion of current-
carrying wires.
Michael Faraday observed electromagnetic induction in 183114. Turning on and off a current in one coil
of wire can induce a current in another coil, and moving a magnet through a coil of wire can also
similarly induce a current. Hence this Law of Induction shows that changing magnetic fields can create
electric fields.
Maxwell’s equations
In 1865, James Clerk Maxwell took one more theoretical step, proposing that just as changing magnetic
fields could produce electric fields, so also changing electric fields would produce magnetic ones. (The
effect of these changing electric fields can be characterized as an effective current, the “displacement
current”, which can then be regarded as generating the magnetic field.) With this final step, he was able
to synthesize essentially all of electromagnetism in his equations. With changing magnetic fields giving
electric fields and changing electric fields giving magnetic ones, these equations predict wave motion.
The velocity of that wave motion as calculated from Maxwell’s equations essentially agreed with the
measured velocity of light, leading to the proposal that light was electromagnetic radiation. Maxwell’s
equations stand to this day as one of the most remarkable achievements of theoretical physics.
14
Some aspects of this may have been anticipated also by Francesco Zantedeschi in 1829 and 1830.
1.2 The background to modern physics 9
Thermodynamics
Fire has been used since prehistoric times for cooking, heating, and possibly also for clearing land for
agriculture. Hero of Alexandria (c. 10 – c. 70 CE) described a simple steam engine, the aeolipile, based
on jets of steam from water inside a heated sphere, making this simple steam turbine possibly the first
“heat engine”, and he and others may have understood the expansion of air when it is heated. By the
early 1600’s several scientists15 were apparently exploiting the idea of a tube closed at one end and with
its other end in water as a thermometer. The tube is partially filled with air, and the level of water in the
tube varies as the air expands and contracts with changing temperature, an idea that may have been
known to Philo of Byzantium (ca. 280 – ca. 220 BCE). Simple steam turbine engines based on steam
pushing vanes of some kind were described in 1551 by Taqi al-Din16, in 1629 by Giovanni Branca, and
by John Wilkins in 1648.
The growth of mining required a method to drain the mines of water. Thomas Savery invented the first
commercially-used steam engine in 1698, a steam-powered pump. Thomas Newcomen’s engine of 1712
employed a piston in a cylinder. The cylinder was filled with steam (not at high pressure), then sealed
off and the steam condensed by cooling it with a spray of cold water, so atmospheric pressure would
push the piston (from the outside) to give the engine action. James Watt in 1774 invented the separate
condenser steam engine, which greatly improved the efficiency. The importance of efficient steam
engines for powering the industrial revolution gave a strong practical need for a deeper understanding
of thermodynamics.
A reliable temperature scale and thermometer based on the expansion of mercury was developed by
Daniel Fahrenheit in 1724. Joseph Black introduced the idea of latent heat17 – the heat required to change
the phase (e.g., from solid to liquid or from liquid to gas) – in 176218. The ice-calorimeter used by
Lavoisier in 1782-83 to measure the heat generated from various chemical reactions was based on
Black’s idea of latent heat. With meaningful thermometers and calorimeters, thermodynamics could
start to become a quantitative science.
Sadi Carnot19 in 1824 considered the question of the efficiency of heat engines like the steam engine,
and introduced the idea of an ideal engine and its (“Carnot”) cycle of operation, an engine presumed to
be otherwise as good as possible, i.e., having no friction and no conduction of heat other than through
the working “fluid” of the device. He proposed that only the difference in the temperature of the “hot”
and “cold” reservoirs mattered in such an ideal engine, not the specific “fluid”. At that time, the nature
of heat was still not clear, being typically viewed in terms of “caloric” – a supposed fluid that flowed
from hot objects to cold objects; this “caloric” was not yet closely identified with energy as we
understand it.
A major challenge to the caloric theory was the observation by Benjamin Thompson (Count Rumford)
in 1797 that heat could be generated by friction; specifically, by immersing a cannon barrel in water and
then attempting to bore it out with a blunt tool, he could boil the water. Hence, mechanical energy could
be converted to heat. Using a falling weight to drive a paddle immersed in water, in 1845 James Joule
deduced a consistent specific heat for water (the amount of energy required to heat a known mass of
15
Possibly Cornelis Drebbel, Robert Fludd, Galileo and Santorio Santorio.
16
Taqi ad-Din Muhammad ibn Ma'ruf ash-Shami al-Asadi (1526–1585)
17
Since this process occurs at a constant temperature, this heat is hidden – latent – from a thermometer.
18
Joseph Black, who also discovered carbon dioxide, knew James Watt well, and invested in his company. Both
men were very well acquainted with James Hutton, arguably the first modern geologist, and Adam Smith, the
progenitor of modern economics. Black, Hutton and Smith were founding members (Fellows) of the Royal Society
of Edinburgh in 1783, and Hutton successfully proposed Watt for Fellowship in 1784.
19
Nicolas Léonard Sadi Carnot (1796-1832), often known just as Sadi Carnot
10 Chapter 1 Introduction
water by a known temperature)20. This led to the proposal that in fact heat was energy, and the energy
was conserved overall in thermal phenomena (which becomes the First Law of Thermodynamics).
Two other key ideas that emerge from these advances in thermodynamics are entropy and the Second
Law of Thermodynamics. The idea of entropy is usually credited to Rudolf Clausius from his work in
the 1850’s and 1860’s. Both Clausius (in 1854) and William Thomson (Lord Kelvin) (in 1851) give
statements of the Second Law. The entropy change ∆S resulting from a flow of heat (energy) ∆Q into a
system at temperature T in thermodynamics can at least be defined through the simple equation
∆Q
∆S = (1.1)
T
Here T is expressed relative to absolute zero temperature (a notion introduced by Kelvin in 1848). There
are many statements of the Second Law of Thermodynamics, but the key notion, stated somewhat
informally, is that, in some isolated system, total entropy cannot decrease. This principle puts a limit on
the efficiency of heat engines, for example.
The idea of entropy is one of the subtler ones in physics. Just what entropy means and where the core
ideas come from becomes much clearer once we make the change to thinking about kinetic theory – the
idea that heat is really the energy stored in the random motions of atoms or other entities. The branch of
physics in which we can talk clearly about these ideas is statistical mechanics, which we can regard as
being started by James Clerk Maxwell in 1871 and a particularly important paper by Ludwig Boltzmann
(1844 – 1906) in 1875, with major contributions by Josiah Gibbs (1839 – 1903).
20
Similar ideas were being proposed at the same time by Julius von Mayer and Ludwig Colding, though Joule’s
experiments may have been more precise.
1.4 Introduction to the main topics of this book 11
to very different concepts of modern physics, ones that enabled completely new technologies that
transformed our world yet again.
21
Photons are the quantum mechanical particles that make up light.
12 Chapter 1 Introduction
1
We should state up front, though, that quantum mechanics cannot be “derived” from classical mechanics;
quantum mechanics simply has to be postulated. Its ultimate justification is that it agrees with experiments, and
you can be sure that it does agree very well with those experiments and that it indeed explains many things we did
not previously understand.
14 Chapter 2 Oscillations and waves
Mass is what we call a scalar – it is described simply by a number, here with “dimensions” of kilograms
(kg) in the usual SI units. Velocity is a vector quantity that has both a magnitude2 – a number, usually
taken to be positive, and here in units of meters per second (m/s) – and a direction. As is common, we
indicate vector quantities with a bold, non-italic font. Because momentum is a product of a number and
a vector, it is a vector quantity as well.
The kinetic energy of a mass is the energy associated with its motion. From elementary classical
mechanics, kinetic energy can be written
1 2
K .E. = mv (2.2)
2
where v is the magnitude of the velocity. In more advanced treatments of classical mechanics, and as
we make the transition towards quantum mechanics, it can be useful to write this in the equivalent form
p2
K .E . = (2.3)
2m
where by p2 we understand the vector dot product of p with itself, that is
p 2= p ⋅ p (2.4)
We can understand v2 in the same way as being v ⋅ v .
Potential energy
Potential energy is defined as “energy due to position”. It is often denoted by the letter V. This choice
of notation is particularly common in quantum mechanics. This V should not be confused with voltage,
even if the potential energies we are working with are often electrostatic in origin. Often we call such
potential energies just “potentials” for short. So this potential energy has units of joules, just like any
other energy. (Voltage is the (electrostatic) potential energy per coulomb of charge, and so has units of
joules per coulomb.)
A key idea in potential energy is that an object will have the same potential energy at some point in
space no matter what path it took to get there. Since we can write a position as a vector r relative to
some position origin, we can write a potential3 as V ( r ) – that is, we write potential energy as a function
of position. Fig. 2.1 shows a potential “hill” or surface. The potential energy of some object at the point
given by the dot is only a function of the position of the dot. Going round the closed path shown by the
arrows back to where we started would make no difference to the potential energy.
Associated with some potential energy is a “field”. By field here we mean a force that is in general also
a function of position, and we will discuss how to relate the two below. Classical “fields” with this
property that “the energy associated with position depends only on the position and not how we got
there” are called “conservative” or “irrotational”. Equivalently, the change in potential energy round
any closed path is zero.
2
Often, we loosely refer to the magnitude of a vector as its “length”, though that “length” of a vector in this loose
sense might not be a distance in ordinary space; the “length” of a velocity vector would be a quantity in units of
meters per second, not meters, for example.
3
Note that, despite the confusion it generates, it is common, especially in quantum mechanics, to use the letter V
to refer to potential energy, so a number of joules, even though we also commonly use V to refer to electrical
voltage, which is joules/coulomb.
2.2 Elementary classical mechanics 15
Fig. 2.1. Sketch of a potential surface. Larger potential energy corresponds to a larger value on
the vertical axis.
Not all fields are conservative; for example, going round a vortex typically either requires energy to
push you round it or you get energy out of it by being pushed round by it – in either case, you have a
different energy when you get back to where you started. Various phenomena associated with changing
magnetic fields are also non-conservative – you can change energy by going round a loop back to where
you started. Gravitational and static electric fields are conservative, however.
Note, incidentally, that the “zero” or “origin” we use for potential energy is always arbitrary. We can
choose it to be what we want as long as we are consistent. In both classical mechanics and quantum
mechanics, there is no absolute origin for potential energy and we only really work with differences in
potential energy between one position and another. We do, however, typically make some specific
choice of origin in practice, at least to make it easy to do the algebra in the problem of interest.
Fig. 2.2. A ball on an inclined plane, being pushed by a force of magnitude Fpush x through a
distance ∆x along the direction of the plane as shown.
In the limit as we take very small displacements, that is, as ∆x → dx and ∆V → dV , we can write
dV
Fpush x = (2.9)
dx
Now this is the “uphill” force we have to exert to push the ball up the “hill”. So the force Fx being
exerted on the ball by the potential can be viewed as being a downhill force. So the relation between
force and potential in this case is
dV
Fx = − (2.10)
dx
In general, the coordinate direction x will not necessarily line up like this along the direction of steepest
change of the potential. For completeness, we can give the general relation between force and potential,
with force again as a vector, and using the “gradient operator” ∇, which we can define as we use it
below,
∂V ∂V ∂V
F = −∇V ≡ − i + j+ k (2.11)
∂x ∂y ∂z
where i, j, and k are unit vectors in the x, y, and z directions respectively. For example, for a mass at
some position above the Earth’s surface, the gravitational force would be a vector pointing in the
direction towards the center of the Earth; that force would technically be the gradient of the gravitational
potential for that mass.
Problems
2.2.1 An electron with the charge –e ( e 1.602 × 10−19 C represents the elementary charge unit) is
being accelerated by the electric field that is the gradient of some electric potential U, which is
a function of position in the usual Cartesian coordinates x, y, and z. (Note: an electric potential
can be the same thing as a voltage, and can be written in units of joules/coulomb (i.e., J/C).)
a) Calculate the magnitude of the force driving an electron in the electric potential U = (1x +
3y – 1.5z) J/e, where x, y, z represent Cartesian coordinates.
[Notes: We are giving the electric potential energy in these slightly unusual units of
joules/electron to make the calculation easier for you. The corresponding potential in volts
would be (1x + 3y – 1.5z)/e V (or J/C). The potential energy in joules for an electron in
this electric potential is simply (1x + 3y – 1.5z) J. This question is an exercise in using the
gradient operator as in Eq. 2.11. The force on this electron is simply the gradient of this
potential expression (1x + 3y – 1.5z). That result is a vector, and the magnitude of that
vector is in newtons. Remember to use Pythagoras’s theorem to evaluate the magnitude of
a vector. So for some vector f = ai + bj + ck , the “length” or magnitude of the vector is
f= a 2 + b 2 + c 2 .]
2.2 Elementary classical mechanics 17
b) Which electric potential would impose a stronger magnitude of force on the electron, U1 =
2x or U2 = 1x + 1y?
2.2.2 The force of gravity on a mass m is proportional to the mass. Near the surface of the Earth, it is
a vertical force. Because of this proportionality, we can write the magnitude of this force as
F = mg where, near the surface of the Earth (the “ground”), the proportionality constant
g 9.8 ms −1 is typically referred to as the acceleration due to gravity. Here, we will consider the
upward vertical direction to be positive.
(a) Suppose we have a mass m = 2 kg at some small distance above the ground. What is the
force, in newtons, in the vertical direction on this mass due to gravity? (Be careful about
the sign of your answer, and explain briefly why you choose whatever sign you choose.)
(b) Suppose we choose the zero for potential energy to be at ground level, and that the mass m
is 2 m above the ground. (To be more precise, we could say that the center of gravity of
the mass is 2 m above the ground.) What is the potential energy of the mass m? (Justify
your answer briefly.)
Consider now the figure below of an inclined plane. The top of the plane is at a height h and the
bottom of the plane is at a zero height.
(c) Suppose a small mass M is sitting right at the top of the inclined plane (initially motionless).
Suppose also that we choose the zero for potential energy to be at ground level. Give an
expression for the potential energy of the mass M, explaining briefly how you get this
expression. [Hint: you can think of slowly lifting the mass up from the ground.]
(d) Suppose the mass now slides frictionlessly down the plane. Just before it gets to the very
bottom of the plane,
(i) what is its kinetic energy? (Justify your answer briefly.) [Hint: think about
conservation of energy.]
(ii) what is its potential energy? (Justify your answer briefly.)
(iii) what is its total energy (kinetic plus potential)?
(e) Suppose we now choose our zero of potential energy to be at the top of the inclined plane,
and we again let the mass M slide frictionlessly down the plane. Just before it gets to the
very bottom of the inclined plane
(i) what is its kinetic energy? (Justify your answer briefly.)
(ii) what is its potential energy? (Justify your answer briefly.)
(iii) what is its total energy (kinetic plus potential)?
(f) Continuing on from (e) (still choosing the top of the inclined plane as the zero of potential
energy), suppose now the mass M has reached the bottom of the inclined plane, and after
some time it comes to a stop (e.g., it might have slid along the ground somewhat, with
friction bringing it to a stop) at ground level.
(i) what is its kinetic energy? (Justify your answer briefly.)
18 Chapter 2 Oscillations and waves
2.3 Modes
Examples of modes
The reader might ask “What is a mode?” Despite that fact that modes are very common in physics and
engineering, it is difficult to find a broad definition of modes in text books. Most (and possibly all!)
scientists and engineers who would insist that they know “perfectly well” what a mode is would be hard
pressed to give a simple definition that was not rather mathematical. There is a precise mathematical
answer, but it gives little direct physical insight, so we postpone it. Here we will discuss a few examples
to get across the concepts of modes first.
We are quite used to the idea of modes in things that oscillate. In this case, the concept of a mode is
essentially the same as that of a “resonance”, a preference to oscillate at a specific “resonant” frequency.
Many such oscillators have only one way or mode in which they can oscillate (at least for small
amplitudes of oscillation), and that way of oscillating is associated with only one frequency. Examples
of this kind include a pendulum (at least if it is constrained to move in only one direction, as it usually
is in a clock), and a mass on a spring (again if it is constrained to move in only one direction). You may
have played a note on a bottle by blowing across the top; in this case, the oscillation is basically the air
in the neck of the bottle behaving like a mass that is bouncing up and down on spring that is given by
compressing the volume of air in the body of the wine bottle. This kind of acoustic resonator is known
as a Helmholtz resonator. The body of a guitar together with the hole in the front of it also forms a
Helmholtz resonator4, and the Helmholtz resonator concept is extensively used in loudspeaker design,
especially for boosting the bass response of loudspeakers. (It allows low frequency resonances even
with resonator volumes that are much smaller than the wavelength of sound at such frequencies.)
4
In the case of the guitar, and effective “plug” of air in and around the hole behaves are the mass that bounces in
and out against the “spring” that is the air inside the body of the guitar
2.4 Analysis of systems with modes 19
Some other oscillating systems have many different resonances or modes. A guitar string has a
fundamental mode, but also has harmonics that can also be excited, at twice the fundamental frequency,
three times the fundamental frequency, and so on. The same is true for wind instruments such as the
flute. Bodies that are essentially more two dimensional, such as a gong or a cymbal, have a much more
complicated spectrum of resonances (which are in general not in integer ratios of frequencies), and three
dimensional structures, such as a bell or a girder bridge, can have yet more complicated sets of possible
resonances or modes. There are also electromagnetic resonators, basically metal cavities, that have
various resonant modes, and there are optical resonant cavities of various kinds, formed with mirrors,
that also similarly have multiple resonant modes.
In all these cases of oscillating modes or resonances, there are at least two common features. One is that
each resonance or mode corresponds to quite a distinct way in which the object oscillates; the pattern of
oscillation of the object is quite unique to a given mode, being quite different from the pattern of any
other mode. As we will see later, the way in which the patterns of oscillation are different from one
another can be quite specific mathematically; in a broad range of situations, the different patterns of
oscillation are orthogonal, and we will be introducing that concept as we describe modes.
A second feature of oscillating modes is that, for each such resonance or mode, there is one well-defined
numerical quantity, usually the frequency, associated with it. A third important point about all such
modes is that, at least if we take a “loss-less” and “small amplitude” idealization of the mode, once
excited, the oscillation would stay of exactly the same form forever, and everything that was oscillating
would be oscillating at the same frequency in that mode.
Another kind of mode, called a “propagating mode”, arises in the propagation of waves, especially in
waveguides. One notion essentially is that the wave stays the same shape as it moves. Essentially then,
every point that propagates with the same “phase velocity”.
To proceed further in understanding modes, we need to analyze a few cases. That will begin to uncover
the important mathematical properties and lead to a tighter definition of what a mode is.
5
Though quantities like force, velocity, and acceleration are all generally vectors, because we are only dealing
with motion in one direction – here the y direction – we can treat them effectively like scalars, assuming the y
vector direction in all cases.
20 Chapter 2 Oscillations and waves
Fig. 2.3. A mass on a spring, without friction, oscillating up and down, here in the y direction.
The second equation that we need here comes from Newton’s Second Law, which gives us
F = Ma (2.13)
where a is the acceleration of the mass in the y direction. The velocity v in the y direction is, by definition
dy
v= (2.14)
dt
which is the rate of change of position, y, with respect to time, t. Acceleration is the rate of change of
velocity, so here
dv d dy d 2 y
a≡ ≡ ≡ (2.15)
dt dt dt dt 2
So, substituting in Eq. (2.13) gives
d2y
F Ma
= = M (2.16)
dt 2
Substituting using Eq. (2.12) for F gives
d2y
F = Ma = M = − Ky (2.17)
dt 2
Defining
ω2 = K / M (2.18)
or, equivalently,
ω= K/M (2.19)
where we take the positive square root, and rearranging Eq. (2.17) gives
d2y K
2
= −ω 2 y
− y= (2.20)
dt M
Mathematically, one solution to an equation like this is
y = sin ωt (2.21)
which is easy to check by substitution. Explicitly
d
sin ωt = ω cos ωt (2.22)
dt
and
2.4 Analysis of systems with modes 21
d2 d d
2
sin ω=
t ω cos ω= t ω [ −ω sin ωt=] −ω 2 sin ωt
t ω cos ω= (2.23)
dt dt dt
This sine solution shows that this equation describes oscillations. The sine function is periodic in its
argument, repeating every 2π. So for some amount of time ∆t such that
ω ∆ t =2π (2.24)
the sine function will be back to where it started. Hence ∆t is what we can call the “period” of the
oscillation. Quite generally, the frequency of an oscillation, in cycles per second or hertz (Hz), is the
inverse of the period; that is, this oscillation has a frequency
1
f = (2.25)
∆t
so the relation between the frequency f and ω is
ω = 2π f (2.26)
In physics, it is very common to use this quantity ω, which is called the “angular frequency” rather than
the (conventional) frequency f, mostly just because it saves writing down “2π ” so often. We can also
think of angular frequency as the number of radians per second. There are 2π radians in one “circle” or
cycle here, so the angular frequency is 2π times larger than the (conventional) frequency.
For Eq. (2.20), we can easily check that y = A sin (ωt ) would be a solution, where A is any constant,
which just says that, at least in our idealized mathematical model here, any amplitude (here A) of
oscillation is possible. Also y = B cos ωt would be a solution for any constant B. We could write the
most general solution of Eq. (2.20) as
=y A sin ωt + B cos ωt (2.27)
As is typical for such “second order” equations (that is, in which the highest order of derivative involved
is the second derivative), we have two arbitrary constants, here A and B, in this general solution. Just
what the values of those are in any particular situation depends on the starting conditions – basically,
when we started the oscillation and how big the amplitude of it was.
The only real differences between the solution in Eq. (2.21) and the general solution Eq. (2.27) are that
general solution of Eq. (2.27) allows for any amplitude and any “phase” of oscillation. (The sine and
cosine functions are 90 degrees out of phase with one another, so combinations of them can represent
an oscillation of any phase.) No matter which solution we choose here, the mass oscillates up and down
at angular frequency ω.
Other than having different possible amplitudes and phases of oscillation, there is only one way in which
this simple harmonic oscillator can oscillate. It has only one mode of oscillation.
The concept of modes is most useful for linear systems, that is, essentially, ones whose behavior is the
same regardless of the amplitude of the oscillation. If some function y (t ) is a solution of Eq. (2.21), so
also is Cy (t ) where C is any constant. This is obvious from the general solution as in Eq. (2.27) –
multiplying such a solution just corresponds to replacing A and B with new constant numbers – and it is
easily verified directly by substituting in Eq. (2.21).
Eigen equations
Note that the equation Eq. (2.20) of our linear oscillator is of the form
Operator operating on the function y(t) = a constant times the function y(t) (2.28)
22 Chapter 2 Oscillations and waves
where A represents the operator, y represents the function, and b represents a constant.
The form of equation given by relation (2.28) (or (2.30)) is called an eigen equation7. Any constant b
for which such an equation holds is called an eigenvalue, and a function that is a solution for that
eigenvalue is called an eigenfunction associated with that eigenvalue8. The particular eigen equation Eq.
(2.20) for our physical problem of a mass on a spring, for a given K and M, physically is only allowed
to have one eigenvalue ( − K / M ) and one eigenfunction (at least if we allow different amplitudes and
phases). As we stated above, this particular oscillator is an example of an oscillator that physically only
has one mode9.
Linear operators
An important point about this mathematics we are discussing here is that the operator d 2 / dt 2 is what is
called a linear operator. The definition of a linear operator is that it should obey the relations
(i) Operator operating on (b times y) = b times (Operator operating on y) (2.31)
for an arbitrary constant b and a function y, and
(ii) Operator operating on (y1 + y2) =
(Operator operating on y1) + (Operator operating on y2) (2.32)
for any two functions y1 and y2.
By this definition, it is straightforward to see that d 2 / dt 2 is linear.
In our more abstract operator algebra, incidentally, we could write Eq. (2.31) as
Ab x = b A x (2.33)
and Eq. (2.32) as
A ( x1 + x 2 ) = Ax1 + Ax 2 (2.34)
6
An “operator” is something that turns one function into another function.
7
“eigen” is a German word with several possible translations into English depending in the context; one translation
is “own” as in the sense of “one’s own”.
8
It is possible in general to have more than one eigenfunction associated with a given eigenvalue, in which case
the eigenvalue is said to be degenerate. Incidentally, for the equations associated with linear physical problems,
there can only be a finite number of such eigenfunctions associated with a given eigenvalue; that number is called
the degeneracy of the eigenvalue.
9
One subtlety here is whether we regard the sine and cosine version of this oscillation as being different modes.
It is possible to take that approach of considering these as different modes, but it is more common to say that there
is one mode with two attributes associated with it, one being amplitude and the other being phase.
2.4 Analysis of systems with modes 23
If the notation of Eqs. (2.29), (2.30), (2.33) and (2.34) seems somewhat abstract, note that for such
linear operators it is exactly the same as matrix and vector notation. We can think of A as a matrix10,
and each of the x, y, and z objects as being column vectors11.
By now the reader can see that we are moving towards defining modes as the eigenfunctions of an
operator, especially a linear operator, and that is indeed the case.
Problems
2.4.1 For a harmonic oscillator consisting of a mass M on a spring of spring constant K, so that the
angular oscillation frequency is ω = K / M , what are the “dimensions” of the spring constant
K?
By “dimensions” we mean what unit or combination of units from the SI system of units
describes the units in which we write K. SI base units include kg (kilograms), m (meters), s
(seconds), and A (amperes) as well as other derived units. Express your answer in two forms:
(i) using only “base” units (i.e., kg, m, s, A)
(ii) using N (newtons) (the “derived unit” of force) as one of the units in your answer.
2.4.2 If we have a simple harmonic oscillator with a mass of 1 g (i.e., one gram) that will oscillate at
a frequency of 1 kHz (note – this is the conventional frequency, not the angular frequency), what
is the value of the spring constant K expressed in the appropriate SI units?
2.4.3 For the simple harmonic oscillator of question 2.4.2, if we start the oscillator out by pulling the
mass by a distance of 1 mm along the oscillating direction (e.g., the y direction), thereby
stretching the spring by 1 mm, how much potential energy do we initially put into the system?
(Hint: remember that we can deduce the change in potential energy by integrating the product of
force times distance. The distance starts out a y = 0 (which is the “equilibrium” position the mass
will sit at if it is not oscillating) and ends up at y = 1 mm, and the force is proportional to the
amount by which the spring is stretched relative to zero.)
2.4.4 For this same simple harmonic oscillator as in questions 2.4.2 and 2.4.3, once we have released
it from its starting position as in question 2.4.3 above, how much kinetic energy does the mass
have when it reaches y = 0? (Hint: if you think about it, you don’t have to calculate anything
here!)
2.4.5 A box of 2 kg is attached to a spring with constant K = 10 N/m and moves in the y direction,
stretching or compressing the spring as it does so. If the box is displaced to a point y = 2 m
relative to its equilibrium point and then released, how long does it take to travel back to y = 1
m? What then is the speed of the mass at this point y = 1 m?
2.4.6 Suppose we consider a simple harmonic oscillator whose oscillation we have written down as
y ( t ) D cos (ωt + φ )
=
where D, ω and φ are constants. (This is often quite a convenient form. We could then call D
the amplitude of the oscillation, and φ the phase.) Show that we can also write this in the form
10
Quite generally, linear operators can be represented by matrices, though we will postpone any general discussion
of that equivalence.
11
Here we mean vectors in the general matrix-vector sense of vectors as columns of numbers. These are not
necessarily geometrical vectors, though they could be if we were considering column vectors with three elements.
Here x, y, and z just arbitrary mathematical vectors; we are not presuming they have any specific relation to x, y,
and z directions in Cartesian geometrical space, nor even that they are vectors in three or any other specific number
of dimensions.
24 Chapter 2 Oscillations and waves
y ( t ) A sin ωt + B cos ωt
=
2.4.9 An operator A acting on a variable y, which we will write here using the notation Ay , is linear
if and only if it meets the two criteria for linear operators (Eqs. (2.31) and (2.32)) for all possible
values of y (and we take y to be any real number). For the operators listed below, identify which
are linear operators, explaining your answers in terms of these two criteria. (You may think of
variables x and y as being different position variables corresponding to different coordinate axes,
and variable t as corresponding to time, though these correspondences are not actually necessary
to address this question.)
(a) Ay ≡ y 2
dy
(b) Ay ≡ − ex y
dx
2.5 The classical wave equation 25
dy
(c) Ay ≡ + sin( y )
dt
d2y
(d) Ay ≡ − y−2
dt 2
12
We have to draw them with some finite length, of course, in the figure.
26 Chapter 2 Oscillations and waves
The vertical displacements and angles are exaggerated in Fig. 2.4, and we presume that all the angles
are actually very small. As a result, the sine can be approximated by the tangent, so
y j +1 − y j y − y j −1
sin θ j and sin θ j −1 j (2.36)
∆z ∆z
So Eq. (2.35) becomes, approximately
y j +1 − y j y j − y j −1 y j +1 − 2 y j + y j −1
Fj T − =T (2.37)
∆z ∆z ∆z
We can multiply the top and bottom lines in Eq. (2.37) by ∆z to obtain
y j +1 − 2 y j + y j −1
Fj = T ∆z (2.38)
( ∆z )
2
But the expression in square brackets, in the limit of small ∆z, is just the second derivative of the
coordinate y with respect to the horizontal coordinate z. So we can write, in the limit of small ∆z, that
the force on the mass j is
∂2 y
F = T ∆z (2.39)
∂z 2
(We are writing the second partial derivative with respect to position z here because we are only
interested in the result at a specific time, so technically this is a second derivative in space at a particular
time. We can also regard this partial derivative as the second derivative in space of a “snapshot” of the
string at some specific time.)
Note that, with Eq. (2.39), we are saying that the vertical force F on the mass at a given point is
proportional to the curvature of the “string” of masses. There is no net vertical force if the masses are in
a straight line, even if that line is “tilted” at an angle. We have to have a difference in the slopes or
“tilting” of the pieces of string on either side of a mass before there is any net force on the mass.
Now we can think of the masses in Fig. 2.4 as the amount of mass per unit length in the z direction, that
is, the linear mass density ρ (that is, kilograms per meter), times ∆z, which we can write as a mass
m= ρ∆z . We can also write from Newton’s Second Law for that element of mass that
∂2 y ∂2 y
F= m 2
= ρ∆z 2 (2.40)
∂t ∂t
(Now we are taking the second partial derivative with respect to time here because we are interested in
the behavior in time of an element of mass at a particular point in space.) Putting Eq. (2.39) and Eq.
(2.40) together and cancelling the ∆z on both sides gives
∂2 y ρ ∂2 y
= (2.41)
∂z 2 T ∂t 2
Rewriting Eq. (2.41) with
v2 = T / ρ (2.42)
gives
∂2 y 1 ∂2 y
− 0
= (2.43)
∂z 2 v 2 ∂t 2
Though we have not justified the meaning of v yet, it will correspond to wave propagation velocity,
and Eq. (2.43) is a wave equation for a wave with velocity v. We can write
2.5 The classical wave equation 27
v= T /ρ (2.44)
where we take the positive square root because we like to talk conventionally about a positive wave
velocity13. Hence we have constructed a wave equation, Eq. (2.43) for waves on a string.
We can check by direct substitution into the wave equation Eq. (2.43) that any function of the form
f ( z − vt ) is a possible solution such an equation. This corresponds to a “pulse” propagating to the
“right”. Similarly, any function of the form g ( z + vt ) is also a possible solution, corresponding to a
“pulse” propagating to the “left”. Furthermore, any sum of such solutions is also a solution.
Problems
2.5.1 Is the following statement true or false? (Justify your answer briefly.)
∂2 y 1 ∂2 y
“In the classical wave equation − =0 for a nominally horizontal string stretched in
∂z 2 v 2 ∂t 2
the z direction, where y is the displacement of the string in the vertical direction, the acceleration
of the string in the vertical direction at a given point is proportional to the curvature of the string
at that point (as given by the second derivative of the string displacement) (with a positive
proportionality constant).”
2.5.2 Consider an infinitely long string for which the wave propagation velocity is v. (The infinite
length means that there are no reflections off any “ends” of the string, and any wave that satisfies
the general solution of the wave equation for the string is a valid solution). For each of the
following proposed functions, which represent the y displacement of the string, state whether it
can be a solution of the wave equation for this string. (The string is presumed to be along the
“horizontal” z direction, and y corresponds to a “vertical” displacement of the string.) In the
following functions we presume A and B are constants, and, where both k and ω are present in
the expression, we presume that k = ω / v . In all cases, k is a constant. [Note: In most cases, it
should be possible to deduce the answers here without substituting directly into the wave
equation to check if these are solutions, though you can always check these that way also.]
(i) A sin ( kz − ωt )
(ii) A sin ( kz − ωt ) + B cos ( kz + ωt )
(iii) A exp ( − k 2 z 2 )
A exp − ( kz − ωt )
2
(vi)
(vii) A sin ( kz − ωt ) + B sin ( 2kz − 2ωt )
13
In this equation we could also take the negative square root, and in fact both the positive and negative results
have meaning – they correspond to waves propagating in positive (e.g., to the “right” and negative (e.g., to the
“left”) directions. Conventionally, though, we use the positive value for v, and use positive or negative signs in
front of it to describe waves in the two directions. Of course, velocity is more correctly a vector, but typically
nonetheless we refer to this positive magnitude of the velocity as just the wave propagation velocity, with
“magnitude” often just being understood by context.
28 Chapter 2 Oscillations and waves
14
“Monochromatic” literally means “one color”.
15
Again, not surprisingly given the name “wavevector”, this “wavevector” is really more correctly a vector, not a
scalar. However, there is no other good word for this k, and hence it is typically just called the wavevector (or
wavevector magnitude), with the understanding that we will write it more correctly as a vector when it matters.
For waves in one direction, we just write it and name it in this scalar fashion.
2.6 The Helmholtz equation 29
just discuss waves in one direction for simplicity, it may seem odd to refer to this quantity k as the
magnitude of some vector. More generally, and very usefully when we are working with waves in three
dimensions, we can write the vector quantity k, which has the magnitude k and the vector direction
corresponding to the direction in which the wave is propagating.
A wave equation in the form of Eq. (2.46) is called a Helmholtz wave equation, and it is essentially the
simplest wave equation we can write down that describes the behavior in space of a monochromatic
wave. Mathematically, we recognize that the Helmholtz equation Eq. (2.47) is also in the form of an
eigen equation, just like Eq. (2.20) for the harmonic oscillator. The eigenvalue now is –k2, though the
Helmholtz equation is a differential equation in z, not t. Just as before for the simple harmonic oscillator,
this equation can be written in an abstract mathematical form as
Ay = −k 2 y (2.50)
where now y represents the entire function y(z) – for our string example, the “snapshot” of the string at
some time when there is a wave on it oscillating at an angular frequency ω – and A corresponds now to
the linear operator d 2/dz2.
When we were considering the simple harmonic oscillator, we had an equation in the same overall
mathematical form, that is, Eq. (2.20), but the eigenvalue was fixed for us by the choice of the spring
constant and the mass. In the case of the Helmholtz equation for a string, for example, we may have
chosen a definite density ρ and a definite tension T, but those only set the magnitude v of the wave
velocity. On its own, as we see from Eqs. (2.42) and (2.48), this wave velocity v only determines the
ratio of ω 2 and k 2, that is,
2 T ω2 (2.51)
v= =
ρ k2
For this eigen equation Eq. (2.50), unless we impose some other constraints on the problem, any
eigenvalue − k 2 is possible, which in turn means any frequency is possible for our monochromatic wave.
This should not surprise us physically; we can imagine a wave of any particular frequency moving down
a string.
Finally here, we note that we can generalize the Helmholtz equation from the “one-dimensional” form
of Eq. (2.46) to a “three-dimensional form”, which is
∇ 2ψ ( r ) + k 2ψ ( r ) =
0 (2.52)
where
∂2 ∂2 ∂2
∇2 ≡ + + (2.53)
∂x 2 ∂y 2 ∂z 2
Here we are using the Greek letter ψ to represent the wave amplitude to avoid any confusion between
the coordinate directions x, y, and z. We have written the wave amplitude as a function of the vector
position r = xi + yj + zk where i, j, and k are unit vectors in the x, y, and z directions16.
In Eq. (2.52), we need to be using partial derivatives because we have three different spatial directions.
So ∂ 2 / ∂x 2 refers to the second derivative with respect to x at specific values of y and z coordinates. Of
course, we are also still imagining that this whole Helmholtz equation refers to a “snapshot” in time of
a wave that is oscillating at (angular) frequency ω.
16
The different uses of the letter “k”, as a wavevector magnitude and, in bold form “k” as both the unit vector in
the z direction and as a wavevector, is so common in the physics of waves that we will also use “k” in these
different forms here. The reader will unfortunately just have to get used to this confusing notation both here and
in the rest of the physics literature, though we will try to be clear here whenever there is actual possibility of
confusion.
30 Chapter 2 Oscillations and waves
The three-dimensional Helmholtz equation Eq. (2.52) is useful, for example, for modeling acoustic
waves in air. We will also be using it later as we start quantum mechanics, though we will mostly be
using the simpler one-dimensional version, Eq. (2.46).
Problems
2.6.1 Is the following statement true or false? “The Helmholtz equation for a wave on a string tells us
that, for a monochromatic wave, the curvature of the string at a given point (as given by the
second derivative of the string displacement) is always proportional to the displacement of the
string (with a constant but negative proportionality constant) at that point.”
2.6.2 For the functions listed below, identify which are solutions to the Helmholtz wave equation for
some specific magnitude of the wave velocity in each case. (We are presuming a “string” with
no particular boundary conditions at the end, which we could view as an infinitely long string,
i.e., there are no boundary conditions to constrain possible solutions here, though we are
presuming some specific tension and linear mass density for the string in each case.)
z − 3t z + 3t
(a)=
y ( z , t ) sin + sin
2 4
z z
(b) y ( z , t ) sin sin(2t ) + sin sin(2t )
=
2 4
z z
(c) y ( z , t ) sin sin(2t ) + cos sin(4t )
=
2
2
Fig. 2.5. Standing waves on a string between two walls a distance L apart.
2.7 Standing waves 31
nπ z
yn ( z ) = A sin (2.58)
L
The case of n = 0 is a trivial one, because the wave would then be zero everywhere on the line. Also,
solutions for negative n are really the same solutions as for positive n. (Since sin θ =
− sin(−θ ) , changing
from n to − n just corresponds to replacing A with − A ; since A is an arbitrary number anyway, there
is no real difference between the solutions for positive and negative values of n.) Hence, we only need
to consider n = 1, 2, 3, … .
These solutions, Eq. (2.58), are simple standing waves on the string. These different standing waves are
the (eigen) modes of this problem. The first three of these are illustrated in Fig. 2.5.
For a given string with a given density and tension, we know there is a specific relation, Eq. (2.48),
between the wavevector magnitude k and the (angular) frequency ω. Since we can choose both of those
to be positive numbers, we can rewrite Eq. (2.48) as
ω = vk (2.59)
So, because we have concluded there are only specific values of k allowed given our boundary
conditions here – explicitly those given by Eq.(2.57) – then we similarly only have a corresponding set
of allowed values of the (angular) frequency
nπ v
ω
= n vk
= n
(2.60)
L
So, with a given string stretched with a given tension (which together lead to a specific wave velocity
magnitude v), and anchored at two posts or walls, there are only specific frequencies of oscillation
possible. In other words, insisting that we are looking only for solutions that correspond to
monochromatic waves, in such a situation with the string tied between two rigid posts or walls, such
solutions or “oscillating modes” only exist for specific (eigen) frequencies.
For this case of a string stretched between two posts or walls, this set of frequencies forms a harmonic
series with integer ratios between these various allowed frequencies. If we were to extend this kind of
analysis to vibrating objects in two dimensions, such as cymbals or drum heads, or three dimensions,
such as bells or bars that support twisting and bending modes, the resulting eigen frequencies are not
necessarily in integer ratios, though such systems will generally still have eigen frequencies and
corresponding eigenmodes.
Another way of looking at standing waves on a string is to view them as sums of forward and backward
propagating waves. For example, the wave A sin(kz − ωt ) is a solution of the full wave equation Eq.
17
The actual eigenvalues of Eq. (2.47) or Eq. (2.50) are technically the numbers −kn2 , which obviously we can
deduce from these values kn .
32 Chapter 2 Oscillations and waves
(2.43) (with k and ω related through Eq. (2.59)); this corresponds to a wave propagating to the “right”.
The wave A sin(kz + ωt ) is also a solution, corresponding to a wave propagating to the “left”. Because
this wave equation is linear, the sum or “superposition” of these two waves is also a solution. From
elementary trigonometric manipulations, we can write that total wave as
Ψ ( z=
, t ) A sin ( kz − ωt ) + A sin ( kz + ωt )
(2.61)
= 2 A cos ωt sin ( kz )
So our standing wave of the form sin kz that is oscillating at frequency ω can be viewed as the sum of
“right” propagating and “left” propagating waves. We can also think of these waves in terms of
reflections; the right-propagating wave reflects off the right barrier to give the left-propagating wave,
and the left-propagating wave reflects off the left barrier to give the right-propagating wave. Note that,
in this “reflecting wave” picture, when a wave reflects off a “hard wall” barrier, the reflected wave
amplitude is minus the incident wave amplitude so that the sum of the two waves always equals zero at
the position of the barrier, as required for the net wave to be zero at the barrier. Equivalently, we can
say that on reflection off a “hard wall”, a wave has its phase changed by 180°.18
Problems
2.7.1 Suppose we have a guitar string 0.5 mm in diameter, and made from steel with a density of 7.86
g/cm3.
(i) What is the linear density of this guitar string (in kg/m)?
Now, we tension the string with a tension of 50 N.
(ii) What is the wave velocity on the string (in m/s)?
Now we choose the string to be 50 cm long attach it at both end to rigid supports (such as the
neck and the bridge of a guitar) with the tension in the string still being 50 N.
(iii) What is the frequency (in Hz) of the lowest-frequency standing-wave resonance on this
string?
2.7.2 A guitar string is 60 cm long and has a mass per unit length of kg/m. It is held between
the two ends (i.e., at the “bridge” and the “neck”) with a tension of 80 N.
(a) What are the frequencies of the fundamental (i.e., lowest frequency) mode and the second
harmonic (i.e., the second-lowest frequency) mode?
(b) What happen to the pitch (i.e., frequency) of a given mode if I loosen the string (i.e., reduce
the tension) by adjusting the tuning peg?
2.7.3 Consider a string of length L stretched between two rigid posts. The wave velocity on the string
is v. (The string is presumed to be along the “horizontal” z direction, and y corresponds to a
“vertical” displacement of the string.) For each of the following functions, which represent the y
displacement of the string, state whether or not the function is a possible solution of the
Helmholtz equation for this string, and if so, give an expression for the angular frequency of the
oscillation in time that is associated with this solution. Presume A and B are constants.
(i) πz
A sin
L
(ii) 2.5π z
A sin
L
18
If we were to reflect a wave off the open end of a string, incidentally, the phase change on reflection would be
zero degrees – that is, the amplitude of the reflected wave would equal that of the incident wave. Then, instead of
having a zero at the end of the string, we would have a maximum.
2.8 A coupled oscillator 33
(iii) 9π z
A sin
L
(iv) 3π z 5π z
A sin + B sin
L L
2.7.4 Consider a uniform string of length L stretched with some specific tension in the z direction
between two rigid posts. Write down the spatial forms (i.e., the functions y ( z ) ) for each of the
three standing wave eigenfunctions with the lowest frequencies. [In other words, find the
solutions of the Helmholtz equation corresponding to the lowest three eigenvalues of this
problem. Note that a function that is zero everywhere will not be counted as an eigensolution.]
(Your answer can be arbitrary within an overall multiplying constant; in such a problem, if some
function f ( z ) is an eigenfunction corresponding to a given eigenvalue, then so also is Bf ( z )
where B is any constant.)
19
Actually, it makes no real difference to this problem if the springs are already compressed or stretched at their
equilibrium positions here, but the algebra is slightly simpler to write down initially if we presume no stretching
or compression.
20
Of course, this force might have a negative value, which is the same as positive force acting to move the mass
to the left.
34 Chapter 2 Oscillations and waves
Fig. 2.6. Coupled oscillator with two identical masses and three identical springs.
We now can apply Newton’s second law to the mass on the left, giving
d 2 x1
m − Kx1 + K ( x2 − x1 ) =
= −2 Kx1 + Kx2 (2.62)
dt 2
We can similarly analyze the force in the positive x direction on the mass on the right. Putting that into
Newton’s second law for the mass on the right gives
d 2 x2
m − K ( x2 − x1 ) − Kx2 =
= −2 Kx2 + Kx1 (2.63)
dt 2
Now we see that we have two simultaneous differential equations, Eqs. (2.62) and (2.63). We want to
solve these equations to show us the possible oscillating modes of this system.
ω 2m
λ= (2.69)
K
Using this, we can rewrite Eq. (2.68) as
2 −1 x1 x1
−1 2 x = λ x (2.70)
2 2
Then, we can rewrite Eq. (2.70) in the abstract mathematical notation introduced earlier as in, for
example, Eq. (2.30), as
Ax = λ x (2.71)
with
2 −1
A= (2.72)
−1 2
and
x
x = 1 (2.73)
x2
We see that we have reduced this problem to an eigen problem, with eigenvalue λ. When written in the
form of Eq. (2.71), it is in exactly the same general form as we encountered in dealing with the simple,
one-mass, oscillator (Eq.(2.30)). Here, as before, the operator A is linear (in fact, matrices with constant
elements only ever represent linear operators). Now x is a vector, with its two elements representing the
positions of the two masses.
To solve the problem, we can now take the standard approach for finding the eigenvalues and
eigenvectors of a matrix. Rewriting Eq.(2.70), we have
2 − λ −1 x1
=0 (2.74)
−1
2 − λ x2
For Eq. (2.74) to hold, a standard result of linear algebra is that the determinant of the matrix must be
zero, that is,
( 2 − λ )( 2 − λ ) − 1 =0 (2.75)
So
λ 2 − 4λ + 3 =0 (2.76)
Hence
1
λ 2 1 ±
= (2.77)
2
or, equivalently
2K 1
ω2
= 1 ± (2.78)
m 2
From Eq.(2.74), substituting the allowed values of λ, one by one, we can deduce the form of the
associated eigenvectors, which are, respectively,
36 Chapter 2 Oscillations and waves
1
λ = 1 : x1 = x2 , i.e., x ∝ (2.79)
1
which corresponds to the two masses oscillating in the same direction, with equal amplitude, with
frequency ω = K / m , and
1
λ = 3 : x1 = − x2 , i.e., x ∝ (2.80)
−1
which corresponds to the two masses oscillating in opposite directions, with equal (but opposite)
amplitudes, with frequency ω = 3K / m . Perhaps not surprisingly, then, now that we have two masses,
we have discovered two different modes, a symmetric, lower frequency one, with both masses going in
the same direction, and an antisymmetric, higher frequency one with them going in opposite directions.
Problems
2.8.1 Consider a coupled oscillator similar to the one shown in as shown in Fig. 2.6. The two masses
are identical, each with mass m, and the slide horizontally without friction. The left and right
springs have the same spring constant k1, but the middle spring now has a possibly different
spring constant k2. We can take the three springs all to have no net extension or compression at
the equilibrium position. The displacements of the masses in the x direction relative to their
equilibrium positions are x1 and x2, respectively, as shown in Fig. 2.6
Derive an expression or expressions for the possible resonant (angular) frequencies (i.e.,
(angular) eigen frequencies) of this coupled oscillator system.
2.8.2 For the coupled oscillator in the preceding problem, calculate the eigenvectors of the matrix
corresponding to this system, and describe or draw how the masses are moving in time relative
to one another in each corresponding mode of oscillation.
We also remember that it is possible to multiply two vectors written out in matrix-vector notation
provided we take some care with matching the “dimensions” of the vectors being multiplied.
Specifically, we can multiply a column vector on the “right” and a row vector on the “left” if the column
and the row are the same “length” – that is, if they have the same number of elements. We could choose
to write out the geometrical vector dot product in this format, making a a row vector and b a column
vector, so we would have
2.9 Mathematical properties of modes 37
−5
a ⋅ b = [ 2 3] = 2 × ( −5 ) + 3 × 4 = 2 (2.82)
4
If we choose two geometrical vectors that are at right angles to one another, such as c= 2i + 3 j and
d =−3i + 2 j , then their dot product is zero, as we can see.
c ⋅ d = 2 × ( −3) + 3 × 2 = 0 (2.83)
Of course, these ideas of products of vectors are valid even when the vectors are not ones that are also
vectors in ordinary geometrical space. In that case, we can call products of row and column vectors as
in Eq. (2.82) “inner products” as a generalization of the idea of the geometrical vector dot product. If
the inner product of two non-zero vectors is zero, we say the vectors are “orthogonal” as a generalization
of the idea of the vectors being at right angles.
Now, if we look at the two eigenvectors we found from analyzing the coupled oscillator problem above
– that is, the eigenvectors in Eqs. (2.79) and (2.80) – and we take the inner product between them, we
find that inner product is zero; they are mathematically orthogonal. Explicitly,
1
[1 1] −1 = 0 (2.84)
0 or “x2”
1 direction
1
1 direction
1 or “x1”
0 direction
1
−1 direction
Fig. 2.7. Plot of the eigenvectors of the coupled oscillator.
If we had drawn these two vectors on a piece of paper as geometrical vectors, as in Fig. 2.7, we would
find they are at right angles to one another.
This orthogonality property of these modes is no accident; in fact, for a very broad class of the linear
physical problems of interest to us, the eigenfunctions or (eigen) modes are orthogonal in this sense. In
particular, this is true in a large and important range of cases in the physical descriptions of quantum
mechanics, and we will discuss this below.
Functions as vectors
Note now that we are starting to think of a function as a vector in some kind of mathematical space.
Indeed, we could represent any particular pair of positions of the two masses in the coupled oscillator
as a vector on the diagram in Fig. 2.7. The function here is a very simple one, and can be written simply
as a list of two numbers, corresponding to the position of mass 1 and the position of mass 2.
38 Chapter 2 Oscillations and waves
Any list of numbers can be written as a vector. We can represent any function as simply a list of numbers,
representing the results of the function when mapping from a known list of arguments21. For functions
of continuous variables that list would be infinitely long, but the concept remains usable with some care
even in that case. In general, though, we can if we want represent any function as a vector. We will give
more examples of this idea below.
Completeness and basis sets
Another very important point about these eigenfunctions or eigenmodes is that they can be used to
describe any particular position of the two masses. This rather important property that sets of modes can
have is called “completeness”. A particular pair of positions of the two masses, namely x1 = f and x2 = g
can rather obviously be represented as
f 1 0
g f 0 + g 1
= (2.85)
with f and g just being the “components” along the “x1” and “x2”directions respectively in Fig. 2.7. We
could, however, equally well write this same pair of positions as the components along the directions
corresponding to the two eigenvectors; that is, we could write
=
f (f + g ) 1 ( f − g ) 1
+ (2.86)
g 2 1 2 −1
1 1
So, we could say that the set of eigenvectors and is complete for representing any possible
1 −1
pair of positions of the two masses.
If we have such a set of vectors that can be used to make up any vector in the “space” of interest (here
a 2-dimensional “plane”), we can call that set a “basis set” of vectors. If the vectors in the basis set are
all orthogonal to one another, then we can call the set an “orthogonal basis set” (or often just an
“orthogonal basis”).
It is obvious if we look at Fig. 2.7 that we could equally well represent any point in the plane in terms
1 0 1
of its coordinates along the and directions, or in terms of its coordinates along the and
0 1 1
1
−1 directions. Either one of these “orthogonal basis sets” is equally usable. Another way of stating
this same property is that the modes form a complete set for “spanning” the space22 that describes the
positions of the masses.
21
The list of values of the argument of the function is usually implicit, though ideally we should be more explicit
about it. In the present case, we could say we are mapping from an “index” that has two values – “1” for the mass
on the left and “2” for the mass on the right – to the corresponding positions of the masses, x1 being the position
of mass “1” and being the first element of the vector, and x2 being the position of mass “2” and being the second
element in the vector. It is a weakness of the vector notation for specifying functions that the corresponding values
of the argument are not written out explicitly, and we simply have to deal with that by some other notational means
whenever there is any confusion.
22
Incidentally, the “space” we are speaking about here is not usually an actual geometrical space, but rather a
function space, i.e., one in which we can represent entire functions. (Here our example functions are rather simple,
since they can be described entirely by specifying only two numbers, and so we can represent them in a two-
dimensional space.) Spaces of this class that are of most interest to us are known as Hilbert spaces. They are
mathematically like ordinary geometrical spaces in many ways, but are different, first, in that they may have any
number of dimensions, including an infinite number, because that many values may be required to represent a
function, and, second, that the coordinates in such a space may be complex numbers rather than only real numbers.
2.9 Mathematical properties of modes 39
Hermitian adjoints
If we look at the operator A in this problem, as given by Eq. (2.72), we can see that the matrix is
symmetric if we reflect it along its leading diagonal (the diagonal with the elements of value 2). In the
general mathematics of modes, it is usual to allow for the possibility of complex quantities, and then, in
an extension of the idea of a symmetric matrix, we can define a Hermitian matrix.
First, we need formally to define the idea of a Hermitian adjoint of a matrix. The operation of taking a
Hermitian adjoint involves reflecting a matrix along its leading diagonal (i.e., taking the transpose of
the matrix), and taking the complex conjugate of the elements when we do so. The Hermitian adjoint is
often denoted with the “dagger” symbol “ † ”. For example, for a 2 × 2 matrix, we have
†
a b a∗ c∗
c d ≡ ∗ ∗
(2.87)
b d
We can also have Hermitian adjoints of matrices that are not square, with the most common example
being a vector. In the case of a 2-element vector, for example, we have
†
a
b ≡ a b
∗ ∗
(2.88)
Again, we can think of this operation as reflecting about a “45°” diagonal line (from top left to bottom
right) and taking the complex conjugate of the elements.
We can think of the Hermitian adjoint as being like the idea of a complex conjugate but now generalized
to matrices or operators. A simple scalar number can be thought of as a 1 × 1 matrix, and the Hermitian
adjoint of that “matrix” is simply the complex conjugate of the number.
Given that we now have the possibility of complex numbers in our mathematics, we need to make one
other minor change. We need to extend the “inner product” to being one in which we use the Hermitian
a c
adjoint vector as the one on the left23, i.e., the inner product of and is given by
b d
c
a∗ b∗ = ∗ ∗
a c+b d (2.89)
d
23
The reader might be asking why do we need to use this Hermitian adjoint, with its complex conjugates, when
taking this inner product? The reason is that we need to have a simple measure of “length” (for a vector) that gives
a real number, even if the elements of the vector are not real. In the language of the mathematics of these Hilbert
spaces, we need a “norm” that is real. It is very useful (and in fact a requirement for a Hilbert space) if that norm
is constructed from the inner product of the vector with its own adjoint (in fact, the norm is defined as the square
root of that inner product, just as it would be for vectors in an ordinary geometrical space). Taking the complex
conjugate in constructing the adjoint guarantees the result for the “norm” or “length” is a real number, just as
taking the product of a complex number with its complex conjugate always results in a real number.
24
Paul Dirac (1902 – 1984) was a British theoretical physicist and one of the most outstanding theoretical
physicists of the 20th century. He made several major contributions to quantum mechanics in particular, including
a relativistic version of quantum mechanics that effectively explained spin and introduced antiparticles, and the
40 Chapter 2 Oscillations and waves
our purposes, a column vector of numbers which we can write as x in one of our abstract notations is
written x in Dirac notation and is then called a “ket” vector. This is the notation on the far right here
x1
x ≡ x2 ≡ x (2.90)
The Hermitian adjoint of the vector x, which could be written as x† in one of our notations, is, in the
Dirac notation, the “bra” vector x , which is the notation on the far right here
and
† †
x = x and x = x (2.92)
which is obvious if we think of these as column or row vectors as appropriate. The real reason for having
and using the Dirac notation is that it gives a clear distinction between what are essentially column
vectors and what are essentially row vectors. Writing a vector just as x does not make this explicitly
clear.
With the Dirac notation, we now have a definite and clear way of writing the inner product between two
(possibly complex) vectors x and y , which we can write as
y1
x y ≡ x ∗
1 x∗
2 y2 (2.93)
Note that, though we could write this inner product as x y , which would be correct and would cause
no problems, conventionally we drop the second vertical line and write x y . Incidentally, we can note
that, quite generally for such inner products
∗
xy = yx (2.94)
which is easy to check based on viewing the vectors as column and row vectors as in Eq. (2.93). Note
that the result of such an inner product here is just generally a complex number, and the superscript “*”
just refers to the complex conjugation of that number as usual. Eq. (2.94) also helps us write down a key
property we require for such inner products with complex functions or vectors, which is that “flipping”
the order of the vector in the inner product results in having to take the complex conjugate.
This Dirac notation version of writing the inner product is particularly clear and intuitive25. The inner
product here even has a kind of “inner” look to it with the angled brackets “enclosing” the expression.
quantum mechanics of light and its interaction with matter. Incidentally, his first university degree was in electrical
engineering.
25
There is another very unfortunate notation often used in mathematical texts for the inner product, which is to
write for the inner product x y instead the notation ( y , x ) . Note that this is the “wrong” way round. Now the
associative law as in matrix-vector products no longer is visually obvious. In Dirac notation, we can write, for
example, x A y ≡ ( x A ) y ≡ x ( A y ) which is just using the normal associated property of matrix-vector
multiplications that allows us to group them in any convenient way as long as we do not change the order. Trying
2.9 Mathematical properties of modes 41
We can also continue to use Dirac notation when we include operators in our expressions. So we can
rewrite Eq. (2.71) ( Ax = λ x ), for example, as
A x =λ x (2.95)
Dirac notation is widely used in more advanced treatments of quantum mechanics.
to write the same equivalence in the mathematical “ ( y , x ) ” notation is extremely confusing and far from obvious
( ( y ,Ax ) ≡ ( Ay , x ) ) .
26
Technically, any “compact” Hermitian operator can certainly be represented by a matrix to any desired degree
of accuracy. In practice, this typically includes all bounded operators, that is, ones that when they operate on a
bounded function lead to a bounded function as a result.
27
As the reader will have noticed, the operator d 2/d t2 is Hermitian, and the rather trivial operation of multiplying
a function by a real constant is also Hermitian. Somewhat surprisingly, the operator d/dt is not Hermitian, though
the operator id/dt is. Simple frictional systems can have terms in the differential equation that involve the operator
d/dt because frictional damping force is often proportional to velocity, at least in simple systems. Physical systems
that have such frictional damping are therefore not represented by Hermitian operators.
42 Chapter 2 Oscillations and waves
and can be represented to any degree of accuracy we wish by a sufficiently large Hermitian matrix28. In
particular, a large number of the operators we deal with in quantum mechanics are of this type29.
By extension from the matrix case, the eigenfunctions of Hermitian operators that can be represented by
matrices are always orthogonal, with real eigenvalues, and form a complete set of basis functions for
the relevant space.
Hence, not only are eigenfunctions and eigenmodes interesting for the physical behaviors they describe,
such as the distinct ways that something can oscillate, but also, for Hermitian operators representing
physical systems, these eigenfunctions or eigenmodes can have rather remarkable and useful properties,
specifically, orthogonality and completeness; as a result, we can use them mathematically to describe
all sorts of behaviors of physical systems, in ways that often turn out to be very convenient.
The reader may be aware of Fourier series. There are a few different forms of Fourier series, but
essentially Fourier series can be used to represent arbitrary functions as sums of sine waves. To represent
an arbitrary function f (z) between 0 and L (at least one that is zero at 0 and L) we could use a Fourier
series of the form
∞
nπ z
f ( z ) = ∑ an sin (2.97)
n =1 L
where the an are appropriate numbers (coefficients). The set of coefficients an is just as good a way of
describing the function as is the list of values of the function for all values of z of interest.
Our modes here for standing waves on a string are actually identical to the sine waves used in the Fourier
series. Hence, since we already know that Fourier series can represent arbitrary functions in this way,
so also can our set of modes. In other words, from our knowledge of Fourier series, we already know
that this particular set of functions forms a complete set, a property we have claimed that sets of modes
possess. Alternatively, we could state that the Fourier basis functions (the sine waves) form a complete
set because they happen to be the eigenfunctions of a Hermitian operator.
28
Operators that can be represented to any desired degree of accuracy by a sufficiently large matrix are essentially
what are called “compact” operators, though the actual definition of compact operators is somewhat technical
mathematically, and requires ideas from the subject known as functional analysis. Functional analysis also gives
us the kinds of results that allow us to generalize the ideas here to infinite dimensional spaces and from finite
matrices to differential equations. This subject can be rather forbidding, unfortunately, though see D. A. B. Miller,
“An introduction to functional analysis for science and engineering,” arXiv:1904.02539 for a relatively readable
introduction.
In particular, all the operators associated with what are called measureable quantities in quantum mechanics are
29
Hermitian.
2.9 Mathematical properties of modes 43
could have a string whose density varied along its length. The modes of such a string would not be the
simple sine waves we have here. Nevertheless, at any given time, we could if we wanted still represent
the shape of the string as a sum of our sine wave modes. In such a case, we are using the mathematical
properties of the set of eigenfunctions of an operator even though that operator is not necessarily the one
that corresponds to the current physical system’s behavior.
There is actually an infinite number of possible basis sets of functions to describe functions in any given
space. This mathematical fact can be very useful; in any given problem, there is often a very convenient
basis to choose that makes the problem simple to solve. This notion of multiple different possible sets
of basis functions is also basic to the mathematical foundation of quantum mechanics.
There are some operators that are not Hermitian that lead to real eigenvalues (in the problem of two unequal
30
masses with three springs, the resulting matrix is not Hermitian, though the eigenvalues are real)
31
A broader mathematical definition of modes of linear systems would be to say that they are the solutions of the
generalized eigenvalue problem. Such a problem can be written in the form Ax = λBx where A is a linear
Hermitian operator and B is a linear operator also.
32
This definition is slightly imprecise in that not necessarily all Hermitian operators can be represented to any
degree of accuracy by finite matrices, though all “compact” ones can. In practice, however, we can typically reduce
linear problems to ones that we can solve with matrices, and in that sense we can regard this definition as being
broad enough for a discussion of such Hermitian problems.
44 Chapter 2 Oscillations and waves
Problems
2.9.1 Consider the following matrices and state whether they are Hermitian.
1 + i 1
(i) 1 1− i
7.5 3 − i
(ii) i − 3 20
14 1 + 7i
(iii)
1 − 7i 5
z exp ( −iθ ) i
(iv) exp ( iθ ) −z 2 − 3i where z is a complex number and θ is a real number.
−i 2 + 3i 0
2.9.2 Consider the following pairs of vectors and state whether they are orthogonal. (Use the form of
the inner product appropriate for complex-valued vectors.)
1 −4
(i) 2 and 2
1 i
(ii) i and 1
1 + i −3
(iii) and
3 1 + i
2.9.3 For the matrices listed below, identify which are Hermitian. For any Hermitian matrices, verify
that their eigenvalues are real and their eigenvectors are orthogonal to each other.
2 2 + i 3
(a) 2 + i 1 i
3 i 2
2i 1
(b) 1
−2i
1 2i
(c) −2i −1
2.9.4 Consider three masses and four springs as in the figure. All the masses are identical, with mass
m. All the springs are identical, with spring constant K. The masses are presumed to be free to
move in the x direction, with no friction, but they cannot move in any other direction. The
positions of the masses with respect to their equilibrium positions are x1, x2, and x3 respectively.
At the equilibrium positions (which correspond, respectively, to x1 = 0 , x2 = 0 , and x3 = 0 ),
we can take all the springs to have no extension or compression.
2.9 Mathematical properties of modes 45
For any given set of positions, we can if we like write these in the form of a mathematical vector
x1
x = x2
x3
(a) By considering the total force on each mass, write down the three differential equations,
one for each mass, that follow from Newton’s second law and the extension of the springs,
in a similar fashion to the two-mass problem in the text.
(b) Now presume we are looking for solutions that correspond to only one (angular) frequency
ω of oscillation for everything that is oscillating. Write down the resulting three equations,
one for each mass, in this case.
(c) Now write these three equations from (b) in the form of a matrix eigen equation – that is,
in the form Ax = λ x . Do this in such a way that the elements of the matrix A are
dimensionless (that is, they are just numbers) and λ is a positive (and also dimensionless)
quantity.
(d) For each of the following vectors, state whether it is an eigenvector of A, and, if so, what
is the associated eigenvalue λ and a corresponding expression for the eigen (angular)
frequency ω. [Note: do not formally solve for the eigenvectors of the matrix, either
analytically or using any computer program here – justify your answers by directly showing
the vectors are solutions of the matrix eigen equation for some value of λ.] [Hint: At least
two of these are eigenvectors.]
1 1 −1 / 2 1 1
0 ; −2 ;
(i) (ii) (iii) 1 / 2 ; (iv) − 2 ; (v) 2
−1 / 2 1
0 1 1
(e)
(i) Find another eigenvector that is orthogonal to all eigenvectors in (d) above. Do this
without solving for the eigenvectors of the matrix by any mathematical technique or
program. You should be able to make an intelligent guess here and justify that your
guess is indeed such an eigenvector. [Hint: In this eigenmode, one of the masses does
not move at all.]
(ii) Give the eigenvalue λ and the corresponding expression for the eigen (angular)
frequency ω.
(iii) Explicitly show the orthogonality of this vector and the other eigenvectors found in
(d).
2.9.5 Suppose we have two masses and three springs, in an arrangement similar to Fig. 2.6 with the
masses allowed to move without friction in one direction. As in that figure, all the spring
constants are the same, with some value K, but now the mass, mL , on the left will be different
from that on the right, mR . We can usefully define the ratio of these masses as
46 Chapter 2 Oscillations and waves
mL
ρ=
mR
We will construct the eigen problem that would allow us to solve for the (angular)
eigenfrequencies ω and the associated eigenvectors that give the relative amplitudes of the
motions x1 of mass mL and x2 of mass mR .
(i) Show that the associated eigen problem can be written as
x1 x1
M = λ
x2 x2
where the matrix M and and the quantity λ are given by
2 −1 mLω 2
M= and λ =
− ρ 2 ρ K
(ii) State whether this matrix is Hermitian, briefly justifying your answer.
(iii) Find an expression for the eigenvalues λ of of this system in terms of ρ.
(iv) For ρ = 2 (so physically, mL = 2mR ), write out the resulting eigenvalues λ. (The resulting
expression should only involve numbers, and you may leave square roots in your
expression – you need not evaluate the square roots of any numbers.)
(v) For each of the eigenvalues, solve for the eigenvectors. Write all the eigenvectors in the
1
form where a is some number or numerical expression (which can include square
a
roots that you need not evaluate, and which will generally be different for the different
eigenvectors). [Note: you may find the identity b 2 − c 2 ≡ ( b − c ) (b + c) useful if you are
simplifying expressions.}
(vi) State whether the resulting eigenvectors are orthogonal, justifying your answer briefly.
2.10 Conclusions
In this chapter, we have introduced many ideas that are well known from classical physics. The idea of
eigenmodes is in particular quite widely used there, and it is also relatively well known that eigenmodes
of Hermitian operators have these almost magical mathematical properties of orthogonality and
completeness.
In quantum mechanics, these ideas of linear Hermitian operators, and the facts that their eigenvalues are
real and that their eigenfunctions are orthogonal and form complete basis sets, are quite central to the
way that we think about the whole subject and how we write it down. Though there is much that is very
different about quantum mechanics compared to classical mechanics, these mathematical ideas of modes
and eigenfunctions carry over very well. Understanding them in the classical context prepares us to start
thinking about quantum mechanics.
The quantum view of the world
3.1 The beginning of quantum mechanics
In the period leading up to 1900, the list of phenomena for which we had no good “classical” explanation
was growing steadily.
By 1870, we had the first drafts of the periodic table of the elements, but we had no explanation for them
or their chemical or physical properties, and indeed no underlying theory of why materials had particular
properties.
The ideas of what we now call statistical mechanics were emerging in the 1870s, starting with James
Clerk Maxwell’s and Ludwig Boltzmann’s work on understanding the ideas of thermal distributions.
These could give some explanations of properties of gasses and, increasingly, other phenomena, such
as specific heat. From Boltzmann’s work in particular, we started to have a meaningful interpretation of
the difficult concept of entropy through the statistics of the occupation of possible states. One underlying
presumption in such theories becomes the idea that there may be discrete states for a system; the counting
of these leads to the statistical ideas of entropy. But in this early work in the 1870’s, there was no
physical basis for arguing that there should be such discrete states. Indeed the ideas of atoms and
molecules themselves as discrete entities were still not universally accepted.
Fig. 3.1. Hydrogen atom spectral lines (top) in the visible spectrum (the Balmer series), together
with their conventional names and wavelengths, and the modern identification as transitions
between energy levels in the hydrogen atom.
48 Chapter 3 The quantum view of the world
Work on the spectroscopy of atoms such as hydrogen led to the observation of remarkable but puzzling
structure in the colors of light emitted by excited hydrogen. Johann Balmer in 1885 noticed that a simple
expression could predict the wavelengths of the spectral lines of hydrogen, as measured by Anders
Ångström, for example, in what is now called the Balmer series of lines as shown in Fig. 3.1.
Another puzzling observation from this period was what we now call the photoelectric effect by Heinrich
Hertz in 1887. When ultraviolet light shines on a metal, it emits electrons more easily (as we now
understand the effect). Though classical physics might reasonably explain why absorbing energy in the
metal might lead to more emitted current, it does not explain why the wavelength of the light matters.
But what in the end led to the first key conceptual breakthrough in setting up quantum mechanics was
an everyday problem – lighting. Prior to the 19th century, the only practical man-made source of light
was from flames. It was, of course, well known that hot objects emit light even if they themselves were
not burning. Blacksmiths could effectively gauge temperature by the color of the heated metal with
which they were working, for example.
There had been experiments through much of the 19th century on incandescent light emitters (that is,
ones that emit light because just they are hot). Electric lighting had become commercially feasible in
about 1880 with carbon filament light bulbs, such as those marketed by Thomas Edison and others (Fig.
3.2). But there was no good model for absorption and emission of light, and so it was not clear how to
make lamps that were better emitters1.
The idea that a body that was a very good absorber of light (a “black body”) would also be a very good
emitter of such thermal radiation was well understood by that time, through the work of Gustav
Kirchhoff and others around 1860. Kirchhoff’s Law of Thermal Radiation tells us that a body that
perfectly absorbs light will be the best possible thermal emitter of light. But there was no good physical
model for the spectrum of light emitted by such a black body.
Fig. 3.2. A carbon filament light bulb at the Fire Station in Livermore, California, where it has
been running essentially continuously since 1901.
In 1887, the German government, stimulated in part by Werner von Siemens, founded the Physikalisch-
Technische Reichsanstalt (PTR) – the Imperial Institute of Physics and Technology – near Berlin. One
priority for the PTR was to understand light emission from hot bodies for better light bulbs. By the late
1890’s, very precise measurements were being made there of the emission spectrum of a black body.
1
At that time, improving light bulbs would largely have been a practical issue of being able to achieve very high
temperatures in a filament while still allowing a usably long lifetime for the light bulb; it was likely quite clear
empirically that hotter bodies emitted more light that was also “whiter”. But, still, there was no underlying theory
of light emission to guide progress.
3.1 The beginning of quantum mechanics 49
Though there were some empirical formulas for portions of the spectrum, there was none that was
perfect, and none of those formulas had a good physical model underlying them.
The spectrum of emission by a black body is shown in Fig. 3.3 for two different temperatures, together
with a comparison to a classical model derived by Lord Rayleigh 1900 and extended with James Jeans
in 1905. First, we see that the classical model does not work at short wavelengths. This classical model
predicts an ever-increasing amount of emitted light at shorter wavelengths. This failure of the classical
model is therefore sometimes known as the “ultraviolet catastrophe”. Second, we see from Fig. 3.3 and
Fig. 3.4 that, since the black body is the best possible thermal emitter of light, there is a practical problem
of making efficient thermal light emitters for the visible part of the spectrum (about 400 nm – 700 nm
wavelength). To get the peak of the emitted spectrum to coincide with the visible range, we would need
a very hot material, essentially as hot as the sun, and if we cannot make a material as hot as the sun, not
only does the wavelength of peak emission move out of the visible range, the magnitude of the emission
also decreases overall quite rapidly with temperature.
Fig. 3.3. The emission spectrum of a black body at 5800 K (approximately the temperature of the
sun) and at 3000 K (the temperature of a very hot light bulb filament), and the classical Rayleigh-
Jeans model.
Fig. 3.4. Expanded view of the black-body spectrum at 5800 K and at 3000 K in the visible range
of the spectrum. The dotted line is the 3000 K spectrum magnified by 25 times. An approximate
color map of the visible spectrum is shown at the top of the figure.
50 Chapter 3 The quantum view of the world
Max Planck, then a professor in Berlin, in 1900 proposed a formula and then a derivation for the black-
body spectrum using one “ad hoc” assumption: Suppose that light is emitted in quanta of energy
E = hf (3.1)
where f is the frequency (in Hz or cycles per second) of the electromagnetic radiation. The quantity h is
what has become known as Planck’s constant
h 6.62606957 ×10−34 Js (3.2)
Commonly, this constant is also used in the form known as “h bar”, which is Planck’s constant divided
by 2π
1.054571726 × 10−34 Js (3.3)
in which case we can write the energy of these quanta as
ω ( hf )
E =
= (3.4)
where ω is the corresponding angular frequency (i.e., ω = 2π f ). (It is also common to use the notation
E = hν , with the Greek letter ν (“nu”) instead of f for ordinary frequency.) When Planck’s constant
appears in some physical formula, we can be fairly sure we are working in the quantum world, one quite
different from the classical ideas that preceded it.
Planck’s formula agrees very well with observed spectra. Note, though, that Planck did not propose the
photon. That would apparently have contradicted the very successful wave theory of electromagnetism
and light, based on Maxwell’s equations. He just proposed the light was emitted in quanta by materials.
Albert Einstein in 1905 proposed what we now call the photon to explain the photoelectric effect2. Eq.
(3.4) then gives the energy of the photon at some frequency f (or angular frequency ω).
Einstein later went on to extend the quantum theory of light in 1917, devising a particularly elegant
derivation of Planck’s black-body spectrum while at the same time explaining the processes of emission
and absorption of light by atoms. In particular, he deduced a key relation between absorption,
spontaneous emission (the most common light emission, as in light bulbs) and the then-unknown process
of stimulated emission (the process seen in lasers), a process he introduced for this argument. This 1917
work is known as “Einstein’s A and B coefficient argument”. We will return to the quantum mechanics
of light and the black-body spectrum later.
Einstein’s proposal that light was actually made of quanta started the concept of “wave-particle” duality.
This notion – that some entity could simultaneously have attributes of waves and of particles – seems to
fly in the face of normal logic, and was not accepted or understood for some time; indeed, it remains
one of the major conceptual challenges when starting to learn quantum mechanics3. It is, however,
arguably not a problem in quantum mechanics if we avoid bringing along all the ideas of classical
particles and classical waves. It is worth noting that wave-particle duality is a routine, everyday
phenomenon in optical communications: We send light down optical fibers, treating it as a wave, but
we generate and detect light using processes that are based on the ideas of photons, so we verify this
idea literally trillions of times a day.
Problem
3.1.1 I need to generate as much electromagnetic radiation as possible in the (infrared) wavelength
range beyond 1000nm (1 micron), but I only have a limited amount of electrical power to drive
any light-bulb filaments. Should I drive a small number of light bulb filaments each using large
2
It was actually for this proposal that Einstein was awarded the Nobel Prize in 1921.
3
And it is quite possible that it remains a conceptual challenge even after that!
3.2 The early quantum mechanics of atoms 51
fractions of my power and running at high temperature (e.g., 5800K) or should I use a large
number of filaments each using small fractions of my power and running at a lower temperature
(for example, 3000K). (You can presume that the filaments are black-body radiators, and that
the only way the light bulb filament emits power is through electromagnetic “black-body”
radiation – no heat is conducted away from the filaments by any other mechanism.) Justify your
answer briefly.
Fig. 3.5. Sketch of Bohr’s model of the hydrogen atom, with a point-like electron orbiting the
proton like a moon round a planet, and restricted to orbits with angular momenta in units of ħ.
Thus Bohr was taking the radical step of using Planck’s constant, which had been introduced on a purely
“ad hoc” basis by Planck to explain the black-body spectrum, and introducing it into the theory of matter.
This model is successfully able to explain the basic energies of the hydrogen atom states, giving the
formula
Ry
EH ( n ) = − (3.5)
n2
Here, n is an integer, starting at 1, and able to take any of the values 1, 2, 3, … and so on. (n is now
known as the “principal quantum number”.) Ry is a constant known as the Rydberg energy (or just “the
Rydberg” for short). The “zero” of the energy here corresponds to the energy at which the hydrogen
atom is just about to be ionized, which classically is equivalent to the “escape” energy, as in the
minimum “escape velocity” of a space ship if it is to be able to escape the Earth’s gravitational pull.
It is relatively straightforward given Bohr’s assumptions to derive Eq. (3.5), coming up with the formula
for the Rydberg energy in terms of fundamental constants, including the masses of the electron and the
52 Chapter 3 The quantum view of the world
single proton that forms the nucleus of the hydrogen atom and, of course, Planck’s constant. We will
return to that full expression later. For the moment, we note that the Rydberg’s value is
Ry 13.6 eV (3.6)
Note that we have chosen to express the Rydberg energy in the energy unit of “electron-volts” or “eV”
for short. The electron-volt is technically the amount of energy involved in moving an electron through
an electrostatic potential difference of 1 volt (1 V). The relation between energy in electron-volts and
energy in joules is
1 eV ≡ e J 1.602 176 565 × 10−19 J (3.7)
4
The meter is now defined as the length of the path traveled by light in vacuum during the time interval of
1/299 792 458 of a second. The underlying reason for this way of approaching fundamental constants is that now
we have a very precise frequency standard (and hence the time standard is defined as a specific number of cycles
of our frequency standard) and so we use that time or frequency standard and this choice of the velocity of light to
define the unit of length.
3.2 The early quantum mechanics of atoms 53
The unit “nm” is nanometers (10-9 m), the unit “pm” is picometers (10-12 m), and the unit “Å” is
angstroms5 (10-10 m)6. In practice, the angstrom version may be the most widely used (picometers is
comparatively rarely used in practice in this context), and it allows the convenient order-of-magnitude
idea that a hydrogen atom is approximately about 1 Å in diameter. Though the modern model of the
hydrogen atom is different from Bohr’s approach here, this Bohr radius remains a useful characteristic
size in discussing the hydrogen atom.
We should, however, be clear, without detracting from Bohr’s remarkable advance here, that this model
is not correct in several ways7. It was, nonetheless, a critical step in the advance of quantum mechanics,
setting up the next phase of understanding of the quantum mechanics of the atom.
Perhaps the first puzzle of the Bohr model is that, because it suggests an orbiting electron, in a classical
model of electromagnetism, it should be radiating electromagnetic waves all the time. In classical
electromagnetism, accelerating a charge leads to radiated waves. We might not think the charge is
accelerating in one sense here – in such a hypothetical circular orbit, the speed of the electron is not
changing – but we must remember that acceleration is the rate of change of a vector quantity, the
velocity. Changing the direction of the velocity vector in time is certainly also an acceleration. So there
is at least an unresolved question here as to why these orbits are not radiating.
In fact, once we construct the correct model of the hydrogen atom using the Schrödinger equation, we
find several other things “wrong” with the Bohr model. Though Bohr correctly proposed that angular
momentum was quantized in units of ℏ, the lowest energy state actually corresponds to an angular
momentum of zero, not 1, units. And, though higher energy states can indeed have larger angular
momentum in units of ℏ, not all of them actually do, a point that again becomes clarified with the full
Schrödinger equation solution.
Fig. 3.6. Electron charge density, or equivalently the probability density of finding an electron at
some position relative to the nucleus, in a hydrogen atom, shown here in cross-sections passing
through the center of the atom for two different possible states.
More importantly, though, the main point that is not conceptually correct about the Bohr model, and one
that could not really have been anticipated by Bohr at that time, is that the electron is not some point-
5
This unit is named after Anders Ångström (1841 – 1874), a pioneering Swedish spectroscopist. When the unit is
written as a letter, it is always written with the ring diacritic, the small circle on top of the A, correctly
corresponding to the first (Swedish) letter in his name. When written out as a unit, both the ring diacritic and the
umlaut accent on the “o” are likely to be dropped completely, though it would be more correct to write ångström.
6
The angstrom is not technically one of the standard units in the SI system of units, but it is widely used because
of its convenience in the quantum mechanics of atomic systems and for historical reasons.
7
Unfortunately, it has become a picture that non-scientists often view as the structure of the atom, with point-like
electrons in circular orbits round a very small nucleus, a picture that is profoundly wrong. Bohr should not in any
way be blamed for this sloppy but persistent misrepresentation, however! Bohr quite certainly accepted the later
improved models, and continued to make major contributions to quantum mechanics.
54 Chapter 3 The quantum view of the world
like particle8 – some “dot” orbiting in these classical orbits. This is a deep and important point in
quantum mechanics. When we are able later to construct the correct model using the Schrödinger
equation, we find probability distributions or electron charge densities like those shown in Fig. 3.6.
These are all “fuzzy clouds” – they are nothing like circular orbits of a point particle. For the lowest,
n = 1 state (also known as the 1s state), the distribution peaks in the middle, is spherically symmetric,
and falls off smoothly. For the other state shown in Fig. 3.6, the distribution is more like a kind of
“dumbbell” shape9; this one corresponds to one of the 2p orbitals of the hydrogen atom.
De Broglie’s hypothesis
The next key step in setting up the quantum mechanics of matter was the proposal by Louis de Broglie
in 1924 that, just as there was a wavelength associated with light and light was also describable in terms
of particles, then for particles of matter there should also be a wavelength associated with them. He
proposed that wavelength λ was related to the magnitude of the momentum p of the particle.
Specifically, he proposed
h
λ= (3.12)
p
an idea that can be fitted in with Bohr’s orbital idea by requiring integer numbers of wavelengths round
an orbital. This proposal of matter as waves – a completely counter-intuitive one at that time – then
suggests that there must be some wave equation describing these waves, which sets us up for
Schrödinger’s wave equation.
Incidentally, Eq. (3.12) also turns out to be correct for the relation between the momentum of the
photon10 and the corresponding wavelength λ of the electromagnetic wave, though we should be clear
that the quantum mechanical wave for a particle like an electron is another kind of wave entirely.
Over the next 8 years or so, the core of the quantum mechanics that we use to understand ordinary matter
and light was constructed by a collection of physicists and mathematicians, including Max Born, Paul
Dirac, Werner Heisenberg, John von Neumann, Wolfgang Pauli, and Erwin Schrödinger.
Almost simultaneously, Heisenberg and Schrödinger proposed mathematical models, subsequently
shown by Schrödinger to be equivalent, that could explain the hydrogen atom. Heisenberg’s approach
(in 1925) is more abstract, being based on matrix algebra. On the face of it, Schrödinger’s wave equation
(in 1926) is more tangible, giving a “wave” that we can visualize11, and we will start below by taking
this approach. In practice, we use both the matrix and wave approaches routinely today, moving
smoothly between them. Actual calculations in quantum mechanics are quite likely to use matrices for
example, even if we are thinking about wave functions.
8
Getting the idea of the electron as some sort of tiny point particle out of our heads is one of the major challenges
in learning quantum mechanics!
9
To specify these other states of the hydrogen atom, we need to specify two other numbers, l – the orbital angular
momentum quantum number, and m – the magnetic quantum number.
10
It might seem very strange that the photon should have momentum because it does not have any mass. However,
we can actually deduce from ordinary classical electromagnetism that a light beam has momentum. The
phenomenon of “radiation pressure” is well known – a light beam can indeed “push” an object quite measurably.
If we consider the photons to have an energy hf for frequency f, then for a beam of a given intensity to have the
radiation pressure that we get from classical electromagnetism, the photons would indeed have to have a
momentum p = h / λ .
11
In fact, as we will see, the “wave” in Schrödinger’s equation is quite a subtle concept, possibly without the
“reality” that we normally ascribe to entities like water waves or electromagnetic waves.
3.3 Schrödinger’s wave equation 55
Born, among other things, contributed the idea of how we interpret Schrödinger’s waves, and Pauli
introduced the important idea of electron “spin” that is an additional sophistication we need to add on
top of Schrödinger’s equation to model atoms with more than one electron. Dirac and von Neumann
largely sorted out the mathematics of quantum mechanics, and Dirac also made other key contributions,
such as a deeper understanding of the quantum mechanics of light, and the explanation of the origin of
spin.
Problems
3.2.1 Calculate the wavelengths in nanometers of the spectral lines of the hydrogen atom
corresponding to the transitions to the n = 3 energy level from each of the n = 4, 5, and 6 energy
levels.
3.2.2 What would be the wavelength of the electromagnetic radiation whose photon energy is equal to
1 Rydberg (1 Ry)? (Such a photon energy would be required to raise an electron from the lowest
level in the hydrogen atom to the point where it is just “ionized” – analogous to reaching escape
velocity when launching a spacecraft from the earth).
3.2.3 An electron is accelerated from rest using an electrostatic potential of 3 V.
(i) What is the wavelength of the quantum mechanical wave corresponding to an electron with
this much kinetic energy?
(ii) What is the (electromagnetic) wavelength of a photon with the same energy?
3.2.4 What is the de Broglie wavelength of each of the following objects, presuming in each case that
we can treat them as a quantum mechanical particle of the given mass:
a) A fullerene molecule (C60, also called a “buckyball” because of its shape) of mass
m 1.2 × 10−24 kg at a speed v = 200 m/s?
b) A soccer ball of mass m = 0.43 kg moving at 25 m/s?
c) The Voyager space probe, m 722 kg, moving at speed 17,260 m/s?
3.2.5 A red photon of wavelength 633 nm is reflecting directly backwards off a mirror in its path,
changing its (photon) velocity from c to –c in that direction What is the momentum transferred
to the mirror? What would this value be if there were 1010 red photons instead of one? (Note:
The de Broglie formula also works for the momentum of a photon if we use the wavelength of
the light as the wavelength λ in the formula. I.e., the magnitude of the momentum of a photon is
also p = h / λ )
2π
k= (3.14)
λ
as defined before in the discussion of classical waves. As we know, at least before we start imposing
boundary conditions, functions like sin kz and cos kz are solutions of such an equation (3.13), and any
linear combination of them (that is, any sum of them with constant coefficients) is also.
In classical mechanics, quantities like wave amplitudes have to be real. Sometimes in dealing with
classical systems we use complex numbers for mathematical convenience; complex exponentials12 like
exp ( ikz ) or, in time-dependent problems, exp ( iωt ) , are often used as intermediate algebraic forms,
but in the end we have to return to real wave amplitudes (typically, by adding in the complex conjugate
at the end of the problem). Certainly from a mathematical point of view, the function exp ( ikz ) is a
solution of Eq. (3.13) and, of course, from Euler’s formula it is anyway a linear combination of sin kz
and cos kz
( ikz ) cos kz + i sin kz
exp = (3.15)
which would also qualify it as a solution to the linear differential equation (3.13).
In quantum mechanics, we make more use of the complex form, like Eq. (3.15). So our possible set of
solutions to Eq. (3.13) includes all of sin kz , cos kz , and exp ( ikz ) , as well as sin(− kz ) , cos ( − kz ) ,
and exp ( −ikz ) .
For the more general three-dimensional Helmholtz equation, we would now have
∂ 2ψ ∂ 2ψ ∂ 2ψ
∇ 2ψ ≡ 2
+ 2 + 2 =−k 2ψ (3.16)
∂x ∂y ∂z
where again we will be considering sin ( k ⋅ r ) , cos ( k ⋅ r ) , and exp ( ik ⋅ r ) , as well as sin ( −k ⋅ r ) ,
cos ( −k ⋅ r ) , and exp ( −ik ⋅ r ) .
Suppose, then, we now try to construct a wave equation for quantum mechanics, based on a Helmholtz
equation, Eq. (3.13) and de Broglie’s hypothesis, Eq. (3.12), which we now rewrite using Eq. (3.14) in
terms of k as
2π p p
=k = (3.17)
h
Incidentally, this gives another very common form for writing de Broglie’s hypothesis, which is13
p = k (3.18)
Hence we can rewrite our Helmholtz equation, Eq. (3.13), as
p2
∇ 2ψ =
− 2ψ (3.19)
or
12
We will use both of the notations exp ( iθ ) ≡ eiθ here, with a preference for the former because it reduces the use
of small superscripts. Note, incidentally, that we will use the physics notation of i ≡ −1 instead of j, which is
more common in engineering (possibly to avoid confusion with the use of i for current in engineering). Also,
because in quantum mechanics we almost always write a time-oscillation in the form exp(−iωt ) rather than the
form exp( jωt ) that is more common in engineering, sometimes we can think of i being − j , though this is not
reliable.
This form p = k is also correct for the photon momentum, where in that case the k = 2π / λ is the wavevector
13
− 2∇ 2ψ =p 2ψ (3.20)
We presume we are interested in some particle of mass m (which we could presume for definiteness to
be an electron, for example). So, we can divide both sides by the particle mass m to obtain
2 2 p2
− ∇ψ =ψ (3.21)
2m 2m
Now, we know from classical mechanics that
p2
≡ the kinetic energy of the particle (3.22)
2m
In general, in classical mechanics, we presume
Total energy (E )=Kinetic energy + Potential energy (V ( r ) ) (3.23)
So therefore we presume
Kinetic energy = p 2 / 2m
(3.24)
= Total energy (E ) − Potential energy (V ( r ) )
Therefore, we can propose to rewrite the Helmholtz equation, Eq. (3.21), as
2 2
− ∇ ψ = ( E − V ( r ) )ψ (3.25)
2m
or, with a minor rearrangement,
2 2
− ∇ + V ( r ) ψ =Eψ (3.26)
2 m
Eq. (3.26) is the Schrödinger equation (technically, the time-independent Schrödinger equation) for a
particle of mass m.
Note, as we said at the beginning of this section, that we have not “derived” Schrödinger’s equation. We
are merely suggesting it. We cannot derive it from first principles. The only justification for making this
postulation of Schrödinger’s equation is that it works.
And it does work. The first major triumph of Schrödinger’s equation14 is that it can explain the hydrogen
atom.
Problems
3.3.1 In a region of space, a particle with mass m and with zero energy has a time-independent
wavefunction in the z direction
z2
ψ ( z ) Az exp − 2
=
L
where A and L are constants. Determine the potential energy V ( z ) of the particle.
14
The alternate and equivalent approach of the Heisenberg matrix model was also successfully used around the
same time by Pauli to solve this hydrogen atom problem.
58 Chapter 3 The quantum view of the world
3.3.2 Consider the Schrödinger equation for a particle of mass m with some specific total energy E in
a uniform potential (i.e., the potential energy does not vary in space) of some specific value V.
For simplicity here we consider the equation in just one dimension, i.e.,
2 d 2ψ
− + Vψ =
Eψ
2m dz 2
For such a situation, the wavefunction that solves this equation can take the form
ψ ( z ) = A exp ( ikz )
for some constant A and a wavevector magnitude k = 2π / λ where λ is the corresponding
wavelength of the electron wavefunction.
(a) Find an expression for k in terms of E, V, m and fundamental constants.
(b) If E = 1.5 eV , V = 1 eV and the particle is an electron, calculate the values of k and λ.
The wavefunction itself, then, is not a probability or a probability density. It is, however, an example of
what in quantum mechanics we can call a “probability amplitude” or a “quantum mechanical
amplitude”. This concept of a probability amplitude ψ, as in Eq. (3.27), where we have to take the
squared modulus to get a probability, is one that has no precedent in ordinary probability theory; it is
one of the new mathematical ideas we have to introduce in quantum mechanics to have it explain the
world around us.
With this additional postulation by Born, we have the quantum mechanical and mathematical basis for
analyzing the hydrogen atom. We will postpone the solution of the hydrogen atom itself because it takes
some time and effort to write down the necessary mathematics and solve the resulting equations, but we
now have all of the physical postulates we need to do that.
15
And possibly even something of a relief! How would we conceptualize a measureable yet complex entity?
16
Strictly, we should think of this as the probability of finding the electron within a small volume round about the
point r, divided by the magnitude of that small volume.
We express this formula with a “proportional to” sign (“ ∝ ”) because we have not yet dealt with the issue of
17
making sure that all the probabilities add up to 1, a process called “normalization” that we can postpone for the
moment.
3.5 The “particle in a box” model 59
Fig. 3.7. Solutions for a particle in a one-dimensional box of width Lz and with infinitely high
walls. Solutions are shown for the first three eigenenergies and the associated eigenfunctions,
with the zeros for plotting the eigenfunctions chosen to be the dashed lines at each corresponding
eigenenergy E1, E2, and E3 here.
Formally, then, we can choose the potential energy V to be zero inside the box18. So, inside the potential
well or box, because V = 0 inside the box, the Schrödinger equation Eq. (3.26) becomes
2 d ψ ( z )
2
− Eψ ( z )
= (3.28)
2m dz 2
or equivalently
d 2ψ ( z ) 2m E
2
= − 2o ψ ( z ) ≡ − k 2ψ ( z ) (3.29)
dz
where
2mo E
k2 = (3.30)
2
18
It never matters where we choose the zero of energy to be – we can choose it anywhere we want. We simply
have to be consistent in using whatever choice we make. It is often simplest just to choose the energy to be zero at
some convenient point in our energy range of interest.
60 Chapter 3 The quantum view of the world
We are presuming here that the potential energy rises to be arbitrarily (or infinitely) high at z = 0 and
at z = Lz . Because we presume that any solutions will be for some finite energy E, we presume there is
no possibility of finding the electron outside the box (that is, for z < 0 or z > Lz ), and so we propose
that the wavefunction ψ should be zero both at z = 0 and at z = Lz ; that is, we have boundary conditions
ψ ( 0) = 0 (3.31)
and
ψ ( Lz ) = 0 (3.32)
Just as for the wave on a string, the general solution to Eq. (3.28) or Eq. (3.29) is
ψ ( z ) A sin ( kz ) + B cos ( kz )
= (3.33)
The second boundary condition, Eq. (3.32), means that kLz = nπ for some integer n, so
nπ
k= (3.36)
Lz
But rewriting Eq. (3.34) gives
2k 2
E= (3.37)
2m
so the solutions of Schrödinger’s equation are, for the wavefunctions (eigenfunctions)
nπ z
ψ n ( z ) = An sin (3.38)
Lz
and for the eigenvalues (eigenenergies)
2
2 nπ
En = (3.39)
2m Lz
where, for the same reasons as in the case of waves on string (no purely-zero function, and only counting
distinct functions once), we take only positive, non-zero values for n, that is, n = 1, 2, 3, … . These
solutions are sketched in Fig. 3.7.
Using the wavefunction as written in Eq. (3.38), we can integrate the modulus squared over the width
of the potential well – which is the only range of positions for which the wavefunction is non-zero – so
we can add up all of these probability densities. Specifically, then, we would have19
Lz Lz
2 2 nπ z 2 Lz
∫ ψ n ( z ) dz ∫=
2
= A sin
n dz An (3.40)
0 0 L z 2
So, if we choose
2
An = (3.41)
Lz
Fig. 3.8. Plot of the modulus squared of the wavefunctions for the first three levels of the particle
in a box, with the corresponding horizontal dashed lines being the zeros for each different plot.
19
This particular integral is particularly simple because it is over complete periods of the sine squared function,
and the average of the sine squared function is ½ over any complete period.
We do not have to choose a real coefficient. We can multiply any eigenfunction by a complex number of unit
20
We can plot out the modulus squared of the wavefunctions, as shown in Fig. 3.8. When we use the
normalized version of the wavefunctions, the area under each one of these plots, relative to its
corresponding zero line, is 1.
To evaluate the probability of finding a particle in a given region of space, then we formally need to
integrate the probability density over that region. Suppose, for example, that we knew that the particle
(for example, an electron) was in state n = 1 in such a box, and we wanted to know the probability Ps
of finding the electron between some point z1 and some other point z2. Then, using the normalized
wavefunction from Eq. (3.43), we simply need to integrated the modulus squared of the normalized
wavefunction over these limits. The answer23 would be
z2 2 z z z
2 2 πz 2 21 2π z 1 2 1 2 2π z
∫z Lz
Ps = sin
Lz
dz = ∫
Lz z1 2
1 − cos
Lz
dz = ∫
Lz z1
1 dz − ∫
Lz z1
cos
Lz
dz
1
z2
(3.44)
1 [ ]z2 1 Lz 2π z ( z2 − z1 ) 1 2π z2 2π z1
= z z1 − sin = − sin − sin
Lz Lz 2π Lz z Lz 2π Lz Lz
1
Ps is then the shaded area in Fig. 3.9. Note that this result in Eq. (3.44) is dimensionless – it is just a
number, as we would expect for a probability. Note also that, if we choose z1 = 0 and z2 = Lz then this
expression, Eq. (3.44), correctly gives us the answer 1 – the particle must be somewhere inside the box.
Fig. 3.9. Illustration of the area, Ps, under the probability density curve for the first state of a
particle in a box, corresponding to the probability of finding the particle between z1 and z2.
23
We are using the trigonometric identity sin
= ( ) (1 / 2 ) 1 − cos ( 2α ) here.
2 α
3.5 The “particle in a box” model 63
conclusion that there are only very specific, discrete values of that energy that are possible24. That
behavior is different from prior classical models of matter. We will also refer to such solutions as energy
eigenstates; these are solutions that are telling us possible states of matter (here, our particle) for this
problem. These discrete states can also be indexed with a “quantum number” – here the integer n. We
regard the “state” here as having these properties of an eigenenergy, an eigenfunction, and an appropriate
quantum number25 to index it.
Parity of eigenfunctions
One mathematical property of the eigenfunction solutions to this problem is what is called “parity”.
Because this problem is rather symmetric (the potential energy is symmetric about the center of the
well), the solutions also are influenced by that. In particular, with respect to the center of the well, the
eigenfunctions are either (i) “symmetric” or have what is called “even parity”, which means the value
of the function some distance to the right of the center is the same as the value of the function the same
distance to the left of the center, or (ii) they are “antisymmetric” or have “odd” parity, which means the
value of the function some distance to the right of the center is minus the value of the function the same
distance to the left of the center. Often for brevity, we call functions with “even” parity just “even”
functions, and similarly those with “odd” parity just “odd” functions.
For the eigenfunctions here, relative to the center of the potential well, as we can see from Fig. 3.7, the
solutions with n = 1,3,5, have “even” parity, and those with n = 2, 4,6, have “odd” parity. (Note
that the parity of the functions does not happen to correspond with whether the quantum numbers
themselves here are even or odd, so the odd-ness or even-ness of the quantum numbers is not itself a
good guide to the parity of the functions.)
Not all quantum mechanical problems will have solutions with these “definite” parities (that is, either
odd or even). Potentials that are not symmetric, such as a “tilted” potential well in which the bottom of
the well is sloping in one direction or the other, can have solutions that are neither symmetric nor
antisymmetric, and so those solutions can be said not to have definite parity.
Problems such as the hydrogen atom, which is a problem that is symmetric in many ways, often do have
solutions with quite definite parities, as well as having other symmetries, such a rotational symmetries
in which rotating by some angle or angles around some axis leaves the eigenfunctions unchanged.
Symmetries generally can be very useful in simplifying solutions of these and other wave problems, and
parity is one of the simplest examples of symmetries that such eigenfunctions may possess.
Another obvious feature of the successive eigenfunctions in this particle-in-a-box problem is that each
one has one more “zero” in it than the preceding one. For example, the n = 1 solution has no zeros (other
than the ones at the walls), the n = 2 solution has a zero in the middle of the well, the n = 3 solution has
yet one more zero, and so on. This also is common mathematical behavior and helps to ensure the
orthogonality of different eigenfunction solutions.
Quantum confinement
This quantum mechanical analysis of a particle in a box is showing behaviors that are quite different
from our notions of what would happen classically to a particle in a box in at least three different ways,
all of which we can regard as being consequences of what we call “quantum confinement” in the box;
by quantum confinement we mean that there is a set of quantum phenomena that emerge if we confine
some system (here, a particle) on a small scale.
24
In quantum mechanics in problems without confining boundaries, or for energies above the “height” of bounding
potentials, it is also possible to continuous ranges of eigenenergies that are not restricted to discrete values, but for
problems where we can at least imagine the solutions are bounded by some box, we will have discrete energy
eigenvalues.
25
More generally, in more complicated problems, such as the hydrogen atom, we may end up with a set of quantum
numbers to index the state.
64 Chapter 3 The quantum view of the world
Hence we see that even this very simple model is beginning to give us the scales of actual quantum
mechanical phenomena. Generally, we expect quantum mechanical effects for matter to become stronger
as we make things smaller. As a general sense of scale, the properties of materials are roughly
independent of the size of the piece of material for scales of microns or larger. If we look carefully at
the properties of objects on scale of about 100 nm or so, we can begin to see quantum confinement
effects. For objects at a size scale of about 10 nm, quantum confinement effects are quite clear. Indeed,
they are used routinely with semiconductor materials, for example, to tune their optical properties solely
based on the physical dimensions.
By the time we get to the single nanometer scale, quantum effects are dominating the properties of a
piece of material, drastically changing its properties compared to normal “bulk” properties. Quantum
confinement from sizes on the single nanometer scale can be used to change the color of
microcrystallites of semiconductor materials, for example. Such nanometer size scales are now routine
for various nanotechnologies, including the fabrication methods used for transistors in silicon chips, for
example, and the layered semiconductor structures used for modern optoelectronic devices like
semiconductor lasers and light-emitting diodes.
Simply put, we cannot begin to understand the properties and operating principles of modern devices,
especially those based on nanotechnology, without this physics of quantum confinement.
Problems
3.5.1 Consider an electron in a one-dimensional potential well with infinitely high potential barriers
on each side and of thickness Lz = 1.5 nm.
(i) What is the zero-point energy of the electron in this potential well, relative to the energy
of the bottom of the well? (Give your answer in electron-volts)
(ii) What is the energy separation of the first and second energy eigenstates for the electron in
this well? (Give your answer in electron-volts)
3.5.2 Suppose we have an electron initially in free space, with no confining barriers of any kind. Now
suppose we confine it in one direction with a pair of parallel “walls” that are effectively infinitely
high (i.e., correspond to a potential energy that is effectively infinitely large). How far apart
should the walls be if we want to give the electron a zero-point energy (i.e., a lowest possible
energy, compared to the “free” electron’s minimum energy) of each of the following amounts?
(a) 25 meV (which, incidentally, is approximately the magnitude of the thermal energy at room
temperature)
(b) the same energy as a photon that corresponds to the electromagnetic wavelength used in
long-distance optical fibers (which is about 1.5 µm in air or vacuum)
(c) the same energy as the separation between the n = 2 and n = 3 levels of the hydrogen atom
3.5.3 An electron is confined in an infinite square well of width L.
(i) Calculate the energy separation in eV of the following states.
(a) L = 10 nm , n = 3 → n = 1
(b) L = 5 nm , n = 5 → n = 3
(ii) Suppose in each case that the system could emit a photon by the electron “falling” from
the upper state of the pair to the lower one, with a photon energy corresponding to the
separation between the upper and lower states. In each case, what would be the wavelength
of the emitted photon? (Note: the energy of a photon is = E = ω hf , where ω is the
angular frequency and f is the frequency.)
3.5.4 For each of the following proposed functions, state whether it is an energy eigen function solution
for Schrödinger’s time-independent equation for a particle-in-a-box problem for a box of
66 Chapter 3 The quantum view of the world
thickness Lz, presuming infinitely high “walls” on either side. Presume z = 0 at the “left” wall of
the box and z = Lz at the “right” wall of the box. [Note: any such solutions are not required to be
normalized.]
πz
(i) cos
Lz
πz
(ii) sin
2 Lz
5π z
(iii) 16sin
Lz
πz 3π z
(iv) 2sin + 3sin
Lz Lz
2iπ z 2iπ z
(v) 2exp − 2exp − L
L
z z
3.5.5 In the semiconductor GaAs, electrons behave as if they have a mass that is approximately 0.07
times the mass of the free electron. Suppose we have a thin layer of GaAs, of thickness Lz to be
determined, that we will treat as if it was a one-dimensional potential well with infinitely high
walls on either side. We want to make a device that emits light at an infrared wavelength of 10
μm by having electrons transition from the n = 2 level in this well to the n = 1 level. How thick
should we make the layer (that is, what value should we choose for Lz)?
∞
2
3.5.6 Find the normalization factor A for the following functions such that ∫ ψ ( z)
−∞
dz = 1 .
(a) ψ
= ( z ) A exp( −ikz ) for − L < z < L .
(b) ψ
= ( z ) A exp(−κ z ) for −∞ < z < ∞ . (Presume κ > 0 .)
2π z L L
(c) ψ ( z ) = A cos for − < z < .
L 4 4
3.5.7 For a particle that has each of the following normalized wavefunctions ψ ( z ) in the z direction,
calculate the probability of finding a particle in the given regions.
2 3π z L 2L
(a) ψ ( z) = sin for < z < .
L L 3 3
2 2π z
(b) ψ ( z) = sin for 0.8L < z < L .
L L
3.5.8 Consider an electron in an infinitely deep potential well in the second energy eigenstate of that
well (i.e., n = 2).
(i) Presuming the well has a thickness Lz, give an expression (with any integrations completed)
for the probability of finding the electron between a position z1 and a position z2 in the well.
Presume now that Lz = 8 nm, for the electron, give the probability that we would find the electron
in the each of the following regions of the well.
(ii) Between 0 and 4 nm from the left of the well
(iii) Between 1 and 2 nm from the left of the well
(iv) Between 3 and 5 nm from the left of the well (i.e., in the middle 2 nm of the well)
3.6 Waves and measurement 67
(v) Between 3.9 and 4.1 nm from the left of the well
An uncertainty principle
One well-known phenomenon seen with classical waves is diffraction from an aperture. This is
illustrated in Fig. 3.10, which shows calculated wave patterns resulting from plane waves on the left, of
wavelength λ, being incident on different apertures in a mask. When we pass waves through a small
aperture, we see quite clearly at large distances that the resulting waves spread out. If we choose smaller
apertures, the spreading angle is larger.
For an aperture of size d, the full angle between the minimum in the wave pattern at the top and at the
bottom is approximately λ / d . A more characteristic angle given some rough half-width of this
angular spread would be significantly smaller than this, and it will be convenient for our algebra here27
to take this characteristic half-width angle to be approximately
λ
θc (3.48)
2π d
Note in this behavior that the better we know the position of the “source” of the waves in the aperture,
as given by the size of the aperture, the less well we know the angle of the waves, or at least there is a
larger spread in angle the waves. Relations like these occur routinely with waves in classical physics,
and they represent classical “uncertainty principles”, where we have a reciprocal relation between the
uncertainty of one quantity and the uncertainty in another. The relation between the uncertainty in the
angular direction of the wave, as given by the diffraction angle, and the uncertainty in the position of
the wave, which is at least the distance corresponding to the width of the aperture, is a good classical
example of an uncertainty principle.
Another common kind of relation is one encountered with signals in time. If we have a very short “pulse”
of a signal of width ~ ∆t, then we know the timing of the signal well because we can know with this
degree of precision at what time the pulse was emitted; but we will find that it is not meaningful to ask
for the precise “frequency” of the pulse because it would necessarily be made up out of a range of
frequencies; we could write that uncertainty in the frequency in angular frequency terms as ∆ω.
26
George Thomson was the son of “J. J.” Thomson, who had proposed the electron as a particle in 1897. Both
father and son separately won Nobel Prizes, George Thomson’s being shared with Clinton Davisson for this work.
27
The overall form of this diffraction angle formula is quite valid, at least for small angles. Our choice here of a
factor of 2π in the bottom line is a reasonable number to choose to give a characteristic diffraction angle, but,
honestly, this specific choice is just so we get to the “right” answer at the end! That right answer can be justified
with more rigorous definitions and mathematics, but we are not giving either of those here.
68 Chapter 3 The quantum view of the world
that is not initially expanding. We can note immediately the similarity between these two “classical”
uncertainty principles, Eqs. (3.49) and (3.50).
At least for electrons28, from de Broglie’s hypothesis we know that the momentum p = k , so we take
the uncertainty ∆px in the momentum in the x direction to correspond with the uncertainty ∆k x we have
in the component of the wavevector in the x direction – that is, we interpret ∆px ≡ ∆k x . Hence, from
Eq. (3.50), we have
∆px ∆x ≥ (3.51)
2
where we have interpreted our calculations here to give a minimum estimate of the divergence of the
beam or, equivalently, the uncertainty in the momentum in the x direction. It is certainly possible to have
larger divergence angles than those shown here, for example if the incident wave in the aperture is not
a plane wave or has some structure to it.
Eq. (3.51) represents an uncertainty principle based on this rather informal derivation from diffraction
angles. In fact, because of our choices of numbers here (which are admittedly somewhat arbitrary), this
result corresponds to Heisenberg’s uncertainty principle for position and momentum that can be derived
with more precise definitions and mathematics.
One important point to make here is that there really is nothing surprising about the existence of
uncertainty principles for waves or for signals in time. They are a routine part of the classical world. For
a classical wave, it is meaningless to ask to know simultaneously both the “position” of a wave and its
direction. If it has a precise direction, then it is a plane wave, and it is arbitrarily wide. If it has a precise
position, then it is a wave at some point source, and it will spread out over a wide angle. With careful
mathematical definitions and some more sophisticated mathematical proofs (which are formally the
same in both classical and quantum mechanical worlds), we get uncertainty principles that are
mathematically identical in form to the quantum mechanical one of Eq. (3.51).
28
In fact, this same relation also is valid for photon momentum.
70 Chapter 3 The quantum view of the world
as we have been discussing it29. That issue is known in quantum mechanics as the “measurement
problem” and it is associated with the “collapse of the wavefunction”.
Fig. 3.11. Geometry of Young’s slits experiment (not to scale), with narrow slits a small distance
s apart at a distance zo from a screen.
The distance d1 from the upper slit to this point x on the screen will be, by Pythagoras’ theorem,
29
These two issues of the uncertainty principle and the measurement problem are linked if we view the ∆px and
∆x as being statistical standard deviations we deduce from repeated experiments measuring these quantities, but
they are not linked if we think of these quantities as just being attributes of wave motion.
3.6 Waves and measurement 71
2 2
s 2 x−s/2
d1 = x − + zo = zo 1 + (3.52)
2 zo
Since we are presuming x and s are much smaller than zo, we can approximate the square root by its
lowest-order power series expansion – that is, for small a, 1 + a 1 + a / 2 – so we can write,
approximately,
( x − s / 2)
2
x2 s2 sx
d1 zo + =zo + + − (3.53)
2 zo 2 zo 8 zo 2 zo
Similarly, the distance d2 from the lower slit to the point x on the screen will be
2
s 2 x2 s2 sx
d2 = x + + zo zo + + + (3.54)
2 2 zo 8 zo 2 zo
Because the distance zo is large compared to the other distances in the problem, we can expect the waves
from the two slits to be essentially equally strong by the time they reach the screen, or at least we can
use this as a simplifying approximation in our analysis here. For mathematical convenience, and also
because quantum mechanical waves may well actually have this form, we use the complex exponential
form of the solutions to the relevant wave equation here; that is, for a wavelength λ and hence a
wavevector magnitude k = 2π / λ , we take the wave coming from a specific slit to be of the form
exp ( ikr ) for some distance r from a source.
So the sum of the two waves from the two different slits, at the point x on the screen, will be
approximately
ψ ∝ exp ( ikd1 ) + exp ( ikd 2 ) (3.55)
we will have
sx sx sx sx
i α + k 2 z exp ( iα ) exp ik 2 z
ψ ( x ) ∝ exp i α − k
+ exp= 2 zo
+ exp −ik
2 zo
o o
ksx π sx
= 2exp
= ( iα ) cos 2exp ( iα ) cos
2 zo λ zo
(3.57)
2
Taking the “intensity” of the beam to be ∝ ψ (which would be essentially the probability density for
a quantum mechanical wave) , we would have a “brightness” on the screen of
2 π sx 1 2π sx
ψ s ( x ) ∝ cos 2 = 1 + cos (3.58)
λ zo 2 λ zo
This corresponds to a pattern of interference “fringes” on the screen, with a period in the x direction of
λ zo
ds = (3.59)
s
We see, incidentally, that these interference fringes are not spaced by the wavelength, but by an amount
that is zo / s larger. Hence Young’s slits experiment allows us to measure wavelength without having
72 Chapter 3 The quantum view of the world
any measuring device on the scale of the wavelength. The interference phenomena and the fringe spacing
are shown in Fig. 3.12.
Fig. 3.12. Calculated interference pattern and resulting bright and dark “intensity” fringes spaced
by a distance ds on the screen on the right for Young’s slits when the slits are illuminated by a
plane wave from the left.
Young’s slits and quantum mechanics
Considered as a wave phenomenon, the behavior of Young’s slits is straightforward to understand. The
apparent conceptual problems start when we want to think in terms of particles. Quantum mechanics is
introducing for us the idea that entities such as the photon and the electron have aspects of both wave
and particle character – what we are referring to as “wave-particle duality”.
On the face of it, there might seem to be no problem if we are thinking of large numbers of photons, for
example, because we might reasonably expect them to behave as waves, as we see every day in the
normal “classical” behavior of radio waves and light beams from lasers. But what happens when we
presume that we are dealing with very small numbers of photons or electrons? Specifically, what
happens when we turn down the light beam intensity or the electron current so much that we can be
quite sure that there is never more than one photon or electron in the apparatus?
We can readily calculate what happens even at high intensities or currents if we block one or other of
the slits, as illustrated in Fig. 3.13 and Fig. 3.14. In either case, we would still see the wave diffracting
as it emerges from the remaining aperture, but we would lose the interference “fringe” pattern on the
screen.
Our classical intuition might therefore suggest to us that, when there is only one particle in the apparatus
at a time, it will go through one or other of the slits, and hence we will lose the interference fringes. But,
that is not what happens. In both the electron and the photon case, we still see the interference fringes.
Now, at these low intensities or currents, for any one photon or electron, we will not see such a pattern.
For a photographic plate for a photon or for a phosphorescent screen for an electron, for any one photon
or electron, we will just see one dot. The remarkable experimental conclusion30, however, is that, if we
keep repeating such two-slit experiments with one photon or electron at a time, the set of dots we get
will progressively build up the interference pattern.
But how can this be? Surely, we might say, the electron or photon has to go through one slit or the other,
so it cannot form an interference pattern. We remember, though, that when discussing the Bohr model
of the atom, we made the point that the electron is not a “point particle”. Now we can extend that
discussion, using this two-slit experiment to clarify how we view what is going on here. To do so, we
need to introduce an idea that more typically only comes up in philosophical discussions but that will
be helpful to us here.
30
An excellent example of such an experiment performed with electrons can be seen at
http://www.hitachi.com/rd/portal/research/em/doubleslit.html .
3.6 Waves and measurement 73
Fig. 3.13. Calculated diffraction pattern when the top slit is blocked.
Fig. 3.14. Calculated diffraction pattern when the bottom slit is blocked.
The ontology of quantum mechanical particles
If we look up the word “ontology” in a dictionary, we will find a definition like “the nature of being”,
which may not be very illuminating even though it is correct. More pragmatically, we can think of the
ontology of something as being the set of attributes it possesses. A ghost can walk through walls. I
cannot walk through walls. This distinction between me and a ghost is an ontological one. The ghost
has the attribute of being able to walk through walls, and I do not have that attribute.
When we were making the transition between a classical view and a quantum mechanical one, we quite
naturally used classical words to describe the new entities we were observing. So, for example, we called
the electron a “particle”, and similarly for the photon. The mistake here is that we therefore also brought
along all the ontological attributes of a classical particle when we did that, even though in fact we had
no justification for doing so.
A classical particle has attributes like charge, mass, position, size, shape, and momentum. But just
because we call an electron or a photon a particle does not mean that it has all of these attributes, and it
may have others as well (spin is an example of such an additional attribute). With the benefit of
hindsight, it might have been better to introduce another word to describe entities such as the electron
and the photon; for example, maybe we should have referred to them as “qwarticles” or some other such
new name. By doing so, we might have been able to avoid automatically bringing along all the classical
attributes and hence causing ourselves a great deal of self-inflicted confusion.
How should we think about the attributes of these “qwarticles”? We can find experimentally that it is
useful to give them the attributes of charge and mass (both of which are zero for the photon, but that
“zero” value is still quite meaningful). We will need to add the attribute of spin, which the classical
particles do not have. Charge, mass, and spin are indeed intrinsic attributes of electrons and photons.
74 Chapter 3 The quantum view of the world
But we need to be much more careful about position, size, shape and momentum. We could ascribe
some intertwined version of these attributes as allowed by the uncertainty principle. It is likely more
useful and meaningful to say that the state that the “qwarticle” is in can have these properties in some
sense. So, the hydrogen atom orbitals can have some kind of size and shape, at least in some average
sense appropriate for a “fuzzy” object, and some concepts of position and momentum, consistent with
the uncertainty principle. Those orbitals also possess quite specific angular momentum values. So we
can say the electron is in such a state (a hydrogen atom orbital) that has these attributes; these attributes
are really of the state, however, not of the idea of an electron itself.
This idea of distinguishing the ontology of the particle and the ontology of the state it is in is one we can
use in the classical world as well; we can ascribe the position and momentum to the “state” of a classical
particle, like a brick, for example, rather than having those be an intrinsic property of the brick itself. In
the classical case, though, we would probably consider the shape to being an intrinsic property, whereas
for the “qwarticle” the shape is better considered to be an attribute of the state the qwarticle is in.
For the case of a photon, we can think of a “mode” of the electromagnetic field, such as some standing
wave mode of a resonator, or the propagating “mode” of a laser beam. A photon can “occupy” that mode
in the same way that an electron can occupy one of the eigenstates of the particle in a box or one of the
orbitals of the hydrogen atom. As we have been discussing, these ideas of modes, eigenstates and orbitals
are really all the same mathematical idea. We could simply use the word “mode” for all of them, though
historically these different areas of physics have grown up somewhat independently so we do not
typically use the word “mode” for all these applications in practice.
When we look at the diffraction pattern for Young’s slits, the pattern we calculate and plot is the “state”
that is being occupied by the electron or the photon. It is not meaningful to ask where the electron or the
photon is, or what slit it passed through31. These are not point particles that have to have a position. They
are occupying a state that has a particular form and shape in space, and that size and shape encompasses
both slits. Asking which slit the electron came through is asking for the value of an attribute that the
electron does not intrinsically possess and for which the state it is in also obviously does not contain an
answer. The question is meaningless. It is like asking someone to tell you where exactly a computer
program is, to a precision of, say, 1 micron. The computer program is a state of some substantial part of
the memory of your computer, likely much larger than 1 micron in size. Beyond that, it is meaningless
to ask where it is.
The ontology of quantum mechanical waves
A slightly more subtle point, and one that is not normally discussed much in this context, is that the
“wave” here does not have all the attributes of a classical wave either, and maybe we should actually
refer to these also by another name we could make up, such as “qwave”32. Classical waves appear to be
quite continuous objects, capable of any of a continuous range of amplitudes, for example, and those
amplitudes are also real, measurable quantities. Our “qwaves” do have the attributes of propagation,
linear superposition, and interference, just like many classical waves. But, quantum mechanical waves
are not necessarily real, and they may not themselves be measurable entities. Furthermore, perhaps the
attribute we might find most difficult to understand with quantum mechanical waves is how they can
correspond to countable particles; classical waves do not appear to have any such attribute.
We have already seen with quantum mechanical waves that to get a direct quantitative meaning, such
as probability density, we should “normalize” the wave, so that the probabilities add up to one. If we
decide always to perform such normalization, then the wave becomes consistent in the sense that the
31
Incidentally, any attempt to figure out experimentally “what slit” the “particle” went through ends up by
destroying the interference pattern.
32
The reader should be clear that these terms “qwarticle” and “qwave” are just words we are making up here for
the purposes of this argument. These are not words that are in use in quantum mechanics as far as the author is
aware.
3.6 Waves and measurement 75
particle always has to be somewhere when its position is measured33. That is not an attribute we would
require of a classical wave.
We have also seen that there may be an issue with quantum mechanical waves as to whether their
amplitudes themselves are actually measurable quantities; furthermore, the amplitudes may not even be
represented by real numbers. So for our quantum mechanical wave, it may not have the attribute that the
wave can take any amplitude (it should perhaps always be normalized), and it may not have the attributes
that it is even measurable or real. Even if we do not give the quantum mechanical wave (or “qwave”)
such attributes, it is apparently still a useful entity as part of our quantum mechanical description of the
world, however.
With the revised ontologies we are giving to our “qwarticles” and “qwaves”, we can now quite
reasonably assert that there are no contradictions in “qwave-qwarticle” duality, and the ones we thought
were there in “wave-particle” duality were because of our unjustified carrying over of the entire classical
ontologies of waves and particles. While this is a quite reasonable and defensible attitude in quantum
mechanics, and we recommend that the student takes this view, we have to be honest that we have also
avoided discussing so far another area in which there are genuine difficulties in quantum mechanics,
which is the issue of the measurement problem.
The measurement problem
The problem we have not dealt with so far lies in the observation that, in any such experiment with only
one electron or one photon in the apparatus, the result of the experiment is to leave a dot on the screen,
not an interference pattern (even though the pattern will build up if we keep repeating the experiment).
Why does the experiment leave a dot at quite a definite position on the screen if we asserting that the
electron or photon does not have a definite position inside the apparatus? Indeed, this is a perfectly fair
question, and we would usually consider it strong evidence that the particle must have had a definite
position when it hit the screen.
Essentially, we are asking for an explanation of Born’s postulate that the modulus squared of the
wavefunction gives us the probability of finding the particle at a point. We could “explain” this by saying
it is because of the “collapse of the wavefunction”. That is about as good an explanation as saying that
something is hot because it has a high temperature – it categorizes the problem to some degree in
scientific terms, but offers no explanation of any process.
The idea of the “collapse of the wavefunction” is saying that, though the wavefunction was distributed
through some region of space before the measurement, the act of measuring the position causes the
wavefunction to “collapse” to being concentrated about some point. A more generalized statement of
the same idea is to say that
the act of measuring some quantity causes the system to collapse into an eigenstate of the quantity
being measured
with probabilities given by Born’s rule or a generalized version of it34. Here, since we are measuring
position, we note that we are “collapsing the wavefunction” into one with a definite position.
To be clear, we are not explaining Born’s rule or the supposed mechanism of how this collapse takes
place. The fact that we are not at all clear on how to do either of these things is what is known as the
“measurement problem”.
33
Incidentally, how we handle quantum mechanical states for more than one particle is not that we simply increase
the amplitude of the wave. In fact, to handle just two particles in three dimensional space, we need a wavefunction
of 6 coordinates, three for each particle, and so on. Somewhat confusingly, that two-particle wavefunction would
also by normalized so the probabilities add up to one, not two. The probabilities it describes, though, are the
probability of finding the first particle at one position and the second particle at a second position.
34
The generalized statement would be “with probabilities given by the modulus squared of the expansion
coefficients of the state on the eigenfunctions of the quantity being measured”.
76 Chapter 3 The quantum view of the world
This measurement problem is quite a deep one in quantum mechanics. We can actually prove that a
simple application of quantum mechanical rules does not allow us to describe any machine that can
cause the collapse of the wavefunction; in other words, not only are we saying that we cannot describe
the details of this process, we can actually prove that, according to quantum mechanics as we usually
state it, the collapse of the wavefunction cannot actually happen!
Of course, this collapse does happen, so we are left with a problem here. Not surprisingly, there are
various proposed approaches to getting us out of this difficulty. A short list of the names of these various
approaches includes the Copenhagen interpretation, Bohm’s pilot wave, nonlinearity in quantum
mechanics, and the “many-worlds” hypothesis. Though each of these approaches (and various others
also) have their advocates, it would be accurate to say that there is no one approach that is universally
accepted and that none of these is easy to explain convincingly.
As a practical matter, what essentially everyone has to do is merely accept Born’s rule as being
something that works for all actual calculations and prediction, even if this leaves us in the unsatisfactory
condition of not really understanding why it works. The good news is that, apparently, it works very
well if we merely presume that measurement is something we do with large apparatuses on small objects.
Indeed, if we had an experiment that could show that this postulate did not work in some set of
circumstances, we could perhaps figure out how to resolve this difficulty. In the meantime, we may just
have to give this practical approach its technical name, widely accepted across the physics community,
which is “shut up and calculate”.
Problems
3.6.1 Suppose we have a particle whose position we can know within an uncertainty of ±∆x where
∆x = 10 nm . We presume the particle has a mass m. If it has an uncertainty in its momentum
∆p x in the x direction, we will presume that corresponds to an uncertainty in its velocity in the
x direction of ∆vx = ∆px / m . Presuming these quantities ∆x and ∆p x are appropriately defined
to correspond to the usual statement of Heisenberg’s uncertainty principle, what is the minimum
uncertainty ∆vx we should expect in the velocity in the x direction of a such a particle in each
of the following cases? (Give answers to three significant figures, in m/s.)
(i) If the particle is an electron.
(ii) If the particle is a proton.
(iii) If the particle is a cube of side 1 micron in length of a material of density 10 g/cm3.
3.6.2 A neutron moving in x direction is passing through a slit of width 10 µm in the y direction. Which
of the following is the closest approximate estimate of the minimum uncertainty of its momentum
in the y-direction given that it passes through the slit?
a) ∆py ≈ 10-43 kg m/s,
b) ∆py ≈ 10-29 kg m/s,
c) ∆py ≈ 10-14 kg m/s,
d) ∆py ≈ 1 kg m/s.
3.6.3 Consider a two-slit interference experiment, as in Young’s slits. So we have a mask, oriented
parallel to the x axis, that is opaque except for two parallel slits. Presume the slits are very narrow,
and are separated in the x direction by a distance of 10 microns. A screen, parallel to the mask,
is positioned away from the mask in the z direction, separated from the mask by a distance
zo = 1 m . We expect in an experiment where the slits are illuminated by a plane wave traveling
along the z direction to the back of the mask that we will see interference fringes on the screen.
What will be the spacing between the centers of adjacent dark fringes on the screen in the
following situations?
3.7 Tunneling 77
(i) In an optical version of the experiment, where the incident beam corresponds to photons
with an energy of 1 eV.
(ii) In a version of the experiment with monoenergetic (i.e., monochromatic) electrons with a
kinetic energy of 1 eV.
3.6.4 Consider a “Young’s slits” experiment for an electron. We are going to generate an appropriate
electron wave for our experiment by first starting off with some electrons that we somehow
manage to generate with essentially zero kinetic energy (for example by managing to emit them
from a metal “cathode” by the photoelectric effect at a carefully chosen optical wavelength,
though the details of this do not matter for our question). Then we are going to accelerate them
towards a “mask” (which would also be the “anode” in this experiment) using an appropriate
electrostatic potential (voltage) Va. (For example, the metal from which we emit the electrons
might be at 0V, and the mask itself might be a metal held at a voltage Va.) The mask has two
very narrow, parallel slits cut in it, separated by 10 µm, and is otherwise opaque to the electrons
(i.e., electrons can only pass through the slits). (The whole apparatus here is in a vacuum.)
We want to form an interference pattern on a screen, parallel to the mask plane, and at a distance
1 m from the mask (on the other side from the electron source). What voltage Va should we apply
if the separation between the “bright” fringe peaks on the screen is to be 0.5 mm?
3.7 Tunneling
In the classical world, we can imagine a ball that we start off with some kinetic energy, and that then
encounters a hill that slopes upwards. We know that, as the ball starts to roll up the hill, its speed
gradually reduces as some of its kinetic energy gets turned into potential energy. The ball will only be
able to get to the other side of the hill if its initial kinetic energy exceeds the potential energy it would
have at the top of the hill35.
In quantum mechanics, however, it is possible for a particle to get to the other side of a barrier that is
“too high” in this sense. Even if the energy of the particle is lower than the height of the barrier, it is
possible for the particle to get to the other side. This is a process that happens with some probability,
and the higher and thicker the barrier is, the lower is the probability.
This process by which the quantum mechanical particle gets to the other side of a barrier that is
nominally too high for it is called tunneling – loosely, it is as if the particle has somehow managed to
tunnel through the hill – though it is important to emphasize that this process is not at all like the classical
idea of digging a tunnel.
This process of tunneling is quite common in the quantum mechanical world, and is found routinely in
modern electronic devices. In so-called “flash” memories that work by storing charge semi-permanently
on small capacitors, the charge is put on and taken off the relevant capacitor plate by “tunneling”
electrons through nominally insulating dielectrics. In modern MOS (metal-oxide-semiconductor)
transistors, the tunneling of electrons through the nominally-insulating gate dielectric is an undesired
nuisance. Gate dielectrics have become progressively thinner as transistors have become progressively
smaller. Tunneling of electrons through those gate dielectrics has become a substantial problem. This
has led to a major change to different gate dielectric materials, such as hafnium dioxide, as an alternative
to the previous silicon dioxide. These allow the use of thicker gate dielectrics, thereby reducing the
tunneling, while retaining the necessary capacitance between the gate and the channel that allows the
transistor to be turned on and off by changing the voltage on the gate.
Though tunneling might seem a very strange process, it is actually quite straightforward to describe it
using quantum mechanical waves and Schrödinger’s wave equation. As we have seen before, quantum
35
Or, at least, that of the highest point it will encounter on its path up and over the hill.
78 Chapter 3 The quantum view of the world
mechanical waves show many of the behaviors we see for classical waves. The process of tunneling also
exists for classical waves, with the best known examples being in what are called “evanescent fields” in
optics.
Such evanescent fields are seen near to a surface when an incident wave arrives at an angle past the so-
called critical angle for total internal reflection. We can see such total internal reflection as we try to
pass from a material of high refractive index to one of low refractive index. If you can swim underwater
with your eyes open, you are likely quite used to the fact that there is only a moderate range of angles
at which you can see “through” the surface of the water to the world above. Outside that range of angles,
the surface of the water appears to be totally reflecting, like a mirror. If you were to shine a flashlight
from below the surface of the water at an angle such that it was totally reflected in this way back into
the water, there would be an evanescent field just above the surface of the water, a field that decays very
quickly with distance. If, however, you were to put a piece of glass just above the surface of the water,
but not touching it, within the short range of these evanescent fields, the light from the flashlight would
be able to “tunnel” into the glass, forming a light beam there in the glass above the water surface, even
though the glass was not touching the water surface. This phenomenon – known in optics as “frustrated
total internal reflection” – mathematically corresponds to the light “tunneling” through the air gap
between the water and the glass, and it is used to make some kinds of partially reflecting mirrors, often
known as “beam splitters”, with two closely spaced pieces of glass, for example.
To understand tunneling in quantum mechanics, we can start by looking at what happens when a
quantum mechanical wave arrives at a potential barrier of finite height.
Fig. 3.15. A potential barrier of finite height, Vo, at position z = 0. The positive z direction runs
to the right.
Nature of solutions for a finite barrier
Quite generally, in the region z < 0, we can write the solution to Schrödinger’s time-independent wave
equation in the form
ψ left (=
z ) A exp ( ikz ) + B exp ( −ikz ) (3.60)
where A and B are arbitrary constants (which may be complex numbers) that might be fixed later once
we apply some boundary conditions or other constraints to the problem of interest.
In quantum mechanics, we interpret a solution of the form exp ( ikz ) to correspond to a wave propagating
in the positive z direction, and, similarly, a wave of the form exp ( −ikz ) corresponds to one propagating
3.7 Tunneling 79
in the negative z direction36. So, in Eq. (3.60) A exp ( ikz ) corresponds to a wave propagating to the right
with amplitude A, and B exp ( −ikz ) is a wave propagating to the left with amplitude B.
The barrier on the right gives boundary conditions that restrict the relation between A and B. In this
problem, we do not presume any barrier on the left. As a result, there is no constraint on the allowed
energy E, at least if we choose it to be positive. So, we simply have
2mE
k= (3.61)
2
Now we need to consider the effect of this finite barrier on the solutions. To understand this, we look
first at the nature of the solutions to Schrödinger’s wave equation for z > 0, that is, inside the barrier, on
the right in Fig. 3.15. Mathematically, in that region, Schrödinger’s equation becomes
2 d ψ ( z )
2
− + Voψ ( z ) =
Eψ ( z ) (3.62)
2m dz 2
which we can rewrite as
d 2ψ ( z ) 2m
= (Vo − E )ψ ( z ) (3.63)
dz 2 2
Suppose now we presume that the energy associated with the state – that is, E – is below the height of
the top of the barrier (i.e., E < Vo ), so the quantity Vo − E > 0 . We can define a real quantity κ (the
Greek letter “kappa”)
2m (Vo − E )
κ= (3.64)
2
which we can choose to be positive, and which has dimensions of inverse distance. With these
choices, we can write, for this region inside the barrier, to the right,
d 2ψ ( z )
= κ 2ψ ( z ) (3.65)
dz 2
The general solution to this equation is of the form
) C exp ( −κ z ) + D exp (κ z )
ψ right ( z= (3.66)
For the situation as in Fig. 3.15, where we presume that the barrier goes on forever on the right – that is,
it continues for arbitrarily large z – we can decide that we should choose D = 0 ; otherwise the solution
This choice comes from a choice made in considering time-dependent behavior, and specifically, Schrödinger’s
36
time-dependent wave equation, which leads to the time behavior being written in the form exp ( −iωt ) with
positive ω. The reader might also reasonably ask why we write the solution here in this form
ψ left (=z ) A exp ( ikz ) + B exp ( −ikz ) rather than the form we had in Eq. (3.33) – a form like
ψ left ( z ) F cos ( kz ) + G sin ( kz ) . Both of these forms are general solutions, and both are correct. Because of
=
Euler’s formula exp = ( iθ ) cosθ + i sin θ , F ≡ A + B , and G ≡ i ( A − B ) , and both forms are equivalent. The cosine
and sine form, however, is describing the wave in terms of a sum of two standing waves, whereas the complex
exponential form is describing the wave in terms of the sum of two “running” waves, one in each direction. It may
be surprising mathematically that both forms are capable of describing either running or standing waves or any
combination of the two, but, because of the time behavior is a multiplicative factor exp ( −iωt ) , this turns out to
be correct. For a forward propagating wave exp [ −i (ωt − kz )] ≡ exp ( −iωt ) exp ( ikz ) ≡ exp ( −iωt ) (cos kz + i sin kz ) .
The “running wave” form is more convenient for us here, so we use it for the moment.
80 Chapter 3 The quantum view of the world
will grow arbitrarily large with increasing z, which we could regard as being unphysical37. Hence we
presume we can have a solution of the form
ψ right
= ( z ) C exp ( −κ z ) (3.67)
37
If we consider a situation with a barrier of finite thickness and presume there is no wave incident from the right,
in the limit of a thick barrier, we can also justify presuming D becomes arbitrarily small.
38
We do, however, also see such penetration effects with classical waves, such as the evanescent optical waves
discussed above.
3.7 Tunneling 81
Fig. 3.16. Calculated probability density for the potential structure shown. The potential barrier
is of height Vo = 2 eV and an electron incident from the left has energy E = 1.5 eV . The
calculated magnitude of the wavevector on the left is k 6.275 × 109 m −1 ≡ 6.275 nm −1 , which
corresponds to a wavelength of λ 1.001 nm , so half a wavelength, which corresponds to the
separation of the peaks or of the minima in the standing wave on the left, is 0.50 nm . The
exponential decay constant for the wave on the right is κ 3.623 × 109 m −1 ≡ 3.623 nm −1 , so the
1/e decay length of the probability density on the right is 1 / 2κ 0.138 nm .
Note that we see the exponential decay of the probability density into the barrier on the right. We also
see a “standing wave” on the left. An infinitely thick barrier such as this has 100% reflection for incident
energies lower than the height of the barrier, so we see the incident wave is completely reflected even
though there is some penetration into the barrier. This is just like the situation of shining a flashlight
inside the water up to the surface of the water; beyond the critical angle for total internal reflection, all
the light is reflected, even though the light penetrates a very small distance, in a similar exponential
fashion, into the air above. Because the reflection is 100% in magnitude, the standing waves we see on
the left have minima of zero because the forward and backward waves, being of equal magnitude, can
exactly cancel one another there by interference. The separation between the minima is half the
wavelength on the left. Note also that the phase change on reflection is neither 180°, which would lead
to a zero in the wave at a reflecting boundary, nor 0°, which would lead to a maximum at a reflecting
boundary. The phase change on reflection here is some other specific value; such a different
mathematical value is required so that we match both the amplitude of the wave and its derivative as we
join up with the decaying exponential wave in the barrier.
One way to analyze such a structure is to work backwards from the right. We could choose any value
we like for the amplitude F of the forward propagating wave on the right for the moment, and work out
everything else in terms of it.
We can apply the boundary conditions on the right interface to establish relations between the
amplitudes C and D of the “forward” and “backward” exponentially decaying waves inside the barrier
and the chosen F. Then at the left interface, we can use the boundary conditions to establish the relations
between the forward and backward propagating wave amplitudes, A and B respectively, on the left and
the amplitudes C and D. In this way we can relate all of the constants A, B, C, and D to the arbitrarily
chosen F. Again, we are not normalizing the wave here because we have not chosen what happens on
the far left or the far right, and we are not for the moment interested in those regions.
Fig. 3.17. A finite barrier structure and the forms of the waves inside each region when the
incident electron energy E is below the height Vo of the barrier.
Note also, because we are interested in tunneling from the left to the right, we presume that there is only
a forward-propagating wave on the right; we presume there is no electron wave incident from the right.
2
The probability density in the incident forward wave is proportional to A . The probability density in
the “transmitted” forward wave is proportional to F 2 . The situations are otherwise similar on the left
and the right – for example, the potential energy is the same. Consequently, we can think of the fraction
of the incident “current” that is transmitted by the barrier, or equivalently, the probability that a particle
incident with this energy E on the barrier will be transmitted through the barrier, as F 2 / A 2 .
Fig. 3.18. Calculated probability density for the potential structure shown, with an electron wave
incident from the left. The parameters and scales are the same as in Fig. 3.16 above except that
the barrier is Lb = 0.15 nm thick.
The algebra for solving this particular problem is left as an exercise for the reader. In Fig. 3.18, we show
the results of such a calculation. The parameters here are the same as in Fig. 3.16, except we have chosen
the barrier to have a thickness Lb = 0.15 nm.
84 Chapter 3 The quantum view of the world
Note several aspects of the resulting behavior in Fig. 3.18. First, we see now that there is a finite
probability density on the right – the wave is indeed tunneling through this barrier. The behavior inside
the barrier is still predominantly an exponential decay, though now the backwards exponential decay is
not negligible, so the shape is actually the sum of forward and backward exponential decays (with a
smaller amplitude on the backward one).
On the left, we are still seeing “standing wave” behavior in the probability density, with the same period.
Note now that the minima in the standing wave do not go down to zero. This is because the reflection
off the barrier is now not 100%; there is partial transmission through the barrier. So the backward
propagating wave on the left has a smaller amplitude than the forward propagating wave. Hence the
interference between these two waves does not lead to complete cancellation at the minima. The standing
wave maxima in Fig. 3.18 are also lower than those in Fig. 3.16 for the same reason; Now the sum of
the amplitudes of the forward and backward propagating waves is not as large at its peaks. Note also
that since the probability density is the modulus squared of the total wave, even a small drop in the
amplitude of the backward propagating wave leads to a significant drop in the maxima of the probability
density.
39
Fresnel’s equations would tell us the reflection and transmission of the fields, and the Poynting vector would
allow us to calculate the power in the reflected and transmitted beams.
3.7 Tunneling 85
Problems
3.7.1 An electron of kinetic energy in the z direction of 1eV is incident on a potential barrier of
arbitrarily large thickness and of height 1.2 eV.
(i) What is the exponential decay length of the electron’s probability density (note, not of the
quantum mechanical wave amplitude) inside the barrier? (That is, what is the distance over
which the probability density decays to 1/e of its initial value, where e is the base of the
natural logarithms?)
(ii) Compare the probability P1 of finding the electron within the first 1 nm of depth into the
barrier with the probability P2 of finding the electron within the second 1 nm of depth (i.e.,
between 1 nm and 2 nm into the barrier). What is the ratio P2 / P1 ?
3.7.2 Consider a potential barrier of arbitrary thickness and of height Vo at a position z = 0 as in the
figure below.
We presume an electron with kinetic energy E less than the height of the barrier corresponds to
a wave on the left of the barrier of the form
ψ left (=
z ) A exp ( ikz ) + B exp ( −ikz )
B
R= = exp ( −iθ )
A
where θ is a real number that we can calculate based on the energy E and the barrier height Vo .
(Note that this means we have concluded, correctly, that the magnitude of this reflection is unity.)
θ is then a measure of the “phase change on reflection” by the wave. Specifically, different values
of θ would lead to different positions of the first maximum in the standing wave pattern for the
probability density to the left of the barrier.
We are interested in the position z1 of the first such maximum in the standing wave pattern for
the probability density on the left of the barrier. (Note that z1 will be less than zero because it is
on the left of the barrier.) We can presume that 0 < θ < π .
where A, B, C, D, and F are constants that may be complex. Note we presume no backwards
propagating wave on the right of the barrier. (Note the choice of the form F exp [ik ( z − Lb ) ] on
the right, which may make your algebra slightly easier. We are always free to make such a choice
because F is an arbitrary constant to be determined, and exp ( −ikLb ) is also a constant.
Equivalently, F exp [ik ( z − Lb ) ] ≡ G exp ( ikz ) for some other arbitrary constant
=G F exp ( −ikLb ) )
(i) Write down expressions for k and κ in terms of E, Vo and fundamental constants.
2 2
(ii) Derive an expression for the ratio T = F / A , which we can call the transmission of this
barrier, in terms of k, κ and Lb. (Hint: work from the right of this problem back towards the
left, solving first for the relations of C and D to F at the right interface using the appropriate
boundary conditions.)
(iii) For a barrier height Vo = 2 eV , an incident energy E = 1.5 eV and a barrier thickness
Lb = 0.25 nm , calculate the transmission T of this barrier.
3.7.6 Consider a particle of mass m in a semi-infinite one-dimensional well, as shown here:
3.8 Conclusions 87
The potential in area III is infinite. In area I it is zero. In area II, it has a constant value of
18 2 3
V0 = . The particle has energy E = V0 . (Note: we have verified for you that this is an
mL 2 4
energy eigenvalue for this problem, so you do not have to derive or prove this is an eigenvalue.)
(a) Write the mathematical form of the wavefunction in each of the three areas, specifying
expressions for the magnitude “k” of the wavevector in any sine or cosine terms, and for
the exponential decay constant “κ” in any exponentially decaying terms.
(b) Use boundary conditions as necessary to write down a form of the wavefunction for all
three areas with only one arbitrary “amplitude” constant for the entire wavefunction. Note:
in considering boundary conditions, continuity of wavefunction will suffice - you will not
need also to consider the continuity of the first derivative (though, incidentally, because
this is an energy eigenfunction of the problem, you would find that condition also to be
satisfied).
(c) Normalize this wavefunction.
3.8 Conclusions
In this chapter, we have progressively introduced many of the core features of the quantum mechanical
view of the world. We started from the early history and continued through the introduction of
apparently strange and counter-intuitive ideas, such as wave-particle duality, and the proposal that the
electron was also a wave. We were forced to these ideas because otherwise we could not construct
models of light and matter that agreed with observations. With the proposal of Schrödinger’s wave
equation and the equivalent Heisenberg matrix form, these and other pioneers of quantum mechanics
were able to construct a model of the hydrogen atom that agrees remarkably with experiments and that
forms the basis of modern chemistry. We will look more at the quantum mechanics of atoms below.
We have seen that a simple example problem, the particle in a box, introduces a wide range of the
phenomena of quantum mechanics, including discrete energy levels. The analysis of the behavior of
waves has helped us introduce an uncertainty principle. The double-slit experiment exposed many more
unusual ideas in quantum mechanics, including the ontological nature of quantum mechanical
“particles” and “waves”, and the difficult issue of the measurement problem in quantum mechanics.
Finally, we were able to introduce a simple model of the phenomenon of tunneling, which is an effect
seen routinely in modern electronic devices.
With this background we are now ready to examine some of the consequences of quantum mechanics
in atoms and materials, and to add a few more quantum mechanical concepts.
Particles, atoms, and crystals
4.1 The hydrogen atom
The hydrogen atom is the simplest of all atoms. It consists of only one electron and only one proton.
Despite its simplicity, many of the features we find there are retained, at least approximately, as we
work with more complex, multi-electron atoms. So, understanding the hydrogen atom gives us a very
good start on understanding atoms, chemistry, and matter generally.
Fortunately, the hydrogen atom is also a quantum mechanical problem that can be solved essentially
exactly and completely using Schrödinger’s wave equation. There is no particular mathematical
difficulty in that solution; it is a straightforward application of standard mathematics of partial
differential equations in three dimensions. It does, however, take a fair amount of algebra to go through
that solution completely from first principles, so we will not do that here1.
Even if we do not formally complete the derivation, we can usefully introduce some of the core ideas
that allow us to simplify the problem and separate it into different parts. We can also understand both
the nature of these solutions and the overall behavior of the results. That understanding will give us a
useful conceptual basis for seeing just how all the physical material round about us is put together.
Center-of-mass behavior
Because the hydrogen atom consists of both an electron and a proton, the first difference in the quantum
mechanics compared to a simple particle in a box is that we have to deal with two particles. Fortunately,
there is a way recasting this problem to give two separate quantum mechanical problems, each of which
is like the quantum mechanics of just a single particle again. One of those problems involves considering
the combination of the electron and the proton as one particle – essentially the atom considered as a
whole – that has a mass M given by the sum of the electron and proton masses. This “entire atom” just
behaves like a free particle with that total mass M in its own Schrödinger equation. That equation
describes how the whole atom behaves quantum mechanically as a single particle, just as the
Schrödinger equation for an electron describes how the electron behaves. We will not discuss that
equation further here, in part because we have already dealt with such behavior.
The other quantum mechanical problem is the one that gives the “internal states” of the atom.
Classically, for a single planet orbiting a sun, it is not quite correct to say that the center of the orbit is
the center of the sun. Actually, the orbit is around the “center of mass” of the combination of the planet
1
See, for example, Chapter 9 and Chapter 10 of Quantum Mechanics for Scientists and Engineers by D. A. B.
Miller (Cambridge 2008)
4.1 The hydrogen atom 89
and the sun2. If the sun is very much more massive than the planet, then we might not notice this effect
because the center of mass will be close to the center of the sun, so the radius of the orbit of the sun will
be small, likely smaller than the sun itself. The radius of the planet’s orbit will likely be very much larger
because it is very much lighter. But, to be correct, both the planet and the sun are orbiting around this
“center of mass”.
We can work through the mathematics of this “center of mass” behavior, either classically and quantum
mechanically; the final equations we write down, technically for the relative motion of the planet and
the sun or the electron and the proton, look like those for a planet (or an electron) orbiting round the
center of the sun (or the proton), but in which we use a “reduced mass” for the planet (or the electron).
In the standard formula we get from this “center-of-mass” analysis, the reduced mass µ, is given by
1 1 1
= + (4.1)
µ me m p
or equivalently
me m p
µ= (4.2)
me + m p
Using the electron mass me 9.109 382 91 × 10−31 kg and the proton mass
2
The position R of the center of mass for two objects of mass me and mp that are at positions re and rp respectively
is R = ( mere + m prp )( me + m p ) , which is a kind of “weighted average” position. If we were thinking about masses
on the two ends of a light bar or a seesaw, the center of mass position would be the position along the bar at which
it would be balanced, or equivalently the point where we could put the pivot or fulcrum of the seesaw to get it to
balance.
90 Chapter 4 Particles, atoms, crystals
where
1
=εo 8.854 187 817... × 10−12 F/m (4.6)
4π × 10−7 c 2
This potential energy, sketched as the solid curve in Fig. 4.1, is a negative number because, since the
electron and the proton are attracting one another, we would need to put energy in to pull them apart.
This energy value is relative to the potential energy they would have if they were arbitrarily far apart
(so V ( r → ∞ ) → 0 ); this chosen “zero” of energy is shown as the dashed line nearest to the top of Fig.
4.1.
Fig. 4.1. Sketch of the Coulomb potential, V ( r ) , and the corresponding positions of the energies
corresponding to values of the principal quantum number n = 1, 2, and 3 from the solutions of
the hydrogen atom. The chosen zero for energy is shown as the top, dashed line. All the shown
energies are negative, corresponding to the attractive nature of the Coulomb potential for the
electron and the proton and to the “bound” nature of the states for the various values of n.
Using the reduced mass and this potential energy, we are ready to write the Schrödinger equation for
the “electron” orbitals, that is, for the relative motion wavefunction, which we will call U ( r ) ,
−2 2 e2 ( )
∇ − U r =EU ( r ) (4.7)
2µ 4πε o r
Note this wavefunction is a function of the vector relative position, r, of the electron and the proton (that
is, r is the electron position vector relative to the proton).
When we examine this equation in more detail, we find that the solutions of it can be written as the
product of two functions, one of which, R ( r ) , only depends on the separation of the electron and the
proton, and the other of which, Y (θ , φ ) , only depends on the angles, relative to some chosen directions.
That is, we can write
U ( r ) = R ( r ) Y (θ ,φ ) (4.8)
The coordinate system we use here is like one we could use to describe the position of a satellite, relative
to the center of the earth. There would be a distance, r, from the earth’s center, and two angles3, one
3
φ corresponds with longitude if we take the angles going East from the meridian (which is a circle centered on
the origin and touching the z and x axes) round in a complete circle (that is, to 360°). Latitude is conventionally
expressed relative to the equator, but the angle θ is expressed relative to the “North” pole (which is on the positive
the z axis), so the “North” pole is at 0°, the equator is at 90°, and the “South” pole is at 180°.
4.1 The hydrogen atom 91
related to the “latitude”, which is the angle θ, and the other related to the “longitude”, which is the angle
φ. This is the “spherical polar” coordinate system, and the formal definitions4 can be seen from Fig. 4.2.
Fig. 4.2. Spherical polar coordinates, r, θ, and φ related to the conventional Cartesian x, y, and z
coordinate directions.
An expression such as Eq. (4.8), where we can rewrite a function of many variables as a product of
functions each of fewer variables, is called a “separation” of the solution or a “separation of variables”5.
The fact that we can “separate” the solution in this way is because the potential in Eq. (4.5) that we use
in our Schrödinger equation (4.7) does not depend on angle at all. Such a potential is called a “central
potential” – it only depends on the distance from the center. This is important because, even for more
complicated atoms, which may have many protons in the nucleus of the atom and correspondingly many
electrons, the potential seen by a given electron can still be approximately independent of angle. Hence
those solutions for more complicated atoms still have similar angular shapes as the solutions to the
hydrogen atom, at least approximately. This is behind the whole description even of complicated atoms
in terms of “orbitals” that use the same nomenclature as we have for the hydrogen atom.
Spherical harmonics
The solutions for the angular part in Eq. (4.8), that is for the function Y (θ , φ ) , are called “spherical
harmonics”. These functions can look somewhat intimidating at first glance. Fortunately, we can
understand what these functions are like without performing the full solution here.
Mathematics of spherical harmonics
The formula for these functions can be written out completely. Explicitly, in quantum mechanics we use
the form
Ylm (θ ,φ ) ∝ Pl m ( cosθ ) exp ( imφ ) (4.9)
where we are indexing the different possible spherical harmonic functions with the integers l and m.
Here, the functions Pl m ( x ) are the associated Legendre functions, whose detailed definitions and
4
The choice of θ to represent the “polar” angle (i.e., relative to the North pole or the z axis) and φ to represent the
“azimuthal” angle (i.e., in the equatorial plane, relative to the x axis), as shown here, is the convention commonly
used in physics. In mathematics, it is also common to use the opposite convention, with φ being the polar angle
and θ being the azimuthal angle.
5
Sometimes this kind of product form is called an “ansatz”, especially when we initially propose it as a kind of
educated guess before verifying it does actually work.
92 Chapter 4 Particles, atoms, crystals
behavior can be found in standard mathematical reference works and in some mathematics programs.
These functions are real, so the only complex aspects of the spherical harmonics come from the
exp ( imφ ) part of the functions. The precise details of the associated Legendre functions are not
particularly important here, so we omit them, though we can understand their important qualitative
behavior.
The most important aspect of these spherical harmonics concerns the integers l and m associated with
them. When we are considering the angular behavior of the possible solutions of the hydrogen atom,
there is one important underlying principle: essentially, if we go round in a circle, we have to get back
to where we started. Equivalently, any function Y (θ , φ ) that is to be a solution for the spatial
wavefunction of the hydrogen atom must be periodic in each of the angles θ and φ ; that is, we require
Y (θ ,φ ) =Y (θ + 2π ,φ ) =Y (θ ,φ + 2π ) (4.10)
Otherwise, the function Y (θ ,φ ) would not be “single-valued” as a function of the physical angles. For
any function whose argument is cos θ , we are automatically guaranteed that the function is periodic in
this way for the angle θ because cos θ is periodic in this way, so Pl m ( cos θ ) will be periodic in this
way, no matter what the functional form of the associated Legendre functions happens to be.
For the function exp ( imφ ) to be periodic in this way with angle φ, m must be an integer. Explicitly, we
would require
exp ( im=
φ ) exp im (φ + 2π=
) exp ( 2π im ) exp ( imφ ) (4.11)
(equatorial) plane, so we only need to understand the form of one of these sets to understand the entire
set of spherical harmonic functions.
In the classical case, instead of letting m run from −l to +l , we simply choose it to be zero or positive,
and use both the cosine and sine forms. This gives us the same total number of functions6. Incidentally,
when spherical harmonic functions are plotted, even in discussions of quantum mechanical problems
such as the hydrogen atom, it is typically these real forms that are shown. For m > 0 , the sine and cosine
forms have the same shapes (with the sine versions just being rotated around the “vertical” axis
compared to the cosine versions), so we only need to plot one form to see the shapes.
Fig. 4.3. Plot of the nodal circles for the vibration modes of a spherical shell, together with the
numbers l and m corresponding to them. Note that l gives the total number of nodal circles, and
m gives the number that pass through the “North” and “South” poles of the spherical shell. (The
“poles” are tilted and rotated slightly in some of the cases for greater clarity in the drawings.)
The different modes of vibration of a spherical shell can be understood from Fig. 4.3. As with other
oscillating or vibration modes, everything that is vibrating in such a mode is vibrating at the same
6
When m = 0 , there is no variation with angle φ, and the “sine” spherical harmonic would be zero everywhere, so
we do not need to count it, so there is only one function associated with m = 0 in this “classical” or “real” case
also.
94 Chapter 4 Particles, atoms, crystals
frequency. The spherical harmonics here are going to tell us the amplitudes of those vibrations, in the
same way that a function like sin(nπ z / L) would tell us the amplitude of the oscillation of a wave on a
string tied between two posts a distance L apart in the z direction.
In the case of the wave on a string, for (integer) n larger than 1, there are zeros in the oscillation at
specific points. For example, for n = 2 , there is a zero in the middle of the string. Such zeros for
oscillations can also be called nodes. In the case of the vibrations of a spherical shell, instead of just
specific points where there is no vibration, we have complete circles round the sphere where there is no
vibration, even though the rest of the spherical shell is vibrating. We can call these specific circles “nodal
circles”.
Just as with the wave on a string, where there is no node along the length of the string (other than the
ends) for the “lowest” mode, then for the spherical shell, there is one mode, called the “breathing mode”,
that has no nodal circles. This mode is as if we were inflating and deflating the balloon slightly. So the
angular amplitude of the oscillation is the same in all directions. This “breathing” mode is indicated by
the sphere shown for l = 0 , m = 0 in the top left of Fig. 4.3. The next set of modes have one nodal circle.
For example, for the one in the top right of Fig. 4.3, indexed with l = 1 and m = 0 , the “equator” of the
sphere is a ring on which there is no vibration at all. Just as would be the case on either side of a node
for the case of waves on a string, the oscillation is in the opposite sense on one side of the nodal circle
compared to the other. So for this sphere with one nodal circle round the equator, it is as if the top
hemisphere is inflating while the bottom hemisphere is deflating during one half of the oscillation cycle
and the opposite way round during the other half of the oscillation cycle. The situation is similar for the
l = 1 , m = 1 case in the middle-left of Fig. 4.3, except it is the “left” and “right” hemispheres that are
inflating and deflating oppositely.
As we proceed to the remaining cases in Fig. 4.3, we have more nodal circles (two for each of the
remaining cases shown). The l = 2 , m = 0 case is an oscillation that looks like we are alternately
“squashing” and “stretching” the sphere along the polar direction; when we squash at the poles, the
sphere expands around the equator, and when we stretch along the poles, the sphere shrinks round the
equator. The remaining two cases in Fig. 4.3 correspond to adjacent “quarters” of the sphere “inflating”
and “deflating” oppositely. Note that, in all these oscillating modes, the corresponding nodal circles,
which we could think of as lines drawn on the surface of the sphere, do not move at all.
As we can see directly from Fig. 4.3, there is a simple behavior emerging for the nodal circles and the
quantum numbers l and m
l is the total number of nodal circles (4.15)
m is the number of nodal circles through the poles (4.16)
This remains true for all the higher modes of oscillation of the sphere, and is a very useful relation for
thinking about spherical harmonics.
We can also plot these spherical harmonic functions in another way. They are functions of the two
angles, θ and φ, so we could plot a surface whose distance from the center was the (real) amplitude of
the spherical harmonic. The only subtlety here is that the amplitude of the spherical harmonic function
can change from positive to negative as a function of angle, but a distance from the center can only
really be positive. So we plot the magnitude of the (real) spherical harmonic function as the distance
from center as a function of angle, and mark the different resulting regions as being either positive or
negative. The results for four representative cases are shown in Fig. 4.4.
Note that there is a relatively simple conceptual way of getting from the “nodal circles” picture of Fig.
4.3 to the “surface” plots of Fig. 4.4, at least qualitatively. Suppose that each of the nodal circles in Fig.
4.3 is an elastic band on the surface of the sphere, and suppose we now shrink those elastic bands to the
point where they completely “pinch” the spherical “balloon”. We need to give the elastic bands one
unusual property, which is that they always pinch down to the center of the sphere (we need this, for
example, for the l = 2 , m = 0 case).
4.1 The hydrogen atom 95
Fig. 4.4. Plots of four of the (real) spherical harmonics, with distance from the center in each case
representing the magnitude of the spherical harmonic. The “+” and “-” signs indicate the relative
sign of the spherical harmonic for different segments. The plots are not to any specific scales,
being meant to represent the form of the functions only.
Hence, we can now qualitatively visualize any of the spherical harmonic functions, without any
calculations. First we draw the nodal circles on a sphere, following the rule (4.15) for the total number
of nodal circles (i.e., l) and rule (4.16) for the number through the “poles” (i.e., m). We then mark the
sphere with opposite signs on opposite sides of each nodal circle. Then we consider those nodal circles
to be bands that we then use to pinch the spherical “balloon”, with the one special property that the
bands always pinch down to the center of the balloon. With this approach, we can qualitatively construct
all of the plots as in Fig. 4.4.
Spherical harmonics and hydrogen atom orbital notation
Historically, the spectral lines observed in atoms were collected into groups, which were given what we
could now regard as relatively arbitrary names. Specifically, these groups were called
“sharp” (s)
“principal” (p)
“diffuse” (d), and
“fundamental” (f)
Now, we associate these terms directly with the l quantum number, as in Table 4.1.
loop of wire and we applied a magnetic field in the direction perpendicular to the loop, then there would
be an energy associated with this that was proportional both to the size of the magnetic field we are
applying and to the current in the wire. We can understand this because we know that having a current
going in a loop itself generates a magnetic field; equivalently, we can think of that loop carrying a current
as behaving like a magnet. Not surprisingly, having a magnet in a magnetic field results in some energy.
For example, we know the magnet “wants” to orient itself in a specific direction relative to the magnetic
field, and we can extract mechanical energy from the system by allowing it to do that – this is how
typical electric motors work7. If we have in our head the idea of a “Bohr” classical orbital of a point
charge going round in a circle, then it is easy to connect the idea of the angular momentum of the electron
going round in a circle with the current in this classical loop, and hence we would indeed expect the
energy of such a state of a specific angular momentum to be proportional to the angular momentum and
to the magnetic field.
Though this classical “Bohr” model of the hydrogen atom is not correct, when we solve the quantum
mechanical version of problem of such orbitals in the presence of a magnetic field, we do indeed get a
splitting of the energy levels that corresponds to what we would expect from a “Bohr”-like model, based
on orbitals having angular momentum m . The splitting of energy levels of atoms with magnetic fields
is called the Zeeman effect8. Incidentally, one reason we use the complex exponential version of the
spherical harmonics, as in Eq. (4.9), with positive and negative values of m, rather than the sine and
cosine versions with only non-negative m, is to provide a more convenient description of these magnetic
effects. In a magnetic field, the negative and positive m values correspond to energy levels with opposite
variations with magnetic field, i.e., there is a change in energy with magnetic field that is proportional
this “negative or positive” m.
A key difference in the results of the Schrödinger equation compared to the “Bohr” model is that the
quantum number in the angular momentum is m not n, and it is also quite possible for m to be zero; some
of the possible orbitals of the hydrogen atom indeed have zero angular momentum in this sense. To
understand the relation between both the orbital angular momentum quantum number l, the magnetic
quantum number m, and the principal quantum number n, we need now to look at the solutions for the
radial part of the wavefunctions.
7
We can have motors that work on the basis of electrostatic forces only, and a few actuators do work this way,
especially very small micromechanical ones, but most so-called electric motors work of the basis of the forces
generated by magnetic fields.
8
There are also additional splittings of the energy levels with magnetic field that are associated with the spin of
the electron, a phenomenon not expected or predicted in the “Bohr” model.
4.1 The hydrogen atom 97
Fig. 4.5. Plots of the radial functions for the hydrogen atom with distance in units of the Bohr
radius ( ρ = r / ao where r is the distance from the center and ao is the Bohr radius).
The functions L p ( s ) are the associated Laguerre polynomials, whose precise definitions can be found
j
9
Note this expression for the hydrogen atom energies is identical to the form originally deduced from Bohr from
his earlier semiclassical model.
98 Chapter 4 Particles, atoms, crystals
2
2 µ e2
Ry =
= 13.6 eV (4.22)
2 µ ao 2 4πε o
2
Note one particularly important point: These energies of the hydrogen atom states do not depend on
either l or m, but only on n, the principal quantum number. Given the different shapes of the orbitals for
different n, l, and m, this is a surprising result. (Actually, hydrogen is unique among the atoms in reliably
having this property; for other atoms, the eigenenergies can depend on all the quantum numbers.)
The energies corresponding to n = 1, 2, and 3 are sketched in Fig. 4.1.
Again, though the expression Eq. (4.17) may look somewhat daunting, we can understand relatively
simply what these functions mean. These radial solutions and the explicit formulas for them are plotted
in Fig. 4.5. Note the following attributes of these solutions:
The overall “size” of these wavefunctions becomes larger with larger n.
The wavefunctions generally have an overall exponential decay of the form exp (− r / nao ) ,
so this exponential decay is slower with r for larger n.
These radial functions have n − l − 1 zeros (not including any zero at the center ( r = 0 ), or
any “asymptotic” zero as r → ∞ )
These zeros are from the roots of the corresponding polynomial functions.
This completes our mathematical solution of the hydrogen atom problem.
10
So far, we have discussed spherical harmonics only with the z direction being vertical. But, the reader might
ask, what sets that vertical direction? Is one direction in space different from the others, and if so why? The answer
here is that there is no such special direction (at least unless we do something like applying an electric or magnetic
field along some direction). It does not actually matter what direction we choose for z here, though it appears here
that we have to choose one so we can solve the problem. The solution to this paradox lies in the fact that this is
what is known as a “degenerate” eigen problem. A degenerate eigenvalue problem is one for which there are
multiple different possible solutions for the same eigenvalue. Such problems typically arise in situations that have
some symmetry. Here, for a given value of l, there is an infinite number of different ways we could write down
solutions, each corresponding to a different direction for z. The key point, though, is that if we choose a particular
“first” z direction, the solutions for any other choice of z direction can be written down as orthogonal linear
combinations of our “first” set of solutions (and vice versa). To solve this problem and write down this set of
solution functions, we need to choose a direction for z, but we can easily (at least in a fundamental mathematical
sense) write down the solutions we would have found with any other choice of direction for z in terms of the
specific solutions we devised for our specific “first” choice of the direction for z. Though we need to choose a
direction to solve the problem for the first time, we have essentially solved it for any direction of z, and any such
choice of z is an equally “good” choice as far as the problem is concerned.
4.1 The hydrogen atom 99
Fig. 4.6. Cross-sections in the x – z plane of the probability density for various hydrogen atom
orbitals, shown as “brightness”. To make the behavior clearer for various orbitals, some of these
are plotted on a logarithmic “brightness” scale.
Problems
4.1.1 In the Schrödinger equation for the hydrogen atom, the potential energy is
e2
V (r ) =
4πε o r
where the various fundamental constants have their usual meanings, and r is the quantity
representing the distance between the electron and the proton. True or false?
4.1.2 The hydrogen atom is formed from an electron and a proton. It is also possible to form a
“positronium atom”, which is made from an electron and a positron. The positron is the
“antiparticle” to the electron; it has the same mass, but its charge is equal but opposite to that of
the electron (i.e., it is +e where the electron charge is –e). The behavior of the positronium atom
can be calculated in exactly the same way as that of the hydrogen atom, but the resulting size
and energies are different because of the different mass of the positron compared to the proton.
100 Chapter 4 Particles, atoms, crystals
(i) Write down an expression in terms of fundamental constants and a principal quantum
number n for the energies of the “internal states” of the positronium atom.
(ii) Evaluate the magnitude of the “binding energy” of the positronium atom (which is the same
as the magnitude of the energy of the n = 1 level), and give your answer in electron-volts.
(iii) Write down an expression for the “Bohr radius” of this positronium atom.
(iv) Evaluate this “Bohr radius” for the positronium atom, and give your answer in nanometers.
4.1.3 State for each of the following proposed combinations of quantum numbers (n,l,m) whether it
corresponds to a “legal” possible set of such quantum numbers to represent a possible energy
eigenstate of the hydrogen atom (technically the “internal states” of the hydrogen atom)? (Here
we use the standard notation for the principal quantum number (n), the orbital angular
momentum quantum number (l), and the magnetic quantum number (m)).
(i) (1,0,0)
(ii) (2,1,2)
(iii) (1,2,2)
(iv) (2,1,0)
(v) (3,2,-3)
(vi) (4,3,2)
(vii) (2,0,-1)
(viii) (-2,-1,0)
4.1.4 Suppose we consider hydrogen atom orbitals oriented with respect to a particular polar axis,
which we can also call the z axis. We can call the resulting x-y plane the “equatorial” plane. We
will call a hydrogen atom orbital “symmetric” with respect to the equatorial plane if the
wavefunction has the same value at some point at a height zo above the equatorial plane as it does
at the corresponding point below the equatorial plane (that is, with the same “x” and “y”
coordinates, but with a z coordinate of -zo). If the orbital has the opposite sign for the
corresponding point below the equatorial plane, we will call it “antisymmetric” with respect to
the equatorial plane. For each of the following possible orbitals of a hydrogen atom, state whether
it is symmetric or antisymmetric with respect to the equatorial plane. (We are using the same
notation as for question 4.1.3 above.)
(i) (4,2,0)
(ii) (2,1,-1)
(iii) (2,1,1)
(iv) (3,1,0)
(v) (3,2,-2)
4.1.5 For each of the following labels in the “spdf” notation, state the corresponding orbital angular
momentum quantum number l.
(i) d
(ii) g
(iii) j
4.1.6 (Note: for this question, consider that electrons can have two different spins, so such electrons
of different spins have to be counted when counting total numbers of electrons. Presume, though,
that different “orbitals” correspond just to different n, l, and m quantum numbers.)
(a) How many possible orbitals are there for n = 5?
(b) How many electrons can occupy all of the n = 5 orbitals?
(c) How many electrons in an atom can have the quantum numbers n = 4, l = 2?
4.2 Spin and identical particles 101
(d) Which of the following combinations of quantum numbers is (are) not allowed for an
electron in a hydrogen atom?
n l m ms
(i) 2 0 0 -1/2
(ii) 3 -2 -2 1/2
(iii) 4 3 1 -1/2
(iv) 4 4 -1 1/2
(e) For n = 3, l = 0, at what radius or radii would the radial wavefunction become zero? (Give
any numerical answers in angstroms.)
4.1.7 What is the energy separation, in electron-volts, between the hydrogen atom states corresponding
to the following pairs of orbitals? (Note: here we are using a notation “n a” where the number n
is the principal quantum number, and the letter a corresponds to the “s, p, d, f, …” notation, so
5p would correspond to n = 5 and l = 1 .)
(i) 3d and 2s
(ii) 6g and 4f
(iii) 2p and 2s
4.1.8 Calculate the energy in eV and the wavelength of the photon emitted if an electron in hydrogen
is transitioned from the initial orbital to the final orbital listed below by emitting the photon.
(Presume the photon energy has to equal the energy separation between the energies of the
orbitals.)
(a) Initial: n = 6, l = 2, m = 1. Final: n = 2, l = 1, m = 0.
(b) Initial: n = 3, l = 2, m = -2. Final: n = 1, l = 1, m = -1.
(c) Initial: n = 5, l = 3, m = 2. Final: n = 3, l = 2, m = 1.
The spin of the electron does not, however, correspond quantitatively with any such classical spinning
body, and it has some surprising properties quantum-mechanically. One key property is that electron
spin is associated with an angular momentum of magnitude / 2 . So, instead of an orbital angular
momentum quantum number l that takes integer values, we have a spin quantum number s for the
electron where
1
s= (4.23)
2
Though this approach does work, it becomes clear at this point that the electron spin is not associated
with a spatial wavefunction corresponding to an orbit, because this non-integer value means any such
spatial wavefunction would not get back to where it started on going round in a circle. Spin is therefore
a more abstract property that we cannot completely visualize in a spatial way.
We can, however, define a magnetic spin quantum number, ms, analogously to the (orbital) magnetic
quantum number m and its relation to l, that is,
− s ≤ ms ≤ s (4.24)
which means that we can have
1
ms = , known as “spin-up” (4.25)
2
and
1
ms = − , known as “spin-down” (4.26)
2
With this definition, we can still write the angular momentum associated with these spin states as being
ms , however, and this appears to work in explaining many behaviors.
At the time when Pauli proposed it, spin and the Pauli exclusion principle were introduced for purely
empirical reasons in trying to understand the occupation of electron orbitals and without any deeper
justification. In 1928, Paul Dirac constructed an equation for the electron that was correct within special
relativity. (The Schrödinger equation is not relativistically correct, being based on non-relativistic
classical mechanics.) When he did that, the property of spin emerged as a necessary requirement of his
equation11. The “explanation” why such “spin ½” particles12 like the electron must obey the Pauli
exclusion principle for comes from the “spin-statistics” theorem, which is based on relativistic quantum
mechanics.
Because the Pauli exclusion principle prevents multiple electrons from occupying the same states, it has
a major role in determining the volume occupied by matter and in giving solid materials the mechanical
properties they have.
Many-electron atoms
Hydrogen is the simplest of all the atoms. With only one proton, and hence only one positive charge on
its nucleus, it is said to have atomic number 1. Atomic number, Z, is the number of protons in the
nucleus, which is the same as the number of electrons in an atom that is not ionized in any way. Armed
11
Dirac’s equation also simultaneously predicted the existence of the positron, the “anti-particle” of the electron,
a point later clarified by Dirac in 1931. At that time, there was no reason to consider any such “anti-particles”,
which can annihilate one another, turning into energy of some kind (such as photons). Now anti-particles are a
core part of elementary particle physics. The positron was conclusively discovered in 1932 by Carl Anderson.
12
The proton and the neutron also are spin-1/2 particles, like the electron, and also obey the Pauli exclusion
principle.
4.2 Spin and identical particles 103
with the Pauli exclusion principle, we can start to try to understand many electron atoms by looking at
the progressive filling of orbitals by electrons.
Such many-electron atoms can be complicated because the electrons all interact with one another
through their Coulomb repulsion. A useful approach is to presume that each electron sees an average
background potential energy from all the other electrons, together with the potential from the nucleus13.
This is an approximation – it is neglecting much possible detail of interactions between individual
particles. If we set such details aside, however, we can understand roughly how such many-electron
atoms behave.
Such approaches using average background potentials can generally work well enough, and simplify the
problem so much, that they are widely used, not just for many-electron atoms but also for many other
complicated systems, like molecules or solid materials. In these approaches, the effective average
background potential typically has to be determined iteratively; for example, we could guess a net
potential energy from all the electrons and the nucleus, calculate the resulting charge density from all
the electron states presuming that potential, and then recalculate the net potential energy using the new
estimate for the charge density, and so on. Often such methods compare the results of the calculations
with observations, such as measured energy levels in the actual atom or material, and adjust parameters
in the proposed potential to try to get the model calculations to agree with the experimental results. It
might seem on the face of it that there is no point to such a method – why would we set up a model just
to give us answers such as energy levels we already know? Such a model, however, can then be used to
estimate other aspects, such as wavefunctions, that can usefully predict other phenomena, like chemical
bonding or optical properties.
One other major simplifying approximation we can make for atoms is to presume that this average
background potential is approximately a “central” one – that is, approximately, it does not depend on
angle; that allows us to continue to use spherical harmonics to describe the angular behavior. The
problem still “separates” into one with the spherical harmonic angular behavior, and some radial
function. The radial function will now be different from that of the hydrogen atom because the potential
seen by one electron is now the sum of the potential from the charge of the nucleus and from the charge
of the other electrons in the atom. We expect, however, that there will be a set of radial functions that
we can still index by some quantum number n, and that those functions will have some behaviors that
are broadly similar to those of the hydrogen atom radial functions, for example, in the number of “zeros”
and in the relation between the allowed values of l and m and the quantum number n, as in Eq. (4.20).
For an atom with a large atomic number Z, we can guess that there will be two electrons (one for each
spin) in the two 1s-like states (of opposite spin) that are very tightly bound near the nucleus because
they will see the very strong attraction from the nucleus with its large number Z of protons. This pair of
electrons in that state will form a kind of full “shell”. An electron in a state that is mostly outside that
shell (which will have to be one of higher n because the inner shell is “filled”) will see a lower effective
attractive charge because the positive charge of the nucleus will be partly “screened” by those inner two
electrons partially cancelling the charge of the nucleus. Then we might expect to consider 2s-like and
2p-like states for the next electrons we add into this n = 2 “shell”; because of this screening by the 1s
shell, electrons in the states in the second “shell” will see a somewhat lower effective positive charge
attracting them towards the center. The 2p-like states are in general further from the nucleus than the
2s-like states, so they will see more screening of the positive charge of the nucleus by the 1s-like
electrons, so they will be slightly less tightly bound than the 2s-like states.
As we keep adding electrons, we fill the 2s-like and 2p-like states, completing another shell. Since there
are three different p orbitals, and we are positing two spin states for each, it will take a total of 8 electrons
to fill this “shell” – 2 to fill the two 2s-like states (one for each spin), and 6 to fill the six 2p-like states
(3 for each spin). On this simple model, the next “shell” would have two 3s electrons, six 3p electrons,
and ten 3d electrons (there are 5 d orbitals, corresponding to m =−2, −1,0,1, 2 ), for a total of 18. As we
13
An approach like this is sometimes called a Hartree-Fock method.
104 Chapter 4 Particles, atoms, crystals
fill the various “shells”, the additional electrons are seeing progressively weaker attraction to the nucleus
because its positive charge is being partially screened by more and more electrons.
We can imagine continuing at least approximately with a process like this. So, generally, in these many-
electron atoms, we expect orbitals of higher n to be progressively less tightly bound14. We also expect
orbitals of higher l for a given n to be less tightly bound, because they are generally somewhat farther
from the nucleus and hence will see a nucleus that is more screened by the closer electrons. As we noted,
the “inner” orbitals with smaller n will be very strongly bound, even more so that in the hydrogen atom,
because they are seeing a larger net positive charge on the nucleus. Hence in these many electron atoms,
the “core” electrons of smaller n are very tightly bound; as a result, they mostly do not participate in
chemical reactions, which instead primarily involve the relatively weakly bound “outer” electrons at
higher n and l.
Fig. 4.7 (a) Ordering of the energies of orbitals according to Madelung’s approximate rule,
grouped by the sum n + l . The short arrows indicate electrons of spin “up” and spin “down”. An
example is shown for the titanium atom, in which 22 electrons fill the orbitals according to this
rule. The energy is not to scale, and indicates only the ordering of the orbital energies, not actual
energies. Light colored short arrows indicate unoccupied orbitals. (b) A simplified diagram
indicating the order of filling the orbitals according to Madelung’s rule. The filling proceeds first
down the highest arrow, in this case just the 1s orbital, then to the next arrow, so the 2s orbital,
then to the next arrow, so the 2p orbitals first, then the 3s orbitals, and so on.
In general, then, we expect the orbitals to “fill up” in some way in which the states of lower n are filled
first, because they are more tightly bound, and to some extent also we expect the states of lower l to
have lower energy than those of higher l and so to be filled earlier as well. There is an empirical rule,
known as Madelung’s rule, which is more sophisticated than our simple “shell” approach above. It is
approximately (but not perfectly15) followed by most atoms, and mostly gives the correct order of the
filling of the orbitals.
Madelung’s rule states that
14
Of course, in the hydrogen atom, states of higher n are already less tightly bound, but the decrease in binding
strength with increasing n is magnified further here by the increased “screening” of the charge of the nucleus by
the more tightly bound electrons in the inner shells.
15
See T. L. Meek and L. C. Allen, “Configuration irregularities: deviations from the Madelung rule and inversion
of orbital energy levels,” Chem. Phys. Lett. 362, 362-364 (2002) doi:10.1016/S0009-2614(02)00919-3 for a list
of exceptions. Common elements that do not obey the rule include chromium, copper, silver, platinum, and gold.
4.2 Spin and identical particles 105
the orbitals are filled up starting from smaller values of n + l and proceeding to larger values of n + l
with the states of lower n being filled first for each particular value of the sum n + l.
This approximate rule is an example of the general principle16 that the states will be filled up in order of
their energy, starting from the lowest (most tightly bound) states and proceeding to higher energy states.
Madelung’s rule is illustrated in Fig. 4.7. An explicit example is also given for the titanium atom, which
has 22 electrons. In this case, all the orbitals up to an including n + l = 4 are fully occupied, and two
electrons are in the 3d orbitals. This figure also illustrates some common notations. The chemical symbol
for titanium is Ti. The number of protons in the titanium nucleus, the atomic number, is 22, which is
also the number of electrons in the neutral titanium atom (that is, when it is not ionized in any way). The
atomic number can also be indicated explicitly together with the chemical symbol as the subscript 22,
as in 22Ti. The number of electrons in each of the sets of orbitals is indicated using the superscripts in
the atomic configuration, here 1s22s22p63s23p64s23d2.
In writing out such configurations, there is a commonly used “shorthand” way of writing them, based
on the configurations of the various noble gasses. So we can write
[ He] ≡ 1s2
[ Ne] ≡ 1s2 2s2 2p6 ≡ [ He] 2s2 2p6
≡ [ Ne ] 3s 2 3p 6
[ Ar ] ≡ 1s2 2s2 2p6 3s2 3p6 (4.27)
[ Kr ] ≡ 1s2 2s2 2p6 3s2 3p6 3d10 4s2 4p6 ≡ [ Ar ] 3d10 4s 2 4p 6
[ Xe] ≡ 1s2 2s2 2p6 3s2 3p6 3d10 4s2 4p6 4d10 5s2 5p6 ≡ [ Kr ] 4d10 5s2 5p6
With this shorthand, for example, the titanium configuration can be written
1s22s22p63s23p64s23d2 ≡ [Ar] 4s23d2 (4.28)
Note that the order of writing the configuration does not matter. The above version, (4.28), for titanium
is written in a “Madelung’s rule” order. It is also common to write configurations first in order of
increasing n, and then in order of increasing l within each group with the same n, as in
1s22s22p63s23p63d24s2 ≡ [Ar] 3d24s2 (4.29)
The above configurations, (4.27), for the noble gasses have been written this way.
This kind of approach to many-electron atoms largely explains the electronic configuration of atoms,
and also explains why it is the electrons in the last “shell”17 that participate in most of the chemical
reactions, those being the least tightly bound electrons in the atom.
16
This is known as the Aufbau principle.
17
The electrons in the last partially occupied shell are known as the “valence” electrons in chemistry.
18
Named after the Italian physicist Enrico Fermi.
19
Named after the Indian physicist Satyendra Nath Bose
106 Chapter 4 Particles, atoms, crystals
Identical particles
Another subtle point about these quantum mechanical particles is that particles of a given type are
absolutely identical to one another, with no way whatsoever of telling them apart. This means that the
counting of possible states of multiple particles is different from that of classical particles.
One way of understanding this difference is to compare dollar bills and dollars in bank accounts. Dollars
in bank accounts are identical in the sense that quantum mechanical particles of a given kind are
identical. If we have two dollar bills and two boxes, there are two different ways of having one dollar
bill in each box – bill 1 in box A and bill 2 in box B, or bill 2 in box A and bill 1 in box B. But if we
think of two bank accounts, there is only one way of having one dollar in bank account A and one in
20
Helium can also exist in the helium-3 form, with a nucleus that only has one neutron together with the two
protons. Though relatively rare compared to helium-4, it is thought to be stable – that is, not radioactive. Helium-
3 atoms behave as fermions.
21
It is possible for two photons to interact in the vacuum to produce an electron-positron pair, for example.
4.2 Spin and identical particles 107
bank account B. This “identicality” of quantum mechanical particles changes the way that they distribute
themselves among available states when we think about thermal distributions.22
Problems
4.2.1 For an electron, there is only one possible value for the magnetic spin quantum number, ms,
which is ½. True or False?
4.2.2 Suppose I have an atom in which all of the 1s, 2s, 2p, and 3s orbitals are occupied, and the total
number of electrons in the atom is equal to the atomic number of the atom. What is the atomic
number of the atom? What element is the atom?
4.2.3 The Madelung rule states that the orbitals are populated by electrons in the increasing order of
the sum of quantum numbers n + l, with priority to the orbitals with smaller n.
Applying this rule to the potassium atom (which has atomic number 19 and the symbol K, and
can be written using a standard notation as 19K), we see that its n = 4, l = 0 set of orbitals is
partially occupied, while the n = 3, l = 2 orbitals are empty. This is the first atom in the periodic
table of the elements to fill a shell with a higher n before exhausting all orbitals with lower n
(though it is still obeying the Madelung rule).
(a) Write out the quantum numbers n, l of the occupied orbitals in the ground state of 19K and
point out how many electrons are associated with each.
(b) Presuming the Madelung rule works, what is the minimum number of electrons needed to
occupy orbitals in such a way that an orbital with quantum number n = q (for some integer
q) is filled before all orbitals with n = q – 2 are completely filled? Which atom is the first
one to exhibit this property? (Expressed in similar terms, the above potassium atom is the
first one, for example, for which an orbital with n = j is filled before all orbitals with
n = j – 1 are completely filled, with, in that case, j = 4.) (Once you get your answer to the
number of electrons, you may need to look at the periodic table of the elements.
Incidentally, the resulting atom does obey Madelung’s rule.)
4.2.4 For each electron configuration given below, all of which are believed to be the electron
configurations for actual atoms, state whether it corresponds to Madelung’s rule, explaining your
answer briefly if you believe it does not correspond to Madelung’s rule.
In each case, also state what element is being represented by this configuration [Hint: count the
total number of electrons, and hence deduce the element from a periodic table or otherwise].
(i) 1s22s22p63s1 ≡ [Ne] 3s1
(ii) 1s22s22p63s23p1 ≡ [Ne] 3s23p1
(iii) 1s22s22p63s23p63d54s1 ≡ [Ar] 3d54s1
(iv) 1s22s22p63s23p63d64s2 ≡ [Ar] 3d64s2
(v) 1s22s22p63s23p63d104s1 ≡ [Ar] 3d104s1
(vi) 1s22s22p63s23p63d104s24p64d104f145s25p65d46s2 ≡ [Xe] 4f145d46s2
(vii) 1s22s22p63s23p63d104s24p64d104f145s25p65d106s1 ≡ [Xe] 4f145d106s1
22
We generally think of classical particles as being different from one another in the way that, say, bricks are
similar but slightly different from one another. For such classical particles that are similar but not identical to other
such particles, when the particles on the average have some thermal energy, those classical particles will distribute
themselves among the various possible states according to the Maxwell-Boltzmann distribution. For identical
fermions, they will obey the Fermi-Dirac distribution, and for identical bosons, they will obey the Bose-Einstein
distribution.
108 Chapter 4 Particles, atoms, crystals
4.2.5 (i) Suppose you have 3 dollar bills (all with different serial numbers, of course!). How many
different ways are there of distributing these dollar bills between two boxes A and B (including
the possibility of having no dollars in a given box)?
(ii) Suppose you have 3 dollars and two bank accounts, one with bank A and one with bank B.
How many different ways are there of distributing the 3 dollars (allowing only integer numbers
of dollars) between the two bank accounts (including the possibility of having no dollars in a
given bank account)?
4.2.6 For each of the following particles, state whether it should obey the Pauli exclusion principle.
(i) An electron
(ii) A photon
(iii) A neutron
(iv) A Higgs boson
23
We could perform such calculations using the methods we have been developing, including the boundary
conditions and appropriate forward and backward waves in each region. The case of zero barrier thickness
corresponds to a “finite potential well”, which can be solved relatively straightforwardly, though the solutions for
the eigen energies are not in closed form – we have an expression to calculate them, but this is not a simple formula,
requiring us to find zeros of a function numerically. For the more complicated coupled-well problem, we can in
principle take a similar approach. In practice we use other matrix methods to help us handle the large number of
simultaneous equations. The calculations here for the coupled wells were performed using one such matrix
approach, the “transfer matrix” method. See, e.g., Quantum Mechanics for Scientists and Engineers by D. A. B.
Miller (Cambridge, 2008), Chapter 11.
4.3 Coupled systems 109
that the solutions of this coupled-well problem are not the same as those of two isolated wells; these are
always coupled solutions; each solution describes a state of the coupled pair of wells.
Fig. 4.8. The first two energy levels and eigenfunctions of a coupled potential well for an electron,
shown for various values of the width or thickness of the barrier between the wells. The height of
the potential energy on both sides and in the barrier is 1 eV relative to the bottom of the wells.
Each of the two wells is 0.6 nm thick. The orange horizontal line (generally, the one in the middle
of the set of horizontal lines) shows the energy of the first level (at ~ 0.362 eV) in an isolated
potential well of 0.6 nm width, with the same 1 eV barrier height. The red and blue horizontal
lines show the energies of the corresponding first (lower energy) and second (higher energy)
states of the coupled well for each barrier width, and these red and blue lines are used also for the
vertical origins for plotting the corresponding eigenfunctions.
In the limit of a thick barrier, we end up with two eigenfunction solutions: one solution is essentially the
“symmetric” sum of the individual “isolated-well” wavefunctions – that is, in this case, a single “bump”
in the right well plus a single “bump” in the left well; the other solution is essentially the “antisymmetric”
sum of those “isolated-well” wavefunctions – that is, in this case, a single “bump” in the right well minus
110 Chapter 4 Particles, atoms, crystals
a single “bump” in the left well24. For a thick barrier, the energies of these solutions are approximately
equally spaced below (for the “symmetric” solutions) and above (for the “antisymmetric” solutions) the
energy of the first state of an isolated well25.
One might think that, especially as we consider thick barriers, those “isolated well” solutions would
become valid again. In fact, though, they are not energy eigenfunctions of this coupled problem, and
they would not be stable in time. This situation is like that of two coupled “masses-on-springs” problems,
or of two pendulums coupled by a weak spring. If we excite one of the masses oscillating, it will
gradually transfer its oscillation over to the other one; subsequently, that oscillation will transfer back
over again to the first mass, and so on. If we “started” the electron in only one of the wells, the same
oscillatory behavior would result here also, with the electron oscillating back and forward between the
two wells26. We could think of this, loosely, as the electron tunneling from one well to another and then
tunneling back again, and so on.
In both the classical coupled oscillators (or masses on springs) and our case here, for the eigen solutions
for this symmetric problem (identical masses and springs for the classical problem27 and identical wells
for our quantum problem) this kind of symmetric and antisymmetric behavior for the eigen states is
retained even as the coupling becomes strong.
Another very important point about this coupled solution is that the coupling here reduces the energy of
the lower state compared to the energy of the isolated well. This phenomenon is at the core of the idea
of covalent bonding of atoms in molecules. It is the reason why two identical atoms can bond with one
another – their energy is reduced in one of the coupled states. Of course, there are many more details in
the actual analysis of such bonding, but this notion – that sharing of an electron between two potentials
can reduce the energy of the state – is very important. We see a growth of the probability density of
finding the electron in the region “between” the two wells; generally the electron is on the average
somewhat nearer to the center of the coupled system in this state.
We can see from Fig. 4.8 why the energy of the lower coupled state is reduced compared to the isolated
well state; this is particularly clear as we look at the cases of strong coupling (thinner barriers). For
example, for a barrier thickness of 0.25 nm, we can see quite clearly that the lower energy wavefunction
is less strongly curved overall within a given well than the “isolated well” wavefunctions (which are
very similar to the wavefunction within a given well in the 1 nm barrier case). That lower curvature
corresponds to lower kinetic energy for the electron in the Schrödinger wave equation (a smaller
magnitude of the second derivative of the wavefunction). Symmetric wavefunction states like this are
sometimes called “bonding” states.
The second, “antisymmetric” state corresponds to higher energy than the isolated-well state. This
wavefunction requires larger curvature because it must go through zero in the barrier. That larger
curvature corresponds to larger kinetic energy (a larger magnitude of the second derivative of the
24
Whether this antisymmetric solution is “right” minus “left” or “left” minus “right” is of no particular significance
or meaning. Indeed, quite generally, we can multiply any such eigenfunctions by any unit complex factor (e.g.,
exp(iθ ) ) and they will still be valid eigenfunctions (as long as we multiply the entire eigenfunction by this same
constant factor). What matters for the antisymmetric solution is that what is on the “left” is minus the value of its
mirror image point on the “right”.
25
One simple approximate model for the case of thick barriers is the so-called “tight-binding” model. In that tight-
binding model, which does predict the limiting behavior for thick barriers, the two levels are indeed split to be
equally below and above the “isolated well” level, and the wavefunctions are these symmetric and antisymmetric
sums, respectively, of the two different “isolated well” wavefunctions.
26
In quantum mechanics, when we extend to the time-dependent Schrödinger equation, we find an angular
frequency ω associated with an energy E through E = ω . The electron would oscillate back and forward between
the wells with a frequency ω=21 ( E 2 − E1) / .
27
The spring providing the coupling between the classical masses need not be identical to the other two springs.
4.4 Crystals 111
wavefunction). Because of its higher energy, this kind of antisymmetric state is sometimes called an
“antibonding” state.
Problems
4.3.1 Consider a coupled potential well with potential barrier heights V0 = 1 eV, well thickness
(Lz) = 0.6 nm and barrier thickness = 0.25 nm. The first two eigenfunctions (ψ 1 and ψ 2 )
are shown below. These wavefunctions are plotted on the same scale below, and relative
to the same vertical zero, which is given by the horizontal line. Both of these wavefunctions
are normalized on the same vertical scale, so the relative amplitudes of the two
wavefunctions are meaningful. The “first” eigenfunction ψ 1 is positive everywhere on this
graph, and the “second” eigenfunction ψ 2 is positive and negative. The form of the
potential is also sketched on the same horizontal scale.)
(a) (i) In which of the two eigenstates is an electron more likely to be found in the middle
barrier region (-0.125 nm < z < 0.125 nm)?
(ii) In which of the two eigenstates is an electron more likely to be found in the outer barrier
region (|z| > 0.725 nm)?
(b) We will say that an electron in the first eigenstate (the lower energy one) has the probability
of P1 to be found in the middle barrier region and an electron in second eigenstate has the
probability of P2 to be found in the middle barrier region. If the middle barrier thickness
reduces from 0.25 nm to 0.05 nm, would you expect the ratio between P1 and P2 to increase
or decrease? Why?
4.3.2 Consider the calculated energy levels E1 and E2 as given in Fig. 4.8 for each of the different
thicknesses d of the barrier, from 0 to 1 nm, and specifically the energy separation between the
two, ∆E = E2 − E1 . By intelligent guesswork and empirical comparison, deduce a simple
expression for ∆E that fits the calculated values well (e.g., to better than a few meV).
[Hints: Graph up the values of ∆E as a function of the barrier width, look at the results, guess a
form (you may even have some physical intuition here as to what the form “ought” to be), plot
that form on the same graph and adjust the parameters empirically until you get a good fit. This
is likely easiest done with some kind of computer program or software. You may also want to
create a table of the differences between the given values of ∆E and your model values. You
could use that table when fine-tuning your parameters. You should not need to use any formal
“curve fitting” to get your result, and you should be able to come up with a very good model
with only two parameters, one of which can be chosen very simply and obviously.]
4.4 Crystals
The combination of atoms into larger structures is, of course, very important in a wide range of
applications. A very large part of chemistry is based on such combinations of atoms into molecules. One
112 Chapter 4 Particles, atoms, crystals
particularly important category of structures for applications in engineering is crystals, which are
essentially large, regular structures of many atoms. A large fraction of the devices we use to process and
communicate information, such as diodes, transistors, photodetectors, and semiconductor lasers, and for
other applications, such as solar cells, use so-called “semiconductor” materials. Most of these
semiconductor materials are crystals. Here we will introduce some of the key features and processes in
these crystal structures that we will need for a basic understanding of how many such devices work.
Many simple materials will form themselves in crystalline structures of atoms. Crystals have several
advantages as materials. If we are careful about the environment and manner in which we “grow”
crystals, they come out essentially exactly the same every time, so they are very predictable materials.
The regularity of their structure means that, even though there may be a vast number of atoms in a piece
of crystal, we can have relatively simple models by which we can understand their properties, such as
their electronic or optical behavior.
With semiconductors in particular, it is also often the case in practice that introducing small amounts of
controlled impurities or “dopants” in the structure enables us to get some very precise and important
kinds of control over the detailed electronic properties of the material.
In a mathematical sense,
Three-dimensional crystals are slightly more difficult to visualize than two-dimensional ones, but they
follow the same rules. In semiconductor electronic and optoelectronic devices, many of them are based
on one of the closely related lattices shown in Fig. 4.10. The diamond lattice is the crystal lattice for the
semiconductor materials silicon and germanium. The “zinc blende”28 lattice is the lattice for the many
of the so-called “compound” semiconductor materials based on “III-V” materials.29 III-V compound
semiconductors are materials made from a combination of elements from Group III of the periodic table
(e.g., gallium, aluminum, indium) and Group V of the periodic table (e.g., arsenic, phosphorus,
antimony, nitrogen). Typical III-V semiconductors include gallium arsenide and indium phosphide.
(a) (b)
Fig. 4.10. (a) Diamond lattice. (b) “Zinc blende” lattice.
Both the diamond and zinc blende lattices are based on atoms forming four chemical bonds, equally
spaced in angle, with their neighbors. Silicon and germanium are both Group IV elements, which means
four of their outer eight electron orbitals are filled. This fact makes them particularly want to bond with
four other atoms. Carbon is also a Group IV element, and when it crystallizes in this form, it is called
diamond (hence the name of the lattice structure).30 In these lattices, the four atoms connected to a given
atom form the corners of what is called a (regular) tetragon – if we joined these four atoms with lines
(which in this case would in a sense correspond to chemical bonds), they form a structure with four
equilateral triangle sides, all of equal size.
A close look at the diamond lattice shows that, somewhat surprisingly, a cubic structure emerges. The
cube is indicated by the dashed lines (and in this case those dashed lines are not representing chemical
bonds). If we look carefully, we see that there is an atom on each corner of the cube, and one in the
middle of each face. A structure consisting of sets of atoms arranged like that is called a “face-centered
cubic” lattice.
There are also other atoms that do not lie on the corners or the faces. If we were to continue this lattice
by adding more atoms in their crystalline positions in adjacent unit cells, we would see that these other
atoms also lie on a face-centered cubic lattice that is simply displaced by one chemical bond in space.
28
Zinc blende is zinc sulfide in a cubic crystalline form.
29
One exception is gallium nitride, and some of its alloys, which can crystallize in a so-called “wurtzite” structure.
The wurtzite lattice is like the zinc blende lattice, but in one direction the alternate planes are rotated by 60 degrees
compared to those of the zinc blende lattice. Gallium nitride is useful in short wavelength (e.g., blue) light-emitting
diodes, for example.
30
Carbon can also crystallize in another completely different form, graphite, which is most common as the “lead”
in a pencil. The mechanical properties of graphite are completely different from diamond; graphite is quite soft,
and is a good lubricant, whereas diamond is extremely hard, and is a good abrasive! Graphite is made of loosely
bound layers of carbon atoms. These layers can be detached relatively easily, in which case the layers are called
graphene. If we “roll up” graphene sheets, we can make so-called carbon nanotubes. Related structures can also
be formed into more spherical structures called “bucky balls”.
114 Chapter 4 Particles, atoms, crystals
Thus the diamond lattice in this sense is actually made from two interlocking face-centered cubic lattices
(or sometimes described as interlocking sublattices).
In the zinc blende form of lattice, one of the kinds of atoms (e.g., arsenic, represented by the larger balls
in Fig. 4.10) lies on one of these face-centered cubic sublattices, and the other (e.g., gallium, represented
by the smaller balls in Fig. 4.10) lies on the other face-centered cubic sublattice. Though the Group III
element only has 3 electrons in its outer shell, and the Group V has 5, they share these between the
atoms to make four bonds for each atom. The fact that, in the zinc blende lattice, one kind of atom sits
on one of the face-centered cubic sublattices and another kind lies on the other face-centered cubic
sublattice helps us clarify that we indeed have two interlocking face-centered cubic lattices in both the
zinc blende and diamond lattices31.
An important additional set of materials, especially for optoelectronic devices, is the set of III-V alloy
materials. These are materials such as indium gallium arsenide, or aluminum gallium arsenide. Materials
with three elements in them are known as “ternary” alloys. In these example cases, the Group III face-
centered cubic sublattice has a usually random distribution of, e.g., indium and gallium atoms on it. It is
also possible to extend this to choices of two atoms for each of the sublattices, as in indium gallium
arsenide phosphide (which, because it has four components, is called a “quaternary” alloy), or three
different choices for one lattice, as in indium gallium aluminum arsenide (also a “quaternary” alloy).
Though the random distribution of elements on a given sublattice means these alloys are technically not
crystals, in practice we can mostly get away with treating them as crystals with some kind of average of
the properties of the perfect two-component “binary compound” crystals (such as gallium arsenide and
indium arsenide). These alloys give a great deal of flexibility in the design of materials with exactly the
desired properties, and they are used extensively in optoelectronic devices. Changing the alloy
composition can change the wavelength or color of light emitted by a light-emitting diode or
semiconductor laser, for example.
Of course, not all the materials we use in making devices are crystalline. Often, we take care to make
the semiconductors crystalline because we need the best, most predictable, and precisely controlled
performance from them. Other materials also used in making devices, such as metals for conductors and
oxides for insulators, do not have to have precise crystalline forms to operate successfully, and we can
be more relaxed in the way we make them32. Materials where we are not too concerned about a precise
crystalline structure can often be made by simple, flexible techniques, such as evaporation onto the
surface of interest. Growth of crystals usually requires carefully controlled temperatures; also, crystals
can typically only be grown uniformly and precisely if they are being grown on some substrate or seed
that itself is crystalline with the same separation between the atoms; such growth is called “epitaxial”
31
Now, we might ask, for both the diamond and zinc blende cases, what is the mathematical crystal lattice that
corresponds to these? The answer to that question is that the crystal lattice in the mathematical sense is actually
one face-centered cubic lattice. But, for each such mathematical point in the lattice, there are two adjacent atoms
associated with that point. For the zinc blende case, that pair of atoms corresponds to one of each kind (e.g., one
Group III atom and one Group V atom). So in that sense the zinc blende lattice is actually a lattice of pairs of
atoms (one atom from each “sublattice”). The actual unit cell for such a crystal will generally have more than one
atom in it. There are often many ways we can draw a unit cell for a given lattice, all of which may be valid (that
is, stacking them regularly side by side fills all space), and different unit cells may have different numbers of
atoms. So-called “primitive” unit cells are ones with one lattice point in them, and one well-known class of such
primitive unit cells is the Wigner-Seitz unit cell, which can be formed for any lattice. For this zinc blende crystal,
however, even a primitive unit cell will have at least two atoms in it (one of each kind). This emphasizes for us
again that we should not necessarily associate each lattice point with an atom, nor should we necessarily associate
the lines between the crystal lattice points with chemical bonds.
32
Elemental metals can also be made as perfect crystalline materials, and this is sometimes done for specific
purposes, including scientific studies. Silicon oxides, used extensively as insulators in electronic integrated
circuits, are usually not crystalline, though a crystalline form of silicon dioxide, known as crystalline quartz, can
be made and is used for various applications. Glass, which is also essentially silicon dioxide, is actually best classed
as a super-cooled liquid, and is quite amorphous.
4.5 Emergence of bands 115
growth, meaning that the growing surface takes on the order of what lies beneath it33. Materials with
essentially no crystalline structure are called “amorphous”; amorphous silicon is relatively easy to make
and can be used in solar cells, for example.
Problems
4.4.1 If the following forms of matter were infinite in size, would they be considered as crystals and
why or why not? (Note: To avoid any confusion, use the specific mathematical definition of a
crystal given above in (4.30).) (You may want to consult other reference materials if you need
to understand more about the various forms of matter below.)
(a) Liquid water
(b) An ice cube (at least, an ideal one)
(c) Water vapor
(d) Window glass
(e) A snowflake
(f) Diamond
(g) Graphite
(h) A silicon wafer
(i) Wax
(j) Graphene
33
In “epitaxial”, “epi” means “surface” (as in “epidermis” for the outer layer of skin), and the “tax...” part means
an “ordering” (as in “taxonomy” for classification).
34
These calculations are also performed using the transfer matrix technique.
35
We are using different barrier thicknesses between the wells in these different figures, but the well widths and
barrier heights do correspond.
36
In one of the simpler approximate approaches, the “tight-binding” model, for example, the energy width of the
band limits to twice the separation of the two states in a coupled well.
116 Chapter 4 Particles, atoms, crystals
of different energies, with the separation between the lowest and highest energies in a band – the energy
width of the band – tending towards a limit as we increase the number of wells. Further increases beyond
these 8 wells here would lead to relatively little increase in this energy separation between the lowest
and highest energy states in the band.
Fig. 4.11. Illustration of the coupling of multiple identical wells. The calculations are for an
electron with a potential barrier height of 1 eV, well thickness of 0.6 nm, and barrier thickness
between the wells of 0.15 nm, for 2 coupled wells (top), 4 coupled wells (middle), and 8 coupled
wells (bottom). The wavefunctions of just the lowest and highest energy states are shown for the
case of the 8 coupled wells.
If we increase the width of the wells being coupled, the energy of the first level will decrease and we
can usefully look also at what happens to the second level. This is shown in Fig. 4.13 where we have
doubled the thickness of the individual wells37. The details of the wavefunctions for the lower energy
band in Fig. 4.13 are shown in Fig. 4.14, and similarly the wavefunctions for the upper energy band in
Fig. 4.13 are shown in Fig. 4.15. If we look at these sets of wavefunctions for the different levels within
a band, we can start to see a pattern emerging. These wavefunctions are a product of (i) a “unit cell”
function that is essentially the “isolated well” function or is closely related to it, and (ii) an overall
standing-wave “envelope” function. For example, for the lowest state in the band in Fig. 4.11 and Fig.
4.12 for the case of 8 wells, we can see an overall standing-wave “envelope” pattern corresponding to
37
The parameters for this wider individual well here are the same as in the bottom panel of Fig. 4.8.
4.5 Emergence of bands 117
one half-wave of a sine wave, just like a wave on a string or a particle in a box, that is “modulated” by
one “bump” per well.
Fig. 4.12. Expanded detailed view of the wavefunctions for the case of eight coupled wells as in
Fig. 4.11.
For the highest state in this band, we see that the wavefunction is changing sign between adjacent wells.
This is because the “unit cell” or isolated-well function is multiplied by an envelope that has 4 full waves
(or 8 half-waves) between the “walls” at either end. For the other levels between these two, if we look
closely we can discern that the “envelope” function for each successive higher energy state has one more
half-wave in it. We see the same behavior for the lower energy band for the eight coupled wells in the
wider well case shown in Fig. 4.13 and Fig. 4.14.
Fig. 4.13. Coupling of multiple wells, with well width 1.2 nm, barrier width 0.15 nm, and barrier
height 1 eV, showing two bands of states, each corresponding to a different state in the isolated
well. (See the bottom panel in Fig. 4.8 for the wavefunctions and energies of such an isolated
well.) The wavefunctions for the lowest and highest states in each band are shown.
The upper band in Fig. 4.13 and Fig. 4.15 is showing similar behavior, with a “unit cell” or “isolated
well” wave function multiplied by a similar envelope. We notice two differences of detail here compared
to the lower band: (i) the “unit cell” or “isolated well” wavefunction itself passes through zero
somewhere in the middle of each well, just like the second state in the isolated well; (ii) the order of the
envelope functions is reversed, with the “single half-wave” one being at the top of the band, and the “8
half-waves” one being at the bottom of the band – this different order is, however, consistent with the
amount of “curvature” in the overall waves being larger in the higher energy states.
118 Chapter 4 Particles, atoms, crystals
Fig. 4.14. Expanded detailed view of the wavefunctions for the lower energy band for eight
coupled wells as in Fig. 4.13.
Fig. 4.15. Expanded detailed view of the wavefunctions for the upper energy band for eight
coupled wells as in Fig. 4.13.
We can understand why the energy width of the band saturates. We can see in that, at one extreme of
the band energies, the envelope function will tend towards a very flat form as we increase the number
of “periods” in our finite structures; then, nearly all the curvature, and hence the “kinetic energy”
associated with the wavefunction, comes from the “unit cell” part, so this “kinetic energy”, and hence
the overall energy of the state, will saturate out to some value as we increase the number of periods.
Similarly, at the other extreme of the band energies, the envelope function tends towards a sinusoid that
changes sign between each adjacent period, and again the overall “curvature” and hence the “kinetic
energy” will saturate out to some value. Hence, as we increase the number of periods in these structures,
we expect the band energy width to saturate out to some value that becomes independent of the number
of periods.
Though shown here for a finite number of periods (or “atoms” or wells), the kinds of behaviors discussed
in this section continue as we increase the number of periods and hence as we consider crystals more
4.6 The Bloch theorem 119
generally. If we allow ourselves one mathematical simplification, we can express this behavior
particularly elegantly through what is called the Bloch theorem.
One-electron approximation
When we were discussing many-electron atoms above, we already introduced a key idea that we will
extend here. This is the idea that, in a complicated system with very many electrons and nuclei, we may
be able to pretend, at least as a useful approximation, that an electron sees an average potential from all
the other electrons and nuclei38.
In the case of a crystal, which is a structure that is periodic, we can also pretend that average potential
VP (r ) is itself also periodic, with the same periodicities and symmetries as the crystal itself. For
example, for a one-dimensional system with a repeat length of a, we would have
VP ( z + a ) =
VP ( z ) (4.31)
We can keep on extending this for subsequent periods; that is,
VP ( z + 2a )= VP ( z + a )= VP ( a ) (4.32)
where m is an integer. We could also appropriately generalize this for two-dimensional or three-
dimensional periodic structures and corresponding periodic potentials. Hence we presume that, at least
as a first approximation, for any given electron, we can write a Schrödinger wave equation for just that
one electron in the form
2 2 ( )
− ∇ ψ r + VP ( r )ψ ( r ) =
Eψ ( r ) (4.34)
2mo
where VP ( r ) is periodic in three dimensions39, just like the crystal. This is what we call the one-electron
approximation for our crystalline structure.
This is certainly an approximation. There are many phenomena that are not covered by such an approach,
such as the scattering of electrons off one another or from the nuclei (or vibrations of them), both of
which are significant causes of electrical resistance in materials, for example. Often, though, we can
38
Again, an approach like this is sometimes called a Hartree-Fock approximation.
39
If we were dealing with a two-dimensional “sheet”-like structure, then we could have a similar equation where
r was a vector in two dimensions.
120 Chapter 4 Particles, atoms, crystals
model such phenomena, if they are not too strong, using this one-electron model as a starting point and
adding in these other effects as what we call “perturbations”.
One of the most important consequences of this model is that it allows a simple way of looking at
crystals, through the Bloch theorem.
where m is an integer40. This expression for the “periodic potential” VP in Eq. (4.35) is just like the one
for the infinite crystal in Eq. (4.33). If this chain is very long, we can argue that its internal properties
will not be substantially different from those of an infinitely long chain, so this is a good model that
gives us a finite system while keeping it periodic.
There are some quite real problems for which such an approach is rigorously correct; an example would
be a “benzene ring”, which is exactly such a ring of six carbon atoms41. In the physics of larger crystals,
though, this presumption that we can pretend the ends of the crystal are joined round onto one another
is just to make the mathematics simpler.
The rationalization as to why we can get away with this for crystals is that, if the chain is sufficiently
long, we are arguing that its properties cannot really depend much on whether the ends of the chain are
joined together. Of course, simultaneously joining each opposite pair of surfaces of a three-dimensional
object, like a “cube” of some physical crystalline material, to one another is geometrically absurd.
Nonetheless, we do this mathematically anyway for our models because it makes the mathematics so
40
m can even be larger than N and this will still work.
41
Each carbon atom also has one hydrogen atom attached to it in a benzene ring.
4.6 The Bloch theorem 121
much simpler. This approach is called using “periodic boundary conditions”. It does work in practice
very well for crystalline structures that are at least moderately large – for example, perhaps at least 100’s
of atoms in each direction42, and we may be able to use it for somewhat smaller structures also.
In periodic boundary conditions, then, we imagine we have a chain of N unit cells that are joined in a
ring. If we presume the length of the unit cells is a, then the length of the entire chain is Na. We do want
the wavefunction to be single-valued; otherwise how could we differentiate it or evaluate its squared
modulus, for example? The periodic boundary conditions then require that, if we go all the way round
the ring, the wavefunction solutions must get back to where they started. So for some wavefunction
solution ψ ( z ) , where z is the distance along the chain, this requirement that the wavefunction gets back
to where it started when we go all the way round the ring (the “periodic boundary conditions”) implies
that, for any z,
ψ ( z + Na ) =
ψ (z) (4.36)
For the case of quantum mechanical waves, we also want any measurable quantity, such as the electron
density, to be the same in every unit cell, so we want
2 2
ψ ( z + a) =
ψ (z) (4.37)
which means that the electron density in one unit cell has the same form as the one in the next unit cell,
and so on all the way round the chain. We can satisfy both of these conditions, Eqs. (4.36) and (4.37),
if we propose wavefunction solutions of the form
ψ ( z ) = u ( z ) exp ( ikz ) (4.38)
where u ( z ) , the “unit cell function”, is the same in every unit cell – that is
u ( z + a) =
u(z) (4.39)
and where k takes on any of a set of N values that are spaced by an amount 2π / Na , a set that is
conventionally written in a “symmetric” form 43 as
2nπ N
k= , n = 0, ±1, ±2, ± (4.40)
Na 2
The assertion that we can write the solutions for such periodic boundary conditions in the form of Eq.
(4.38) with such restrictions on k is known as the Bloch theorem. It is straightforward to check that this
form does indeed satisfy both conditions Eqs. (4.36) and (4.37). In this form, the “ exp ( ikz ) ” function is
an example of an “envelope” function that multiplies the unit cell function, u ( z ) .
The main mathematical differences between this Bloch form solution, Eq. (4.38), and the “standing
wave” solutions for a finite length of crystal are that (i) the Bloch form corresponds to traveling, not
standing, waves, (ii) the finite crystal with “ends” to it has standing wave “envelopes” that correspond
to fitting integer numbers of half-waves between the ends, whereas these Bloch form solutions fit integer
numbers of whole waves round the ring, but the propagating waves in opposite directions count as
42
If we have to analyze a small structure, then we will have to be careful, and we may have to stop using periodic
boundary conditions, reverting to actual “hard wall” boundary conditions at the ends of some finite structure.
43
Note that writing these allowed values this way (i) presumes N is even (though that is not a necessary restriction
for using periodic boundary conditions), and (ii) leads to N + 1 elements in the list. These “details” are customarily
ignored because N is presumed large enough that no-one cares much about the difference between N and N + 1 or
whether it is even or odd. To get the right number of elements, we can leave off either − N / 2 or + N / 2 . An
alternative list that avoids both of these minor issues is to choose a “single-sided”
= form n 0,1, 2,, N − 1 , though,
by convention, we typically do not use this list, mostly because it is not symmetric about zero. Formally, we are
mathematically writing out the N different “Nth roots of unity” – the N distinct numbers that, when raised to the
power N give the result 1.
122 Chapter 4 Particles, atoms, crystals
different solutions, so we get to the same number of different solutions overall44,45, and (iii) such
solutions in this Bloch form will in general be complex.
We can formally return to our sets of potential wells and solve for the states of them now using the
periodic boundary conditions46. We can choose the “left” of our unit cell at some position zL (for
example, in the middle of the barrier on the left) and the right of the unit cell therefore at some position
z=
R z L + a , where a is the repeat length (so zR would then be in the middle of the barrier on the right).
Now, from Eq. (4.38) with Eq. (4.39), we have47
ψ ( z + a ) = u ( z + a ) exp [ik ( z + a )] = u ( z ) exp(ikz ) exp(ika )
(4.41)
= exp ( ika )ψ ( z )
Then, in this Bloch form approach, presuming we know the periodic potential, VP ( r ) (which becomes
just our “rectangular well” potential in the z direction in our example here), our mathematical task
formally becomes solving the Schrödinger equation, Eq. (4.34), subject to the boundary conditions
ψ q ,k ( z R ) = exp ( ika )ψ q ,k ( z L ) (4.42)
Now we are being explicit in our notation for the wavefunction ψ q ,k ( z ) that we are solving for a
wavefunction in a specific “band” q and at a specific one of the allowed values of k.
For each k we expect to get a set of eigen solutions, indexed by some integer q that we could call the
“band index”, with eigenfunctions
ψ q ,k ( z ) = uq ,k ( z ) exp ( ikz ) (4.43)
and some corresponding set of eigenenergies Eq , k . Effectively, this approach means that for each
allowed k we are really solving for the unit cell function, u q , k ( z ) , and the associated eigenenergies.
The results of one such calculation for our periodic set of “rectangular” potential wells are shown in Fig.
4.17.
We see from the top trace in Fig. 4.17 that the unit cell function for this particular k value (or for this “k
state” in the band) is indeed periodic with the periodicity of the “crystal” – that is, it is the same in each
unit cell48. The unit cell function for the state in this lower band has no zeros inside each well, as we
might expect since it is formed from the lower state in the isolated potential well.
The middle trace shows the (real part of) the propagating wave “envelope” function. For this particular
chosen value of k, the “envelope” propagating wave goes through one full cycle from the left to the right
of the figure. The bottom trace shows the (real part of) the total wavefunction, which is the product of
the unit cell function and the propagating wave “envelope” function, as we expect for such a Bloch form
44
Except, of course, that, with k values as stated in Eq. (4.40), we have one too many in the Bloch form case.
45
One more substantial “error” in the Bloch form is that it allows a k = 0 solution that is not quite physical for a
crystal of actual finite length; that solution would be zero everywhere in the “standing wave” case if the
wavefunction has to hit zero at the walls.
46
By using the Bloch form of the solutions, this case of “rectangular” potential wells solved with such periodic
boundary conditions can be completed analytically (though the solutions are not in closed form). That particular
analytic approach is known as the Kronig-Penney model.
47
This form here, ψ ( z + a ) = exp ( ika )ψ ( z ) , is just as good a statement of the Bloch theorem result as are Eqs.
(4.38) and (4.39), and is often used mathematically instead.
48
Or, at least the real part, as shown, though in fact the complete complex unit cell function is also the same in
each unit cell.
4.6 The Bloch theorem 123
of solution. The total wave in Fig. 4.17 corresponds here approximately with the second state from the
bottom in Fig. 4.1449.
Fig. 4.17. Calculation of the wavefunctions for a periodic set of potential wells, each like those
in Fig. 4.13 (well width 1.2 nm, barrier width 0.15 nm, giving a repeat length
a =1.2 + 0.15 =1.35 nm , and barrier height 1 eV), and for a k value of 0.25π / a 0.582 nm −1 .
The vertical lines are drawn at the positions of the middles of the barriers between adjacent wells.
The real parts of the complex wavefunctions are shown. The calculation is for one state (the
second lowest) in the lower energy band, which corresponds to coupling the “lower” states in
each well.
Essentially, what the Bloch theorem approach has done is transformed a relatively intractable problem
of solving for all the states of a large system of many atoms into a problem that is roughly like solving
for the states of just one “unit cell”, which is more like solving for the states of just one atom. This
problem is somewhat more complicated than the hydrogen atom problem because (i) we have to guess
what the potential energy is, possibly iterating to get some form so the calculated energies agree with
measured energy separations in the material, and (ii) we have to solve the problem again for each value
of k (though just with the same potential). Such “band structure” calculations are a major branch of the
subject of solid-state physics.
Problems
4.6.1 Consider a ring of 10 atoms spaced by a distance a = 5 nm round the ring
(a) Write out all 10th roots of unity, rn, such that rn10 = 1. (Choose the way you write these so
that the case of n = 1 corresponds to the very simple root “1”, and try to proceed
sequentially with n as you write down the subsequent roots. You should actually be able to
write a simple formula for these roots as a function of n from 1 to 10, and you can leave
the result in some exponential form, for example.)
49
The propagating wave envelope function in Fig. 4.17 has a different phase compared to the standing wave in
Fig. 4.14, so the total wave is shifted laterally in Fig. 4.17, though that phase is arbitrary, so there is no particular
meaning to this lateral shift.
124 Chapter 4 Particles, atoms, crystals
(b) Considering the wavefunction of an electron as a function of the distance s round the ring,
write down the periodic boundary condition for the wavefunction as a function of s for this
10-element ring (using units of nm for the distance). Write down the corresponding relation
between the modulus squared of the wavefunction at a point so and a point so + 5 nm.
(c) By considering the Bloch form of the solutions of such a problem (that is, a product of a
unit cell function and a propagating wave “envelope” function with appropriate k values),
calculate all the distinct possible values of k vectors that are allowed for the Bloch
“envelope” functions. (Note: use a “single-sided” list of allowed values of k here, not the
more common “symmetric” list of Eq. (4.40), so corresponding to running from 0 to N − 1
rather than running from − N / 2 to + N / 2 .)
(d) If the electron is in the state with the lowest allowed k vector magnitude, what is the
probability of finding the electron in the first unit cell (the first unit of length a round the
ring, starting from the top of the diagram and going to the right)? What is the probability
of finding it in the sixth unit cell (the one just to the left of the bottom of the diagram)?
How do your answers change if you consider the electron in the state with the fourth lowest
k vector magnitude?
[Note: if you conclude that some aspects of this question are actually trick questions, you
might be right!]
4.6.2 For a wavefunction ψ ( z ) that is a solution of Schrödinger’s time-independent equation for an
electron wave in a periodic potential with period a in the z direction, we must have that
ψ ( z + a) =
ψ ( z ) . True or false? (Justify your answer briefly.)
4.6.3 Consider a system with some specific periodic potential where VP ( z + a ) =
VP ( z ) . Suppose we
know that an eigenstate of such a system is
2π 2π
ψ ( z ) = exp −i − k z − exp i + k z
a a
Rewrite the eigenstate in the Bloch form ψ ( z ) = u ( z ) exp ( ikz ) , where u ( z ) is a function that is
the same in every unit cell. [Hint: think about how trigonometric functions can be written using
complex exponential functions.]
4.6.4 Suppose that, in a periodic potential of period a in one-dimension, a function
ψ 1 ( z ) = u1 ( z ) exp ( ik1 z ) where u1=
( z ) u1 ( z + a ) is known to be a solution of the one-electron
time-independent Schrödinger equation under periodic boundary conditions, with some specific
eigen energy E1. Consider now the function ψ 2 ( z ) = u2 ( z ) exp ( ik2 z ) where k2= k1 + 2π / a .
Give an expression for u2 ( z ) that will guarantee that ψ 2 ( z ) is also a solution of the same
Schrödinger equation with the same eigen energy E1.
Fig. 4.18. Band structure (eigenenergies as a function of wavevector k) for a periodic structure of
rectangular wells (well width 1.2 nm, barrier width 0.15 nm, so a repeat length
a =1.2 + 0.15 =1.35 nm , and barrier height 1 eV). The dark-colored dots are for N = 8 , the
medium-colored dots show the additional results for N = 16 , and the light dots show the
additional results for N = 32 . The solid lines show the limiting behavior for very large numbers
of periods.
First, note that the k values for these dots are equally spaced, by an amount 2π / Na , and, consistent
with Eq. (4.40), these run from −π / a to +π / a . To get the correct number (8) of distinct states, we
could just leave out either one of the “end points” – that is, either the −π / a or the +π / a value for k.
We can see that, as we have “pushed” the “atoms” together to make this “crystal”, we can keep the same
counting of states. That is, some given atomic level in each of N atoms has turned into N energy levels
in a band in the combined system, with the same such counting behavior for each atomic level.
Second, note that, for each value of k, within the energy range from 0 to 1 eV for which we performed
calculations here, there are two possible energy eigenvalues, as given by the “red” dots for the lower
band and the “blue” dots for the upper band.
Third, as we increase N, here by doubling N first to 16, and then to 32, we end up merely “filling in” the
spaces between the existing “dots”, retaining all the previous solutions. The solid lines show the limiting
cases for large N. Note in particular that the energy width of the bands does not increase as we keep on
increasing N to arbitrarily large numbers. We end up with energy bands of definite width, as shown for
the lower and higher energy bands 1 and 2 in Fig. 4.18. Note also that these bands are smooth and
continuous once we increase to large N, and all the allowed energy solutions lie on these curves.
Fourth, we see in this case that there is a range of energies between the two bands for which there are
no energy eigenstates. Such energy ranges are called “band gaps”, with a band gap energy EG. The
existence or otherwise of band gaps in a band structure can be very important in influencing both the
electrical and optical properties of a material.
Here we have only calculated for the allowed set of N different k values. Mathematically, we could keep
on performing calculations for further values of k spaced by further amounts 2π / Na . Essentially, we
would just be continuing the band structure of Fig. 4.18 horizontally. If we did that, what we would find
is that we would just keep repeating the same energy solutions; we could just take the curves in Fig.
4.18 and copy them sideways, shifting them over by an amount 2π / a each time. The total
126 Chapter 4 Particles, atoms, crystals
wavefunctions for these solutions would also just be the same as for the corresponding solution in the
diagram in Fig. 4.18. Any one such 2π / a range of k values can be called a “Brillouin zone”.
Conventionally, the diagram centered round k = 0 , like the one in Fig. 4.18, is referred to as the first
Brillouin zone, and the point k = 0 is referred to as “zone center”. We only ever need to calculate the
results for the first Brillouin zone because it contains all the distinct solutions.
Crystal momentum
For a particle in a uniform potential, we know that the eigen solutions take the form of plane waves. For
example, in one dimension, for a particle of mass mb with a given energy E, solving Schrödinger’s
equation would give a wavefunction ∝ exp ( ikz ) for a “forward-going” wave, with corresponding eigen
energy E = 2 k 2 / 2mb . We also know from de Broglie’s hypothesis that the magnitude of the
momentum for such a situation is p = k .
Now, in the case of these solutions in Bloch form, we also have as the “envelope” part of the solution a
propagating wave of the form exp ( ikz ) . We can if we want define a similar concept here that we call
the “crystal momentum” pC , which we write down with a formula that looks essentially the same as de
Broglie’s hypothesis,
pC = k (4.44)
where the k here is the k in the propagating “envelope” function.
This quantity turns out to have many effective properties that behave like actual momentum for an
electron in a crystal. In many ways, it can appear as if an electron in the crystal has such a momentum.
We should be clear, though, that, because we are only considering the “envelope” wavefunction of the
form exp ( ikz ) here rather than the full wavefunction of the electron, pC is not actually the momentum
of an electron in the crystal; we could call this effective momentum a “pseudo-momentum”.
It can, for example, appear that an electron accelerated by a force in the crystal acquires just such an
effective momentum, and in processes like optical absorption, it can appear that “momentum” overall is
conserved when we use this pseudo-momentum as the effective momentum of the electron. In all these
cases, in fact we deduce this effective conservation of this pseudo-momentum from deeper analysis of
the actual problem of interest – the effective “conservation” of this pseudo-momentum is something that
comes out of that analysis, not a principle we put into the analysis. Nonetheless, this concept turns out
to be so useful in sufficiently many situations that it is common in discussions of electrons in crystals to
refer, loosely, to this crystal momentum as the electron momentum. One particular example of this
occurs when we consider effective mass.
Effective mass
We have seen already in our example calculations for the case of a periodic structure of “rectangular”
wells that we get a band structure that has minima and maxima at particular points in the Brillouin zone
(that is, for particular values of k). Near some minimum or maximum, at least for some small range of
k, we could fit the energy as a function of k with a parabola, as sketched in Fig. 4.19.
For such a parabola centered round k = 0 , we could write down an approximate formula for the electron
energy, relative to the energy at the minimum or the maximum, Eo, that, by definition of a parabola,
would have the form E − Eo ∝ k 2 . One convenient way of writing down the proportionality constant is
to write this relation in the form
2k 2
E − Eo = (4.45)
2meff
4.7 Band structures 127
This parameter meff , which we call the “effective mass”, is then just a parameter with dimensions of
mass that characterizes how shallow or steep this parabolic variation of energy is near to k = 0 . A large
value of meff – a large “effective mass” – corresponds to a shallow parabola in Fig. 4.19, and a small
value – a small “effective mass” – corresponds to a steep parabola.
Fig. 4.19. Sketches of hypothetical band structures with a maximum in a lower “valence” band
and a minimum or minima in an upper conduction band, together with parabolas (dashed lines)
fitted to the maxima and the minima. The sketch on the left shows a “direct band gap” material,
with the conduction band minimum and the valence band maximum at the same k. The sketch on
the right shows an “indirect band gap” material, with the conduction band minima and the valence
band maximum at different k.
If we then treat pC = k as an effective momentum, then we get a relation between the energy E − Eo
and this effective momentum that is of the form
p2
E − Eo =C (4.46)
2meff
which is consistent with this effective momentum behaving in many ways like momentum for a particle
of an effective mass meff. (For minima or maxima centered round some different value ko of k, as in the
conduction band case on the right of Fig. 4.19, we would replace k with k − ko in Eq. (4.45).)
This idea of effective mass is very useful, especially when discussing the behaviors of electronic and
optoelectronic semiconductor devices. It is important to remember that it is only an effective concept,
albeit a very useful one. We are not in fact changing the mass of the electron. But the band structures
we get can usefully make it appear as if the electron mass is different.
where the unit cell function is the same in every unit cell of the crystal (in all two or three directions),
and the components of the wavevector k all obey relations similar to Eq. (4.40) with the appropriate
“repeat lengths” a for each direction (which will be different for each direction if the unit cell has
different lengths in each direction).
128 Chapter 4 Particles, atoms, crystals
We can construct band structures for two- and three-dimensional crystals in similar ways to those
discussed here for one dimension. Of course, such crystals can be more complicated than those in one
dimension, with potentially much more complicated forms of unit cells; to plot out the band structure,
we would need a four-dimensional diagram, with three dimensions to correspond to the components of
a vector k in the Bloch form, and a fourth dimension to indicate the corresponding energies. Of course,
we cannot readily do that graphically. As a compromise, band structures for such two- or three-
dimensional crystals are usually just plotted along very specific directions, allowing a band structure
diagram that is not very different at first sight from those we have plotted for one dimension. The
horizontal axis will now correspond to the magnitude of k along a specific direction. It is also common
to plot the band structure for several different specific directions in one overall graph.
To understand the basic meaning of band structures for three-dimensional crystals, for example, we need
first to understand the Brillouin zone in three dimensions. Now rather than being just a one-dimension
range of one k, it needs to be a three-dimensional construction that shows the range of each of the three
components of k. Different crystal structures have different forms of such three-dimensional Brillouin
zones. The one corresponding to diamond or zinc blende lattices in the way it is usually drawn is shown
in Fig. 4.20. Note that this Brillouin zone has a three-dimensional shape.
We might want to plot out the band structure along the x, y, and z directions, for example. Actually, for
a crystal with such underlying cubic symmetry, we only need to do this along one of these three
directions. The band structure along the x direction would look the same as that along the y direction or
along the z direction.
Fig. 4.20. The Brillouin zone for the face-centered cubic lattice, showing the ranges for the x, y,
and z components – kx, ky, and kz – of the k vector. The Γ (gamma) point is at k = 0 , that is, in
the center. The X direction is along a cube edge direction. The L direction is along the space
diagonal direction of the cube.
Therefore, we could plot the band structures from the so-called Γ (gamma) point (the point at k = 0 )
out to the point X along the x direction, as shown in Fig. 4.20. Another interesting direction might be
along the “space-diagonal” direction in the cube. In Fig. 4.20, that corresponds to the direction out to
the L point. One other simplification is that band structures themselves are generally symmetric about
the center of the Brillouin zone (the Γ point)50. We see that symmetry already in the simple band
structures we calculated for the one-dimensional lattice of rectangular wells, as in Fig. 4.18. As a result,
we only need to plot in the “positive” half of the Brillouin zone.
When band structures are plotted, therefore, we can choose to use the “right” half of the plot for the
band structure along one direction, such as the band structure out to the X edge point on the Brillouin
50
This symmetry is known as Kramers degeneracy. For each point on one side of the Γ point, there is one with the
same energy symmetrically on the other side; technically, these are associated with opposite spin values, though
states of opposite spin often have the same energy anyway, so band structures are often symmetric even for the
same spin.
4.7 Band structures 129
zone, and the “left” half of the plot for the band structures along some other direction, such as out to the
L edge point (that is, in the “space diagonal” direction).
Fig. 4.21. Sketch of the upper valence and lower conduction bands for silicon, after calculations
by K. S. Sieh and P. V. Smith, Phys. Status Solidi (b) 129, 259 (1985).
Fig. 4.22. Sketch of the upper valence and lower conduction bands for gallium arsenide, after
calculations by M. Rohlfing, P. Krüger and J. Pollmann, Phys. Rev. B 48, 17791 (1993).
Example forms of the band structures for silicon and for gallium arsenide are plotted in Fig. 4.21 and
Fig. 4.22. In general, for band structures, we typically only plot some of the bands, usually including
just the highest completely full bands (usually, valence bands) and the lowest completely or partially
empty bands (usually, conduction bands).
Each band is formed from some combination of atomic states. The combinations associated with “core”
electrons in the atom, while they do form bands, are usually of little technological importance, being
still very tightly bound to the nucleus, and they may contribute relatively little to the bonding into a
crystal form. So we typically do not plot those bands. We are most interested in the electrons in the
outer, highest energy “valence” set of orbitals, just as we are generally in chemistry also.
130 Chapter 4 Particles, atoms, crystals
For Group IV materials like carbon, silicon and germanium, in the outer, highest energy orbitals, we are
dealing with the combination of 2 “s” orbitals and 6 “p” orbitals (where we include both spin states),
which gives 8 orbitals that together hold 4 electrons. Putting atoms of one such type together to form a
crystal leads to the formation of two sets of bands; at least loosely, one of these sets of bands is based
on “bonding” versions of the combined orbitals of the different atoms and the other based on
“antibonding” versions of those same orbitals. The bonding versions can have lower energy than the
antibonding versions.
If those “bonding” and “antibonding” sets of bands do not overlap in energy, the bonding versions will
be essentially fully occupied, to give what are called the “valence” bands, and the antibonding versions
will be essentially unoccupied, to give what are called the “conduction” bands51. Such non-overlapping
band behavior, with a band gap energy between them, is characteristic of a semiconductor or an
insulator52. In practical terms, the GaAs material and its various other III-V cousins behave essentially
like effective Group IV materials, with similar behavior of the band structures.
We will return to discuss band structures and their consequences for physical properties and devices
later. For the moment, we note a few aspects of these band structures for Si and GaAs. First we see that
both of these band structures do show a band gap between the highest valence bands, which are all
nominally fully occupied, and the lowest conduction bands, which are all nominally empty; this finite
separation by a band gap energy is a necessary requirement for “semiconductor” behavior, which we
will discuss in more detail later. By convention, the energy origin is taken at the top of the highest
occupied valence band; this is just an arbitrary choice – the actual zero of energy there has no absolute
significance.
In contrast to our simple one-dimensional coupled well cases, we have multiple bands here that do
overlap in energy, and in some cases actually coincide with one another at specific points in the Brillouin
zone. It is still the case, though, for all such band structures that, for N atoms in the crystal, each band
has N states in it (multiplied by 2 if electrons of different spins also have the same energy), and all bands
have the same set of equally spaced k values associated with them.
The shapes of these bands are also more complicated than those of the simple one-dimensional
“rectangular well” case. If the lowest point in the conduction band does not lie at the same point in the
Brillouin zone (that is, at the same k value) as the highest point in the valence band, the material is said
to have an “indirect” band gap (see also the right-hand diagram in Fig. 4.19), which is the case for
silicon. By contrast, GaAs has a “direct” band gap, with the minimum in the conduction band lining up
with the maximum in the valence band53 (see also the left-hand diagram in Fig. 4.19). Whether a material
has a direct or an indirect band gap greatly influences is optical properties, as we will see later.
This whole notion of band structures, which is mathematically enabled by the Bloch theorem (Eqs.
(4.38), (4.39), and (4.40)), is very widely used in discussing a broad range of electrical and optical
properties of crystalline solids, and therefore forms the scientific basis of much of modern electronic
and optoelectronic device technology.
51
For molecules, the terminology HOMO, for highest occupied molecular orbital, and LUMO, for lowest
unoccupied molecular orbital, is often used.
52
We will return to discuss the difference between semiconductors and insulators, which is really a quantitative
one based on the number of thermally excited electrons we have at normal temperatures; that number in turn
depends on the band gap energy.
53
In this particular case, the minimum and maximum are both at k = 0 , though that is not a necessary requirement
for a direct band gap
4.8 Electrons in bands 131
Problems
4.7.1 For a direct band gap semiconductor, if the lowest energy in the conduction band lies at k = 0 ,
there must also be a minimum in the electron energy in the highest valence band at k = 0 . True
or false? (Justify your answer briefly.)
4.7.2 In a band structure, one of the bands has a minimum at zone center. At a point in the Brillouin
zone near that minimum at which the k vector has a magnitude of 0.5 nm-1, the energy of an
electron in this band is +100 meV relative to the energy in that band at zone center. Presuming
we can approximate this band minimum by a parabolic variation of energy with k, what value of
effective mass does this minimum correspond to, expressed in units of the free electron mass mo
(that is, if you think the effective mass is b × mo , your answer would be b)?
4.7.3 The band structures for three-dimensional crystals are usually just plotted along very specific
directions because it requires a four-dimensional diagram to plot the energy E as a function of
kx, ky, and kz all on one diagram. Therefore, typically the band structures are plotted from the
center Γ point out to the point X along the x direction, and to the point L along the “space
diagonal” direction, for the first Brillouin zone (Fig. 4.19 – Fig. 4.22) in a crystal with an
underlying cubic kind of symmetry of the crystal structure.
(a) By looking at the band diagram of silicon (Fig. 4.21), determine how many equivalent
conduction band minima silicon has in its lowest conduction band. That is, how many
distinct “valleys” at or near the ∆ (delta) point(s) are there when we think about the crystal
and the Brillouin zone in all three dimensions?
(b) By looking at the band diagram of GaAs (Fig. 4.22), determine how many equivalent
conduction band minima GaAs has in its lowest conduction band (considering only the
lowest energy minimum or minima). In other words, at how many different points in the
GaAs Brillouin zone in three dimensions will we find the same (lowest) minimum energy?
(c) In the band diagram of GaAs (Fig. 4.22), we can see that the second lowest minimum in
the conduction band as drawn here lies at the “L” point, at one edge of the Brillouin zone.
How many equivalent such “second” conduction band minima are there for GaAs? [Hint:
Think about what happens if a minimum “valley” is shared by two Brillouin zones.]
one hand, and metals on the other hand. In insulators and semiconductors, the counting of the available
valence electrons is such that, at least at low temperature, ideal perfect semiconductors and insulators
have bands that are either completely full or completely empty. The highest occupied bands are called
the valence bands, and the lowest unoccupied bands are called the conduction bands. Both
semiconductors and insulators have a finite band gap energy EG separating the highest valence band and
the lowest conduction band.
Fig. 4.23. Schematic illustration, in a one-dimensional case, of band structures (i) on the left, of
insulators and semiconductors (at least at low temperature), and (ii) on the right, of metals.
In metals, the counting of electrons and band states is such that there is a band that is partially filled with
electrons, even at low temperature. Conventionally, that band is also called the conduction band54.
Metals may or may not have band gaps between various bands. Band gaps are relatively unimportant
for metals because the electrical properties of metals are largely determined by the electrons in the
partially filled conduction band. Band gaps are, however, are crucial for the properties of
semiconductors and insulators.
In the case of a metal, with no applied electric field, of course we expect no current flowing. Indeed, for
the kind of symmetric filling of the metal conduction band as shown in Fig. 4.23, there is no current.
With a metal, if we apply an electric field, we will tend to “skew” the distribution of the electrons to one
side or the other in our band structure diagram drawn as a function of k; that “skewing” does correspond
to a flowing current in one direction or the other55. Of course, we can only skew the distribution if there
are empty states into which to “skew” the electrons.
In the case of a semiconductor or an insulator, rather obviously if a given band, such as a “conduction”
band, is empty, it cannot contribute to the current. Somewhat less obvious, however, is that a full band
cannot conduct current either. Since all the states are full, we can make no changes in what states are
54
Indeed, this idea of a conduction band only makes literal sense for metals. The “conduction” band does not
necessarily “conduct” electrical current for semiconductors or insulators.
55
We can think of this as unbalancing the momentum distribution of electrons, giving more with momentum in
one direction than in the other. The actual calculation of currents from band structures is quite subtle, however,
requiring the concept of group velocity to calculate it correctly. It is, though, still the case that this “skewing” of
the electron distribution does correspond to current flow.
4.8 Electrons in bands 133
occupied56; we cannot “skew” the distribution of electrons, for example. So, with bands that are either
completely full or completely empty, there is no current flow; hence an insulator can insulate.
Fig. 4.24. Schematic illustration of bands in real space, in one direction (i) on the left, of insulators
and semiconductors (at least at low temperature), and (ii) on the right, of metals.
Fig. 4.24 shows the bands plotted another way, still with increasing electron energy on the vertical axis,
but with ordinary position on the horizontal axis. This kind of diagram does not show the k-states or the
band structure as a function of k, but it is often used when discussing electronic and optoelectronic
devices because those are typically made of multiple different materials joined together, and we need to
see the behavior of band edges and carrier distributions especially near the junctions between the
materials. On this diagram it is also clear that the insulator and the (low-temperature) semiconductor do
not conduct electricity. Also in real space there is nowhere for the electrons to go in the valence band –
all the states are already full – and obviously we cannot conduct in the “conduction” band because there
are no electrons to give the conduction. For the metal, there are certainly available positions for electrons
to move to, so conduction is possible.
To understand why a semiconductor can conduct electricity, and how we can achieve quite substantial
control of that conduction to make transistors and other electronic devices, we need to understand first
how finite temperature affects the precise distribution of electrons. Simply put, once we consider finite
temperature, there can be small populations of electrons thermally excited across the band gap energy,
and one result of that thermal excitation is that we can now conduct electricity, at least to some degree,
in a semiconductor. The difference between semiconductors and insulators is essentially just the size of
the band gap energy, which is relatively small for semiconductors, and relatively large for insulators.
With a small band gap energy, that thermal excitation across the band gap is a stronger process, and
hence such small band gap energies result in materials that partially conduct electricity (even when the
materials are pure).
56
Applying normal electric fields to semiconductors or insulators does not typically cause electrons to move from
one band to another, though, with very high fields under some conditions, that kind of “interband” process is
possible. This is the kind of mechanism that can then allow electrical breakdown of insulators.
134 Chapter 4 Particles, atoms, crystals
57
If you have dug a hole in the ground, but then decide it is in the wrong place, and it needs to be moved to the
right, then to do so, you end up moving soil to the left.
4.8 Electrons in bands 135
Fig. 4.25. Sketch of n-type doping (left) with added donor atoms giving donor levels just below
the conduction band and leading to electrons ionized into the conduction band, and of p-type
doping (right) with added acceptor atoms giving acceptor levels just above the valence band
leading to “holes” ionized into the valence band.
Fig. 4.26. Schematic illustration of bands in real space, in one direction, for an n-doped and a p-
doped semiconductor.
Under normal conditions of room temperatures and moderate levels of doping, the simple model of one
added electron in the conduction band for each added donor atom or one added hole in the valence band
for each added acceptor atom is roughly correct in typical semiconductors. It is not, however, quite
exactly correct, and that simple picture does not explain what happens as we change temperature or if
we use a high doping density (a large number of dopant atoms per unit volume); it also does not explain
why adding dopants to insulators, or even in some cases to wide band gap semiconductors, often has
little or even sometimes no influence on their conductivity.
A more sophisticated model of doping is to say that adding in, say, a donor atom adds in an electron and
also a “donor” energy level that the electron can occupy, as sketched in the left of Fig. 4.25. Under the
136 Chapter 4 Particles, atoms, crystals
normal conditions (e.g., room temperature or above) at which we operate semiconductor devices, most
of the electrons associated with such donor atoms are effectively “ionized” by the available thermal
energy to become electrons in the conduction band. Similarly, for acceptors and holes, the most of the
holes associated with the acceptor atoms are effectively “ionized” to become holes in the valence band.
Analogously to Fig. 4.24, we can also show these n-doped and p-doped semiconductors with ordinary
position on the horizontal axis (Fig. 4.26). Again, this kind of picture is common when discussing device
structures that are made with layers of different materials.
In insulators, for a variety of reasons, adding impurities may not lead to them conducting electricity.
One reason can be that the energies of donor and acceptor levels can be too far from the corresponding
band edges, and the thermal ionization of those “dopant” atoms does not occur to a sufficient degree.
Problems
4.8.1 At very low temperature in a metal, there will be no electrons in the conduction band. True or
false? (Justify your answer briefly.)
4.8.2 A perfectly pure semiconductor material will have equal numbers of electrons in the conduction
band and holes in the valence band, and so it will not conduct electricity because the electron
and hole currents will exactly cancel. True or false? (Justify your answer briefly.)
4.9 Conclusions
Here we have seen that, starting from the Schrödinger equation, and with the addition of the key ideas
of the Pauli exclusion principle and spin, we have been able to build up a deep framework for
understanding atoms and materials, especially crystals.
The solution of the hydrogen atom problem lays the basis for understanding the orbitals of atoms. The
Pauli exclusion principle, together with the approximate notion of each electron seeing an effective
average potential from all the other electrons and atomic nuclei, lets us understand more complicated
atoms. A simple picture of coupled systems gives us the basis for understanding core aspects of chemical
bonding, such as bonding and antibonding states. Extending that picture into periodic systems, especially
with the simplifying mathematical approach of the Bloch theorem, lets us see the emergence of bands
of states in crystals. This band picture gives us a solid basis for understanding many of the electronic
and optical properties of important materials for devices.
Though the details of all of these phenomena can become quite involved and difficult to calculate in
practice, we now have the basic quantum mechanical ideas and vocabulary we need for use in a broad
range of applications.
Thermal distributions
5.1 Statistics and physics
Until the last 30 years of the 19th century, heat and the phenomena connected with it were generally
regarded as being based on continuous variables. For example, energy was not presumed in any way to
be discretized into “chunks”, nor were atoms or other particles considered only to occupy specific
countable states. The concept of entropy was emerging from this background as scientists and engineers
were trying to understand the limits to efficiency in steam engines. Entropy in such purely
thermodynamic discussions is a very abstract idea and is quite difficult to understand. In turn, that makes
the Second Law of Thermodynamics – which in one form states, essentially, that entropy always
increases – even more difficult to comprehend.
The idea that thermodynamics, and entropy in particular, had anything to do with statistical distributions
was initially an unusual concept. Indeed, there was at that time in the latter part of the 19th century no
basis for saying that any kind of discrete states were involved in describing matter. Such ideas did not
emerge in their own right until the advent of quantum mechanics in the 20th century. Despite this absence
of the ideas of countable states for matter, these ideas of statistical distributions and a resulting firm
underpinning for all of thermodynamics emerged from the work of James Clerk Maxwell, Josiah Gibbs,
and, especially, Ludwig Boltzmann, primarily in the last part of the 19th century. These ideas remained
very controversial throughout Boltzmann’s lifetime, only finally being vindicated convincingly in the
early 20th century.
Arguably, the statistical basis behind all of thermodynamics makes it much more directly
comprehensible. In particular, entropy then has a relatively straightforward and tangible meaning1.
1
The ideas of entropy are also closely tied in with the modern concept of information, though we will not be able
to discuss that link in any detail here. “Bits” are, however, actually a measure or unit of entropy in information.
138 Chapter 5 Thermal distributions
particular number of “heads” and “tails”. The number of microstates in a macrostate is called the
“multiplicity” of the macrostate.
Fig. 5.1. Different possible results (“microstates”) for tossing a coin 4 times, grouped into
“macrostates” depending on the number of ways of getting given numbers of “heads” and “tails”,
with “multiplicities” representing the number of microstates in a macrostate. The “excess”, 2se,
is the number of “heads” minus the number of “tails” in a given macrostate.
Fig. 5.2 shows these multiplicities for this case of N = 4 coin tossings (top left graph), and for larger N,
plotted against the “excess”, 2se, which is the difference between the number of “heads” and the number
of “tails”2. se itself is the number by which the number of “heads” exceeds the average number of
“heads”, that average being ( N / 2) = 2 here for N = 4 .
We can see various behaviors in Fig. 5.2. First, we note that the single most likely outcome is that we
will have equal numbers of heads and tails, with 6 microstates corresponding to that macrostate – that
is, with a multiplicity of 6. Outcomes or macrostates with differing numbers of heads and tails are less
likely – that is, they have smaller multiplicities. We see that as we increase N, the number of times we
toss the coin, this behavior persists. In all cases, as the difference between the number of heads and tails
becomes larger – that is, as we move further from the center in any of the graphs in Fig. 5.2 – the
multiplicity drops.
Second, we see that the multiplicity at the peak – that is, at se = 0 – becomes very large as we increase
N. For large N, the number of sequences of coin tossings in which the number of heads and the number
of tails is the same or similar becomes extremely large, and starts to dominate over all other possibilities3.
2
Our notation here anticipates calling 2se the “spin excess” – the difference between the numbers of “spin-up” and
“spin-down” in a set of spins.
3
This phenomenon is the source of the error in the “gambler’s fallacy”. The gambler thinks that just because the
coin has come up “heads” several times in a row, it is more likely on the next coin toss that it will come up “tails”.
5.2 Tossing coins 139
Third, with increasing N, the distribution is getting wider – for N = 128 we can see by eye that the width
of the distribution with se is maybe something like ~ 20 or so, which is obviously wider than in any of
the other graphs in Fig. 5.2 – but, importantly, compared to the number N of coin tossings, it is becoming
a smaller fraction.
Fig. 5.2. Multiplicities (bars) for different N (number of coin tosses or number of spins) as a
function of 2se, the excess number of heads over tails or “spin-up” over “spin-down” values. The
curves are the Gaussian approximation to the multiplicity for large N.
The formula that gives these bar graphs is quite straightforward, and is an example of the binomial
distribution. To understand this counting, we can consider we have N coins in a row; we can think of
them as all being “heads” to start with if we like, though it does not matter in the end for this argument4.
We take some number k of them, and flip them over (to “tails”, if we like). We thereby create two sets,
the flipped-over set with k coins and the un-flipped-over set with N − k . There are N ways we can
choose the first coin to flip over, N − 1 ways we can choose the second, and so, down to N − k + 1 for
our kth choice; multiplying these together gives us
N!
N ( N − 1)( N − 2 ) ( N − k + 1) = (5.1)
( N − k )!
There are also k ! different orderings in which we could have chosen which coins to flip over while still
leaving us the same sets of flipped-over and un-flipped-over coins in the row, so we divide by k ! , giving
our final expression for the total number of different-looking rows of coins in which k of them are flipped
over, that is
The gambler believes this because he knows that, on the average, the fraction of heads and tails does come out to
about 50% each, and so he thinks there must be some mechanism, which he sometimes calls “the law of averages”
(a non-existent statistical “law”), that somehow balances everything out. In fact, all that is going on is that there
are just many more possible sequences in which there are approximately equal numbers of heads (H) and tails (T),
and any sequence is presumed to be equally likely – HHHHHHHH is just as likely as HTTTHTHH. This
phenomenon of the dominance of sequences with near equal numbers of heads and tails is correctly viewed as an
example of the “law of large numbers”, a statistical law that does exist.
4
Rather than talking about flipping coins over, we can just talk choosing to set some coins to tails, and setting all
the others to heads.
140 Chapter 5 Thermal distributions
N!
g= (5.2)
( N − k ) !k !
which is therefore our counting of the multiplicity of a macrostate corresponding to k “heads” and
N − k “tails”5.
For our arguments here, we prefer to write this expression in terms of se, which would give us
( N / 2 ) + se for the number of “heads” and ( N / 2 ) − se for the number of “tails”6. In that case, we will
write for the multiplicity in this problem
N!
g ( N , se ) = (5.3)
N N
+ se ! − se !
2 2
It is easy to check these formulas Eqs. (5.2) and (5.3) against the results in Fig. 5.1 and Fig. 5.2. For
example, for k = 1 “heads” out of N = 4 coin tossings, we would have, from Eq. (5.2)
4 × 3 × 2 ×1
=g = 4 (5.4)
( 3 × 2 × 1) × 1
and for se = 2 , an “excess” 2 se = 4 , which corresponds to 4 more “heads” than “tails”, we would have
from Eq. (5.3)
4! 4!
g ( 4,=
2) = = 1 (5.5)
( 2 + 2 )!0! 4!0!
where we remember that 0! = 1 .
We want to be able to draw conclusions about large physical systems, with numbers of atoms that may
be 1020 or more. To look at what happens as we increase to even larger N, we can use an approximation
to this binomial distribution, which in turn is based on an approximation for the factorial n ! for large n.
This approximation
1
log ( n !) log 2π n + n log n − n (5.6)
2
is known as Stirling’s formula7,8.
Note, incidentally, that in fundamental mathematical and physical expressions such as Eq. (5.6),
logarithms are usually implicitly or explicitly to the base e 2.718281828459 . Sometimes such
“natural” logarithms are written as “loge” or as “ln”, which is more explicit than just writing “log”. The
“ln” (i.e., “el en”) notation is both explicit and compact, but unfortunately it is easily confused in most
N!
5
The binomial coefficient g = we write here is a standard result in combinatorics, sometimes known
( N − k )!k !
N
as “N choose k”, and is also written in notations like C ( N , k ) , CkN and .
k
6
We are implicitly assuming here for simplicity that N is even. This is not a necessary restriction for dealing with
this problem, but it makes the algebra somewhat easier, and since N will be large in the following argument, this
will not matter in the end.
n
7
Equivalently, n ! 2π n ( n / e ) where e here is the base of the natural logarithms. Often, Stirling’s
approximation is written dropping the first term as being relatively insignificant at large n, giving
log ( n !) n log n − n , though we need the fuller form here for one further result.
8
James Stirling (1692-1770) was a Scottish mathematician.
5.2 Tossing coins 141
typefaces with “In” or even “one n”. Because nearly all the logarithms we use will be such natural
logarithms, we will use the notation “log” to refer to the natural logarithms; we will explicitly use
notations of the form “log2” or “log10” to refer to logarithms to the base 2 or to the base 10 if we need
them.
Using this approximation, Eq. (5.6), we can derive a convenient result. For large N, we can
approximately rewrite the expression in Eq. (5.3) as
2s 2
g ( N , se ) g ( N ,0 ) exp − e (5.7)
N
where
2 N
g ( N ,0 ) 2 (5.8)
πN
The proof of this result is relatively straightforward, starting from Eq. (5.3) and from Stirling’s formula,
Eq. (5.6), and is left as an exercise for the reader9. A function of this type as in Eq. (5.7) is called a
Gaussian, and it gives a characteristic “bell-shaped” curve.
Fig. 5.3. Multiplicity relative to the maximum using the Gaussian approximation (i.e., plotting
exp ( −2se2 / N ) ) for large N, ) as a function of se / N , the relative difference of the number of
“heads” or “spin-up” states compared to the average.
The expression Eq. (5.8) is used to draw the limiting curves in Fig. 5.2. We can see by eye that, even
for these relatively small values of N, this expression is a good approximation. We can now use Eq.
(5.8) to examine the multiplicity for very large N. The relative multiplicity10 is plotted in Fig. 5.3 for N
of 103, 104, 105 and 106. As N becomes large, the absolute width does continue to grow, but the relative
width becomes smaller, as we see in Fig. 5.3.
We can see directly from Eq. (5.8) that the value of se for which the distribution falls to 1/ e2 of its peak
value is
sec = N (5.9)
9
Other than standard manipulations of logarithms, the only additional step required is approximating
log (1 + x ) x − x 2 / 2 for small x.
10
By relative multiplicity we mean the multiplicity relative to that at se = 0 .
142 Chapter 5 Thermal distributions
which is therefore a characteristic width of the distribution. So, though the width of the distribution
grows as N , for the relative width we have
sec 1
= (5.10)
N N
which expresses the fact that the relative width of the distribution decreases as the square root as N
increases. So, for example, if we tossed a coin N = 1020 times, the average number of heads or tails
would be 0.5 within roughly 1 part in 1010.
Note that nearly all the microstates are found in or very close (in terms of se / N ) to the most likely
macrostate, especially for large N. This is a quite general behavior for such statistical systems with large
numbers of elements like this.
Problems
5.2.1 Following Fig. 5.1, draw all the possible “microstates” for tossing a coin 6 times. Then group
these microstates into “macrostates” that contain all the microstates with the same numbers of
“heads” and “tails”, with “multiplicities” representing the number of microstates in a macrostate.
Which macrostate is the most probable state?
5.2.2 Consider N = 10, 20, 30, and 40 coin tossings. For N = 10 and 20, calculate any binomial
coefficients by hand. For N = 30 and 40, you may have automatic calculators or some other
resources do the job for you.
(a) For each N, calculate the multiplicity of the most probable macrostate using the binomial
distribution formula and then again with Stirling’s approximation. Calculate the percentage
error of using Stirling’s approximation in each case. How do the percentage errors change
as N gets larger?
(b) For each N, find the excess number of “heads” (2se) where the multiplicity of this
macrostate falls to 1/e of the maximum multiplicity. How does the ratio se/N for this se
change when N gets larger? [You may use Stirling’s formula for this part of the question.]
5.2.3 If I toss a coin 100 times,
(i) calculate how many possible sequences of results there are in which only two of the results
are “heads” and the others are all “tails”, and
(ii) estimate approximately how many possible sequences there are for which the number of
“heads” and the number of “tails” are equal
5.2.4 The stadium at McRowe State University seats 50,420 and has an attendance of 43,340 at a game.
(a) Assuming that half of the attendees are McRowe State fans and can sit only in the north
part of the stadium (which contains half of the seats), while the other half of the attendees
can only sit in the south part, write out an expression for the number of different ways the
seats can be occupied. (All fans are presumed created equal, so it does not matter who is
sitting at an exact spot within their appropriate half of the stadium, only which seats are
occupied). Note: because of the large numbers, use Stirling’s formula to approximate any
factorials in your expression.
(b) Now assume there is no segregation between the two halves of the stadium, so any fan can
occupy any seat. By approximately what factor is the number of possible seating
arrangements smaller or larger than in the first case?
5.2.5 Using Stirling’s approximation and the power series expansion of log (1 + x ) (up to second order
in x) show that the binomial coefficient
5.3 Two-state spin systems 143
N!
g ( N , se ) =
N N
+ se ! − se !
2 2
for large N can be written approximately as
2s 2 2 N
g ( N , se ) g ( N ,0 ) exp − e where g ( N ,0 ) 2
N πN
11
We want this “negligible interaction” so we can talk meaningfully about individual spins without having to
consider coupled spin systems whose energies might be influenced by that coupling.
12
Essentially here, we are dividing possible microstates into ones that are “accessible” and ones that are not. For
many physical situations, this is a very good approximation. We can imagine situations in which it is not, however.
If some barrier between two gasses is very slightly permeable, then we cannot quite neglect the possibility that an
atom or molecule from one side will get to the other side. Then, such a microstate is a possibility, but it is less
likely than many other possibilities, at least if we only wait for some finite time to declare “equilibrium”. We can
extend statistical mechanics with probabilistic weightings to cover such situations, though we will not do that here.
That extension is, however, quite common in the idea of entropy as used in information theory. For example, some
letters in an alphabet are more likely to occur than others in a written message. “E” is a particularly common letter
in English, for example, whereas “q” is not, so entropy in information theory when applied to written messages
has to account for these probabilities.
144 Chapter 5 Thermal distributions
Note, incidentally, that we are presuming that there is some physical process that allows the spins to
“flip” from one state to the other (i.e., from “spin-down” to “spin-up” or vice versa) – possibly as a
result of some collisions, for example - so that all possible microstates of spin-up and spin-down for
each spin are “accessible” in this sense.
To model physical systems more realistically, we need to allow the microstates to have energies that
may not all be the same. For our spin system, we can imagine that we apply a magnetic field of
magnitude B along the “up-down” axis of the spins. For a small magnet of magnetic moment (magnet
“strength”) µs, the energy of a “spin-up” “magnet” in the magnetic field can be written as
Eµ = − µ s B (5.11)
A spin with “spin-down” will have exactly the opposite energy in the magnetic field. To the extent that
some spins are “up” and others are “down”, the energies of equal numbers of “up” and “down” spins
will cancel out when we work out the total energy, U. Hence, the total energy only depends on the “spin
excess” 2 se , specifically being13
U ( se ) = −2 µ s Bse (5.12)
We are interested for the moment in some closed physical system in which this total energy U is fixed
for some reason, which means we are presuming conservation of energy for this set of spins in what we
will call a “closed” system.
Fig. 5.4. Two systems containing spins, (a) initially separate (top), and (b) later joined through a
thermally conducting wall (bottom)
We presume that the total energy in the combined system is to be conserved, even when we join the
systems through some thermally conducting wall. So, at all times, the sum
13
Remember that 2se is the number by which we have more “spin-up” than “spin-down”; it should not be confused
with the spin quantum number s.
5.3 Two-state spin systems 145
s=
e se1 + se 2 (5.13)
which will give the total energy of the system through Eq. (5.12), will be fixed or “conserved” even
after we join the systems. (This thermally conducting wall conducts heat, but does not allow spins to
move through it, so the numbers N1 and N2 remain fixed throughout.)
Our goal now is to deduce what will be the most likely possible macrostate for splitting the total se
between the two systems when we have reached thermal equilibrium after joining the two systems
through the thermally conducting wall. Since we know that, for large systems, the most likely macrostate
and those close to it dominate the possible microstates, if we know this most likely macrostate and its
properties, we will essentially know the properties of the system at thermal equilibrium.
Presuming N1 and N2 are both large, for a given se1 and N1 in system 1, we know the multiplicity of the
macrostate for system 1 from Eq. (5.7), which is
2s 2
g1 ( N1 , se1 ) g ( N1 ,0 ) exp − e1 (5.14)
N1
and similarly for the multiplicity for system 2, g 2 ( N 2 , se 2 ) .
The multiplicity of the macrostate of the entire system is simply the product of the multiplicities of the
parts; in given macrostates of systems 1 and 2, if there are g2 microstates of system 2 possible for each
microstate of system 1, and there are g1 possible microstates of system 1, then the total number of
possible microstates for the entire system is g1g2. So, if there are g ( N1 , se1 ) possible microstates for
system 1 and g ( N 2 , se 2 ) for system 2, then for the entire system, there are
2s 2 2s 2
gtot g ( N1 ,0 ) g ( N 2 ,0 ) exp − e1 exp − e 2 (5.15)
N1 N2
possible microstates for these choices of se1 and se2 that characterize the macrostate of systems 1 and 2,
respectively. Our task now is to find out how we should split up se between se1 and se2 so as to maximize
the multiplicity gtot.
Since se itself is fixed, then we can rewrite Eq. (5.15) in terms only of a fixed se and a variable se1
2 se21 2 ( se − se1 )2
gtot ( se1 ) g ( N1 ,0 ) g ( N 2 ,0 ) exp − exp − (5.16)
N1 N2
Now we can differentiate this with respect to se1, setting the result to zero to find the choice of se1 that
maximizes gtot. It is slightly easier to do this if we take the logarithm and find the maximum of the
logarithm (if a positive function is at a maximum, so also is its logarithm). Taking the logarithm of both
sides of Eq. (5.16) gives
2 s 2 2 ( se − se1 )
2
Since in a given magnetic field the energy of a given one of the systems is just correspondingly
proportional to se1 or se2, that is, from Eq. (5.12)
U1 ( se1 ) = −2 µ s Bse1 and U 2 ( se 2 ) = −2 µ s Bse 2 (5.20)
the statement Eq. (5.19) is equivalent to the statement that the energy per spin in the two systems is the
same – that is, from Eqs. (5.19) and (5.20)
U1 U 2
= (5.21)
N1 N 2
Of course, if the energy per spin is the same on each side, then the energy per spin in the entire system
is also the same. That is, with
U= U1 + U 2 (5.22)
and
N
= N1 + N 2 (5.23)
then
U1 U 2 U
= = (5.24)
N1 N 2 N
and similarly14
se 2 se1 se
= = (5.25)
N 2 N1 N
So, by example here with this simple system, we have uncovered a relatively straightforward result: the
average energy for each similar microscopic system – here the spins – is the same for systems at thermal
equilibrium. To understand the deeper meaning of this, we need to generalize some more, and introduce
two concepts that are new to our discussion here – entropy and temperature.
Problems
5.3.1 Consider a system of 6 spins in the presence of a magnetic field B. The energy of a macrostate
with spin excess 2se is given by U ( se ) = −2 µ s Bse . The system is at equilibrium. Presume for the
purposes of this question that each possible microstate is equally likely, independent of its
energy.
(a) What is the probability of the system having energy U = 4 µ s B ?
(b) Derive an expression for the expected value (i.e., the average value) of the energy. The
expected value can be calculated by the formula X = ∑ n P ( X n ) X n , where P ( X n )
represents the probability that the variable X takes value X n .
5.3.2 We have two sets of “spins” (such as electrons), one on the “left” and one on the “right” that are
initially in separate boxes. A magnetic field (the same in each box) is applied so that spin-up
spins have a specific amount more energy than spin-down spins. The number of spins is fixed,
14
Though this result is perhaps obvious from the physical discussion of equal energy per spin, we can prove it
algebraically as well.
From Eq. (5.19), we have se 2 = N 2 se1 / N1 , so se = se1 + se 2 = se1[1 + ( N 2 / N1 )] = se1[( N1 + N 2 ) / N1 ] = se1 N / N1 ,
hence se / N = se1 / N1 .
5.4 Entropy and temperature 147
with 20 spins in the “left” box, and 80 spins in the “right” box. Initially, in the box on the left,
15 spins are spin-up and 5 are spin-down, and in the box on the right, 25 are spin-up and 55 are
spin-down. We then join the boxes through a thermally conducting wall that can pass energy (but
not spins) between the boxes, but otherwise both boxes are isolated from the environment at all
times. After we let the combined system come to thermal equilibrium, what is the most likely
configuration? In other words, what is the most likely number of spins being spin-up in each
box?
where we are summing over all the possible values of the energy U1 (for example, from 0 to U). Because
U1 and U2 must add up to the total energy U, as given by Eq.(5.22), we have substituted U − U1 for U2.
Formally, to evaluate the multiplicity we should perform this summation. We presume, however, that
the multiplicity overall will be dominated by the term in the sum for which the product
g1 ( N1 ,U1 ) g 2 ( N 2 ,U − U1 ) is largest, or at least a group of terms close to that, so we will consider just
such terms, finding the conditions under which they are maximized; we presume that doing so will
effectively also maximize the sum as in Eq. (5.26).
In general, if we make small (actually infinitesimal) changes in quantities such as U1 and U2, we will
make small (actually infinitesimal) changes in the multiplicity g that depends on U1 and U2. This small
change is called the differential. We note, first, that, since g1 does not depend on U2 and g2 does not
depend on U1 (at least when we are regarding U1 and U2 as independent variables for the moment), that
∂ ( g1 g 2 ) ∂g1 ∂ ( g1 g 2 ) ∂g
= g 2 and = g1 2 (5.27)
∂U1 ∂U1 ∂U 2 ∂U 2
15
The restriction to absolutely specific values of energies is unrealistic and somewhat problematic – there might
be only one possible microstate of the system associated with one “exact” energy. We can, however, imagine we
are considering some small range of energies δ U about any specific value U in considering our macrostates, hence
avoiding this formal problem.
148 Chapter 5 Thermal distributions
Hence the differential and can be written for this specific case as16
∂g ∂g
=dg 1 g 2 dU1 + g1 2 dU 2 (5.28)
∂U1 N1 ∂U 2 N2
The idea of this differential is that it would formally tell us the small change in the multiplicity g that
would result if we made small changes in U1 and U2, presuming that we somehow knew the partial
derivatives in Eq. (5.28).
In our particular situation here, where total energy is conserved, we know that if U1 increases by some
amount then U2 must decrease by an equal amount, so
dU1 = −dU 2 (5.29)
If the system is to be in the macrostate with the largest multiplicity, then its multiplicity g = g1 g 2 should
be at a maximum as far as the choice of U1 is concerned. Therefore, if we were to make an infinitesimally
small change in U1, there should be no change in g1 g 2 . Hence we can write that the differential of
g = g1 g 2 should be zero, and using Eq. (5.29) we have
∂g ∂g
dg = 1 g 2 dU1 − g1 2 dU1 =0 (5.30)
∂U1 N1 ∂U 2 N2
1 ∂g1 1 ∂g 2
= (5.31)
g1 ∂U1 N g 2 ∂U 2 N
1 2
or, equivalently,
∂ log g1 ∂ log g 2
= (5.32)
∂U1 N1 ∂U 2 N2
So, we are concluding that, in thermal equilibrium, these two partial derivatives should be equal. If these
two are equal, we will have found the conditions for the product g1 g 2 to be maximized, and we are
arguing that, since such macrostates dominate the sum of the multiplicities over all possible macrostates
(i.e., Eq. (5.26)), we will have found the conditions for maximizing that multiplicity overall.
Usefully now we can define a quantity, which we call the “entropy”17
16 ∂g1
Remember that the notation for the partial derivative means the derivative with respect to U1 with N1
∂U1 N1
held constant, and is formally defined as
With partial derivatives generally, we sometimes do not specify explicitly which variables are being held constant
if that is presumed obvious. So, for this derivative here, we do not explicitly mention that U2 is being held constant
by listing it as a subscript beside N1, though we could do that if we felt it was important for clarity.
17
We can think of this particular version as being a “statistical” definition of entropy, sometimes called
“fundamental entropy”. Different ways of stating entropy can use logarithms to different bases (information theory
uses logarithms to the base 2) or have constants multiplying them (thermodynamics typically multiplies by
5.4 Entropy and temperature 149
σ ( N ,U ) ≡ log g ( N ,U ) (5.33)
The key idea of entropy is that, for some given macrostate, it is the log of the multiplicity.
Note that, though the multiplicity of some combined system is the product of the multiplicities of the
individual systems, because of the use of the logarithm, the entropy of the combined system is the sum
of the entropies18. So, if we have two systems, 1 and 2, in macrostates with multiplicities g1 and g2
respectively, and hence with entropies
σ 1 = log g1 and σ 2 = log g 2 (5.34)
then the total entropy of the joint system will be
log ( g1 g 2 ) =
σ tot = σ1 σ 2
log g1 + log g 2 =+ (5.35)
Continuing with our argument here, we have, from Eq. (5.32) and the definition Eq. (5.33), for the
macrostate with the largest multiplicity (the most probable configuration)
∂σ 1 ∂σ 2
= (5.36)
1 N1 ∂U 2 N2
∂U
This is the condition for thermal equilibrium for two systems in thermal contact. We could restate this
in words as “the rate of change of entropy with energy is the same for all systems in thermal equilibrium
with each other” (at least for the case here of fixed numbers of particles in each system).
We are used to the idea that when systems are in thermal equilibrium with one another that it is the
temperature that is the same. That is, for two such systems, with temperatures T1 and T2, respectively,
in thermal equilibrium, we expect them to have the same temperature
T1 = T2 (5.37)
Hence we can now identify the quantities in Eq. (5.36) as being some function of temperature. To accord
with our conventional understanding of temperature and its prior use in thermodynamics, we use the
reciprocal of these quantities as the temperature, and also use a constant to get the temperature into the
unit with which we are familiar, that is
1 ∂σ
= kB (5.38)
T ∂U N
where kB is Boltzmann’s constant
k B 1.380 6488 × 10−23 J K −1 (5.39)
The Boltzmann constant is only there because of our system of units. It can be more convenient to work
with “fundamental temperature”, τ, which we can define as
1 ∂σ
= (5.40)
τ ∂U N
or equivalently
τ = k BT (5.41)
Boltzmann’s constant). Such minor differences of definition make no difference to the underlying concept of
entropy; all share the core idea of the logarithm of the multiplicity.
18
Remember log=
xy log x + log y
150 Chapter 5 Thermal distributions
Note that the fundamental temperature has dimensions of energy, which is the real unit of temperature.
In that sense, even the kelvin unit19 is redundant, though it can be practically more convenient20. Various
other temperature scales are also in use based on the kelvin unit or “degree” Celsius or the Fahrenheit
degree (Fahrenheit, Rankine)21.
In thermodynamics, we write
1 ∂S
= (5.42)
T ∂U N
where the (thermodynamic) entropy S corresponds with our statistical “fundamental” entropy through
S = k Bσ (5.43)
Again, units for entropy are technically unnecessary since it is really a pure number, being the logarithm
of a number, but for historical reasons relating to the emergence of entropy first as a concept in
thermodynamics, that thermodynamic statement of entropy has units of J K-1.
We can therefore write the thermodynamic version of entropy explicitly in terms of the logarithm of the
multiplicity g, formally using Eqs. (5.33) and (5.43), giving
S = k B log g (5.44)
This is perhaps the most important equation in the whole field of statistical physics22, and expresses the
core meaning of the idea of entropy. It finally gives a tangible meaning to what was otherwise a very
difficult abstract concept. In this statistical sense, entropy is a logarithmic measure of the uncertainty of
which microstate the system is in23.
19
Temperatures in the modern SI units are given in kelvin (not technically “degrees kelvin”, so there is no need
for the “°” symbol for degrees when using kelvin, though in practice this form is often used, either in words or
with the degree symbol), and indicated by the letter K. The Kelvin scale is zero at the absolute zero of temperature.
Temperature in degrees Celsius is temperature in K plus 273.15. Equivalently, absolute zero is -273.15 °C.
20
A temperature of 300K, often taken as a rough simple number to describe “room temperature”, is
4.14 × 10−21 J , not a particularly convenient number for everyday use!
21
The Celsius scale is defined with degrees of the same magnitude as 1 K, but with a different starting point. Until
1948, this scale was known as “centigrade” before being renamed to honor its creator, Anders Celsius (though
interestingly, he actually had defined it the other way round, with 0 as the boiling point of water and 100 as the
freezing point). The modern definition of the Celsius scale is that the zero is at 273.15 K. In loose terms, the
Celsius scale runs from the freezing point of water at 0 °C to the boiling point at 100 °C, though the formal
definition is as given here in terms of kelvin. The Rankine scale is a thermodynamic temperature scale starting at
absolute zero, just like kelvin, though the degrees are the same magnitude as those in the Fahrenheit scale.
Absolute zero is at −459.67 °F. The zero in the Fahrenheit scale may come from the lowest temperature to which
Daniel Fahrenheit (who invented the scale and made one of the first good thermometers) could cool brine. Loosely,
the Fahrenheit scale is 0° F at the freezing point of water (0° F is defined to be the same temperature as 0° C) and
212 °F at the boiling point (though with the modern definitions, the actual boiling point of water is not quite exactly
that). That separation between the freezing and boiling points might seem arbitrary, but note that it is 180°. The
notion of 180° in a scale was already known from the idea of 180° in the measurement of geometrical angles, and
180 is one of those numbers that easily divides by many other integers (e.g., 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 30,
45, 60, 90), which was convenient in the era before modern calculators.
22
The version S = k log W is engraved on Boltzmann’s tombstone.
23
We could also say, therefore, that entropy is a measure of ignorance! This is a confusing point in physics. The
system itself is not confused about what microstate it is in – it is in some specific microstate. But in our example
here, we seem to be assigning two different entropies to the same microstate (log 4 and log 6). However, it is not
the microstate that has entropy, and in this sense entropy is not a property of the physical system. (Note,
incidentally, that this is an interpretation that some physicists might dispute.)
5.4 Entropy and temperature 151
Fig. 5.5. Microstates and entropies for two systems, each consisting of two spins, before (left)
and after (right) joining them.
We can look at an example of bringing two systems into thermal contact, in which energy can be
exchanged between the two systems. Fig. 5.5 shows a very simple case. We have two systems, system
1 and system 2, with energies U1 and U2 respectively. Each one has two spins in it. Initially, system 1
has both spins “up”, and system 2 has both spins “down”. We presume that the same magnetic field B
is applied to both systems, so the starting energies associated with these systems are, from Eq. (5.12),
for system 1,
U1 = −2 µ s B (5.45)
because we have two “spin-up” in system 1, and for system 2, which has two “spin-down”,
U 2 = +2 µ s B (5.46)
Returning to the idea of our ignorance, if we know nothing about the system's specific state other than the total
energy, and there are 6 possible such microstates of the system, then the entropy is log 6. If we can distinguish
different macrostates, say by looking at the magnetization in each part of the system, and if we find it in the
macrostate corresponding to zero magnetization in each part of the system (so one spin up and one spin down in
each part), then there are 4 such microstates, and, armed with this information, the entropy is log 4. Hence, we
could say that entropy is actually a measure of our ignorance of what microstate the system is in, and that ignorance
has been reduced, from log 6 to log 4, by our measurement of the magnetization in each part of the system.
We often in practice regard entropy as being a property of the system because, for large systems, we can say with
a very high degree of likelihood that the system will be in a microstate that is a member of the most probable
macrostate or one very similar to it. Hence we can predict - in principle statistically, but in practice with a very
high degree of certainty for even moderately large systems - what the properties of a system will be, for example
after some exchange of heat. Because of this very high certainty for even moderately large systems, it can appear
that entropy is a property of the system because it is a predictable outcome of some interaction, such as exchange
of heat. So, in thermodynamics (which is essentially something we use just for large systems in this sense), entropy
is treated as if it is a property of the system, and, at least in a practical sense, this works quite well.
For a large system, our brains are not large enough to hold the details of the actual microstate of some large number
of atoms or molecules, so our ignorance in such situations is unavoidable, so we should not feel too bad about this!
152 Chapter 5 Thermal distributions
There is only one starting microstate of each of systems 1 and 2 that corresponds to these two specific
energies. For that reason, the starting entropy of each of system 1 and 2 is log1 = 0 , and the total entropy
σtot of the two systems is the sum of these, which also is mathematically 0.
After we bring the two systems together and allow them to exchange energy, there are 6 accessible
microstates all corresponding to the same total energy. There are 4 of these in one macrostate, which is
the most probable one; all the microstates in this macrostate have the same energy in each of the (sub)
systems 1 and 2. Note that, even for this very simple example, 2/3 of the microstates are in the same
macrostate. Nearly all the entropy of the combined system, σ tot = log 6 , is associated with that most
probable macrostate, for which the entropy is σ mp = log 4 ; explicitly, ( log 4 ) / ( log 6 ) 0.77 – that is,
~ 77%.
Here we have formally written in these derivatives that the number of particles in each system is being
held constant, which we presume is the case because we are only considering possible changes in energy
here, not movement of particles from one body to the other.
Suppose we allow a small amount of heat ∆U to flow from body 1 to body 2. Now, since we are
discussing the transfer of heat from one body to the other, if body 1 loses energy ∆U (so ∆U1 = −∆U )
and body 2 gains energy ∆U (so ∆U 2 = ∆U ), then Eq. (5.47) becomes
∂σ ∂σ 2 1 1
∆σ = 1 ( −∆U ) + ( ∆U ) = − + ∆U (5.48)
∂U1 N1 ∂U 2 N2 τ1 τ 2
where we have used the definition Eq. (5.40) for temperature. In conventional thermodynamic notation,
this result becomes, on multiplying both sides by Boltzmann’s constant,
1 1
∆S = − + ∆U (5.49)
T1 T2
So, if T1 > T2 , so (1/ T2 ) > (1/ T1 ) , then transfer of a positive amount of energy or “heat” ∆U from body
1 to body 2 leads to an increase of entropy. The increase in entropy of the colder body is greater than
the decrease in entropy of the hotter body. Explicitly, we could rewrite Eq. (5.49) as
∆S = ∆S1 + ∆S 2 (5.50)
where ∆S1 and ∆S 2 are the respective changes in entropy of the two bodies, given by
∆U ∆U
∆S1 =− and ∆S 2 = (5.51)
T1 T2
5.4 Entropy and temperature 153
Here, then, the entropy of “hotter” body 1 has decreased with the flow of heat energy out of it, while the
entropy of “colder” body 2 has increased with the flow of heat energy into it, with the direction of heat
flow being such as to increase the entropy overall.
It is instructive to understand the magnitude of what is happening here for a simple macroscopic
example. Suppose, as illustrated in Fig. 5.6, that we have hot cup of coffee (body 1) at a temperature of,
say, 67° C (so T1 = 340.15 K) and a counter-top (body 2) at room temperature of, say, 20° C (so
T2 = 293.15 K). Imagine that we transfer just 0.01 J of energy from the cup of coffee to the counter-top
by briefly laying the cup of coffee on the counter-top. For the purposes of this calculation, we presume
that both the cup of coffee and the counter-top are sufficiently large that this small transfer of energy
does not appreciably change the temperature of either of them.
T1 = 340.15 K (67° C)
∆S1 −2.94 × 10−5 J K −1
0.01 J
T2 = 293.15 K (20° C)
∆S 2 3.41× 10−5 J K −1
Fig. 5.6. Illustration of the transfer of heat from a hot cup of coffee to a room-temperature counter-
top, showing the entropy changes for a transfer of 0.01 J of heat.
So, we have
0.01 0.01
∆S1 =− −2.94 × 10−5 J K −1= ∆S 2 3.41× 10−5 J K −1
340.15 and 293.15
So the total change in entropy by summing these two is
∆S ( 3.41 − 2.94 ) × 10−5 4.17 × 10−6 J K −1
Numerically this verifies that entropy has increased as heat flows from a hotter to a colder body. We can
convert to the more fundamental or “statistical” version of entropy by dividing by Boltzmann’s constant,
obtaining an increase of entropy of
∆S 4.17 ×10−6
=∆σ 3.02 ×1017
k B 1.38 ×10−23
So, the number of microstates available to the combined system if this heat flows from the hotter to the
colder body has increased by a factor
c
24
We remember, by definition of logarithms to the base a, that b = a loga b , and for powers that ( a b ) = a b×c
154 Chapter 5 Thermal distributions
microstates. Remember that tossing a coin N times leads to 2N possible outcomes or microstates. For
2 N exp ( 3.02 ×1017 ) then
=N log
= ( 3.02×1017 ) log 2 ( 2=
2 e
17
{
log 2 [ 2 e ]
log 2 e )3.02×10 (1/log 2 )
3.02×1017
} 4.4 ×10
17
So, expecting this energy to flow from “cold” to “hot” is like asking for a specific sequence of heads
and tails, such as “all heads”, when tossing a coin roughly 4.4 × 1017 times. This will not happen very
often. Since the universe is ~ 4.4 × 1017 seconds old25, this is like tossing a coin once a second since the
Big Bang and asking for it to come up heads every time!
To be clear here, we are not saying that we have to successfully randomly manage to generate one
specific desired sequence of heads and tails from ~ 4.4 × 1017 possibilities, which would be hard enough.
17 17
Rather there are 24.4×10 = 101.31×10 possibilities to choose from!
This calculation illustrates by example the “reason” why heat flows from hot to cold. If there is some
mechanism by which energy can flow from one body to another, there are massively more accessible
microstates if the energy flows from the hot body to the cold body. To flow in the opposite direction
would correspond to the number of accessible microstates decreasing by an equally large factor, the
equivalent statistically of succeeding when randomly trying to choose only one specific state from a
very, very large number of possibilities.
25
At the time of writing this, the best estimate for the age of the universe (the time since the Big Bang) is ~ 13.8
billion years, or ~ 4.35 × 1017 seconds.
26
You might be worried that somehow the act of you tidying your room, or performing any other such organizing
actions that lead to apparently reduced entropy in the world around us, is somehow a violation of the second law
of thermodynamics. You can be sure, however, that the increase in the world’s entropy from the energy you expend
in the process, which nearly all turns into heat, will vastly exceed any such decrease in entropy in a tidy room! The
resulting argument that by not tidying your room, and hence not increasing entropy overall, you are doing the
world a favor is, however, not likely to be very convincing on a personal level with your room-mate, and may
unfortunately reinforce the entirely inaccurate but widely held belief that scientists and engineers are annoyingly
lacking in ordinary social skills.
5.4 Entropy and temperature 155
The second law is a statistical principle. It is quite possible in small systems for the entropy to decrease
sometimes, but those systems need to be really small for there to be much chance of this happening to
any significant degree. With the kind of statistical view we are taking here, it is also possible to calculate
the possibility of various such random fluctuations in the system away from the “equilibrium” values,
and such calculations are an important part of the larger field of statistical mechanics.
Problems
5.4.1 Suppose we have a set of 100 spins, 55 of which are spin-up, and 45 of which are spin-down,
and that is all we know about this particular “macrostate” of the system.
(i) What is the multiplicity of this macrostate (at least approximately)?
(ii) What is the corresponding entropy of this macrostate?
(a) in “fundamental” units?
(b) in J K-1 ?
5.4.2 In thermodynamics, if an amount of heat energy U passes into a body at temperature T, then the
change of entropy of the body is ∆S = U / T . For U = 5 J and a temperature of 20° C,
(i) what is the change of entropy of the body, in thermodynamic units?
(ii) by what factor has the multiplicity of the accessible microstates changed?
5.4.3 Consider 1kg of water, which in this question may be in solid (ice), liquid (liquid water), or gas
(water vapor) form. Find the change in entropy (in thermodynamic units) of the water in each of
the following cases.
(a) 1 kg of ice melts at a temperature of T = 0°C. (Take the latent heat of fusion (the heat
energy involved in melting or freezing a kilogram) to have a magnitude 3.34 × 105 J kg-1
under the conditions for this experiment. Note that heat has to be added to ice to melt it.)
(b) 1 kg of water vapor at T = 100°C condenses into liquid water. (Take the latent heat of
vaporization (the energy involved in boiling or condensing a kilogram) to have a magnitude
2.26 × 106 J kg-1 under the conditions for this experiment. Note that heat has to be added
to liquid water to boil (vaporize) it.)
5.4.4 An ideal gas is defined as one in which all collisions between atoms or molecules are perfectly
elastic and in which there are no intermolecular attractive forces. It is like the kind of “gas” that
would correspond to a large collection of very small “billiard balls” bouncing elastically off one
another inside a box. Such an ideal gas is a useful first model for considering the behavior of
gasses. The kinetic energy of such an ideal gas at a given temperature T is U = ( 3 / 2 ) Nk BT ,
where N is the (fixed) number of gas particles. For the purposes of this question, we can consider
this U to be the “heat” energy of the gas.
We want to understand, essentially, what is the increase in entropy ∆S of such a gas if, starting
at some energy U i in the gas, we slowly add some total amount ∆U of heat to it. Note that, as
we slowly add this heat, we are also slowly going to change the temperature of the gas, which
complicates the solution somewhat. Now, we know the (thermodynamic) relation between small
changes δ S in entropy and changes δ U in (heat) energy at a given temperature T, for example,
1 ∂S
as the derivative = in Eq. (5.42). So, we are going to have to integrate this derivative
T ∂U N
to get the total corresponding change in entropy ∆S as we add this heat energy ∆U .
156 Chapter 5 Thermal distributions
Now, because of the very simple relation U = ( 3 / 2 ) Nk BT in this ideal gas, or equivalently,
2 ∂S 3 Nk B
T= U , we can rewrite this derivative in this case as = .
3 Nk B ∂U N 2 U
So now, starting from some energy U i and ending at some energy U i + ∆U , we could now
deduce an expression for the corresponding change in entropy by integrating this derivative.
For this question, the answer we want is not to state this resulting entropy change itself, but
instead to deduce by what factor g ( ∆U ) the multiplicity of the gas changes as a function of
this amount of (heat) energy ∆U added to the gas, with a given number of particles N.
Fig. 5.7. An idealized heat engine that takes heat energy UH in from a hot reservoir at temperature
TH, performs work of energy W, and dumps heat energy UC into a cold reservoir at temperature
TC.
The Stirling engine is an entirely sealed piston heat engine, requiring only external hot and cold
reservoirs27. Jet engines are gas turbine (heat) engines. Most electricity generation from fuels like coal,
gas or nuclear energy is by steam turbines, which are also heat engines. A turbine that generates
electrical power is obviously a heat engine since the electrical power is generated based on the
mechanical work that the engine creates to power a dynamo or some kind. Generally, any engine that
generates electrical power from heat is also a heat engine, however, because electrical power can be
converted to mechanical work with 100% efficiency in principle. So, solar cells are also heat engines,
with the hot reservoir being the sun, even though they do not directly generate mechanical motion.
27
Stirling engines can work with only small temperature differences. Toy Stirling engines can run when you put
them on top of your hot cup of coffee or tea, and are fun to play with.
5.5 Carnot efficiency limit for heat engines 157
With two simple principles – conservation of energy (which is also essentially the First Law of
Thermodynamics), and the requirement the entropy cannot decrease overall (which is essentially the
Second Law of Thermodynamics), we can calculate a limit to the energy efficiency of any such heat
engine; this limit is known as the Carnot limit28.
We want to calculate a limit to how efficient such a heat engine could possibly be, so we presume no
energy is wasted in any other processes, such as friction or heat loss to any other heat sink. So, by
conservation of energy,
U=
C UH −W (5.52)
The entropy change of the hot reservoir is
U
∆S H =
− H (5.53)
TH
This is a decrease in entropy of the hot reservoir because heat energy is flowing out of the hot reservoir,
hence the minus sign. The entropy change of the cold reservoir is
UC U H − W
∆SC = = (5.54)
TC TC
This is an increase in entropy of the cold reservoir because heat energy is flowing into the cold reservoir.
By presumption, entropy overall cannot decrease, so
∆S H + ∆SC ≥ 0 (5.55)
So
UH UH −W
− + ≥0 (5.56)
TH TC
Rearranging gives
1 1 W
UH − ≥ (5.57)
TC TH TC
Defining the engine’s efficiency ηengine as the work energy out per unit heat energy in (presuming we
have to keep replenishing the hot reservoir to make up for the heat extracted from it)
28
Carnot devised this limit in 1824. At that time, the concept of entropy was not yet developed, so Carnot used a
different and ingenious argument. He proposed a hypothetical engine operating in a cycle based on expanding gas
in a cylinder pushing a piston, with appropriate “isothermal” (i.e., same temperature) and “adiabatic” (i.e., with no
heat flow) steps. The approach involves the cylinder connected to a hot reservoir, then detached from all reservoirs,
and then connected to a cold reservoir, with similar steps then repeated in reverse to get us back to where we
started, completing a cycle in which some work has been done by the cylinder’s piston, with heat flow out of the
hot reservoir and heat flow into the cold reservoir. Such an approach can be made entirely reversible, allowing
either a heat engine that converts heat flow from hot to cold, with some efficiency, into work, or by putting in
work, to refrigerate the cold reservoir and put heat into the hot reservoir. We will not give the details of this
argument here, though it is an easy one to find. Carnot’s approach starts, then, by constructing such a hypothetical
and completely reversible cycle. In modern terms, such reversibility is associated with having no change in
entropy. In Carnot’s argument, to conclude that the efficiency of this cycle is the best possible in a heat engine,
one only has to believe that no machine or process could allow the flow of heat from cold to hot, with no other net
change in the system. If we had a heat engine more efficient than a Carnot one, then we could use it to drive a
Carnot engine in reverse, (so running the Carnot engine as a refrigerator), and the net result would be some net
transfer of heat from cold to hot by the time we had completed the full cycle of both machines. Hence by reductio
ad absurdum, no such more efficient engine can be constructed.
158 Chapter 5 Thermal distributions
W T
ηengine ≡ ≤1− C (5.58)
UH TH
which is the Carnot efficiency limit for the heat engine. For given temperatures, nothing we can do can
make a heat engine more efficient than this. To do so would require violating either the First or Second
Laws of Thermodynamics.
This is a very serious and real limit. We can never convert heat energy continuously to work at 100%
efficiency. To get even reasonably efficient engines, we need a very hot “hot” reservoir and/or a very
cold “cold” reservoir.
Problems
5.5.1 What is the maximum possible efficiency of a solar cell at converting the sun’s energy to
electricity, presuming the sun has a temperature of 5800 K, and the earth is the “cold” reservoir
at a temperature of 300 K?
5.5.2 Perform a similar analysis to that for the heat engine, but for a refrigerator that is to extract heat
energy from the cold reservoir at temperature TC by performing work and delivering all the
energy to the hot reservoir at temperature TH.
(i) Defining the efficiency η refrigerator as the ratio of the magnitude UC of the heat energy
extracted from the cold reservoir divided by the magnitude W of the work energy put in to
the “refrigerator” engine, derive an expression for the maximum efficiency of the
refrigerator in terms of TC and TH.
(ii) What is the maximum efficiency if the cold reservoir is to be at 0° C (so, 273.15 K) and
the hot reservoir is to be at 30° C (so 303.15 K)?
(iii) If the refrigerator is sitting in the middle of your room, can you continuously cool the room
by opening the refrigerator door? (Justify your answer.)
(iv) What is the maximum efficiency if the cold reservoir is to be at 4.2K (the boiling
temperature of liquid helium, and a common cryogenic temperature) and the hot reservoir
is to be at 30° C (so 303.15 K)?
(v) You notice that the best logic device you have available at room temperature takes 40 aJ
to switch states. You want to save energy in information processing so you look for devices
that take less energy. You notice that, if you cool this device down to 4.2 K, it can switch
with only 4 aJ of energy. You pitch investors to back your proposal to develop this cooled
technology to save energy in information processing. Presuming that the investors have
taken thermal physics courses (and therefore must be both very intelligent and highly
educated), should they invest in your proposal? Justify your answer (scientifically!).
5.5.3 Perform a similar analysis to that for the heat engine, but for a “heat pump”. A heat pump is a
device in which we put in work W to heat our house by adding heat energy UH to the inside of
our house at temperature TH (which is then the “hot” reservoir) while simultaneously cooling (or
extracting heat from) a cold reservoir TC (which is the environment outside our house). Its
efficiency can be defined as the ratio η heatpump of the heat energy UH delivered to the house at
temperature TH to the work energy W put into the heat pump
(i) Derive an expression for the maximum efficiency of the heat pump in terms of TC and TH.
(ii) What is the maximum efficiency of the heat pump, defined as η heatpump = U H / W , if the
outside temperature is 0° C (273.15 K) and the inside of the house is at 25° C (298.15 K)
5.6 The Boltzmann factor 159
Fig. 5.8. A total system with energy Uo, divided into a large reservoir R, with energy U o − ε , and
a system M with energy ε. The system M can exchange energy with the reservoir through its
thermally conducting walls.
Our goal here is to establish the probability that the system M is in some particular quantum state (which
therefore will be a microstate of M ) given that we have a particular temperature τ (≡ k BT ) for the total
system. Consider, then, two possible quantum states of the system M, state 1 and state 2, with associated
energies ε 1 and ε 2. The probability P1 that the system M is in state 1 is proportional to the multiplicity
of the reservoir R when it has energy U o − ε1 ; this multiplicity is simply the number of ways the total
system can exist in which system M is in state 1. So, with a similar argument for state 2 and the
corresponding probability P2 of the system M being in state 2, we have
P1 Multiplicity of R at energy U o − ε1
= (5.59)
P2 Multiplicity of R at energy U o − ε 2
Note here that we are not discussing the probabilities that the system M has energy ε 1 or ε 2; there could
possibly be many different quantum states (and hence microstates) that have the same energy ε 1 or ε 2;
we are discussing the probabilities that the system is in a specific quantum state with energy ε 1 or in
another specific quantum state with energy ε 2.
Now, with σ R (U ) as the entropy of the reservoir at some energy U, we know that
and similarly
Multiplicity of R at energy U o − ε 2 = exp σ R (U o − ε 2 ) (5.61)
160 Chapter 5 Thermal distributions
P1 exp σ R (U o − ε1 )
= = exp σ R (U o − ε1 ) − σ R (U o − ε 2 ) (5.62)
P2 exp σ R (U o − ε 2 )
Now, we are presuming that, compared to the energy of the very large reservoir, the energy ε of the
specific state of the system M is very small. So, we can usefully expand the entropy of the reservoir in
a Taylor series about the point U o . Hence we obtain, formally,
∂σ R 1 2 ∂ 2σ R
ε ) σ R (U o ) − ε
σ R (U o −= + ε 2
+ (5.63)
∂U U =U o 2 ∂U U =U o
Note, implicitly, all these derivatives are being taken at constant numbers of particles in both the
reservoir and the system M, and we are presuming other possible variables, like the volumes of the
systems, are being held constant also. Given these variables being held constant, we know from Eq.
(5.40) that here we have
∂σ R 1
= (5.64)
∂U U =U o τ
where τ (≡ k BT ) is the temperature of the reservoir.
The reservoir is taken to be very large by definition. Changing the reservoir’s energy by any small
amount therefore makes negligible difference to the its temperature. If we do find that changing the
reservoir’s energy by some amount ε of interest to us does change the temperature measurably, then we
simple make the reservoir bigger until that temperature change is negligible. So, we presume for no
energy change of interest to us does the temperature of the reservoir change appreciably, and hence the
higher derivative terms in the expansion Eq. (5.63) can all be made to vanish by choosing the reservoir
sufficiently large29. Hence, retaining only the first-order term in ε, we can write
ε
σ R (U o −=
ε ) σ R (U o ) − (5.65)
τ
Using this expression in Eq. (5.62), we therefore have
P1 exp ( −ε1 / τ ) (ε − ε ) (ε − ε )
= = exp − 1 2 ≡ exp − 1 2 (5.66)
P2 exp ( −ε 2 / τ ) τ k BT
So, for a system M in thermal contact with a large reservoir at temperature τ (≡ k BT ) , the relative
probability, under conditions of constant numbers of particles, of the system M being in a specific
quantum state of energy ε1 rather than one of energy ε 2 is given by Eq. (5.66). This factor that expresses
this relative probability of occupation of states separated by energy ε in thermal equilibrium, is called
ε
the Boltzmann factor, exp −
k BT
where we have written it in conventional units.
The Boltzmann factor and others closely related to it appear in a wide range of situations, and form the
basis for the distributions of particles among quantum mechanical states in systems in thermal
equilibrium.
29
Note that ∂ 2σ R / ∂U 2 is the rate of change of 1/(the reservoir’s temperature) with energy, which we can make
arbitrarily small by choosing the reservoir sufficiently large.
5.7 Chemical potential and the Gibbs factor 161
Problems
5.6.1 Consider a hydrogen atom that is in thermal equilibrium with a large heat reservoir. What is the
relative probability of finding the electron in a specific n = 6 state compared to finding it in a
specific and n = 5 state at each of the following temperatures?
(a) Room temperature T = 293 K
(b) Sun temperature T = 5780 K
5.6.2 Consider a large number of hydrogen atoms in thermal equilibrium at a temperature of 5800 K
(approximately, the temperature of the sun). N1 atoms will be in a 1s state, and N2 atoms will be
in a 2s state. What value would we expect for the ratio N 2 / N1 ? [For simplicity, you can consider
atoms of just one specific electron spin and one specific proton spin, though including all possible
spin states actually makes no difference to this problem.]
Chemical potential
Because we are interested in the change in entropy that arises just from the transfer of particles, we will
presume for the moment that this movement of particles does not itself change the energy of either
system. In our example spin system, this would be the case if we turned off the magnetic field, for
example, since then every spin would have zero energy anyway. So we can examine a relation like the
one we put together for the change in entropy with changes in energies at constant particle number, Eq.
(5.47), but now thinking of the change of entropy with changes of particle numbers N1 and N2 at constant
energies U1 and U2 for systems 1 and 2, that is
∂σ ∂σ 2
∆σ 1 ∆N1 +
= ∆N 2 (5.67)
∂N1 U1 ∂N 2 U 2
Here, we presume conservation of particle number overall, so if a small number ∆N of particles moves
from system 1 to system 2, then ∆N1 = −∆N and ∆N 2 = ∆N . Presuming we are at diffusive equilibrium,
which we take to be the situation of maximum entropy, then any such small transfer should result in
vanishing change in entropy, so, at diffusive equilibrium we should have
∂σ ∂σ
0 − 1 ∆N + 2 ∆N (5.68)
∂N1 U1 ∂N 2 U 2
So the quantity represented by such a derivative for a system is the quantity that is equalized under
diffusion of particles from one system to the other.
Conventionally in thermodynamics, we use the quantity called the chemical potential, which is defined
as
∂σ
µC = −τ (5.70)
∂N U
which is simply (minus) the (fundamental) temperature times this derivative30,. Given that we are
operating at some temperature, then we can say that diffusive equilibrium for some kind or “species” of
particle between two systems or bodies corresponds to both systems or bodies having the same chemical
potential for that species. Here, of course, so far we have only been considering one kind or species of
particle. We can generalize this discussion to multiple different kinds of particles, in which case each
kind of particle can have its own chemical potential.
∂U
µC = (5.71)
∂N S
That is, the chemical potential is the rate of change of the energy of the system per added particle of the
given species, when we hold the entropy constant during this addition. This is the definition of chemical
potential as originally introduced by Gibbs. He proposed the chemical potential to be the increase in
energy per unit quantity of substance added, under conditions of constant entropy. This concept of an
amount of energy as a function of the amount of some chemical substance justifies the idea of calling it
a “chemical potential”.
It is, however, by no means obvious physically why Gibbs’ definition, Eq. (5.71) would be equivalent
to the chemical potential definition in Eq. (5.70). We can now prove the equivalence of Eqs. (5.70) and
(5.71) algebraically; like many relations involving partial derivatives in thermodynamics, however, it
can be quite difficult to understand the equivalence of such expressions in a direct physical way.
To see the relation between the two definitions, consider the differential of entropy σ for a system with
energy U and particle number N. Then, choosing to consider a situation of constant entropy, we would
have
∂σ ∂σ
dσ = dU + dN = 0 (5.72)
∂U N ∂N U
Now, we are interested in finding a partial derivative at constant entropy, as in Eq. (5.71). So indeed the
fact that we set dσ = 0 in Eq. (5.72) means that, if we essentially31 divide both sides by dN , the
30
Typically, chemical potential is just written as µ, without the subscript “C” we have used here, but we introduced
the subscript to avoid confusion with other uses of µ in this text.
31
The idea of “dividing by dN ” to form a derivative like this is technically what is sometimes called an “abuse
of notation”. This Leibniz notation of writing derivatives as ratios of infinitesimal quantities (as in dy / dx ) tempts
5.7 Chemical potential and the Gibbs factor 163
resulting ratio of dU over dN is one taken at constant entropy (which we can view as either σ or S,
∂U
since they are just related by a constant). So technically that ratio is the partial derivative . Hence
∂N S
we have, after rearrangement
∂σ ∂U ∂σ
= − (5.73)
∂U N ∂N S ∂N U
∂U ∂σ
= −τ (5.74)
∂N S ∂N U
As we mentioned, this relation will be useful to us later when thinking about semiconductor devices. If
we take some piece of material containing electrons and change the voltage applied to it by some amount
V, we correspondingly are just adding the same amount of energy (of magnitude e × V , where e is the
magnitude of the electron charge) to every electron without making any other changes. So any derivative
∂U
like just increases by that same additive amount. Hence changing voltage just changes the
∂N S
chemical potential by an amount of magnitude e × V . The chemical potential for electrons in
semiconductors is known more commonly as the “Fermi level”, so this tells us that the Fermi level just
moves with the applied voltage. We use that idea very frequently in discussing electronic and
optoelectronic devices, and we will return to this point later.
dy dz dy
us to do this, and we can often get away with it, for example in a “chain rule” where we write = , and
dz dx dx
just “cancel” the dz on the top and bottom lines. (Here z had to be a function of just x, i.e., z ( x) .) For expressions
like Eq. (5.72), we can be more formally correct in our proof if we rewrite first of all in terms of actual small
∂σ ∂σ
changes ∆σ , ∆U , and ∆N , which gives= ∆σ ∆U + =∆N 0 . Then we can quite legally divide
∂U N ∂N U
∂σ ∆U ∂σ
by ∆N , giving + = 0 . We then formally take the limit as ∆N and the resulting ∆U tend
∂U N ∆N S ∂N U
∂σ ∂U ∂σ
to zero, giving + = 0 , or, equivalently, (5.73).
∂U N ∂N S ∂N U
32
There are various different equivalent ways of deriving and of defining chemical potential. Some of these discuss
derivatives of other quantities, sometimes known as thermodynamic potentials, such as Helmholtz free energy,
Gibbs free energy, and enthalpy. Though those thermodynamic potentials are very convenient for particular
important classes of problems, they are not as fundamental as energy and entropy; thermodynamics can in principle
be understood without them. We restricted ourselves here to using just energy and entropy, and we performed our
derivation of chemical potential without using those other quantities. As a result, our derivation above may look
different from many others – for example, it is common to use the Helmholtz free energy in the proof of Eq. (5.74)
– and our (somewhat original) derivation here is arguably more compact than most.
164 Chapter 5 Thermal distributions
Consider now a system M in thermal and diffusive contact with a large reservoir R. Now, in addition to
being able to transfer energy between the system and the reservoir, we can also transfer particles. The
entire closed system of reservoir R and system M has No identical particles, and has energy Uo. When
the system M has N particles, then the reservoir has N o − N particles, and when the system M has
energy ε M , the reservoir has energy U o − ε M .
Just as for the derivation of the Boltzmann factor, we consider the system M to be in a particular state,
in which it has energy ε M and N particles. We consider two specific microstates.
where
σ σ ( N o − N1 ,U o − ε1 ) − σ ( N o − N 2 ,U o − ε 2 )
∆= (5.77)
Just as we did for the Boltzmann factor derivation, we expand σ ( N o − N ,U o − ε ) in a Taylor series,
now about the values No and Uo, with the difference that this time we need to expand in two variables,
N and ε, to obtain
∂σ ∂σ
σ ( N o − N ,U=
o −ε ) σ ( N o ,U o ) − N −ε + (5.78)
∂N o U o ∂U o No
Assuming that the reservoir is very large, we may neglect all higher order terms in the expansion, just
as we did in the Boltzmann factor derivation, to obtain, in the case of a very large reservoir
33
Each g here might look as if it is only the multiplicity of the reservoir, but since the system M is in a specific
microstate, and hence has multiplicity 1, these are the multiplicities of the combined system.
5.8 Distribution functions 165
∂σ ∂σ
∆σ =− ( N1 − N 2 ) − ( ε1 − ε 2 ) (5.79)
∂N o U o ∂U o No
So, using our definitions of temperature, Eq. (5.40), and chemical potential Eq. (5.70), we have
=∆σ
( N1 − N 2 ) µC − (ε1 − ε 2 ) (5.80)
τ τ
Finally, from Eq. (5.76)
P ( N1 , ε1 ) exp ( N1µC − ε1 ) / τ
= (5.81)
P ( N2 ,ε 2 ) exp ( N 2 µC − ε 2 ) / τ
The factor that expresses this relative probability of occupation of states separated by energy ε and by
population difference N is called
( N µC − ε )
the Gibbs factor, exp
τ
which is the un-normalized probability that system M is in a state of energy ε and particle number N.
34
The absolute energies of these states do not matter since we will only be considering energy differences, so the
choice of this energy to be zero is of no particular significance as long as the energy difference here is ε.
35
Suppose, for example, that when I buy shirts, for each 4 white shirts I buy, I always buy 3 blue shirts and one
red shirt. I have bought so many shirts, however, that I can’t remember how many I have. I want to know the
probability that, when I select a shirt at random, it is a blue shirt. The “unnormalized” or “relative” probabilities
here are obviously: white – 4; blue – 3; red – 1. Obviously also, these relative probabilities do not add up to one
(in fact, in this case, they are each 1 or larger). I can find the actual (“normalized”) probability by dividing by the
sum of all the “unnormalized” or “relative probabilities. So, the probability of choosing a blue shirt at random is
3 / (4 + 3 + 1) = 3/8.
166 Chapter 5 Thermal distributions
The Gibbs factor for energy 0 and N = 0 is exp[(0 × µC − 0) / τ ] = 1 , and that for energy ε and N = 1 is
exp[(1× µC − ε ) / τ ] , so the sum of these relative probabilities, the partition function Z, is
(µ − ε )
Z = 1 + exp C (5.82)
τ
To get the absolute probability for occupation of this state of energy ε, then, we divide the relative
probability (the Gibbs factor) for the occupation of this state of energy ε − that is, exp[( µC − ε ) / τ )] −
by the sum of these two relative probabilities – that is, the partition function Z above. So, the (absolute)
probability that this state is occupied by a fermion (such as an electron) is
exp ( µC − ε ) / τ
P ( N =1, ε ) ≡ f ( ε ) = (5.83)
1 + exp ( µC − ε ) / τ
which is called the Fermi-Dirac distribution, as plotted in Fig. 5.9 and Fig. 5.10.
Fig. 5.9. Fermi-Dirac (FD), Maxwell-Boltzmann (MB), and Bose-Einstein (BE) distributions,
plotted as a function of energy E relative to the chemical potential µC (or Fermi energy for the
Fermi-Dirac distribution).
Conventionally, we use the normal definition of temperature, and the chemical potential for electrons is
nearly always referred to as the Fermi energy (or Fermi level)36, notated as EF. We should note that the
Fermi energy (or Fermi level) and the chemical potential are exactly the same concept for electrons, that
is
E F ≡ µC (5.86)
36
The terms “Fermi level” and “Fermi energy” are used interchangeably, they have the same meaning, and they
both also mean the chemical potential for fermions.
5.8 Distribution functions 167
With these conventions, the Fermi-Dirac distribution for electrons and an electron state of energy E is
commonly written as
1
f FD ( E ) = (5.87)
E − EF
exp +1
k BT
This relation is used extensively in the physics of semiconductors, for example, forming the basis for
the calculations for many electronic and optical devices.
Fig. 5.10. Fermi-Dirac function for temperatures of 0 K, 30 K, 100 K, and 300 K, plotted against
the energy relative to the Fermi energy, in meV.
We see in Fig. 5.9 and Fig. 5.10 several characteristic features of the Fermi-Dirac distribution. For states
of energies significantly below the Fermi level, the probability of occupation of any such state
approaches 1. (It cannot exceed 1, of course, because we cannot have more than one electron in a state
by Pauli exclusion.) Conversely, for states of energies much larger than the Fermi level, the probability
of occupation approaches zero. States of energies within a few kBT of the Fermi level have intermediate
probabilities of being occupied. For a state of energy exactly equal to the Fermi level, the probability of
occupation is ½, which is sometimes used as a practical definition of the Fermi level.
Note from Fig. 5.10 that, at or very close to absolute zero, any states below the Fermi level are essentially
all occupied and any states above the Fermi level are essentially all empty37. With increasing
temperature, the curve “softens”, acquiring a transition region round about the Fermi energy or chemical
potential, but always going through the point ½ at the energy corresponding to the Fermi energy or
chemical potential. In general, we expect some moderate probability of finding electrons in states within
a few kBT above the Fermi level, and we similarly expect that within a few kBT below the Fermi level,
there is some moderate probability that such states are not occupied by electrons.
37
It is not quite correct to define the Fermi level as that energy such that, at zero temperature, all states below it
are occupied and all states above it are empty; though that is correct if the temperature is indeed zero or close to
it, as the temperature changes, even for the same total number of particles in the system, the Fermi level will in
general change at least slightly with temperature. This subtlety arises if the distribution of possible states is not
uniform in energy. For example, suppose there is a very large number of states just above the energy of the Fermi
level at zero temperature. Those states would all be unoccupied at zero temperature, of course. But as the
temperature is increased, they would start to become occupied, and the Fermi level would tend to move down so
as required to keep the total number of particles in the system constant.
168 Chapter 5 Thermal distributions
Bose-Einstein distribution
We could follow a similar argument, starting with the Gibbs factor, to derive the thermal distribution
for bosons. In this case, the partition function – the summation over the relative probabilities for the
various possible states – is more complicated because we are not restricted to just N = 0 or N = 1; we
can have any number of bosons in a given boson “mode”. We will not give that derivation here (though
it is not particularly hard to derive it). The result, which is known as the Bose-Einstein distribution, is,
however, simple to state. The expected number of bosons in a boson mode of energy E per boson is
1
f BE ( E ) = (5.88)
E − µC
exp −1
k BT
Unlike the use of the term “Fermi energy” for the chemical potential for fermions, there is no particular
other name for the chemical potential in the case of bosons. This distribution is also plotted in Fig. 5.9.
Note that, though we can think of the Fermi-Dirac distribution as giving the probability that a given state
can be occupied, for the Bose-Einstein distribution, because fBE can certainly be greater than 1, we have
to think of fBE as being the expected number of bosons in the boson mode rather than the probability the
mode is occupied38.
Planck distribution
Compared to the Fermi-Dirac distribution, the Bose-Einstein distribution is used relatively infrequently
in its full form as stated here with the chemical potential. That is because most of the bosons we have to
deal with in applications, especially in devices, are photons – the quanta of light – or phonons – the
quanta of vibrations – or other such quanta associated with oscillations of some kind.
Such bosons are particularly simple in that there is only one possible state for a given such boson in a
given “mode”; unlike, say, an atom, there are no excited states of these simple bosons. As a result, the
exchange of energy between the system and the reservoir and the exchange of particles between the
system and the reservoir are the same physical process – if we know how many particles in a given mode
we have exchanged, we know what energy we have exchanged, and vice versa. A consequence of that
identicality of processes is that there is no need for two separate parameters, temperature and chemical
potential, to be matched to reach equilibrium; we only need one of them, and by convention we use
temperature.
Particles like photons and phonon therefore obey a simpler version of the Bose-Einstein distribution
called the Planck distribution. We usually write the energy per boson for these simple bosons in the form
E = ω (5.89)
where ω is the angular frequency associated with the boson mode of interest, such as a mode of the
electromagnetic field. Indeed, Eq. (5.89) is the standard expression for photon energy. The Planck
distribution can then can be stated as
1
f P ( ω ) = (5.90)
ω
exp −1
k BT
and can be considered as giving the number of such bosons per mode.
38
Also, the Bose-Einstein distribution has no meaning for energy less than the chemical potential, since that would
give a negative number for the result.
5.8 Distribution functions 169
It is relatively straightforward to derive Planck’s distribution, and we can do that now. Planck’s original
proposal was that matter emitted light of frequency ν in quanta of size hν. Subsequently, Einstein
proposed that light of a given frequency existed only in such quanta. At the time that was a radical
proposal because light was believed to be an electromagnetic wave, and such quantization would seem
to be in conflict with the apparent continuous nature of waves.
Part of the resolution of this conflict lies in the notion that the photons exist in modes of the light field,
and those modes have the same functional forms as the modes we would calculate classically for
electromagnetic waves in resonators or in propagating wave modes. Our modern way of looking at this
is to say that a given mode of the electromagnetic field can exist in different levels of excitation, with
those levels separated by an energy = E hν ≡ ω .
Fig. 5.11. Illustration of the possible energy levels for a mode of the electromagnetic field for
which the photon energy is ω .
The possible energy levels for an electromagnetic mode are sketched in Fig. 5.11 for the first few such
levels. We can identify the integer q with the number of photons in the mode if we like, and the energy
of such a mode can be written as
ε q qhν ≡ qω
= (5.91)
where q is zero or a positive integer. It actually makes no difference if we regard q as being the “level
of excitation” of the mode or as being the number of photons in the mode; these are just different forms
of words for the same underlying physics.
We could add a constant to all the energies εq if we wanted39; that would make no difference again to
the physics here because we will only be concerned with energy differences. For simplicity, we use the
form in Eq. (5.91) here.
We presume that, in equilibrium with a reservoir at temperature T, the relative occupation probability
of a state of energy εq compared to relative occupation probability of a state of energy 0 is given by the
Boltzmann factor exp(−ε q / k BT ) . To get the absolute probability of finding the oscillator in the state of
energy ε q = qω , we need to normalize the relative probability by dividing by the sum (the partition
function) Z of the relative probabilities all the possible states – that is, we divide by the sum
∞
=Z ∑ exp ( −qω / k T )
q =0
B (5.92)
39
In part because we can derive the formal quantum mechanics of light by considering each mode of the
electromagnetic field as being a kind of harmonic oscillator, it is common to add ω / 2 to the energies in Eq.
(5.91) because then the energy zero corresponds with the zero of “potential energy” in the oscillator, but whether
or not we add this constant or any other makes no difference here to the final result.
170 Chapter 5 Thermal distributions
This sum in Eq. (5.92) is just a geometric series – that is, a sum in which each term differs from the
previous one by constant factor, here exp ( −ω / k BT ) . The sum of such a series is a standard result,
giving
1
Z= (5.93)
1 − exp ( −ω / k BT )
So
1 exp ( − ω / k BT ) exp ( − ω / k BT )
=q = (5.97)
Z 1 − exp ( − ω / k BT ) 1 − exp ( −ω / k BT )
2
So, rearranging Eq. (5.97), the average number of photons per mode in thermal equilibrium at
temperature T is the Planck distribution
1
q = (5.98)
exp ( ω / k BT ) − 1
which we see is exactly the same are Eq. (5.90). We see, as proposed, that the Planck distribution is a
Bose-Einstein distribution with chemical potential of zero. Note incidentally that q need not be an
integer since it is an average number.
This expression Eq. (5.98) is not yet the spectrum of the emitted light from an object, but we have
completed a major step towards that result, and we return to that topic in Chapter 7. The Planck
distribution is quite widely useful for many different sorts of vibrating systems, including also vibrations
in solids, where the associated quanta are called phonons.
f MB
= ( E ) A exp − E (5.100)
k BT
where
5.8 Distribution functions 171
µ
A = exp C (5.101)
k BT
This distribution, which is also plotted in Fig. 5.9, was originally derived for classical particles, which
are presumed to be non-identical40, and is known as the Maxwell-Boltzmann distribution41. In practice
with the Maxwell-Boltzmann distribution, A is just regarded as a number that is chosen so as to give the
correct total number of particles in the system, though it can formally be related to chemical potential
for either fermions or bosons as given by this expression.
We tend to use this Maxwell-Boltzmann limit wherever we can reasonably get away with it because it
leads to much simpler mathematics. With the full Fermi-Dirac distribution it is often is much harder to
integrate over energy - for example, when we are trying to count the total number of particles in the
system.
At high energies, the Fermi-Dirac and Bose-Einstein distribution can tend to this Maxwell-Boltzmann
distribution for non-identical particles because this limit applies when probabilities of finding such a
high energy state occupied are very small. In that case, because the probability of occupation is much
less than 1, the issue of whether we can or cannot have multiple different particles in a state practically
does not arise. So, the differences between the counting of fermion states, with their Pauli exclusion
principle, boson states, and the states of non-identical particles make negligible difference to the end
result.
If we compare the Fermi-Dirac, Maxwell-Boltzmann, and Bose-Einstein distributions in Fig. 5.9, we
see that the Bose-Einstein distribution lies above the Maxwell-Boltzmann distribution. We can
rationalize this by noting that identical bosons are more likely to be found in the same mode than are
classical or non-identical particles; this agrees with the counting we found when comparing (i) the
number of possible states of two dollar bills in two boxes, where 2 out of 4 possible states had the two
bills in the same boxes, and (ii) the number of possible states of two dollars in two bank accounts, where
2 out of 3 states had the two dollars in the same bank account (there being only one way of having one
dollar in one bank account and one in the other). Of course, both the Maxwell-Boltzmann and the Bose-
Einstein lie above the Fermi-Dirac distribution because for that distribution there is no possibility of
finding two fermions in the one state.
Problems
5.8.1 A system can be in one of four states. These are
(1) No particles, zero energy
(2) One particle, energy ε1
(3) One particle, energy ε2
(4) Two particles, energy ε1 + ε2 + I
where ε2 > ε1 > 0 and I < 0. (State (4) might correspond, for example, to a bound state of the two
particles.) The system is in thermal and diffusive contact with a large reservoir. All the particles
here are the same kind of particle. The temperature of the reservoir is T and its chemical potential
for this kind of particle is µ.
40
It is actually quite hard work to derive it because of this presumption of non-identicality, and it is arguably easier
to write it down as a limit of the Fermi-Dirac and Bose-Einstein distributions that are based on identical particles.
We could even take the view that there are no “non-identical” particles out there anyway, just different species of
identical particles, so maybe the Fermi-Dirac and Bose-Einstein distributions are the only correct ones.
41
We can see an obvious simple apparent relation between this Maxwell-Boltzmann distribution and the
Boltzmann factor, though the simplicity of this end result is somewhat deceptive since the full derivation of the
distribution is more complicated than this relation might suggest.
172 Chapter 5 Thermal distributions
(a) Write down the partition function Z of the system. [Hint: the partition function is a sum of
the un-normalized or relative probabilities of the various states, where those un-normalized
or relative probabilities are given by some appropriate factor. Given that we are working
with a system that can exchange both energy and particles with a reservoir, what factor is
appropriate here?]
(b) Give an expression for N(T, µ), the expected number of particles in the system. [Hint: we
can work out actual probabilities Pj for various states j of the system by dividing relative
probabilities by the sum of all the relative probabilities. We can work out the average or
“expected” value of some quantity by multiplying each possible value of the quantity by
its probability, and adding up these results.]
5.8.2 In a semiconductor, we find that the probability that the lowest state in the conduction band is
occupied by an electron is 10-2 at room temperature (300 K). Where is the Fermi level relative to
the bottom of the conduction band? Express your answer in meV, and be explicit about the sign
of the energy of this Fermi level, stating whether this Fermi level lies in the band gap region
(presuming the band gap energy is large) or above the bottom of the conduction band.
5.8.3 Consider a metal with Fermi energy EF = 5.0 eV at room temperature T = 293 K.
(a) Based on the Fermi-Dirac distribution, what is the probability of finding an electron in a
specific state with each of the following energies:
(i) E1 = 5.025 eV
(ii) E2 = 5.050 eV
(iii) E3 = 5.075 eV
(iv) E4 = 5.0 eV
(v) E5 = 4.975 eV
(vi) E6 = 4.950 eV
(b) Repeat the above calculations, but for a temperature of T = 77 K (the temperature of boiling
liquid nitrogen). (You may presume the Fermi level still has the same value).
(c) What is the percentage of error when using Maxwell-Boltzmann distribution instead of the
Fermi-Dirac distribution for (i), (ii) and (iii) for the 293K case?
(d) Repeat (c) for the 77K case.
(e) Can we use the Maxwell-Boltzmann distribution as an approximation for cases (v) and (vi),
for example, for 77K? Justify your answer.
5.8.4 Suppose we have an optical or electromagnetic resonator, which we can consider to be some box
with highly reflecting walls. Suppose this resonator is in thermal equilibrium with some heat
reservoir. Consider various resonant modes of this box as stated below, and state the average
number of photons in the mode at the given reservoir temperatures T in each case.
(a) A mode of wavelength 400 nm (which is blue/violet light) at T = 5800 K (the temperature
of the Sun)
(b) A mode of wavelength 10 µm at T = 5800 K
(c) A mode of wavelength 10 µm at T = 300 K (room temperature)
(d) A mode of frequency 1 GHz (a typical radio frequency) at T = 5800 K
(e) A mode of frequency 1 GHz at T = 300 K
(f) A mode of frequency 1 GHz at T = 2.725 K (the background temperature of the universe)
5.8.5 In solids, there are many different vibrational modes possible for the crystal. These various
modes, known as phonon modes, can be viewed as behaving with the same kind of statistics for
particles we call phonons as photons have in electromagnetic modes. One particular type of
vibrational modes is called “optical phonons”. If the optical phonon corresponding to a specific
5.9 Conclusions 173
optical phonon vibrational mode has an energy of 36 meV, in thermal equilibrium at room
temperature, how many phonons are there on average in this mode?
[Notes for interest only (not needed for the question!): Understanding the energy stored in such
modes can be useful for many purposes, including calculating the specific heat of materials –
essentially how much energy it takes to heat them up. This “optical” name has relatively little to
do with optics, in fact. In such modes, adjacent atoms vibrate in opposite directions and some
such modes can be excited by shining light on materials in which the adjacent atoms are of
different kinds because the different amounts of charge on different adjacent atoms lead to them
being pulled in opposite directions.]
5.9 Conclusions
In this chapter, we started out with the simple example of tossing coins. That example illustrates that
nearly all possible sequences we will encounter share a common characteristic, which is that they nearly
all have equal numbers of heads and tails. We introduced the concepts of a microstate – the equivalent
of a particular sequence of heads and tails – and a macrostate – the group of microstates that all have
the same number of heads and tails. We saw, especially with the convenience of a Gaussian
approximation for evaluating the numbers of microstates, that as the numbers became large, this
tendency to find nearly all microstates within a narrow range of “most-likely” macrostates became
overwhelming. Even as we constrained the system to a more physical situation of requiring the
microstates to have a specific energy, we find this concept is retained.
We can usefully define the concept of entropy as logarithm of the multiplicity of the system – the number
of “accessible” microstates. When we allow systems to share energy, we find the dominant macrostates
for the two systems correspond to ones in which the rate of change of entropy with energy is the same
in both systems, and we identify that quantity with 1/temperature. We see that heat flow from hot to
cold corresponds to an increase in entropy, and that in practical cases, it is essentially impossible for
heat to flow from cold to hot simply because there are massively more accessible outcomes if it flows
from hot to cold. The second law of thermodynamics is seen to be essentially the law of increase of
entropy, which follows from the observation that systems will tend towards those macrostates of largest
multiplicity.
We introduced various useful results for analyzing physical systems, including the Boltzmann and Gibbs
factors, and the resulting thermal distributions – Fermi-Dirac, Bose-Einstein (and its simpler Planck
version), and Maxwell-Boltzmann. We can now start to look at some of the consequences of these
distributions for the behavior of the practical world around us.
Bands and electronic devices
6.1 Introduction
Modern electronics exploits the different capabilities of three broad classes of materials – metals,
semiconductors, and insulators. Simply put, the metals conduct the currents to where we want them to
go, the insulators stop the currents from going where we do not want them to go, and the semiconductors
control the currents, turning them on and off. We discussed the basic band structures and concepts of
metals, insulators and semiconductors in Chapter 4, as well as the idea of “holes” – the absences of
electrons in the valence band – which is a particularly important concept for semiconductor devices.
Now we expand on these ideas to set up the background concepts for understanding electronic and
optoelectronic devices.
When used in electronic devices, metals and insulators are often not crystalline materials (though they
can be), but semiconductors mostly are. To get the precise control we need with semiconductors, we
have to use very pure and perfect starting materials. Impurities – small amounts of other elements – or
imperfections (or “defects”) in the crystal lattice can make significant changes in how semiconductor
materials conduct electricity. Controlled “doping” (see Chapter 4 for a first discussion), which adds
small amounts of such other elements to semiconductors that are otherwise very pure, allows us in
practice to add electrons and/or holes into the semiconductor. Such doping is a major part of
semiconductor device design. Such impurities or defects have, however, less effect on electrical
conduction in metals or in the basic ability of insulators to insulate.
The simple models we use for metals and insulators typically treat them as if they were at least
approximately crystalline, though, so we can describe their properties also in terms of bands of states,
at least as a first approximation. Though deeper understanding of the behavior of insulating and metallic
materials and structures is important in getting the very best performance out of technologies like those
used for semiconductor integrated circuits, the details of the band structures themselves are less critical.
So here we concentrate in some greater depth on key ideas in semiconductors. The combination of band
structure concepts together with thermal distributions – especially the Fermi-Dirac distribution that
applies to electrons and holes – can give us a very strong basis for understanding semiconductor
electronic and optoelectronic devices and the important electrical and optical properties of the materials
we use to make them.
structure). Such a model is a good approximation for many bands near minima or maxima1. Taking the
energy at the minimum to be zero for simplicity, such a relation corresponds to
2 k 2
E= (6.1)
2meff
(which is the same as Eq. (4.45)), where the effective mass meff is parametrizing the curvature of the
parabola. This kind of approach works for both electrons and holes2.
Transport of electrons
If we think of an electron in some parabolic minimum (or maximum) in the band as having an energy
relative to the minimum (or maximum) as in Eq. (6.1) with effective mass meff, we can rewrite Eq. (6.1)
in terms of crystal momentum of magnitude p = k (see Chapter 4) as
pC2
E= (6.2)
2meff
pC = meff vg (6.4)
Technically, the particular velocity vg we are working with here is called the “group velocity”3.
1
This isotropic parabolic band approach is a good first approximation near zone center for the valence bands in
many semiconductors, including Si, Ge, and the III-V materials that crystallize in zinc blende form, and for the
lowest conduction band in zinc blende direct gap III-V materials like GaAs or InP. For the lowest conduction
bands in Si and Ge, in part because their lowest points do not lie at zone center, the band minima, though
approximately parabolic in each direction, have one effective mass in one direction and a different one in the other
two. The approach we are taking here can be extended to such cases, though we will not do that here.
2
Electrons in conduction band minima have positive effective masses, often considerably lighter than the actual
electron rest mass – ~10% of the electron rest mass is a common order of magnitude for the electron effective mass
in many direct gap materials, for example. Electrons near maxima in valence bands do actually have negative
effective masses – the parabola is curved downwards, not upwards; however, there we typically think in terms of
holes, and, for a variety of reasons that we will discuss later, those negative electron masses can be considered as
positive hole masses. Hole effective masses can be quite light, though more commonly the highest valence band
has a relatively heavy hole effective mass – ~ 40% of the actual electron rest mass, for example.
3
We are performing a rather informal derivation here, though the result is correct. Normally when we work with
waves such as light waves or sound waves in free space, we think that the wave is propagating at some velocity
=v f= λ ω / k , called the “phase velocity”. Here f is the frequency and λ is the wavelength; correspondingly ω is
the angular frequency and k = 2π / λ is the wavevector magnitude. As long as the phase velocity does not depend
on frequency (or equivalently on wavelength or wavevector), such a picture works, and some pulse we might make
would propagate with this (phase) velocity also. If, however, the phase velocity is different for different
frequencies, which happens at least to a small extent for most wave propagation inside matter, the actual
176 Chapter 6 Bands and electronic devices
Suppose now we apply an “external” force F to the moving “particle”, such as the force resulting from
an applied electric field. Then the work done in applying the force through a distance dx is
dE = Fdx (6.6)
The distance dx is equal to the group velocity vg times the time, dt , for which the force is applied, so
we have
dE Fdx
= = Fvg dt (6.7)
Hence
1 dE 1 dE dpC
=F = (6.8)
vg dt vg dpC dt
propagation velocity of a pulse is better described by the “group velocity”, which is vg = d ω / dk . In quantum
mechanics, once we consider time dependence, we can generally associate an energy E with an angular frequency
ω through E = ω . In that case, we can more rigorously derive Eq. (6.5) and Eq. (6.9) without even assuming we
are at some parabolic minimum. The actual movement of an electron is better considered in terms of some “pulse”
or “wave packet”, which has to be made up out of some range of k values, and that wave packet does indeed move
at this group velocity.
6.2 Transport of carriers 177
The average velocity of the electron is half this peak value, since we presume it starts on the average
from zero velocity and it accelerates linearly to this peak value on the average. So on this model the
average value of the electron velocity in the presence of this accelerating field E is
ets
vav = E (6.11)
2meff
We see in this simple model that the average velocity of the electron is proportional to the applied
electric field E. Such transport of an electron is called “drift transport”, and the vav is called the “drift
velocity”. Such transport is often written as
vav = µeE (6.12)
where
ets
µe = (6.13)
2meff
is called the “mobility”. This kind of “drift” transport, where the electron average velocity is
proportional to the applied electric field, will give rise to behavior like Ohm’s law, in which the current
(which is proportional to electron velocity) through some “resistor” is proportional to the voltage (and
hence the electric field) applied across it.
Transport of holes
At a maximum at the top of the valence band, we see that necessarily the effective mass we would
calculate would be negative. This leads to the apparently counter-intuitive conclusion that the electrons
would go backwards if we pushed them4. Though counter-intuitive, this conclusion is in fact correct.
The group velocity of an electron near such a maximum can indeed be “backwards” from what we would
expect. In some applied electric field, the electrons are going in whatever direction the group velocity
says they should be going in response to the force from an electric field, even if, counter-intuitively, that
is the opposite direction we would naively expect.
In terms of current, however, a negative charge going backwards is the same as a positive charge going
forwards. So one way to look at this is to pretend that the absence of an electron in some state in such a
“negative electron effective mass” maximum is behaving like a positive charge that is indeed going in
the same direction as such a positive charge would go in the same electric field. So, though it might
seem obvious that the hole, having an effective positive charge, should go in such a direction, this is the
result of the cancellation of two negative signs; the actual motion is of negatively charged electrons with
negative effective mass going “backwards”. The “absence of an electron” travels in the same direction
as the electrons occupying the states round about it in the band structure.
If this seems confusing5, the reader can take heart that this model, with its cancelling minus signs, works
so well that in practice for devices we can essentially always pretend that the holes behave as positive
charges with positive effective masses. We can also think of the hole kinetic energy as being positive if
we look at the band diagram upside down, just as we will be able similarly to think of positive hole
energies in the hole Fermi-Dirac distribution when looking at the band diagram upside down.
We can similarly analyze hole transport with ballistic and drift models, just like the electron case, and
the “positive hole mass and positive hole charge” effective model continues to work here also.
4
Remember that when we say “electrons” here we are speaking loosely; these are actually effective “quasi-
particles”, with negative charge, that arise from the solution of the Schrödinger equation in a periodic potential;
we are not changing the actual mass of “real” electrons or making it negative.
5
It seems confusing because it is confusing! Device texts are liable to gloss over this annoying point!
178 Chapter 6 Bands and electronic devices
Problems
6.2.1 (a) If a piece of GaAs is doped with beryllium (Be), with the Be atoms taking the places of some
of the Ga atoms, is it p-type or n-type? (Justify your answer briefly.)
(b) If silicon atoms are added to a piece of GaAs and they occupy gallium sites, do silicon atoms
work as acceptors or donors? (Justify your answer briefly.)
6
Even if we inject or remove electrons or holes in semiconductor, whatever population is there of either of them
essentially instantaneously (e.g., in picoseconds or less) settles into a thermal distribution, at least locally. The
reason is that the scattering of electrons or holes off one another is a very strong process (because of the Coulomb
interaction) that very quickly randomizes the energies of the electrons or holes into thermal distributions. The
temperature of that distribution could be different in different parts of the semiconductor. Those “carrier”
temperatures can be different from the background crystal (or “lattice”) temperature. The Fermi levels for electrons
and holes need not be the same, and those Fermi levels can also vary between different parts of a device structure,
but the local distributions will still very likely be thermal distributions, at some temperatures and Fermi levels,
because of this strong scattering.
6.3 Fermi levels and doping 179
Fig. 6.1. Band diagrams showing electron and hole populations as a function of doping and the
resulting Fermi energy for a hypothetical semiconductor, here shown for k space in one direction.
(Not to scale.) The middle panels presume relatively light n or p doping, so the Fermi level is
below the donor levels or above the acceptor levels. The bottom panel shows heavy “degenerate”
n doping, with the Fermi level significantly inside the conduction band.
180 Chapter 6 Bands and electronic devices
Fig. 6.2. Band diagrams showing electron and hole populations as a function of doping and the
resulting Fermi energy for a hypothetical semiconductor, here shown for one direction in real
space. (Not to scale.)
In a pure semiconductor – so an “undoped” semiconductor with no dopant atoms (also known as an
“intrinsic” semiconductor) – we know that the numbers of electrons in the conduction band must be
equal to the number of holes in the valence band. Otherwise, we would not have electrical neutrality. As
a result, the Fermi energy has to be near the middle of the band gap7.
Semiconductor band gap energies of common semiconductors used for electronic and optoelectronic
devices at room temperature or below are mostly in the range from ~ 0.5 eV to 3.5 eV. Even at room
temperature, which corresponds roughly to k BT 25meV , the band edges for both valence and
conduction bands are therefore > 10 k BT away from the Fermi level, so the Fermi-Dirac factor is a
small number for valence and conduction band states. At room temperature, though, this can still
correspond overall to non-negligible overall electron and hole concentrations in such an intrinsic
semiconductor, especially for semiconductors with small band gap energies (“narrow-gap”
semiconductors). At cryogenic temperatures, though, even narrow-gap intrinsic semiconductors will
have very low overall electron and hole concentrations.
If we dope the semiconductor n-type, we add in “donor” energy levels near the conduction band, and
also potentially a free electron in the conduction band, and we shift the Fermi level somewhere near to
the conduction band edge. Note that the donor energy levels themselves have an occupation probability
given by a Fermi-Dirac factor also, and this can be used, together with the known density of electron
states in the conduction band, to deduce the probability of the donor atoms being “ionized” to give their
electron to the conduction band. Since the separation of the donor levels from the conduction band edge
is typically quite a small number, such as 10 meV, then at low doping densities, the probability of this
ionization is quite large. With increasing n-type doping density, the Fermi level will move progressively
from near the middle of the band gap up towards the conduction band edge.
When the Fermi level is still several k BT or more below the conduction band edge, the Fermi-Dirac
factor and distribution can be approximated with their mathematically simpler Maxwell-Boltzmann
limit. With further increasing doping density, the Fermi level can also move up past the energy of the
donor levels, and even into the conduction band. If the Fermi level has moved a few k BT or higher into
the conduction band, we call this situation “degenerate” doping, or a “degenerate” semiconductor. In all
such cases where the Fermi level is closer than a few k BT from the band edge or when it has moved
above the band edge into the fully “degenerate” situation, the full Fermi-Dirac factor or distribution has
to be used to describe the statistics accurately.
7
If the electron and hole effective masses, and hence densities of states, are not equal, it will not be exactly at the
middle, but it will be near the middle within some amount of energy of the scale of k BT .
6.4 Diodes 181
Similar phenomena happen with increasing p-doping, but in the opposite sense. Adding p-doping to a
pure (intrinsic) semiconductor will progressively move the Fermi level from near the middle of the band
gap energy down towards the valence band8.
As a practical matter, when we are deliberately doping a semiconductor material either n-type or p-type,
the Fermi level will essentially always end up within a few k BT of the edge of the corresponding band
(the conduction band for n-type doping, and the valence band for p-type doping). On the electron-volt-
sized scale on which we draw energy band diagrams, this means that graphically the Fermi energy will
always be essentially very close to the corresponding band edge for such doped materials.
6.4 Diodes
So far, we have discussed isolated semiconductor materials, considering their Fermi levels separately
for each such material. Much of the practical usefulness of semiconductor materials comes when we
join them together. One of the simplest and most useful such structures is a p-n diode, where we join9
an n-type material and a p-type material. We will look first at what happens when we join the n-doped
and p-doped regions, but without applying any voltage to the diode.
8
Degenerate p-type doping is relatively hard to achieve because of the heavy effective mass typically found in the
highest valence band. That means that a very large number of holes would be required to move the Fermi level
substantially into the valence band.
9
In practice, we essentially never just take a piece of n-type material and a piece of p-type material and then
somehow stick them together to make such a diode. Usually such structures are grown either with these regions of
different doping built in during the growth of the crystal itself by varying the added doping during the progressive
growth of the crystal, or the dopants are diffused or implanted into the material after the crystal growth.
10
Note that, before joining, there is no requirement that the number of ionized donors in the n-type material is the
same as the number of ionized acceptors in the p-type material, even if the figure is drawn this way for graphic
simplicity. After joining, the numbers of ionized acceptors and ionized donors in the depletion region of the diode
will, however, be equal, preserving charge neutrality overall.
11
There is in fact a very small number of electrons in the conduction band of the p-type material, and similarly a
very small number of holes in the valence band of the n-type material; we will return to this point when we discuss
diode currents below. These numbers are so small that they do not affect the “electrostatic” argument about charges
and voltages, but they are important in understanding especially “reverse” leakage current in diodes.
182 Chapter 6 Bands and electronic devices
diffusion. We can view the same random walk diffusion process as taking place for the free holes. So
electrons will initially tend to diffuse from right to left and holes will tend to diffuse from left to right.
Fig. 6.3. Sketch of p and n doped regions before (left) and after (right) joining
As the electrons and holes diffuse, they upset the charge balance in the regions near the junction of the
two materials. We can view the electrons and the holes in a given region as essentially canceling one
another out; electrons will tend to fill in any “holes” they find, and similarly holes will tend to fill up
with any electrons they “find”. So in a region near the junction, there will be relatively few electrons
and holes – a region called the depletion region (so-called because it is “depleted” of mobile charges).
The fixed charges from the ionized acceptors and donors are still present in this depletion region, and
these separated positive and negative charges give rise to an electric field, a field that acts in such a
direction as to discourage any further diffusion of the mobile charges by pushing them in the other
direction.
This mechanistic picture is correct, but we can also understand what is happening based on chemical
potentials (here, Fermi levels), which leads to a simple and precise characterization of the result of
joining these materials. From our discussion of equilibration12 when we are able to exchange particles,
we concluded that the chemical potential must be the same in such connected regions when we are in
equilibrium. So, on joining these two materials and allowing them to reach diffusive equilibrium (at the
same temperature), then we know that the chemical potential, or here, equivalently, the Fermi energy,
must be the same throughout the combined structure.
In the n-type material, we know that, at least far from the junction, the Fermi energy will be somewhere
near the conduction band edge in some moderately n-doped material, and in the p-type material it will
be somewhere near the valence band edge in some moderately p-doped material, as sketched in the top
part of Fig. 6.4. Once we join the materials and allow the electrons and holes to diffuse, then we know
that the Fermi level must the same throughout the structure after we have achieved equilibrium, as
sketched in the bottom part of Fig. 6.4.
To a good approximation, in the “depletion region” in the middle of the combined structure, there is
little or no free charge; only at the very edges of this region do we get to a slightly ambiguous region
where we start to have some amount of free charge in the form of free electrons and holes; that region
is generally quite small in electronic device structures that are at least moderately large, so a simple
approximation is just to assume that this depletion region ends essentially abruptly at both sides. Under
that “depletion approximation” it is relatively easy to solve for the full behavior of the structure,
12
By “equilibration” we simply mean “coming to equilibrium”.
6.4 Diodes 183
including the precise form of the “bending” of the band edges inside the depletion region, though we
will not do that here.
Fig. 6.4. Sketch of the band structure as a function of position before (top) and after (bottom)
joining p- and n-type semiconductors.
Applying voltage
To understand from our statistical physics what happens when we apply voltage to a diode, we need the
other equivalent definition of chemical potential that we introduced previously in Eq. (5.71) and which
is proved there. In this definition, the chemical potential also corresponds to the amount of energy added
per particle as we add particles, provided that in doing so we keep the entropy constant; that is, restating
this relation here,
∂U ∂σ
=µC ≡ −τ (6.15)
∂N S ∂N U
So, with this definition, we see that, if we raise the electrostatic potential of a set of charges by raising
the voltage, then the chemical potential associated with adding any such particles rises accordingly.
Merely changing the electrostatic potential seen by a whole collection of charges in a system by
effectively “lifting the system up” in an electrostatic sense does not change the entropy of the system.
So, changing the energy per particle of a set of charges by raising the voltage merely corresponds to
raising the chemical potential of that set of charges by exactly the same amount of energy per particle.
184 Chapter 6 Bands and electronic devices
As a result, if we change the voltage in volts by some amount, we merely change the chemical potential
by the particle’s charge times that voltage13. If we choose to write energies in electron-volts, then the
change in energy of an electron when raising the voltage by V volts is simply
∆E = −e × V J ≡ −V eV (6.16)
So, in diagrams like Fig. 6.4 or Fig. 6.5, applying a voltage V to some conducting region is equivalent
to moving the band diagram and the chemical potential for that conducting region by an amount −e × V
joules or an amount numerically equal to −V electron-volts14.
If we apply a “forward” bias to a diode, as illustrated in Fig. 6.5, various things happen. First, we should
note that, because of the “drive” from the voltage source, which is continually now putting energy into
the system, such a situation is no longer one of thermal or diffusive equilibrium15. Outside the depletion
region, but within the n-doped region on the right, however, the electron distribution will still locally be
one in which the material is approximately in thermal equilibrium, and it is meaningful still to define a
Fermi energy for the electrons. The same is true outside the depletion region but within the p-doped
region on the left. These Fermi levels are, however, now different, with the one on the left having been
lowered by the corresponding reduction in electrostatic potential energy for the electrons on the left
compared to those on the right because of the applied voltage V.
Given that the Fermi level for the electrons is lower on the left than it is on the right, electrons will try
to diffuse from the right to the left within the conduction band. Similarly, holes will try diffuse from the
left to the right because they see a “lower” Fermi level for holes on the right (remembering we have to
“stand on our heads” to visualize what happens for holes). Both of these diffusions added together are
what gives the “forward” current in a diode16.
Fig. 6.5. Band diagram for a diode under forward bias with a voltage V from a battery.
13
Remember that voltage is joules/coulomb. The potential energy change of a particle of charge Q coulombs on
raising its electrostatic potential by an amount V volts is simply QV.
Since we are applying a positive voltage to the p material in Fig. 6.5, because of the “-” sign here we are therefore
14
Once some electrons manage to get from the right to the left, they will then essentially equilibrate to
form part of the thermal distribution of electrons on the left, with the corresponding Fermi level on the
left. This equilibration process has to involve what are known as “recombination” processes, by which
an electron can fall from the conduction band to “fill in” a hole in the valence band. Similar processes
happen on the right for holes that manage to diffuse from the left to the right.
The situation for a reverse-biased diode is sketched in Fig. 6.6. Here we see that the barrier for the
previous diffusion of electrons from the n-doped material into the p-doped material has become very
high, and similarly for the diffusion of holes from the p-doped material into the n-doped material, so
this “forward” diffusion current becomes exponentially small. Hence, we obtain the basic “rectifying”
behavior of a diode, which is that it passes current relatively well in one direction and relatively not well
in the other.
There is, however, a small “reverse” current that flows in the “reverse-biased” diode. In the p-doped
material, there is still a relatively small but non-zero electron concentration in the conduction band.
Electrons in the conduction band in a p-doped material are known as “minority carriers”; by contrast,
holes in a p-doped material are “majority carriers”. There are, of course, very many more majority
carriers than there are minority carriers. Our Fermi-Dirac statistics enables us to calculate just how large
this number of minority carriers is.
We have indicated this small number of (minority carrier) electrons in the conduction band in the p-
doped region just at the edge of the depletion region in Fig. 6.6. (This density of minority carriers extends
through the whole p-doped region but, for clarity, we are only sketching them near the edge of the region
here.) These electrons see a lower Fermi level to their right in the n-doped material, and so they will
tend to diffuse over there17.
Fig. 6.6. Band diagram for a diode under reverse bias with a voltage V from a battery.
A similar process exists for the small number of holes in the valence band in the n-doped material; these
holes in the n-doped material are the minority carriers there (with the electrons in n-doped material being
the majority carriers there). These minority carrier holes will tend to diffuse over to the p-doped material.
17
To be more complete, we could also say that the minority carriers in such a reverse-biased diode diffuse into the
depletion region, and are then “swept” by the strong electric field there, in what may then be more like “drift”
transport, into the other side of the diode. The amount of this current is, however, controlled by the initial diffusion
process, so that process sets its magnitude.
186 Chapter 6 Bands and electronic devices
This small current under reverse bias is known as “reverse leakage", and the sum of these two “minority
carrier diffusion currents”, which is the magnitude of this reverse leakage current, we call the “saturation
current” IS. Note that we cannot completely get rid of this reverse current in diodes; diodes cannot be
absolutely perfect rectifiers.
Fig. 6.7. Sketch of balancing diffusion currents between the two sides of a diode at zero bias.
Minority and majority carriers are sketched explicitly near to the edges of the depletion region,
though these densities extend through the rest of the p-doped region on the left and through the
rest of the n-doped region on the right.
Majority carrier electrons diffuse from the n-doped material into the p-doped material and minority
carrier electrons diffuse from the p-doped material into the n-doped material. Similarly majority carrier
holes diffuse from the p-doped material into the n-doped material and minority carrier holes diffuse
from the n-doped material into the p-doped material. At zero bias, all these diffusion currents balance
out and we have diffusive equilibrium, consistent with the constant Fermi level throughout the structure,
as shown in Fig. 6.4 and Fig. 6.7. So, at zero applied bias (as in Fig. 6.7), the diffusion current from the
majority carriers on each side has to balance that from the minority carriers diffusing in the opposite
direction, so this majority carrier diffusion will lead to a “forward” current of magnitude IS also, in a
direction opposite to the “reverse leakage” minority carrier diffusion.
We will not give a full derivation of the diode current-voltage characteristic here; to do so would require
that we treat the diffusion and recombination processes more quantitatively than we have space to do
here. It is, however, now straightforward to write down the overall form of that characteristic in a simple
model.
Once we forward-bias the diode, as shown in Fig. 6.8, presuming a Maxwell-Boltzmann approximation
for occupation probabilities of electron states, we have increased the occupation probability of all the
electron states in the conduction band in the n-doped semiconductor by an amount exp ( eV / k BT )
because we have raised the Fermi level by an amount of magnitude eV; hence, the majority carrier
diffusion current of electrons from the n-doped material increases by this factor. A similar phenomenon
happens for the majority hole diffusion from the p-doped material. Since the forward diffusion current
for V = 0 was of magnitude IS (to balance the “reverse leakage” minority carrier diffusion), the forward
diffusion current is now, after multiplying both electron and hole forward diffusion currents by the same
exponential factor, a total of I S exp ( eV / k BT ) . To get the net forward current, we merely now need to
subtract the “reverse leakage” minority carrier diffusion current to obtain
6.4 Diodes 187
eV
=I I S exp − 1 (6.17)
k BT
which is the current-voltage characteristic of an “ideal” diode.
Fig. 6.8. Sketch of the diffusion currents under forward bias. The majority carrier diffusion of
both holes from the left and electrons from the right has increased, giving net “forward” current.
In Eq. (6.17), the “-1” term is the “reverse” diffusion of the minority carriers, which gives the reverse
leakage current, and the exp ( eV / k BT ) term is the “forward” majority carrier diffusion. In summary,
with no applied voltage, the majority carrier diffusion is balanced by minority carrier diffusion, but with
increasing forward bias, the majority carrier diffusion increases exponentially to give this classic,
approximately exponential curve, as sketched in Fig. 6.9.
Fig. 6.9. Current (I ) vs. Voltage (V ) characteristic of a diode, with current in units of IS, and
voltage in units of k BT / e .
Problems
6.4.1 In a diode with no bias applied to it, the total number of electrons in the conduction band of the
n-type material has to be the same as the total number of holes in the valence band of the p-type
material. True or false? (Justify your answer briefly.)
6.4.2 Electrons are the minority carriers in a p-doped material. True or false? (Justify your answer
briefly.)
188 Chapter 6 Bands and electronic devices
6.4.3 In a diode at zero bias at equilibrium, the Fermi level used to describe the electron distribution
in the p-doped material is at the same energy as the Fermi level used to describe the electron
distribution in the n-type material. True or false? (Justify your answer briefly.)
6.4.4 Consider a specific ideal diode at room temperature (which for this question we can take to
correspond to k BT = 25 meV ). At a large reverse bias (so at some reverse voltage of magnitude
V 25 mV ), the current flowing in this diode has a magnitude of 1 nA.
Suppose now we apply various other voltages to this diode. For each of the following voltages,
what is the magnitude of the current flowing through the diode? For any non-zero answers, state
also whether this current is in the “forward” direction or the “backward” direction (where the
“backward” direction corresponds to the direction of current flow under large reverse bias..
[Note: a positive voltage here corresponds to a forward-bias voltage.]
(i) -25 mV
(ii) 0 V
(iii) 25 mV
(iv) 100 mV
(v) 500 mV
6.4.5 We want to use a semiconductor diode as a temperature sensor. We presume the diode can be
regarded as an “ideal” diode. The current through the diode at large reverse bias is of some
magnitude I S . We then forward-bias the diode in such a way that, regardless of the temperature
or the resulting voltage across the diode, the (forward) current through the diode is always the
same, at some value I B . (Such “constant current” biasing could be accomplished, for example,
by biasing through a very large series resistor from a very large (forward) power supply voltage,
or by various other known circuits.)
(i) Derive an expression for the voltage across the diode as a function of temperature T (in
kelvin).
(ii) Suppose for this ideal diode that I S = 1 nA and that the (forward) bias current is set at
I B = 1 μA . Presume the temperature changes from room temperature of 20 C (293.15 K)
to 30 C. How much does the voltage across the diode change?
6.5 Conclusions
In this chapter, we have seen how the concepts of bands of states and of thermal distributions allow us
to understand much of the basic operation of crystalline semiconductor devices. There are, of course,
many other aspects about the physics of such devices that we have not discussed here, but these core
ideas we have introduced give us many of the concepts and vocabulary that allow us to understand a
broad range of mechanisms and devices in electronics and optoelectronics.
Light and quantum mechanics
7.1 Introduction
The history of the quantum mechanics of light goes back to the very start of quantum mechanics itself,
with Planck’s proposal that light is emitted in quanta of size =E hν ≡ ω . Famously, this proposal led
to the correct explanation of the spectrum of light emitted by hot bodies, which in its ideal form is known
as the black-body spectrum. Right from the beginning, then, light brings together the two main topics
of this book – quantum mechanics and thermal physics of statistical mechanics and thermal distributions.
We now have enough understanding of both the quantum and thermal aspects to derive and explain the
main behaviors of light and how it interacts with matter. We can explain light emission and absorption,
and the relation between them, in everyday bodies; we can understand key practical issues such as the
limitations of conventional light bulbs and the light from the sun; and we can introduce key modern
concepts such as the process of stimulated emission that makes lasers work.
We already derived the thermal distribution – Planck’s distribution – that lies behind the statistical
mechanics and thermodynamics of light in Chapter 5. We will go on to derive key concepts in the
thermodynamics of light, including black-body spectra and Kirchhoff’s law of radiation, and we will
derive Einstein’s famous “A and B coefficient” argument that relates emission and absorption processes,
including stimulated emission. We will start with the photoelectric effect, whose explanation by Einstein
in 1905 was a key step in the development of quantum mechanics in general and of light in particular.
flow. (That kinetic energy in electron-volts would be numerically equal to the magnitude of the voltage
between the plates.)
Fig. 7.2. Explanation of the photoelectric effect. Light is composed of photons of energy hν,
where ν is the frequency. The kinetic energy of the emitted electron is given by the difference
between the photon energy and the work function, φ.
The explanation of the photoelectric effect proposed by Einstein is as shown in Fig. 7.2. We presume
that electrons in metal sit in some sort of “pool”, which we now understand to be a Fermi-Dirac
distribution, filled up approximately to the Fermi level. There is an energy barrier, called the work
function, φ, here, that electrons have to overcome to get out of the metal. A photon of energy hν,
exceeding the height of the barrier, will correspondingly raise the energy of the electron, allowing it to
be extracted from the metal. The excess energy the electron has from the photon will appear as kinetic
energy of the electron.
7.3 Light and modes 191
In the worst case of the electron traveling directly away from the metal plate towards the collecting
electrode, all of that kinetic energy will correspond to motion in the direction towards the collecting
electrode, and we will need the full “stopping voltage” to slow that electron down to zero velocity so it
does not quite get to the other electrode. In such an experiment, we find the stopping voltage increases
linearly as we increase the light frequency (and hence the photon energy).
The explanation of this effect by Einstein requires not only that light is emitted in quanta by objects,
which was Planck’s proposal, but that it actually also consists of such quanta – the photons of energy
hν.
Problems
7.2.1 Suppose that the work function of some metal is 2eV, and that we illuminate it with blue photons
at a wavelength of 425 nm. What minimum stopping voltage should be required to stop the
emitted electrons from reaching the collecting plate? Specify also the sign of the stopping
voltage, with a positive voltage on the stopping plate relative to the emitting metal being a
positive number. [Hints: you do not need to use the mass of the electron nor do you need to know
the distance from the emitting metal plate to the collecting plate. How much kinetic energy in
electron-volts do you need to “climb uphill” against a stopping voltage of magnitude 1V?]
Fig. 7.3. A cubic box or “cavity” of side L, with perfectly reflecting internal walls. Also sketched
are sine waves in one direction that fit within the box.
Because we presume perfectly reflecting walls, we argue that we can take the wave to be zero at these
walls. Of course, in one direction we can have waves of the form sin k x x where kx would be chosen to
fit integer numbers of half-wavelengths within the box, hitting zero at the two walls in that direction.
The possible values of kx would then be
192 Chapter 7 Light and quantum mechanics
π 2π 3π nx π
kx = , , , , . (7.1)
L L L L
where nx is a positive integer.
For three dimensions, we need to generalize, and we propose that the possible modes are of the form
sin ( k x x ) sin ( k y y ) sin ( k z z ) in which integer numbers of half waves fit inside the box in each of the three
dimensions, so ky and kz each have sets of values also like those in Eq. (7.1).
Note that, with expressions like Eq. (7.1) for each of kx, ky and kz, the allowed values for each component
kx, ky and kz are spaced by π / L . So, if we are interested in some range of kx of size ∆kx, we should
expect to find
∆k x L
= ∆k x (7.2)
(π / L ) π
different possible kx values in that range. We can argue similarly for ky and kz. So, for ranges ∆kx, ∆ky,
and ∆kz, we should expect to find
L L L L3
∆k x ∆k y ∆k z = ∆k x ∆k y ∆k z ≡ g k ∆k x ∆k y ∆k z (7.3)
π π π π3
different possible values of the wave vector
k = k x xˆ + k y yˆ + k z zˆ (7.4)
1
For large volumes, this idea of a density of states per unit volume does seem to work, but the argument we have
used here (and it is a common one for this problem) is not very satisfactory. It relies on the assertion of some kind
of resonator that does not actually exist (the internally reflecting cube). The reason for the use of the resonator is
that we can formally solve eigenproblems for resonators, which lead to orthogonal sets of resonator modes, and
we need that orthogonality for a meaningful counting. So, since historically we have not had another good way of
getting that orthogonality, as physicists we just think (or hope) we are going to get away with this.
There is another way of approaching the problem which does not require assertion of non-existent resonators, and
gives orthogonal modes in any volume. Arguably, it gives the correct way out of this problem. See D. A. B. Miller,
"Waves, modes, communications, and optics: a tutorial," Adv. Opt. Photon. 11, 679-825 (2019)
https://doi.org/10.1364/AOP.11.000679
7.4 Thermal radiation 193
where we have summed over the two polarizations possible for each mode of an electromagnetic field
(e.g., horizontal polarization and vertical polarization) to get the factor of 2 here.
We presume the volume or “box” we are considering is large, so the allowed values of the components
kx, ky, and kz are very closely spaced. Hence, we can approximate the sum by an integral where we allow
each component kx, ky, and kz to range from approximately zero to infinity, and we use the density of k-
states gk = V / π 3 of Eq. (7.5). So the total energy per unit volume for all these modes is
∞ ∞ ∞
2 2
U= ∑∑∑
V kx k y kz
q ω
V= ∫ k ∫ 0=k ∫ 0 q ω gk dk x dk y dk z (7.7)
kx 0 =y z
Nothing in the integral in Eq. (7.7) depends on the direction of the wave vector k, so we can change to
spherical polar coordinates and then perform our “volume” integral using spherical shells of radius k,
surface area 4π k 2 , and thickness dk; we only need one “octant”, that is 1/8, of that spherical shell
because we are only considering positive values of k in each of the three directions. So, in using these
spherical shells we need to divide by 8. Hence we have
∞ ∞
2 1
=
∫ q ω gk 4π k 2 dk = ∫ q ω 2 k 2 dk
8V k 0=k 0
U
π
(7.8)
For light in free space, for angular frequency ω, we have2 k = ω / c , so changing variables from k to ω,
we have
∞
2 3 ∫
U ω 3 q dω (7.9)
π c ω =0
We can think of this integral, Eq. (7.9), as an integral over a quantity uω, that is
∞
U = ∫ uω d ω (7.10)
0
2
This is equivalent to the relation f = c / λ between the frequency f, wavelength λ, and velocity of light c.
194 Chapter 7 Light and quantum mechanics
Then uω is the energy density per unit (angular) frequency for light in thermal equilibrium at a
temperature T. We can write this explicitly (substituting for q from Eq. (5.98)) as
ω3
uω = (7.11)
π 2 c3 exp ( ω / k BT ) − 1
We can call this expression Planck’s radiation law3. This actually gives the form of the “black-body”
spectrum, though we have a few more conceptual steps to complete that connection. For the moment, it
is still just the energy per unit volume per unit (angular) frequency in a light field in thermal equilibrium
at a given temperature. This function is plotted in Fig. 7.4.
Fig. 7.4. Plot of uω , the energy per unit volume per unit angular frequency, in units of
(k B3T 3 / π 2 2c3 ) , as a function of the photon energy, in units of kBT.
Note one important point about this Planck spectrum: the exponential function on the bottom line of Eq.
(7.11) grows faster at high frequencies or photon energies than the ω 3 term on the top line, which causes
this distribution to fall off at high frequencies. In turn this is because the Boltzmann factor exponentially
decreases the probability of the existence of photons of large photon energies, and hence reduces the
energy per mode. This is the reason why the quantization of light into photons solves the problem of the
“ultraviolet catastrophe” of classical physics, in which the energy in the light would grow without bound
as we considered higher and higher frequencies.
We can also consider this energy density in terms of wavelength. Obviously, whether or not we integrate
over (angular) frequency ω in Eq. (7.10), or if we change the variable to integrate correctly over
wavelength instead, we should get the same total energy density in the electromagnetic field. So, we
could define an energy density per unit wavelength, uλ , so that
∞
U = ∫ uλ d λ (7.12)
0
Formally, we can change the variable in the integration in Eq. (7.10) from ω to λ. Noting that
ω 2=
= π f 2π c / λ (7.13)
3
The form we are giving here is just for the energy density per unit volume. Often, Planck’s radiation law is quoted
as a radiated intensity per unit solid angle (or “spectral radiance”). In that case, one divides this expression by 4π
to account for the 4π steradians of solid angle, and then also presumes that this energy propagates in the direction(s)
of interest at a velocity c (the velocity of light), so we have to multiply by c to get that the resulting intensity and
hence spectral radiance, so, overall, multiplying by c/4π compared to our formula here.
7.4 Thermal radiation 195
we see first that when ω = 0 , λ = ∞ , and vice versa. So, noting also that d ω / d λ = −2π c / λ 2 , we have
∞ 0 ∞
dω 2π c
=U ∫=
0
uω d ω ∫∞ uω d=
λ
dλ ∫ o
λ2
uω d λ (7.14)
So, therefore, we can identify the quantity uλ ≡ 2π cuω / λ 2 . All the remains is to rewrite uω in terms of
wavelength λ rather than angular frequency ω. So, from Eq. (7.11) and substituting from Eq. (7.13), we
have
2π c (2π c)3 1 8π hc 1
=uλ = (7.15)
λ π c λ exp ( hc / λ k BT ) − 1 λ exp ( hc / λ k BT ) − 1
2 2 3 3 5
In the last step on the right in Eq. (7.16), we changed variables to x = ω / k BT . Using the result
∞
x3 π4
∫0 exp ( x ) − 1 15
dx = (7.17)
Problems
7.4.1 Suppose I have a 1 m3 box, which we can think of as having perfectly reflecting walls. I imagine
that this box is near the surface of the sun, so it is at the same temperature as the sun, which we
take to be 5800 K. How much energy is stored in the electromagnetic radiation inside the box?
7.4.2 Presume my room has a size 3 × 4 × 2.5 m3, and that the electromagnetic field inside the room is
in thermal equilibrium with the walls at a temperature of 25°C (and the room is otherwise
completely dark). How much energy, in joules, is there in the electromagnetic field in my room?
7.4.3 We can see from Fig. 7.4 that the energy per unit frequency (or angular frequency) (Eq. (7.11)
in a radiation field at thermal equilibrium at a given temperature is peaking at just below 3k BT .
In fact, after some work (the calculus does not have a particularly simple form), it is possible to
show that this photon energy of peak energy per unit frequency is at 2.821 k BT . Calculate this
peak photon energy in electron-volts for the following cases:
(i) the temperature of the sun (~ 5800 K)
4
Josef Stefan (1835–1893)
196 Chapter 7 Light and quantum mechanics
Fig. 7.5. A cavity with a small hole, (a) considering light emitted through the hole, (b) considering
light shining into the hole (to be absorbed), and (c) covering the hole with a black (perfectly
absorbing) surface.
Suppose now we close off this hole with a “black” (perfectly absorbing) surface, as shown in Fig. 7.5(c).
In thermal equilibrium, the radiation emitted by this black surface into the resonator must equal the
radiation landing on the black surface from the inside of the resonator. Hence the thermal radiation
emitted from a black surface is equal to the thermal radiation from a small hole in a cavity (at the same
temperature). Hence the Stefan-Boltzmann and Planck radiation laws can allow us to calculate the
thermal radiation emitted from a black surface.
One key point about “black bodies” is that they set a limit on the emission of light by hot bodies in
thermal equilibrium. Suppose that we presumed there was some body that at thermal equilibrium could
7.6 Absorption and emission of light 197
emit more electromagnetic radiation than a black body. If there were such a body, we could heat up the
cavity, which starts at the same temperature as the body, solely with the radiated thermal energy from
that body. Presuming we could do this to some useful degree, then the cavity would become hotter and
so we would have heat flowing from a cooler body to a hotter body in a closed system, which would
violate the second law of thermodynamics.
These results from the thermal physics of light are fundamentally bad news for incandescent light bulbs.
There is a limit to how much light we can get out of something just by heating it up, and there is nothing
we can do about this limit. If we want more light from a given size of incandescent bulb, we have to
make it hotter, and materials can only be made so hot before they evaporate.
5
For a recent discussion and extension of the ideas of this law, see also D. A. B. Miller, L. Zhu, and S. Fan,
“Universal modal radiation laws for all thermal emitters,” PNAS 114, 4336-4341 (2017)
doi:10.1073/pnas.1701606114
198 Chapter 7 Light and quantum mechanics
Here, we start with Planck’s radiation law, Eq. (7.11), which we wrote as the energy per unit volume
per unit (angular) frequency. For this argument, we will use the version of that law using ordinary
frequency ν rather than angular frequency, so we substitute using ω = 2πν in Eq. (7.11), and since we
want energy per unit frequency rather than per unit angular frequency, we also multiply the previous
formula, Eq. (7.11), by 2π .
Hence we have the energy per unit volume per unit frequency in the electromagnetic field at thermal
equilibrium, which we write as
8π hν 3 1
ρ (ν ) = (7.19)
c 3
exp ( hν / k BT ) − 1
Now, to construct this argument, we presume we have some atoms, all the same kind, and that they have
two possible states, state 1 of energy E1 and state 2 of (greater) energy E2, as sketched in Fig. 7.6(a).
In this collection of atoms, we have N1 (per unit volume) in state 1 and N2 (per unit volume) in state 2,
with everything – atoms and the electromagnetic radiation – in thermal equilibrium at temperature T.
We also expect in thermal equilibrium that the ratio between N2 and N1 should be given by the Boltzmann
factor, so that
N2 E −E
= exp − 2 1 (7.20)
N1 k BT
We presume the only way atoms change states from state 2 to state 1 is by emitting a photon of energy
hν where
ν E2 − E1
h= (7.21)
and similarly the only way the atoms change states from state 1 to state 2 is by absorbing a photon of
energy hν .
Because of Eq. (7.21), we can rewrite the Boltzmann factor as
N2 hν
= exp − (7.22)
N1 k BT
We expect there is an emission process – spontaneous emission – by which an atom in state 2 will emit
a photon and fall to state 1. We expect this process happens independent of the amount of light also
present. We propose the number of such spontaneous emission transitions is simply proportional to the
number of atoms in state N2, and we propose a constant coefficient A21 – an intrinsic property of the
atom, not dependent on temperature – giving a spontaneous emission rate A21 N 2 , as sketched in Fig.
7.6(b).
We propose also that there is an absorption process resulting from absorbing photons from the
electromagnetic field, and we propose an absorption rate from the N1 atoms in state 1 for taking them to
state 2 that is proportional to amount of electromagnetic radiation per unit frequency of frequency near
to ν – in other words, this absorption rate should be proportional to ρ (ν ) in thermal equilibrium – and
we presume some constant coefficient B12 for this process that is an intrinsic temperature-independent
property of the atoms. So we have a total absorption rate B12 N1 ρ (ν ) , as sketched in Fig. 7.6(c).
These proposals so far are perfectly reasonable. If atoms are in their excited state, then it is reasonable
to say they will fall to their lower state, emitting light, even in the dark – we see that all the time with
hot bodies – and it is quite reasonable to say that this (spontaneous) emission process still happens in
the same way even if we shine a light on the material. We might change the populations of the states by
shining the light on the system, but still we can reasonably expect that the strength of this process itself
is not changed by doing so. Similarly, there is some absorption process for light that has a strength
determined by intrinsic properties of the atoms.
7.6 Absorption and emission of light 199
Fig. 7.6. (a) Illustration of atoms with two levels, separated by energy hν . (b) The process of
spontaneous emission of photons. (c) The process of absorption of photons. (d) The process of
stimulated emission of photons.
Now, a basic presumption of statistical mechanics is that we have “detailed balance”: in equilibrium,
we must have exactly equal rates for processes in both directions. In equilibrium, if there is some process
that excites systems, then there must be another process that de-excites them, and the rates must balance
overall, otherwise we would not be in equilibrium.
But this detailed balance gives a problem if we stop our model here. Asking for detailed balance so far
means we would be equating the spontaneous emission and the absorption for our thermal equilibrium
system of atoms and radiation, which would give
A21 N 2 = B12 N1 ρ (ν ) (7.23)
By presumption, A21 and B12 are intrinsic coefficients for the atoms, and they do not depend on
temperature, but the right hand side of this equation does depend on temperature, so there is an apparent
contradiction.
Einstein realized a way out of this difficulty. He proposed another process – stimulated emission – a
process just like absorption, but from state 2 to state 1. He proposed another coefficient B21, also
presumed to be an intrinsic temperature-independent atomic property. Hence we obtain an additional
emission rate B21 N 2 ρ (ν ) , as sketched in Fig. 7.6(d). Now detailed balance gives
A21 N 2 + B21 N 2 ρ (ν ) =
B12 N1 ρ (ν ) (7.25)
where the emission terms are on the left, and the absorption terms are on the right. Rewriting Eq. (7.25)
gives
So if we choose
B21 = B12 (7.27)
then we can have detailed balance independent of temperature, and we also obtain the relation
200 Chapter 7 Light and quantum mechanics
8π hν 3
A21 = B12 (7.28)
c3
So, in summary, if we presume intrinsic microscopic processes for spontaneous emission, with rate
A21 N 2 , and absorption, with rate B12 N1 ρ (ν ) , then to achieve detailed balance we need to propose
stimulated emission, with rate B21 N 2 ρ (ν ) , and we find the results Eqs. (7.27) and (7.28).
Since Eq. (7.28) tells us that the coefficient A21 for spontaneous emission just involves fundamental
constants, the frequency ν, and the coefficient B12, the same microscopic atomic coefficient underlies
all three processes of absorption, spontaneous emission, and stimulated emission. This is a remarkable
result that ties together all the absorption and emission processes.
We can derive one other simple result here, which is the ratio of spontaneous and stimulated emission.
We note we can write the energy per unit volume per unit frequency, as in Eq. (7.19), as
8π hν 3
ρ (ν ) = n ph (7.29)
c3
where nph, which we now think of as the number of photons per mode for a mode of frequency ν, is just
the same as the quantity q from the Planck distribution, Eq. (5.98) ( q = 1 / exp ( ω / k BT ) − 1 ).
Explicitly now, using frequency rather than angular frequency, Eq. (5.98) becomes
1
n ph = (7.30)
exp ( hν / k BT ) − 1
The total spontaneous emission rate per unit volume per unit frequency is, therefore, using Eq. (7.28)
8π hν 3
Rsp A=
= 21 N 2 B12 N 2 (7.31)
c3
and the total stimulated emission rate per unit volume per unit frequency is
8π hν 3
=Rstim B= ( )
12 N 2 ρ ν B12 N 2 n ph (7.32)
c3
So
=
Rstim (=
8π hν / c ) B N
3 3
n
12 2
n ph (7.33)
Rsp (8π hν / c ) B N
3 3
12 2
ph
where we have also used Eq. (7.27). So we come to the simple conclusion that the ratio of stimulated
to spontaneous emission in a given mode is the number of photons in the mode.
Under normal circumstances, as in the light from the sun, there is not much stimulated emission present,
though it is not quite negligible. For visible radiation, such as for a photon energy ~ 2.5 eV,
corresponding to a wavelength of ~ 500 nm, the number of photons per mode at 5800 K (the approximate
temperature of the sun) directly from the sun is
1 1 e
= exp −2.5
exp ( hν / k BT ) − 1 exp ( 2.5e / k BT ) − 1 k BT
(7.34)
1.602 × 10−19
exp −2.5 exp ( −5 ) =
0.007
1.38 × 10−23 × 5800
so about 0.7% of the sun’s photons at 500 nm are from stimulated emission.
For light from the sun, we might not regard such an amount of stimulated emission as being very
important, even if it is necessary for the statistical mechanics of light to work out correctly. A key point,
however, is that, though we derived Eq. (7.33) so as to make detailed balance work at thermal
7.7 Conclusions 201
equilibrium, we expect this process of stimulated emission and the result Eq. (7.33) to remain valid even
when we are not at thermal equilibrium. In the case of a laser, we will have very large numbers of
photons in the “lasing” mode, which means the laser emits almost entirely through stimulated emission,
putting yet more photons into the same mode, and giving us laser beams.
Problem
7.6.1 Suppose we have a gas of atoms, each of which has two possible energy states, E1 and E2 , with
E2 > E1 .Suppose that all the atoms are in their upper ( E2 ) state. I come into this gas with n
photons, all of an appropriate energy to cause transitions between these level. All these n photons
are in one specific mode. Then, the total rate of emission of photons by the gas into this mode is
(on the average) which one of the following possibilities (justify your answer briefly)?
(a) independent of n
(b) proportional to n
(c) proportional to n + 1
[Hint: Look very closely at Eqs. (7.31) to (7.33), and remember we are looking for the total
emission rate into this mode.]
7.7 Conclusions
In this chapter, we have seen that, with the basic presumption that light exists as photons of energy
E hν ≡ ω , we can put this together with our general understanding of thermal distributions to
=
construct a powerful model of light. We can explain the “black-body” spectrum that was such a mystery
at the end of the 19th century, and we can generally explain radiation by hot bodies and the basic relation
between emission and absorption. With Einstein’s A and B coefficient argument, we can even explain
the process of stimulated emission, the core process behind all lasers. With these principles, we are now
in a position to understand a wide range of optical and optoelectronic devices and applications.
Semiconductor optoelectronics
8.1 Introduction
Our ability to work with light has been transformed by devices that enable us quickly and efficiently to
convert between electrical and optical energy and signals. The devices that enable such conversions are
called optoelectronic devices. Viewed broadly, that category includes conventional incandescent and
fluorescent light bulbs, vacuum-based devices like cathode ray tubes and plasma displays and some
specialized photodetectors1, and liquid crystal display technology. A large fraction of all modern
optoelectronic devices are, however, based on semiconductors, and these include a broad range of
photodetectors (including solar cells), optical modulators, light-emitting diodes, and lasers. Here, we
will introduce the physics behind such semiconductor optoelectronics, building on the principles from
previous chapters.
Many of these devices are built using diodes. In electronics, diodes are used first for their rectifying
properties. In semiconductor optoelectronics, however, diodes are used for quite different reasons. First,
they can collect the current of electrons and holes that are generated by absorbing photons inside the
diodes; we use this both for solar cells and for detecting signals on light beams. Second, forward biasing
the diodes injects electrons and holes into the junction region of the diodes, and those electrons and
holes can recombine by emitting photons; we use this process in the light-emitting diodes that give us
efficient sources of illumination, and in semiconductor lasers that we use both for generating the light
for powering optical fiber communications and as an efficient primary source for high power lasers
generally. Third, reverse biasing diodes provides an efficient way of applying large electric fields inside
devices without much current; we use this for several kinds of light modulators to turn light beams on
and off in optical communications – these large electric fields can change the material’s optical
properties, such as its optical absorption strength or its refractive index. Here we will introduce
especially the use of diode structures in detection and light emission.
Photoconductors
In a semiconductor, absorbing a photon whose photon energy exceeds the band gap energy can take an
electron from the valence band and put it in the conduction band, leaving a hole in the valence band. We
can call this process generation of an “electron-hole pair”. The electrons and holes generated by such
1
Photomultiplier tubes and related devices like image intensifiers can be in this vacuum photodetector category.
8.2 Photodetectors and solar cells 203
absorption of a photon “across” the band gap energy are sometimes called “photocarriers” and we refer
to them as being “photogenerated”.
Fig. 8.1. Absorbing a photon of energy greater than the band gap energy can create an electron in
the conduction band and a hole in the valence band, which can conduct electricity in the presence
of an electric field until they recombine.
It can take some time2 for the electron and the hole to “recombine”, with the electron falling back into
the valence band. Before they recombine, they can conduct electricity if we apply an electric field, such
as by applying a voltage between one end of the semiconductor and the other. In a piece of
semiconductor, this leads to what is called photoconduction, as illustrated in Fig. 8.1. This process is
used for some photodetectors.
Photodiodes
If we absorb a photon to generate an electron-hole pair inside the depletion region of a diode, as shown
in Fig. 8.2, the electron will drift or diffuse “downhill” into the n-doped region, and the hole will
similarly drift or diffuse “downhill” (in its terms3) into the p-doped region.
Fig. 8.2. Absorbing a photon across the band gap inside the depletion region leads to the
photogenerated electron and the hole moving towards the n- and p-doped regions, respectively.
2
Recombination lifetimes for electrons and holes in semiconductors can be as long as milliseconds or as short as
picoseconds. There are many mechanisms for such recombination, including radiative recombination in which a
photon is emitted in the recombination process, and recombination through “trap” states (often called “non-
radiative” recombination).
3
“Downhill” for holes is “uphill” on the electron energy diagram. We can view hole energies by looking at these
electron energy diagrams “upside down”.
204 Chapter 8 Semiconductor optoelectronics
The charge separation from the electron and hole moving in opposite directions creates an electrostatic
potential (or equivalently an electric field) in such a direction as to resist the charge separation; positive
charges build up in the p-doped semiconductor and negative charges build up in the n-doped
semiconductor, each of which acts to repel the movement of the corresponding photocarrier, at least
partly.
This charge separation and the resulting electric potential therefore create an external voltage, which
can drive a current through a resistor, as sketched in Fig. 8.3. When we generate voltage from a diode
like this, we call this the “photovoltaic” operation of a diode. This process is the basic mechanism in a
semiconductor solar cell, generating power by absorbing photons to generate both voltage and current.
Such photovoltaic diodes can also be used as photodetectors, generating voltage and current in response
to some input light signal.
Fig. 8.3. Shining light into the depletion region of a photodiode generates a voltage that can drive
current through an external resistor.
Fig. 8.4. A reverse-biased diode efficiently and rapidly sweeps photocarriers out of the depletion
region, allowing a signal voltage across the resistor that is proportional to the incident light.
It is also common to use reverse biased photodiodes as photodetectors, as sketched in Fig. 8.4. Such
reverse-biased devices do not generate power in the same way as the photovoltaic diodes do. With
reverse bias, however, the electrons and holes are swept particularly quickly out of the depletion region
8.2 Photodetectors and solar cells 205
by the large electric field in there, giving high-speed devices. This rapid sweep-out also means that we
get nearly exactly an electron of current round the circuit for every photon absorbed, with very few
recombining with holes inside the depletion region before being swept out.
One other benefit of the reverse-biased operation is that the signal voltage across the resistor can be
proportional to the amount of light (number of incident photons per second) shining on the diode. In the
photovoltaic case, by contrast, the output voltage tends to be approximately the logarithm of the incident
light power because of the underlying exponential behavior, and hence the output voltage is not
proportional to the input optical power.
Fig. 8.5. Current vs. voltage curves for a photodiode in the “dark” without incident light (upper
curve) and with incident light (lower curve).
We can now understand the current vs. voltage characteristic of a photodiode, as sketched in Fig. 8.5.
In the simplest model, the “photocurrent” – the current from the movement of the photogenerated
electrons and holes – behaves like a reverse current; photogenerated electrons and holes are moving
“downhill” into their respective n- and p-doped regions, which is the direction opposite to the forward
diffusion current in a diode.
So, presuming the photogenerated electrons and holes do get to their respective n- and p-doped regions
without recombining, we expect simply to subtract the magnitude of this “photocurrent” IPC from the
“dark” diffusion current, shifting the curve down by the photocurrent IPC, and giving a net current
eV
=I I S exp − 1 − I PC (8.1)
k BT
Note that a solar cell would operate in the lower right quadrant of the current vs. voltage characteristic
of Fig. 8.5. This is the quadrant in which the diode would generate a power of magnitude V × I in an
external resistor in a circuit like that in Fig. 8.3.
Problems
8.2.1 At equilibrium, with zero applied bias voltage in a semiconductor diode, and in the absence of
light, the electron flux from the p region to the n region is the same as the electron flux from the
n region to the p region, and similarly for holes. Thus, the total net electrical current is zero under
such conditions.
(a) Presume that this diode is at zero bias (i.e., VA = 0). Now we shine light, with photon energy
greater than the band-gap energy in the material used to make this diode, into the depletion
region, where some of it will be absorbed, creating (photogenerated) electrons and holes.
For the electrons and holes created in the depletion region, draw the movements of
electrons and holes on the band diagram. Is the photogenerated electrical current positive
or negative? (Note: consider positive current to be current in the same direction as that in
206 Chapter 8 Semiconductor optoelectronics
a forward-biased diode.) What is the driving force for any movements of photogenerated
electrons and holes in the depletion region?
(b) Based on your answer from part (a), should we apply a positive or a negative voltage to
obtain zero current? (Note: positive voltage corresponds to the sign of voltage that would
forward-bias the diode.) Explain your answer. (Note: the voltage applied to reach zero
current is called the “open-circuit” voltage.)
8.2.2 Suppose that we have an “ideal” diode with a saturation current I S = 100 nA . We presume that
we can shine light into it in such a way that any light that is absorbed is absorbed in the depletion
region of the diode. We presume that the semiconductor material of the diode has a bandgap
energy of Eg = 1 eV (i.e., the minimum separation between the valence and conduction bands
is 1 eV), and that any photons of energy hν ≥ Eg are absorbed. [Any photons of energy hν < EG
are not absorbed, though we will not use this fact in this question.] Each photon that is absorbed
is presumed to give rise to the equivalent of one electron of “photocurrent” in the circuit. We are
working at a “room” temperature of 293 K.
[Note: for this problem, you may find it useful to use some computer program to graph the
appropriate expressions for you. The optimum conditions requested can be deduced just from
looking at the graphs, if necessary on some “zoomed-in” portion of them; algebraic proofs of
those conditions are not required.]
Suppose that we are shining sufficient light on this diode that we are generating a photocurrent
of 10 mA. [Note: this is not necessarily the total current in the diode – there will also be the usual
diffusion current in the diode - though this photocurrent would be close to the total current in the
diode if we were running the diode at a large reverse bias.] We presume that we have connected
the two ends of the diode (i.e., the “outsides” of the p and n regions) together using only a resistor.
We will later deduce what value, R, we want to choose for this resistor.
Our ultimate goal in this question is to find the conditions of operation of this photodiode so that
it is most efficient as a “solar cell” in converting the incident light to electrical power, which we
will then dissipate in the resistor. Until we make the final choice of R, we can presume
experimentally that we can adjust its value as needed.
(a) Graph the current-voltage characteristic of this diode when there is this much photocurrent
(10 mA). [Graphing from 0 to 0.28 V should be a suitable range.]
(b) Graph the power dissipated in the load resistor as a function of the voltage that would be
across it. [Again, a range from 0 to 0.28 V should be suitable.] [Note, incidentally, that this
power should be positive. The power in the diode – a product of the current through it and
the voltage across it – would give a negative number, because it is generating power, not
dissipating it. The power dissipated in the resistor should be -1 times this.]
(c) By graphing over an appropriate smaller range (or by some other method if you prefer),
find (to at least two significant figures)
(i) the maximum power, Pmax , that could be dissipated in the resistor
(ii) the voltage, Vmax , across the diode when the maximum power is being dissipated in
the resistor.
(d) What would be the value of the resistor R to achieve this maximum power, Pmax ? (Two
significant figures is sufficient here.) [Note: you can deduce this from the voltage Vmax
and a knowledge of the dissipated power and/or the corresponding total current at that
voltage.]
(e) Supposing that this photocurrent is being generated by a light beam with photons of photon
energy 1 eV (all of which are being absorbed in the depletion region of the diode).
(i) What is the power in the light beam?
8.3 Light-emitting diodes and lasers 207
(ii) What is the efficiency of this solar cell in converting light power to electrical power
in the resistor (when operating at the maximum efficiency conditions derived above)?
(f) Suppose instead the light beam has photon energy 1.5 eV , though it is still generating the
same photocurrent.
(i) What is the power in this light beam?
(ii) What now is the efficiency of converting light power to electrical power in the
resistor?
(g) If you could redesign the diode in such a way as to reduce I S = 100 nA to I S = 10 nA ,
would the efficiency of the diode in converting light to electrical power increase or
decrease? [Note: you can prove and/or justify this just by appropriate calculations –
algebraic proofs are not required.]
When we analyze the microscopic processes of light absorption and emission in crystalline materials,
we find that the strong processes involve electrons and holes in the same (or very nearly the same) k
states in different bands. That is, the absorption process that would take an electron in a given k state in
the valence band and put it into essentially the same k state in the conduction band (what we can call a
“vertical” transition) is a particularly strong one; from Einstein’s A and B coefficient argument, we
understand also that therefore the emission process that would take an electron in a given k state in the
conduction band and emit a photon to “fill in” a hole at essentially the same k state in the valence band
is a similarly strong process.
One common rationalization of this so-called “direct” absorption process is to argue that the photon has
a very small momentum compared to that of the scale of crystal momenta when we look at states in a
Brillouin zone, and so to “conserve momentum”, optical transitions between bands have to be essentially
“vertical” on such a diagram. If we perform a more detailed analysis of such absorption processes, we
do indeed find that this effective momentum can be viewed as being conserved in such a process, though
that is a conclusion that comes out of the full analysis, not a principle we put into it.
When we are considering absorption, such “direct” or “vertical” transitions between the valence and
conduction band states are quite possible also in indirect band gap materials like silicon or germanium.
In absorption, we start with essentially full valence bands and essentially empty conduction bands, so
for each full valence band state there is an empty conduction band state “above” it, and so such “direct”
absorption transitions are quite possible. Such indirect materials can be good absorbers of light, and
silicon is very widely used as a photodetector.
When, however, we inject electrons into the conduction band and holes into the valence band, in silicon
and germanium, holes collect in one part of the Brillouin zone and electrons collect in other parts, as
illustrated on the left of Fig. 8.6; the holes collect in the valence band maxima that are at the center of
the zone (at k = 0 ), but the electrons collect in minima that are at or near the edge of the Brillouin zone
in these particular “indirect gap” materials. So where there are electrons in states in the conduction band,
there are no holes in the corresponding k states in the valence band, and where there are holes in states
in the valence band, there are no electrons in the corresponding states k in the conduction band.
Fig. 8.6. An indirect band gap material (left) showing weak indirect transitions, and a direct band
gap material (right) showing strong direct transitions that can emit photons efficiently, both
shown in a simple “one-dimensional” Brillouin zone.
III-V materials like GaAs and InP and many others have “direct” band gaps in which the minimum in
the conduction band lines up above the maximum in the valence band. Hence when we inject electrons
into the conduction band and holes into the valence band, they are in the same regions of the Brillouin
zone, as shown in the right of Fig. 8.6. Now if we find an electron in a given k state in the conduction
8.3 Light-emitting diodes and lasers 209
band, we may well find a hole in the corresponding k state in the valence band, and so we have strong
possibilities for emission of a photon as the electron “falls” down into that k state in the valence band.
It is possible to make “indirect” transitions that emit a photon and that do not conserve crystal
momentum for the electrons, but these are weak4. The electron may be more likely to recombine with a
hole through some other, “non-radiative” process before these weak radiative processes have much of a
chance to happen, so such “indirect” light emission is generally an inefficient process.
Fig. 8.7. Forward biasing a diode injects both electrons and holes into the junction region, where
some of them may recombine to emit photons.
Under such forward bias, electrons and holes near the junction can effectively have their own Fermi
levels, different from each other. These are known as “quasi Fermi levels”. The electrons can be in
approximate thermal and diffusive equilibrium with each other, so it is meaningful to define a (quasi)
Fermi level for that population of electrons. Similarly, the holes can be in approximate thermal and
diffusive equilibrium with each other, with their own (quasi) Fermi level. Hence we can have useful
models for their distributions. This situation is illustrated in Fig. 8.8.
Such a situation is like one we might encounter in chemistry, with two different “species” – here the
electrons and the holes – each having their own chemical potentials. These two species can then “react”
to generate a new species, which in the present case is the photons, with the “reaction” being the process
we call “radiative recombination”.
For light-emitting diodes to begin to work, it is enough just to get some electrons in the conduction band
and some holes in the valence band so that the radiative recombination can happen. Indeed, forward
biasing a diode made from direct gap materials will quite generally result in light emission by this
process. When we design such device well, such light-emitting diodes can be very efficient sources of
light. Generally, the emission of light is at photon energies close to, but above, the bandgap energy of
4
Such transitions typically have to involve some other interaction as well, with indirect transitions usually
involving a crystal vibration or a “phonon” that effectively contributes the necessary crystal momentum to allow
the transition to happen.
210 Chapter 8 Semiconductor optoelectronics
the semiconductor because the holes gather near to the top of the valence band and the electrons gather
near to the bottom of the conduction band.
Fig. 8.8. Illustration of the quasi Fermi levels for the electrons and for the holes inside the junction
region in a forward-biased diode.
Lasers
Lasers are light sources in which the dominant emission of light is by stimulated emission. The word
“laser” is an acronym for Light Amplification by Stimulated Emission of Radiation, and the acronym is
a variant on the earlier “maser” that generated microwave radiation.
There are many different kinds of lasers. The first used specific transitions in ruby in 1960. Various
approaches for lasers use transitions (a) associated with specific impurities in glasses or crystals, (b)
between atomic or molecular levels in gasses, (c) in dyes in liquid form, and (d) in semiconductor diodes.
Following the first demonstration in 1970 of semiconductor diode lasers that could operate continuously,
the use of semiconductor diode lasers grew particularly strongly. Now, especially because of their
extensive use in optical communications, diode lasers dominate over all other approaches in terms of
the number in use.
Fig. 8.9. Left: Sketch of a laser, consisting of a gain medium and mirrors (possibly partially
transmitting) to reflect many of the photons back into the gain medium. Right: Sketch of the
threshold behavior of the laser light output as a function of the “pumping” of the laser gain
medium
8.3 Light-emitting diodes and lasers 211
A laser generally consists of a “gain medium” and some sort of resonator with mirrors, as sketched on
the left in Fig. 8.9. The concept here is first that, because of stimulated emission, more photons come
out of the gain medium in a given electromagnetic mode than went in. Then mirrors, forming a “cavity”,
reflect the photons in that mode back and forth through the gain medium to generate more such photons
by more stimulated emission, and so on.
We may allow some photons to be transmitted through one or both mirrors so we get some optical output
power. We design the device, however, so that the gain of photons from the stimulated emission in the
gain medium exceeds the loss of photons both by transmission through the mirror(s) and by other
absorption or scattering processes. Then the “lasing” process can grow regeneratively to give a large
number of photons circulating (and emitted) in one mode, which is the laser action.
Generally, lasers have a “threshold” level of drive or “pumping” below which they will not lase. This is
typically because there will be some background fraction of photons that will always be lost due to
mirror transmission and other losses, and the gain per pass through the cavity has to rise to a level above
that loss before the regenerative “lasing” process can take over. Below that threshold, there may still be
light emission, predominantly through conventional spontaneous emission, known as “luminescence”;
that spontaneous emission may happen into all the modes of the electromagnetic field at appropriate
frequencies corresponding to emission transitions in the gain medium. Above the threshold for lasing,
the emission into the lasing mode will tend to take over as the dominant light emission. This “threshold”
emission behavior is sketched on the right of Fig. 8.9.
The laser will tend to lase in the “strongest” mode – that is, the mode with the largest positive difference
between gain and loss per “round trip” in the cavity. This lasing mode can correspond to a very specific
wavelength, which may correspond with the wavelength with the largest “gain” in the gain medium.
The shape of the beam can also be quite a narrow beam in a very specific direction, with the precise
shape of the laser beam being set by the form of the specific “strongest” mode of the resonator formed
by the mirrors. (Typically, the “strongest” mode is the one with the lowest loss, and, more generally, the
one with the greatest positive difference between gain and loss.)
Population inversion
A necessary condition for a laser to work is that the probability of stimulated emission in a given
transition from a high energy level 2 of some quantum system to a low energy level 1 of that quantum
system has to exceed the probability of absorption, which is the transition from the low energy state 1
to the high energy state 2. Otherwise, photons coming in are more likely to be absorbed than to stimulate
emission, and there will be no “gain” – that is we will have fewer photons coming out of the gain medium
than went in.
Now, we know from Einstein’s A and B coefficient argument that, for N2 such quantum systems in the
upper state 2, the stimulated emission rate is B12 N 2 ρ (ν ) and, for N1 such quantum systems in the lower
state, the absorption rate is B12 N1 ρ (ν ) ; note that we have used Einstein’s conclusion that the same
microscopic coefficient (Einstein’s B12 coefficient) governs both processes. So the condition for there
to be a greater probability of stimulated emission than absorption is that
N 2 > N1 (8.2)
This requirement is called an “inverted population”. One reason why we do not see much stimulated
emission in everyday life is that no system in thermal equilibrium will have the population of the higher
energy state of the system more likely that the population of the lower energy state of the system; we
can see this from Boltzmann’s factor, for example. So no thermal light source will ever give us
212 Chapter 8 Semiconductor optoelectronics
population inversion, and so, for any given photon entering such a system at thermal equilibrium, it is
always more likely to be absorbed than it is to stimulate the emission of another photon5.
Suppose, however, by some scheme of pumping atoms or quantum systems into their upper states (so
increasing N2), and, if necessary clearing out the “lower” states (so reducing N1), we can “invert” the
population. So, instead of the set of “atoms” or the material having net loss for photons, it can have net
gain. We could express this gain as a factor G, the ratio of photons out to photons in, or as the gain γ per
unit length, relating the two by
G = exp ( γ L ) (8.3)
for a gain region of length L. For example, we commonly express γ in “per centimeter”, cm-1 or “inverse
centimeters”.
So, for example, as illustrated in the left of Fig. 8.10, we could “pump” a system somehow from a level
1 to a level 3, then have it “relax” to level 2. If the “pumping” and the relaxation are fast enough, then
we may be able to make it more likely that the system is in level 2 than it is in level 1, and so we could
invert the population of such “atoms”.
There are various ways of accomplishing the “pumping”, including electrical discharge to excite the
atoms or molecules, or optical pumping, such as with a flash-lamp or even another laser. Typically, the
“relaxation” will be some non-radiative process, such as collisions or some other scattering process.
Radiative processes can be problematic for the relaxation since the emitted photons could then excite
other “atoms” from level 2 back to level 3, which is undesirable when we are trying to maximize the
population of level 2.
Fig. 8.10. Three-level (left) and four-level (right) laser level schemes.
Such a three-level system can work for lasers. It has the challenge, though, that we have to make sure
we excite a majority of all the “atoms” in this way; otherwise, we cannot invert the population – any
“atoms” left in their ground state, level 1, are able to absorb light. The four-level system as shown in
Fig. 8.10 on the right helps avoid this problem. Now we are only trying to invert the population between
levels 2 and 3. If some of the “atoms” are left in level 1 because they are not excited at all, this does not
matter for the population inversion of levels 2 and 3. Hence, we are not required to excite a majority of
the “atoms”, which opens up more possibilities for viable lasers. In the four-level scheme, it is important
that the relaxation from level 2 to level 1 is relatively rapid, so the lower “lasing level”, level 2, is kept
as nearly empty as possible; this relaxation should ideally be non-radiative also.
5
Note that we are not saying that the population has to be inverted before we see stimulated emission. Stimulated
emission can and will certainly occur if we have some systems in their upper state and have incoming photons of
the right photon energy to match the energy difference between the states. However, if the population is not
inverted, we are more likely to see absorption from the lower state to the higher state rather than stimulated
emission from the higher state to the lower state.
8.3 Light-emitting diodes and lasers 213
Semiconductor lasers
Semiconductor lasers are effectively four-level lasers. Using a semiconductor diode in forward bias, just
as discussed for light-emitting diodes in general, we pump electrons and holes into the junction region.
If we pump hard enough, we may manage to get an inverted population. A simple version of such an
inverted population would be for the (quasi) Fermi levels of both electrons and holes to move into the
corresponding bands, as sketched in Fig. 8.11, which would give “pools” of electrons in the conduction
band and holes in the valence band. Such a situation would certainly be inverted, because at a k state
where we find an electron in the conduction band, we find a hole in the valence band.
Such semiconductor lasers are effectively four-level schemes. The conduction electron population at
lower energies inside the pool is continually repopulated by relaxation from higher-energy states, with
the whole population being replenished by the forward current. Similarly, electrons that fall into states
in the top of the valence band after emitting a photon then tend to be “cleared out” of such states by
holes relaxing “upwards” into these states.
Fig. 8.11. Sketch of an inverted population in semiconductor bands, with a “pool” of electrons in
the conduction band and a “pool” of holes in the valence band, giving an inverted population.
The situation need not be quite as extreme as shown in Fig. 8.11 to achieve effective inverted populations
in semiconductors6. Indeed, because of their larger effective mass the holes may well not have the Fermi
level in the valence band anyway, and in practice the “pools” do not usually have such well-defined
“surfaces”, having instead widths of the order of a few kBT because of the Fermi-Dirac distribution.
Semiconductor lasers can have particularly large “gain”, which makes them relatively easy to make.
They can also be very compact, with size scales as small as microns to hundreds of microns in many
types in current use. Among lasers that are directly driven electrically, they are also particularly efficient.
Problem
8.3.1 I have two different light-emitting diodes (LEDs). One is designed to emit blue light, and the
other is designed to emit red light. For the purposes of this question, you can presume that the
LEDs all emit light with photon energy very close to the bandgap energy of the semiconductor
6
In a semiconductor, for an electron state in the conduction band and another one at essentially the same k in the
valence band, the probability of a photon causing a radiative (absorption or emission) transition from an electron
state in one band to an electron state in the other is proportional to the probability fe that the starting state is
occupied by an electron (or the probability 1 − f h that it is “empty” of a hole) and the probability 1 − f e that the
ending state is empty of an electron (or the probability fh that it is “full” with a hole). Here we use the fe Fermi-
Dirac distribution for electrons in the conduction band, with the electron quasi Fermi level, and the fh Fermi-Dirac
distribution for holes in the valence band, with the hole quasi Fermi level. Hence, the probability of absorption on
this transition is ∝ (1 − f e )(1 − f h ) and the probability of (stimulated) emission on this transition is ∝ f e f h . The
difference between these two probabilities is therefore ∝ f e f h − (1 − f e )(1 − f h ) = ( f e + f h ) − 1 . So stimulated
emission is more likely than absorption for f e + f h > 1 , which becomes the effective condition for an inverted
population for a given k state in a semiconductor.
214 Chapter 8 Semiconductor optoelectronics
material from which they are made (and, therefore, we are using different semiconductors for
the two different colors of LEDs).
In this experiment, we will be shining light from one LED into another, and looking at
consequences for the LED into which I am shining the light. Note also that LEDs, since they are
diodes, can also function as photodiode detectors if we shine appropriate light into them.
(i) Suppose first of all, that I shine light into one of the LEDs, and I see current in a current
meter connected to the LED. What is the sign of the electric current that I see? Specifically,
if I connect the p end of the LED to the “+” terminal of the current meter and the n end of
the LED to the “-” negative terminal of the current meter, will it show a positive or negative
current. (A positive electrical current flowing into the “+” terminal of the meter will be a
positive current in this sense).
(ii) I connect a power supply to the blue LED, forward biasing it, and as a result it emits blue
light. I shine this light into the red LED. The red LED is connected just to a current meter.
Will I expect to see significant current in the current meter?
(iii) As in (ii) above, I bias the blue LED so it is emitting blue light, and shine that into the red
LED. This time the red LED is connected just to a volt meter. Will I expect to see
significant voltage on the volt meter?
(iv) I flip round the experiment from part (ii), connecting a power supply to the red LED to
forward bias it, as a result of which it emits red light. I shine this red light into the blue
LED. The blue LED is connected just to a current meter. Will I expect to see significant
current in the current meter?
(v) As in (ii) above, I shine light from the blue LED into the red LED, but I do not connect the
red LED terminals to anything at all. Will I be able to see red light emitted from the red
LED if I look carefully?
8.4 Conclusions
The use of semiconductors, and especially semiconductor diode structures, opens up a wide range of
useful optoelectronic devices, with many useful applications. Such structures allow us to generate solar
power. The use of light-emitting diodes for illumination may well also save substantial amounts of
power compared to previous methods. Semiconductor lasers and other devices like semiconductor
optical modulators transmit nearly all of our data over optical fibers, and semiconductor photodetectors
receive it at the other end. Semiconductor detectors now record nearly all of our pictures in digital
cameras. Despite this wide range of devices and applications, the underlying principles of all these
devices are largely covered by the physics we have introduced here, and that physics lays a solid
foundation for the engineering of optoelectronic devices and systems.
Epilogue
Now that we have reached the end of this story, we can look back at how we progressed through a linked
set of ideas. We started with modes and eigenstates in the classical world of oscillations and waves,
ideas that are quite useful there in themselves. These ideas prepared us for the quantum mechanical
view: eigenstates become the way we look even at states of matter, and form a surprising and different
basis for light and electromagnetism. Armed with these ideas of states of matter and light, we introduced
the statistical view of the occupation of these states. That statistical view gives us the idea of entropy.
Entropy then forms the core of the whole fields of statistical mechanics and thermodynamics. Those
fields then explain much of the macroscopic behavior of the world around us while also linking back to
the microscopic quantum mechanics.
If you came to this material just with a background such as basic classical mechanics and
electromagnetism, I think you now understand that background on its own does not come close to
explaining the world around us; it also does not give us many of the tools that are essential for
constructing the technology that enables much of our modern lives. But I also hope you now understand
that there is a relatively small number of key ideas that do allow us to explain how the everyday world
around us actually works.
We have seen that often this explanation is counter to our intuition. The quantum mechanical view
necessarily has many concepts in it that do not agree with our everyday “common sense”, but that
nonetheless do give a coherent and practically useful basis for explaining and engineering the world.
We do also have to be honest that there are a few aspects of the basics of quantum mechanics that still
puzzle us, especially the idea of quantum mechanical measurement. We learn to live with those,
however, as we use the practical techniques of quantum mechanics to design real materials, devices and
technology.
The ideas of statistical mechanics explain much about complex systems and why, because of
overwhelming statistical probabilities, much of what is random can have apparently coherent and
predictable behavior. If we understand that the larger system is always, in a statistical sense, going to
proceed whenever possible to the macrostate with the largest multiplicity or entropy, then we have the
key principle that guides many of the processes we see around us.
In the space available here, we could only just introduce many of the core concepts, often with simple
“toy” models, though sometimes with quite practical extensions through to the principles of modern
devices and to behaviors we see in the macroscopic world every day. Of course, there is always more to
learn about the topics we introduced in the preceding chapters. But you can be sure that the material and
ideas we have covered here are key physical ideas that do largely explain the everyday modern world
216 Chapter 9 Epilogue
around us; they give us both the concepts and the vocabulary to proceed to the next level, both in science
and engineering.
If this is where you have to leave this story, at least for the present, then you do so much better informed
about a physical world that both enables and bounds much of what we can do. If you are heading into a
deeper study or use of physical science or technology, then you can have the confidence that you now
understand a core set of concepts that is quite finite, reliable and universally useful, and that will give a
solid foundation for all that you are about to learn there. Whichever path you are taking, I hope that you
have found this story here at least informative, possibly interesting, and hopefully even fun!
A.1 Fundamental constants
Constant Name Symbol Numerical Value ∆ Units
Bohr magneton µB 9.274 009 68 x 10-24 20 J T-1
Boltzmann constant kB 1.380 6488 x 10-23 13 J K-1
Electric constant εo 8.854 187 817… x 10-12 - F m-1
Electron g factor ge 2.002 319 304 361 53 53 -
Electron mass me 9.109 382 91 x 10-31 40 kg
Elementary charge e 1.602 176 565 x 10-19 35 C
Fine structure constant α 7.297 352 5698 x 10-3 24 -
Magnetic constant µo 4π x 10-7 - H m-1
Planck’s constant h 6.626 069 57 x 10-34 29 Js
Planck’s constant/2π 1.054 571 726 x 10-34 47 Js
Proton mass mp 1.672 621 777 x 10-27 74 kg
Proton-electron mass ratio m p / me 1 836.152 672 45 75 -
Rydberg constant R∞ 10 973 731.568 539 55 m-1
R∞ hc / e 13.605 692 53 30 eV
Speed of light in vacuum c 299 792 458 - m s-1
The “∆” quoted is the absolute value of the uncertainty in the last two digits of the quoted numerical
value corresponding to one standard deviation from the numerical value given. So, for example, the
possible values of Planck’s constant within one standard deviation of the best estimate lie between 6.626
069 28 and 6.626 069 86 J s.
The speed of light in vacuum has been chosen to have the exact value shown because the meter is now
defined as the length of the path traveled by light in vacuum during the time interval of 1/299 792 458
of a second. The magnetic constant (also known as the permeability of free space) is chosen to have the
value shown because it is an arbitrary constant that arises from the choice of the system of units and the
electric constant (also known as the permittivity of free space) then follows from it and the (chosen)
velocity of light because, by definition, c = 1/ ε o µo , so all three of these quantities have no uncertainty
by definition. The Bohr magneton is µ B = e / 2me. The fine structure constant is α = e2 / 4πε o c .
These values are the CODATA Internationally recommended values as of 2010. Reference
http://physics.nist.gov/cuu/Constants/index.html .
218 Appendix A Physical constants and units
A.2 SI units
We list here most of the major SI base and derived units. For a full list, see
http://physics.nist.gov/cuu/Units/units.html .
SI base units
Base quantity Name Symbol
length meter m
mass kilogram kg
time second s
electric current ampere A
thermodynamic temperature kelvin K
SI derived units
Derived quantity Name Symbol In terms of other In terms of SI base
SI units units
frequency hertz Hz - s-1
force newton N - m·kg·s-2
pressure, stress pascal Pa N/m2 m-1·kg·s-2
energy, work, quantity of heat joule J N·m m2·kg·s-2
power, radiant flux watt W J/s m2·kg·s-3
electric charge, quantity of electricity coulomb C - s·A
electric potential difference, volt V W/A m2·kg·s-3·A-1
electromotive force
capacitance farad F C/V m-2·kg-1·s4·A2
electric resistance ohm V/A m2·kg·s-3·A-2
electric conductance siemens S A/V m-2·kg-1·s3·A2
magnetic flux weber Wb V·s m2·kg·s-2·A-1
magnetic flux density tesla T Wb/m2 kg·s-2·A-1
inductance henry H Wb/A m2·kg·s-2·A-2
SI Prefixes
Factor Name Symbol Factor Name Symbol
1024 yotta Y 10-1 deci d
1021 zetta Z 10-2 centi c
1018 exa E 10-3 milli m
1015 peta P 10-6 micro µ
1012 tera T 10-9 nano n
109 giga G 10-12 pico p
106 mega M 10-15 femto f
103 kilo k 10-18 atto a
102 hecto h 10-21 zepto z
101 deka da 10-24 yocto y
Upper Case Lower Case Name
Α α alpha
Β β beta
Γ γ gamma
∆ δ delta
Ε ε epsilon
Ζ ζ zeta
Η η eta
Θ θ theta
Ι ι iota
Κ κ kappa
Λ λ lambda
Μ µ mu
Ν ν nu
Ξ ξ xi
Ο ο omicron
Π π pi
Ρ ρ rho
Σ σ sigma
Τ τ tau
Υ υ upsilon
Φ φ phi
Χ χ chi
Ψ ψ psi
Ω ω omega
C.1 Introduction
In this Appendix, we extend the qualitative discussion in Chapter 6 of Fermi levels, doping, and electron
and hole populations in semiconductors, showing how simple quantitative relations can be established.
The main additional concept we need to introduce is the idea of the density of states, which lets us model
how many electron (or hole) states there are within given energy ranges. Together with the Fermi-Dirac
distribution, we can then connect carrier (electron and/or hole) populations quantitatively to Fermi
levels.
Fig. C.1. Sketch of equally spaced points in kx and ky and spheres of constant energy for the case
of an isotropic parabolic band.
Now, we know from the Bloch theorem that the allowed values of k are separated by an amount 2π / Na
for a one-dimensional “crystal” with N unit cells of size a. Equivalently, if we said the crystal was of
length L = Na , then the allowed values of k are separated by 2π / L . Now we generalize to three
dimensions. For simplicity, we will presume the crystal is a cube of side L. So now we presume for a
wavevector k that each of its components – kx, ky, and kz in the x, y, and z directions respectively – can
take on values similarly spaced by ∆k where
C.2 Density of states 221
2π
δk = (C.1)
L
We could sketch out all of the allowed values of k as a set of points in a three dimensional diagram
plotted as a function of the components kx, ky, and kz. This would just be a regular set of points equally
spaced in all three directions. In Fig. C.1, we have sketched this set of points, though we are showing
only a two-dimensional cross-section in the kx and ky plane at k z = 0 .
Though we know how the various possible states are spaced in k, we need to know how they are spaced
in energy E. To do so, we need some model – a band structure – that relates E and k. We saw that, near
to a band minimum or maximum, we expect parabolic relations between E and components of k. The
simplest situation to consider is where this parabolic relation is the same in all three directions, which is
what we would call an isotropic1 parabolic “band” (or region of the band structure). Such a model is a
good approximation for many bands near minima or maxima2. Taking the energy at the minimum to be
zero for simplicity, such a relation corresponds to
2 k 2
E= (C.2)
2meff
where the effective mass meff is parametrizing the curvature of the parabola. So, for some specific energy
E, we could draw a corresponding sphere in our diagram Fig. C.1, of radius
2meff E
k= (C.3)
2
For some slightly larger energy E + ∆E we could draw a sphere of a slightly larger radius k + ∆k .
From our relation Eq. (C.2), we know how to relate ∆k and ∆E .
where we used the expansion for small x of 1 + x 1 + x / 2 . So, subtracting Eq. (C.3) from Eq. (C.4)
gives
2meff E ∆E
∆k (C.5)
2 2E
Now, the number of allowed k states that lie between energy E and energy E + ∆E is just the number of
“dots” in Fig. C.1, extended into three dimensions, that lie between the sphere for E and the sphere for
E + ∆E . On this figure, the “volume” of that space is just the “volume” of a spherical shell of (inner)
radius k and thickness ∆k . The surface “area” of a sphere of radius k is 4π k 2 . So the “volume” of a
thin spherical shell with thickness ∆k is (for small ∆k ) 4π k 2 ∆k . We can think of each “dot” in Fig.
C.1 as being surrounded by a cubic “volume” of magnitude (2π / L)3 , so the number of such “dots” that
lie in this spherical shell is, using Eqs. (C.2), (C.3) and (C.5),
1
“isotropic” means the same in all directions (from the Greek “iso” meaning “the same” and “tropos” meaning a
turning, way or manner).
2
This isotropic parabolic band approach is a good first approximation near zone center for the valence bands in
many semiconductors, including Si, Ge, and the III-V materials that crystallize in zinc blende form, and for the
lowest conduction band in zinc blende direct gap III-V materials like GaAs or InP. For the lowest conduction
bands in Si and Ge, in part because their lowest points do not lie at zone center, the band minima, though
approximately parabolic in each direction, have one effective mass in one direction and a different one in the other
two. The approach we are taking here can be extended to such cases, though we will not do that here.
222 Appendix C Thermal carrier distributions
3/ 2
4π k 2 ∆k L3 2 L3 2meff E 2meff E ∆E L3 2meff
= k =
∆k = E1/ 2 ∆E (C.6)
( 2π / L )3 2π 2 2π 2 2 2 2 E 4π 2 2
This is the number of k states lying between energy E and energy E + ∆E . To get the total number of
electron states in this energy range (presuming we have no magnetic field to give different energies to
different spins), we multiply by 2 for the 2 different spins. We note that L3 is just the volume of the
crystal. So we can also divide by L3 and write that the total number of electron states between E and
E + ∆E , per unit volume of crystal, is
3/ 2
1 2meff
g ( E ) ∆E
= E1/ 2 ∆E (C.7)
2π 2 2
This quantity
3/ 2
1 2meff
g (E) = E1/ 2 (C.8)
2π 2 2
is known as the density of states in energy (per unit volume of crystal) for an isotropic parabolic band.
With this quantity and the Fermi-Dirac distribution, we can start to calculate useful distributions of
electrons in such materials.
There are, however, at least two useful limiting cases, known as “degenerate”3 and “non-degenerate”,
in which we can establish simple relations, and these also correspond to common physical situations, so
we will examine them below.
We can easily invert the expression Eq. (C.11) to obtain the Fermi energy as a function of the total
number (density) Ne of electrons in the band, giving
2
EF − EC = ( 3π 2 N e )
2/3
(C.12)
2me
Incidentally, a useful “order-of-magnitude” number to have in mind when considering the statistics of
electrons in materials (and for other purposes as well) is the size of kBT at room temperature, in electron-
volts. For a typical room temperature of, say, 20° C (so 293.15 K), using k BT in electron-volts as k BT
in joules divided by the magnitude e of the electronic charge, we have4
k BT at 20° C ≅ 25 meV (C.13)
Fig. C.2. Plot of g ( E ) , the density of states in energy for a parabolic band, and g ( E ) f e ( E ) , the
product of this density of states and the Fermi-Dirac function fe, for the case where the Fermi
energy EF = 10k BT . The electron energy E is plotted in units of k BT from the bottom of the band.
3
This use of “degenerate” is, unfortunately, quite different from the use of the term “degeneracy” to refer to the
number of quantum mechanical states that have the same energy in some system (such as the 2s and 2p states in
hydrogen, which all have the same energy in the simplest model of hydrogen), or more generally to describe the
situation of multiple different eigenfunctions for the same eigenvalue. There is no specific relation between these
two different uses of the same word.
4
A more exact value for 20° C would be 25.2671 meV.
224 Appendix C Thermal carrier distributions
The shaded area represents the total number of electrons, Ne, which is the integral of g ( E ) f e ( E )
over all energies.
Metals typically have Fermi levels that are of the order of electron-volts relative to the bottom of their
conduction band, so the “zero-temperature” approximation is very good since k BT is a very small
fraction of the Fermi level energy relative to the bottom of the band.
A more intermediate situation is sketched in Fig. C.2, in which the Fermi level is 10k BT above the
bottom of the band. This figure also illustrates the form of the density of states for a parabolic band, and
the product of the density of states and the Fermi-Dirac distribution – that is, the function g ( E ) f e ( E )
– which is the form of the integrand in Eq. (C.9). The shaded area in Fig. C.2 represents the result of the
integral – that is, the total number (density) Ne of electrons in the band – in Eq. (C.9).
A situation where the Fermi level is many kBT above the bottom of that band – as it is in a metal or in
the intermediate situation of Fig. C.2 – is referred to as “degenerate” or corresponding to “degenerate”
statistics, and the formula Eq. (C.12) can often be used to deduce the Fermi level.
Fig. C.3. Conduction and valence bands in a semiconductor or insulator showing finite thermal
excitation of electrons from the valence band to the conduction band, leaving “holes” in the
valence band.
Now, we know that quite generally for the occupation of the electron states that, in equilibrium, they
obey the Fermi-Dirac distribution fe as given by Eq. (C.10). This distribution applies to all the quantum
states, both in the valence band and the conduction band. Since fe is the probability that a given quantum
state of a given energy is occupied by an electron, then
C.6 Law of mass action and intrinsic carrier concentration 225
fh ( E )= 1 − fe ( E ) (C.14)
is the probability that quantum state is not occupied or is “empty”. We could also say that this is the
probability that it is “occupied” or “full” with a “hole” if we regard a hole as an absence of an electron.
We can if we want therefore write down a thermal distribution for such “holes”. From Eq. (C.14) we
have
E − EF E − EF
exp +1−1 exp
1 B k T k BT
fh ( E ) =
1− = = (C.15)
E − EF E − EF E − EF
exp +1 exp +1 exp +1
k BT k BT k BT
Multiplying the top and the bottom of Eq. (C.15) by exp − ( E − EF ) / k BT gives
1
fh ( E ) = (C.16)
( E − EF )
exp − +1
k BT
This is the same form as the Fermi-Dirac distribution, but with energy E and the Fermi energy EF
regarded as minus their values in the electron distribution. What this means is that, if we “stand on our
heads” and look at a figure like Fig. C.3 upside-down, we will see Fermi-Dirac distributions for holes;
we simply regard all the energies for holes as being minus the corresponding energies for electrons, and
we get the same form of distribution5.
As discussed in Chapter 4, we can view holes as being like bubbles. A liquid, which is like some “sea”
of drops of liquid that are like electrons in our analogy, settles down towards the lowest energy by filling
a bucket from the bottom up. Bubbles in the liquid, which are absences of drops of liquid, rise to the top
of the liquid; their energies are upside down compared to those of drops of liquid – they want to fall
“up”.
Incidentally, since electrons and holes can both be viewed as being able to carry electrical current, as
we will discuss below, the term “carriers” is often used to cover both types when we need a generic term
for that idea.
5
We can also view the conduction band as being mostly “filled” with holes.
226 Appendix C Thermal carrier distributions
E − EC ( E − EC )
=f e ( E ) exp F exp − k T (C.17)
k BT B
So, using Eq. (C.9), with this approximation for fe, and using the form of the electron density of states
from Eq. (C.8) with the conduction band edge at EC, and with electron effective mass meffe, we have
3/ 2 ∞
1 2meffe ( E − EC )
∫ (E − E )
1/ 2
=Ne C exp − dE
2π 2 2 EC k BT
(C.18)
3/ 2
1 2meffe k BT
∞
E − EC 1/ 2
=
2π 2 2
exp F ∫ x exp ( − x ) dx
k BT 0
Since
∞
π
∫x
1/ 2
exp ( − x ) dx = (C.19)
0
2
then Eq. (C.18) becomes, for the total number of electrons per unit volume in the conduction band,
E − EC
N e = N C exp F (C.20)
k BT
where the quantity NC, which is sometimes called the “effective density of states at the conduction
band edge” is given by6
3/ 2 3/ 2
meffe k BT meffe T
19
=NC 2 2.510 × 10 cm −3 (C.21)
2π 2 me 300
where me is the mass of a free electron.
We can then repeat the same derivation, but now for the valence band. Now we choose the energy at the
“top” of the valence band to be EV. Hence we can write
E − EV ( E − EV )
fh ( E ) =
exp − F exp + k T
k BT B
as the Maxwell-Boltzmann approximation to the “hole” Fermi-Dirac function as in Eq. (C.16), and
3/ 2
1 2meffh
gV ( EV − E ) ( EV − E)
1/ 2
= (C.22)
2π 2 2
where the effective mass for the holes is meffh. This hole effective mass is treated as a positive number
here for a maximum at the top of the valence band. The energy E here is formally the electron energy.
In performing the integral here, formally we would be integrating from -∞ to EV, though mathematically
we can change that to an integral from EV to ∞ to obtain a similar integral to that of Eq. (C.18), with the
result for the total number of holes (per unit volume) in the valence band being
E − EF
N h = NV exp V (C.23)
k BT
6
Note that we are quoting this number in cm-3, not in m-3. In practice, the use of “per cubic centimeter” for
specifying electron and hole densities is almost universal in semiconductor physics and devices. In m-3, the
numbers for densities are 106 larger.
C.6 Law of mass action and intrinsic carrier concentration 227
where
3/2 3/2
meffh k BT 19 meffh T
=NV 2 2.510 × 10 cm −3 (C.24)
2π 2 me 300
is called the “effective density of states at the valence band edge”.
Now we can multiply these two results, Eqs. (C.20) and (C.23), to obtain
E
N e N h N C NV exp − G
= (C.25)
k BT
where EG is that band gap energy, which by definition is
E=
G EC − EV (C.26)
Eq. (C.25) is known as the “Law of Mass Action”7. It tells us that, in equilibrium, at least if the Fermi
energy is far from both the conduction or valence band edges, there is a reciprocal relationship between
the numbers of electrons (in the conduction band) and the number of holes (in the valence band) – if we
somehow increase the number of electrons by some factor, we will corresponding decrease the number
of holes by the same factor. In practice in the use of semiconductors, we make such changes by doping
the semiconductor, and we will discuss doping and Fermi levels below.
In a pure semiconductor or insulator material, we know that, however many electrons are in the
conduction band, that number must be balanced by the number of holes in the valence band; otherwise
the material would not be electrically neutral. Hence, in the pure material, we must have
N=
e N=
h Ni (C.27)
where Ni is called the intrinsic8 carrier concentration. From the Law of Mass Action as in Eq. (C.25),
we can therefore write an expression for Ni, giving
E
=Ni N C NV exp − G (C.28)
2k BT
This is the electron and hole concentration we expect in a pure semiconductor or insulator. The factors
NC and NV do increase with temperature, and through the different effective masses in different materials,
they do vary somewhat from material to material because of different effective masses. However, the
dominant factor in determining whether a pure material is a good insulator at a given temperature or
whether it has some significant conductivity from the “intrinsic” presence of electrons and holes (as in
a “semiconductor” material) is the band gap energy EG in relation to kBT.
For example, at about room temperature, with k BT 25 meV , we could consider two materials, the
second with a band gap energy 1 eV larger than the first. This 1 eV difference will lead to a decrease of
the exponential factor in Eq. (C.28) by exp ( −20 ) 2 × 10−9 in the second material compared to the
first. Though we should consider the factors NC and NV and also consider the mobilities of the electrons
and holes to get a precise answer, overall, we therefore expect the electrical conductivity in the materials
to differ by a factor of roughly this order of magnitude. So, if the first material was a moderate conductor,
we would expect the second material with the 1 eV larger band gap to be a reasonably good insulator.
7
This name comes originally from chemistry, though the statement of this law in chemistry looks somewhat
different from this form even though the underlying principles are the same.
8
The word “intrinsic” is use to refer to the properties of materials without added impurities or “dopants”. Such a
pure material could also be described as an “intrinsic” material.
228 Appendix C Thermal carrier distributions
This approach also enables us to calculate where the Fermi level is in such pure semiconductors and
insulators. Since the numbers of electrons and holes must be equal in such pure materials (Eq. (C.27)),
we must have, from Eqs. (C.20) and (C.23)
E − EC EV − EF
N C exp F = NV exp k T (C.29)
k BT B
or equivalently, using Eqs. (C.21) and (C.24)
3/ 2
N C meffe E − EF − EF + EC
= = exp V (C.30)
NV meffh k BT
Taking the logarithm of both sides and multiplying both sides by kBT and using Eq. (C.26) gives
3k BT meffe
log = EV + EC − 2 EF = 2 EV + EG − 2 EF (C.31)
2 meffh
So
EG 3k BT meffe
EF = EV + − log (C.32)
2 4 meffh
Effective masses can differ substantially between the valence and conduction bands – for example for
GaAs, meffe 0.07 me and meffh 0.4me . Since only the logarithm of the ratio enters in to Eq. (C.32),
however, we can still say that the Fermi level in such pure semiconductors or insulators lies within a
few kBT of the middle of the band gap. Since the band gap energy is typically some large number of kBT
even at room temperature in semiconductors and insulators, this result also justifies the use of the
Maxwell-Boltzmann approximation for calculating the behavior of carriers in such pure or “intrinsic”
materials.
9
If we pretend we can treat the added electron “orbiting” the donor nucleus with a net single positive charge in
the same way we modeled the hydrogen atom, but with a dielectric constant of ε r ε o and with an effective mass
2
meff e 2
meff, then our Rydberg formula would be Ryeff ≈ . With an effective mass meff of, say, 1/10 of
2 4πε r ε o
the free electron mass, and a relative dielectric constant of, say ε r 10 , then the effective Rydberg would be ~
1000 times smaller than the hydrogen atom Rydberg, at more like 13.6 meV than 13.6 eV. This kind of hydrogenic
approach to donors (and acceptors) is actually quite a good first model for such systems and their energy levels.
C.7 Doping and Fermi levels 229
So, adding this donor atom adds an energy level that is just below the conduction band, by a few to a
few 10’s of meV, and an electron that can either occupy this level or can be thermally excited into the
conduction band. Whether or not it is thermally excited is determined by the Fermi level and Fermi-
Dirac statistics.
Adding in such donors moves the Fermi level towards or even into the conduction band, which has to
happen if there are electrons occupying either the donor level itself or the conduction band states10. So,
a better picture of n-type doping is that we are adding in electrons and also energy levels, for electrons,
just below the conduction band. If we analyze such a picture with the Fermi-Dirac distribution, we will
explain the basic phenomena of doping, including being able to calculate just how many electrons are
added to the conduction band (or “ionized” into the conduction band) at given doping densities and
temperatures. We will not do that analysis here, though it is a straightforward extension of the models
we have constructed.
For p-type dopants we end up with a similar model, but viewed “upside-down”. Such acceptor dopants
add energy levels for holes just above the valence band, those holes may or may not be thermally excited
into the valence band, and the addition of these p dopants moves the Fermi level nearer to or even into
the valence band.
Incidentally, such an approach can also explain why adding impurities to wide band gap materials like
insulators or wide-band-gap semiconductors does not always lead to conductivity. The key parameter is
just where the dopant energy levels lie relative to the band edges. In such materials, those added energy
levels are often quite far from the band edges, so the electrons or holes they add do not always get
thermally excited into the corresponding band. Many such impurities only add what are called “deep
levels” that are far from either band edge.
At low doping levels in semiconductors, often we can continue to use the “non-degenerate” Maxwell-
Boltzmann approximation because the Fermi level can be quite far from the band edges11. With high
doping, we can however also get into the “degenerate” regime, as illustrated in Fig. C.2.
10
For example, at near zero temperature, the donor level would have to be “full” because all electron states below
that are already filled, so at low temperatures the Fermi level would have to be essentially at the energy of the
donor level or just above it.
11
Even though the dopant energy levels are close to the band, and even though at low temperature the Fermi level
would lie very close to the dopant energy level, at higher temperatures and low to moderate amounts of doping,
the Fermi energy can lie quite far from the corresponding band edge, significantly “below” the doping energy
levels. Incidentally, the term “dopant level” could refer to the energy level of the dopant, or it could refer to the
density of dopant atoms added to the material, though “doping level” would be a better term for this latter idea.
Index
231
A band structure, 126, 127, 131, 132, 175, 177, 178, 183, 221,
222
absolute zero, 10, 150 basis
absorptivity, 197 function, 18, 41–43
acceptor atom. See atom, acceptor set, 38, 42–43
adiabatic, 157 state, 18
aeolipile, 9 binary compound, 114
age of the universe, 154 binomial
alchemy, 5 coefficient, 140
al-Din, Taqi, 9 distribution, 139, 140
Alhazen, 5, 7 black body, 48–50, 195–97
alloy, 114 radiation, 195–97
quaternary, 114 spectrum, 48–51, 51, 189, 193–95, 201
ternary, 114 Black, Joseph, 9
amorphous material, 115 Bloch theorem, 119, 120–23, 127, 130, 136, 220
Ampère, André-Marie, 8 Bohr magneton, 217
Ångström, Anders, 48 Bohr model. See atom, Bohr model
angular frequency, 21 Bohr radius, 52, 53, 96, 97
angular momentum, 51, 53, 54, 74, 92, 95–96, 101, 102 Bohr, Neils, 51, 53, 95, 97
angular momentum quantum number. See orbital quantum Boltzmann factor, 159–61, 163, 164, 169, 171, 173, 194,
number 198, 224
antibonding state, 111, 130, 136 Boltzmann, Ludwig, 10, 47, 137, 150
anti-particle, 102 Boltzmann’s constant, 149, 152, 153, 217
Arago, François, 8 bonding state, 110, 130, 136
Aristotle, 5, 6 Born postulate, 58, 75, 76
associated Laguerre polynomials, 97 Born rule. See Born postulate
associated Legendre functions, 91, 92 Born, Max, 54, 55, 58
atom, 5, 18, 47, 48, 50, 64, 67, 87, 105, 110, 129, 136, 168 Bose, Satyendra Nath, 105
absorption and emission of photons, 198–200, 207, 212 Bose-Einstein distribution, 107, 166, 168–70, 170, 171
acceptor, 135 boson, 105–6, 107, 171
Bohr model, 51–54, 72, 92 boundary conditions
donor, 135, 228, 229 barrier of finite height, 79–80, 83
helium. See helium hard wall, 30, 60, 121
hydrogen. See hydrogen infinitely high walls, 60
many-electron, 55, 105, 102–5, 119 periodic, 120–22
atomic configuration, 105 Boyle, Robert, 5
atomic number, 102, 103, 105 Brahe, Tycho, 6
Aufbau principle, 105 bra-ket notation. See Dirac notation
Avicenna, 5 Branca, Giovanni, 9
azimuthal angle, 91 breathing mode, 94
azimuthal plane, 92 Brillouin zone, 126, 128–29, 130, 176, 208
bucky ball, 113
B
Bacon, Francis, 5
C
ballistic transport, 176, 177 caloric, 9
Balmer series, 47, 48, 52 carbon nanotube, 113
Balmer, Johann, 48 Carnot cycle, 9, 157
band Carnot limit, 157, 158
emergence, 115–18, 136 Carnot, Nicolas Léonard Sadi, 9, 157
isotropic parabolic, 174, 175 carrier, 225, 226
parabolic, 127 carrier concentration, 222, 226
band gap, 125, 130, 132, 133, 135, 202, 203, 224, 225, 227, intrinsic, 227
228, 229 Celsius, Anders, 150
direct, 127, 130, 207–9 center-of-mass coordinates, 88–89
indirect, 127, 130, 208 central potential, 91, 103
232
chemical potential, 161–65, 162, 163, 168, 170, 171, 182, drift transport, 176–77, 177, 203
183, 184, 209 drift velocity, 177
Fermi level. See Fermi level
Gibb's definition, 162 E
chemical symbol, 105
classical wave equation, 23–27, 31 Edison, Thomas, 48
Clausius, Rudolf, 10 effective density of states
completeness, 38, 41, 42, 43, 46 at the conduction band edge, 226
conduction band, 127, 129, 130, 174–78, 175, 181, 182, 184, at the valence band edge, 227
185, 186, 202, 203, 207–9, 209, 213, 212–13, 213, 221, effective mass, 119, 126–27, 175, 177, 213, 221, 222, 226,
224, 226, 225 227, 228
Copernicus, Nicolaus, 6 eigen equation, 21–22
corpuscular theory of light, 7 eigenfunction, 13, 18, 21–23, 31, 37, 38, 41–46, 59–64, 75,
Coulomb attraction, 51, 89, 228 120, 122, 223
Coulomb potential, 90 eigenstate, 18, 62–63, 64, 74, 75, 106, 125, 215
Coulomb repulsion, 103 eigenvalue, 22, 21–22, 29, 31, 35, 41, 42, 43, 46
Coulomb, Charles, 8 degenerate, 22, 98, 128, 223
coupled oscillator, 32–36 energy, 62–63, 108, 125
coupled well, 108–11 Einstein, Albert, 7, 50, 166, 169, 189, 190, 191, 197, 199,
coupled wells 201
multiple, 115–18 Einstein's A and B coefficient argument, 50, 197–201, 207,
covalent bonding, 110 208, 211
crystal, 111–15 electric constant, 217
lattice, 112 electrical breakdown, 133
crystal momentum. See momentum, crystal electron, 51–54, 67, 72, 87, 88, 133
center of mass behavior, 88–89
D charge, 52, 89, 217
core, 104, 129
Dalton, John, 5 diffusion, 182, 186
de Broglie, Louis, 54 emission, 189–91
de Broglie’s hypothesis, 54–55, 56, 67, 69, 126 Fermi level, 209
degenerate doping, 180 Fermi-Dirac distribution, 165–67, 213, 222
degenerate statistics, 223, 229 g factor, 217
Democritus, 5 injection, 202, 207, 209
density of states, 178, 192, 221, 222, 226 mass, 64
effective. See effective density of states microscope, 67
depletion region, 203–7 orbital, 90
Descartes, René, 5 rest mass, 89, 217
diamond lattice, 113, 114, 128 screening, 103, 104
diesel engine, 156 shells, 103–4, 134
diffraction, 7, 8, 67–69 spin, 55, 102, 106
angle, 67 sweep-out, 204
two-slit. See two-slit experiment transport, 174–77
diffusion, 181–83, 203 valence, 132
diffusion current, 185–88, 205 electron-hole pair, 202, 203
diffusive equilibrium. See equilibrium, diffusive electron-positron pair, 106
diode, 112, 178–88, 202, 214 electron-volt, 52
current-voltage characteristic, 186, 187 elementary charge. See electron:charge
laser. See laser diode emissivity, 197
light-emitting. See light-emitting diode Empedocles, 5, 7
photo. See photodiode enthalpy, 163
photovoltaic, 204 entropy, 10, 47, 137, 148–58, 159, 160, 164, 173, 183, 197,
Dirac notation, 39 215, 224
Dirac, Paul, 39, 54, 55, 102 fundamental, 148, 150, 153
Dirac’s equation, 102 thermodynamic, 10, 150
direct transition. See vertical transition epitaxial growth, 115
donor atom. See atom, donor equilibrium, 143, 155, 168, 169, 182, 224, 227
donor binding energy, 228 diffusive, 161, 162, 182, 184, 186, 209
doping, 134, 178, 227, 228, 174–78, 229 thermal, 145, 146, 148, 149, 160, 184, 193–99, 201, 207,
n-type, 134–35 209, 211, 212, 222
p-type, 134–35 Euclid, 7
233
tension, 25, 29 W
Thales of Miletus, 5, 8
thermodynamics, 4, 9, 10, 11, 137, 149, 150, 162, 163, 189, Watt, James, 9
215 wave
first law of, 10, 157 monochromatic, 28–30, 55, 68
second law of, 10, 11, 137, 154–58, 157, 173, 197 wave equation
Thompson, Benjamin, 9 classical. See classical wave equation
Thomson, William, 10 Helmholtz. See Helmholtz wave equation
transistor, 4, 65, 77, 112, 133 Schrödinger. See Schrödinger’s wave equation
tunneling, 77–87, 87, 108, 110 wave packet, 176
two-slit experiment, 7, 70–73 wave-particle duality, 50, 70, 72, 87
wavevector, 28–29
U Wilkins, John, 9
wurtzite lattice, 113
ultraviolet catastrophe, 49, 194
unit cell, 112, 113, 116, 117, 120, 121, 122, 123, 124, 127, X
128, 220
primitive, 114 Xu, Wang, 8
Wigner-Seitz, 114
Y
V Young, Thomas, 7
valence band, 127, 130, 174, 175, 177, 182, 185, 202, 203, Young's slits. See two-slit experiment
207–9, 209, 213, 221, 225, 226, 174–78 Yu, Zhu, 8
valence electron, 105
van Musschenbroek, Pieter, 8 Z
velocity of light, 8, 52, 193, 217
vertical transition, 208 Zeeman effect, 96
Volta, Alessandro, 8 zero-point energy, 64
voltaic cell, 8 zinc blende lattice, 113, 114, 128, 175, 221
von Kleist, Ewald, 8 zinc sulfide. See zinc blende lattice
von Neumann, John, 54, 55