BScthesis On MD
BScthesis On MD
Bachelor Thesis:
Basic Principles
in
Molecular Modeling
1 Introduction 1
2 Classical Mechanics 2
2.1 Newton’s Second Law . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Integration Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Verlet Algorithm . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Leap-Frog Algorithm . . . . . . . . . . . . . . . . . . . . . 4
2.2.3 Velocity Verlet Algorithm . . . . . . . . . . . . . . . . . . 5
3 Statistical Mechanics 6
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Ensemble Averages and Time Averages . . . . . . . . . . . . . . 6
4 CHARMM 8
4.1 Force Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 General Features of
Molecular Mechanics Force Fields . . . . . . . . . . . . . . 8
4.1.2 Bond Streching . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.3 Angle Bending . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.4 Torsional Terms . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.5 Improper Torsions / Out-of-Plane Bending . . . . . . . . 13
4.1.6 Electrostatic Interactions . . . . . . . . . . . . . . . . . . 14
4.1.7 Van der Waals Interactions . . . . . . . . . . . . . . . . . 15
4.1.8 The CHARMM Force Field – A Simple Molecular Me-
chanics Force Field . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Residue Topology File (RTF) . . . . . . . . . . . . . . . . 18
4.2.2 Parameter File (PARAM) . . . . . . . . . . . . . . . . . . 18
4.2.3 Protein Structure File (PSF) . . . . . . . . . . . . . . . . 19
4.2.4 Coordinate File (CRD) . . . . . . . . . . . . . . . . . . . 19
5 Energy Minimization 20
5.1 Energy Minimization:
Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Derivative Minimization Methods . . . . . . . . . . . . . . . . . . 20
5.2.1 First-Order Minimization Methods . . . . . . . . . . . . . 21
5.2.2 A Second-Order Minimization Method - The Newton-Raphson
Method (NR) . . . . . . . . . . . . . . . . . . . . . . . . . 23
i
6 Molecular Dynamics (MD) and Normal Mode Analysis (NMA) 25
6.1 Molecular Dynamics - Running A Molecular Dynamics Simulation 25
6.1.1 Starting Structure . . . . . . . . . . . . . . . . . . . . . . 25
6.1.2 Modification of the Starting Structure . . . . . . . . . . . 25
6.1.3 Energy Minimization . . . . . . . . . . . . . . . . . . . . . 26
6.1.4 Heating Dynamics . . . . . . . . . . . . . . . . . . . . . . 26
6.1.5 Equilibration and Rescaling Velocities . . . . . . . . . . . 28
6.1.6 Production Dynamics . . . . . . . . . . . . . . . . . . . . 29
6.2 Normal Mode Analysis . . . . . . . . . . . . . . . . . . . . . . . . 31
A Acknowledgements 39
ii
List of Figures
iii
6.11 As a molecule consisting of three atoms, water has three normal
modes which are presented here. Experimental (and calculated)
frequencies are shown. The first one has a significantly lower
q
frequency ν than the others. Quantity ν is proportional to νk
with force constant k and reduced mass µ . . . . . . . . . . . . . 32
6.12 The motion of a molecule around an energy minimum can be
approximately described by a parabolic energy profile. This is the
reason why one has to generate the energy-minimized structure
(green ball), which is located around a minimum of the energy
surface, before starting a normal mode calculation . . . . . . . . 33
6.13 Vibrations in 2-dimensional space. In reality one more dimension
comes into play . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.14 A linear triatomic molecule like CO2 . The vectors (here: scalars)
x1 , x2 , x3 define displacements of the corresponding atoms. . . . 35
6.15 Results of normal mode calculation for a linear triatomic molecule.
λi , ηi and xi describe eigenvalues, eigenvectors and amplitudes. . 36
6.16 Protein molecules are the most examined type of molecule with
respect to vibrational motion. Obtaining the normal modes of
motion, one can notice a difference between the motional fre-
quency of bigger (global) parts and smaller (local) parts of the
molecule, e.g. a whole domain or even just an atomic link be-
tween two distinct atoms. Global motions of a protein are often
specific to it and can be related to its function. . . . . . . . . . . 36
iv
Chapter 1
Introduction
1
Chapter 2
Classical Mechanics
dvi d2 vi
Fi = mi · a = mi · = mi · 2 . (2.1)
dt dt
It describes the motion of a particle of mass mi along the coordinate xi with
Fi being the force on mi in that direction. This is used to calculate the motion
of a finite number of atoms or molecules, respectively, under the influence of
a force field that describes the interactions inside the system with a potential
energy function, V (~x), where ~x corresponds to the coordinates of all atoms in
the system. The relationship of the potential energy function and Newton’s
second law is given by
dV (~x) d2 xi
= −mi · , (2.3)
dxi dt2
which relates the derivative of the potential energy to the changes of the
atomic coordinates in time. As the potential energy is a complex multidimen-
sional function this equation can only be solved numerically with some approx-
imations.
1
With the acceleration being a = − m · dV
dx we can then calculate the changes
of the system in time by just knowing (i) the potential energy V (~x), (ii) initial
coordinates xi,0 and (iii) an initial distribution of velocities, vi,0 . Thus, this
method is deterministic, meaning we can predict the state of the system at any
point of time in the future or the past.
The initial distribution of velocities is usually randomly chosen from a Gaussian
or Maxwell-Boltzmann distribution [3], which gives the probability of atom i
having the velocity in the direction of x at the temperature T by:
2
21 2
!
1 mi vi,x
mi
p(vi,x ) = · exp − . (2.4)
2πkb T 2 kb T
Velocities are then corrected so that the overall momentum of the system
equals a zero vector:
N
X
P = mi~vi = ~0. (2.5)
n=1
3
Thus, it uses the position ~x(t) and acceleration ~a(t) at time t and the posi-
tions from the previous step ~a(t − δt) to calculate new positions ~x(t + δt).
In this algorithm velocities are not explicitly calculated but can be obtained in
several ways. One is to calculate mean velocities between the positions ~x(t + δt)
and ~x(t) − δt].
1
~v (t) = · [~x(t + δt) − ~x(t − δt)]. (2.12)
2δt
The advantages of this algorithm are that it is straightforward and has mod-
est storage requirements, comprising only two sets of positions [~x(t)and~a(t−δt)]
and the accelerations ~a(t). The disadvantage, however, is its moderate preci-
sion, because the positions are obtained by adding a small term [δt2 ·~a(t)] to the
difference of two much larger terms [2~a(t) − ~x(t − δt)].This results in rounding
errors due to numerical limitations of the computer.
Furthermore, this is obviously not a self-starting algorithm. New positions
~x(t + δt) are obtained from the current positions ~x(t) and the positions at
the previous step ~x(t − δt). So at t = 0 there are no positions for t − δt) and
therefore it is necessary to provide another way to calculate them. One way is
to use the Taylor expansion truncated after the first term:
1
~x(t + δt) = ~a(t) + δt · ~v (t) + δt2 · ~a(t) + ... (2.13)
2
4
2.2.3 Velocity Verlet Algorithm
The velocity Verlet algorithm (Swope et al. 1982)[2, 3] yields positions, velocities
and accelerations at time t and does not compromise precision:
1
~x(t + δt) = ~x(t) + δt · ~v (t) + δt2~a(t) (2.18)
2
1
~v (t + δt) = ~v (t) + δt[~a(t) + ~a(t + δt)]. (2.19)
2
For this algorithm more than two calculations have to be done for a single
time step. This is due to the fact that calculation of the velocities ~v (t + δt)
requiires acceleration values at (t) and (t + δt). So first the positions at (t + δt)
are calculated; then the velocities at time (t + δt) are computed using
1 1
~v (t + δt) = ~v (t δt) + δt · ~a(t + δt). (2.20)
2 2
Summary
We have now learned four examples of integration algorithms. But what could
make us prefer one over another? As for any other computer algorithm the ideal
method should be fast, which means computationally efficient, require as little
memory as possible and be easy to program. These however are not the main
features you should examine. They are of rather secondary interest for most
MD simulations because most algorithms do not demand significant storage
amount and calculations for the integration are rather fast in comparison to
other calculations in a simulation such as the calculation of the force acting on
every single atom in the system. Thus, other features are considered first: The
algorithm should conserve overall momentum and energy, be time-reversible and
permit a long time step without great loss of precision.
The choice of the integration step size, in fact, is very important. One must
weigh the increased accuracy of using a small step size against the longer real
time that can be simulated when a larger step size is used.
5
Chapter 3
Statistical Mechanics
3.1 Definitions
The mechanical or microscopic state of a system is defined by the atomic
positions xi and the momenta pi = mi · vi . They can be considered as a mul-
tidimensional space with 6N coordinates, for which they both contribute 3N
coordinates. This space is called phase space.
6
an ensemble is the complete collection of microscopic systems and a macroscopic
sample can only consist of a finite number of systems. A sufficiently big sample,
however, can be seen as good approximation to an ensemble. That is why
statistical mechanics defines averages corresponding to experimentally measured
thermodynamic properties as ensemble averages [3, 5].
The ensemble average is given by:
ZZ
hAiensemble = pN d~xN A(~
d~ pN , ~xN )ρ(~
pN , ~xN ), (3.1)
Zτ M
1 1 X
hAitime = lim pN , ~xN )dt =
A(~ pN
A(~ xN
M,~ M ), (3.3)
τ →∞ τ M i=1
t=0
7
Chapter 4
CHARMM
8
Several types of force fields exist. Two of those may use an identical functional
form yet have very different parameters and thus bring about different energies
for the same system. Moreover, force fields with the same functional form but
different parameters, and force fields with different functional forms, may give
close results. A force field should be considered as a single entity; it doesn’t need
to be correct to divide the energy into its individual components or even to take
some of the parameters from one force field and mix them with parameters from
another one.
An important point that one shouldn’t forget is that no ‘correct’ form for a force
field exists. If one functional form performs better than another, that form will
be favored. Most of the force fields commonly used do have a very similar form –
we will discuss this particular form in more detail later on – but it should always
be kept in mind that there may be better functional forms, particularly when
developing a force field for new classes of molecules. molecular mechanics force
fields are often a compromise between accuracy and computational efficiency;
the most accurate ones may often be unsatisfactory for efficient computation.
As the performance of computers increases, it becomes possible to incorporate
more sophisticated models.
A concept that is common to most force fields is that of an atom type. For a
quantum mechanics calculation it is usually necessary to specify the charge of
the nuclei, together with the geometry of the system and the overall charge and
spin multiplicity. For a force field calculation, however, the overall charge and
spin multiplicity are not explicitly required, but it is usually necessary to assign
an atom type to each atom in the system. This contains information about
its hybridization state and sometimes the local environment. For example, it is
necessary to distinguish between carbon atoms which adopt a tetrahedral geom-
etry (sp3 -hybridized), which are trigonal (sp2 - hybrdized) and carbons which are
linear (sp-hybridized). The corresponding parameters are expressed in terms of
these atom types, so that the reference angle Θ0 for a tetrahedral carbon atom
would be about 109.5 ◦ and that for a trigonal carbon near 120. For example,
the MM2 [7], MM3 [8, 9, 10] and MM4 [11, 12, 13, 14, 15] force fields of Allinger
and co-workers, that are widely used for calculations on ”small” molecules,
distinguish the following types of carbon atoms: sp3 , sp2 , sp, carbonyl, cyclo-
propane, radical, cyclopropene and carbonium ion. The value of the potential
energy V is calculated as a sum of internal or bonded terms, which describe
the bonds, angles and bond rotations in a molecule, and a sum of external or
non-bonded terms, which account for interactions between non-bonded atoms
or atoms separated by three or more covalent bonds. So it is:
9
Figure 4.1: Variation in bond energy with interatomic separation
10
Figure 4.2: Comparison of the simple harmonic potential (Hooke’s Law) with
the Morse curve.
11
Figure 4.3: Bond Angle θ
constants are smaller here. As with the bond-stretching terms, the accuracy of
the force field can be improved by the incorporation of higher-order terms.
These two terms, the bond-stretching and angle-bending, describe the deviation
from an ideal geometry; effectively, they are penalty functions and the sum of
them should be close to zero in a perfectly optimized structure. These two terms
are often regarded as hard degrees of freedom, in that quite substantial ener-
gies are required to cause significant deformations from their reference values.
Most of the variation in structure and relative energies is due to torsional and
non-bonded contributions.
12
Figure 4.4: Variation in energy with rotation of the carbon-carbon bond in
ethane.
Figure 4.5: A torsion angle (dihedral angle) A-B-C-D is defined as the angle
Phi between the planes (ABC) and (BCD). A torsion angle can vary through
360 degrees.
13
Figure 4.6: The improper dihedral term is designed to maintain planarity about
certain atoms. The potential is described by a harmonic function. α is the angle
between the plane formed by the central atom and two peripheral atoms and
the plane formed by the peripheral atoms only.)
dard terms we already know, the equilibrium structure would have the oxygen
atom located out of the plane formed by the adjoining carbon atom and the two
carbon atoms bonded to it. The simplest way to achieve the desired geometry
is to use an out-of-plane bending term. One approach is to treat the four atoms
as an improper torsion angle i.e., a torsion angle in which the four atoms are
not bonded in the sequence A-B-C-D. Another way involves a calculation of the
angle between a bond from the central atom and the plane defined by the central
atom and the other two atoms (Fig.4.6). A value of 0 ◦ corresponds to all four
atoms being planar. With these definitions the deviation of the out-of-plane
coordinate can be modeled using a harmonic potential of the form:
1
ν(α) = kα · α2 . (4.6)
2
The improper torsion or improper dihedral definition is more widely used as
it can then be easily included with the ‘proper’ torsional terms in the force field.
However, the other form may be better to implement out-of-plane bending in
the force field.
After learning the most important bonded terms of the energy function we are
now going to have a look at the non-bonded terms which consist at least of the
electrostatic and the van der Waals interactions in the system.
14
qi qj
VE (i, j) = . (4.7)
4π0 r0
rij is the distance between qi and qj , the electric charge in coulombs carried
by charge i and j respectively, and 0 is the electrical permittivity of space.
Alternative approaches to the calculation of electrostatic interactions, e.g. the
central multipole expansion which is based on the electric moments, may provide
more exact solutions to the electrostatic interactions [17, 18].
15
Figure 4.7: The Lennard-Jones potential. The collision parameter, σ, is shown
along with the well depth, epsilon. rn ull is the point of minimum energy. The
dashed curves represent Paulirepulsion and van der Waals attraction.
case, the number increases as the square of the number of atoms for a pair-wise
model. To speed up the computation two approaches are applied. In the first
approach the interactions between two atoms separated by a distance greater
than a pre-defined distance, the cutoff distance, are ignored. The interactions
are simply set to zero for interatomic distances greater than the cutoff distance
(Truncation-Method) or the entire potential energy surface is modified such
that at the cutoff distance the interaction potential is zero (Shift-Method). The
other way is to reduce the number of interaction sites. The simplest way to do
this is to subsume some or all of the atoms (usually just the hydrogen atoms)
into the atoms to which they are bonded (United-Atom-Method). Considerable
computational savings are possible; for example, if butane is modeled as a four-
site model rather than one with twelve atoms, the van der Waals interaction
between all the atoms involves the calculation of six terms rather than 78.
16
1 X 1 X
Vtot = kl · (l − l0 )2 + kΘ · (Θ − Θ0 )2 +
2 2
bonds angles
1 X 1 X
+ kΦ · [1 + cos(nΦ − δ)] + kα · α2 +
2 torsions 2 impropers (4.9)
" 12 6 #
X qi qj X σij σij
+ + 4i j − .
i,j
4π0 rij ij
rij rij
Let us consider how this simple force field would be used to calculate the
energy of a conformation of a simple molecule. Propane is one that is most
popular for this task [2] it has more terms than ethane for example, and it is
not as complicated as butane (butane has far more non-bonded and torsional
energy terms. As it is explained by Leach, propane has ten bonds: two C-C
and eight C-H bonds. The C-C bonds are symmetrically equivalent but the
C-H bonds fall into two classes, one group corresponding to the two hydrogens
bonded to the central methylene (CH2 ) carbon and one group corresponding to
the six hydrogens bonded to the methyl carbons. In some sophisticated force
fields different parameters would be used for these two different types of C-H
bond, but in the CHARMM force field the same bonding parameters (i.e. kl
and l0 ) would be used for each of the eight C-H bonds. This is a good example
for transferability since the same parameters can be used for a wide variety of
molecules. There are 18 different valence angles in propane, comprising one
C-C-C angle, ten C-C-H angles and seven H-C-H angles. Note that all angles
are included in the CHARMM force field even though some of them may not
be independent of the others. There are 18 torsional terms: twelve H-C-C-H
torsions and six H-C-C-C torsions. They are modeled with a cosine series ex-
pansion having minima at the trans and gauche conformations. The improper
dihedral term is dropped out for propane. Finally, there are 27 non-bonded
terms to calculate, comprising 21 H-H interactions and six H-C interactions. A
sizeable number of terms are thus included in the CHARMM model, even for
a molecule as simple as propane. Even so, the number of terms, namely 73 is
much less than the number of integrals that would be involved in an equivalent
quantum mechanical calculation.
The force field equation given above is only one variant of the many CHARMM
force fields. There are other potential energy functions which contain more
terms for a more precise calculation, implemented in the CHARMM program,
e.g. the extended electrostatics model [20] or the fast multipole method [21]
for treating long-range electrostatic interactions with a multipole approxima-
tion. Our calculations on BPTI, however, were performed with the force field
presented above.
17
Figure 4.8: Example of the RTF for the Alanine residue.
For a specific molecule, the necessary data is stored in the Protein Structure
File and the Coordinate File, respectively.
18
4.2.3 Protein Structure File (PSF)
The PSF is the most fundamental data structure used in CHARMM. It is gener-
ated for a specific molecule or molecules and contains the detailed composition
and connectivity of the molecule(s). It describes how molecules are divided
into residues and molecular entities (segments), which can range from a single
macromolecular chain to multiple chains solvated by explicit water molecules.
The PSF must be specified before any calculations can be performed on the
molecule. The PSF constitutes the molecular topology but does not contain
information regarding the bond lengths, angles, etc., so it is necessary to read
in the parameter file to add the missing information.
19
Chapter 5
Energy Minimization
20
Figure 5.1: Quadratic Approximation at the Minimum.
1
V (x) = V (xk ) + (x − xk ) · V 0 (xk ) + [(x − xk )2 · V 00 (xk )] + . . . , (5.2)
2
where V 0 is the first derivative and V 00 the second derivative of the energy
function V .
In the case of a multidimensional function the variable x corresponds to a vector
~x and the derivatives are replaced by matrices: For a system with N atoms V (~x)
is a function of 3N coordinates [2]. So ~x has 3N components and the gradient,
~g = V 0 (~x)T , accordingly is a vector with 3N dimensions as well, with each
element being the partial derivative of V with respect to a single coordinate,
∂V 00
∂xi . The second derivative V (~ x) is a (3N × 3N )-matrix. Every element (i, j)
corresponds to the partial second derivate of V with respect to the coordinates
2
xi and xj , ∂x∂i ∂x
V
j
. This is a symmetric matrix which is called Hessian Matrix.
Thus the multidimensional Taylor expansion is written as follows [22]:
1
V (~x) = V (~xk ) + V 0 (~xk ) · (~x − ~xk ) + [(~x − ~xk )T · V 00 (~xk ) · (~x − ~xk ) + . . . . (5.3)
2
This is a quadratic function and thus can only be seen as an approximation
for an energy function. However, the area close to a minimum is well approxi-
mated by this Taylor expansion as can be seen in Fig.5.1
In the following we will discuss the most important and most frequently used
methods for energy minimization in molecular modelling. They can be classified
according to the highest order derivative used.
21
Figure 5.2: Steepest Descent Figure 5.3: Line Search
22
Figure 5.4: SD on a narrow valley
~gk · ~gk
γk = . (5.8)
~gk−1 · ~gk−1
Thus, new directions are linear combinations of the current gradient ~gk and
the previous direction ~νk−1 . As there is no previous direction for the first step,
the conjugate gradient methods starting direction is the same as for SD.
1
V (x) = V (xk ) + (x − xk ) · V 0 (xk ) + [(x − xk )2 · V 00 (xk )] + . . . (5.9)
2
The first derivative of this function is
V 0 (xk )
xM in = xk − . (5.11)
V 00 (xk )
The expression for a multidimensional function is
23
Minimization 1. Method 2. Method Final Energy / kcal·mol−1
1. 600 Steps SD - -875
2. 600 Steps NR - -1142
3. 100 Steps NR 500 Steps SD -1069
4. 100 Steps SD 500 Steps NR -1189
24
Chapter 6
25
Figure 6.1: Aligned BPTI Structures shown in Cartoon Representation
(Blue: Before Minimization; Red: After Minimization)
26
Figure 6.2: Temperature vs. Time. Figure 6.3: Total Energy vs. Time.
of motion are integrated to propagate the system in time. This is done for a
certain period of time to let the system equilibrate in the new thermodynamic
state, giving the energy time to evenly distribute throughout the system. In the
next step the velocities are scaled to values corresponding to a slightly higher
temperature and another equilibration phase is carried out. You can reach the
desired temperature by simply repeating this process.
Typical steps for raising the temperature are about 5 K and the short equilibra-
tion period lasts about 0.3-1.0 s depending on the size of the simulated system.
So a heating process with 5 K every 0.3 ps, for example, would raise the temper-
ature from 0 K to 300 K in 20 ps (as shown for BPTI in Fig6.2), which is in fact
very quick. An even more rapid heating, though, would result in high-energy
motions and interactions that are physically not feasible. Energy density for
example would be increased locally to a level, when bonds break or, if quantum
mechanics were neglected, they would elongate far beyond natural dissociation
lengths. So every molecule with a complex three dimensional structure made up
by hydrogen bonds or other non-bonded interactions would denature and thus
be useless for any further simulation.
In Figures 6.2-6.5 it is shown how some properties of the system vary in time
during a heating process. As you can see the kinetic energy and the temperature
show exactly the same behavior. This is because they are related to each other
by following equation:
1 3
Ekin = mh~v 2 i = N kZ, (6.1)
2 2
where N is the number of atoms and k the Boltzmann constant.
Having raised the total energy of the system in a heating step, this amount of
energy is evenly distributed throughout the system as already stated before. On
the one hand this distribution occurs in space, meaning very high and very low
velocities of single atoms get closer to the mean value. On the other hand the
distribution occurs in terms of different energies: kinetic and potential energy.
As you can see the total energy is the only property that is constant during
equilibration. Kinetic and potential energy, however, behave irregularly, show-
ing that the total energy distributes between them in a vibrational kind of way.
27
Figure 6.4: Kinetic Energy vs. Time Figure 6.5: Potential Energy vs. Time
28
Figure 6.8: C-S-S-C Dihedral Angle of
Figure 6.7: C-S-S Angle of Disulfide
Disulfide Bridge between CYS 5 and
Bridge between CYS 5 and CYS 14
CYS 14
conservation may be violated due to several reasons: The force field may ne-
glect some critical effects (for example due to a badly selected cutoff) or the
numerical integration method may be not precise enough due to numerical lim-
itations of the computer resulting from the binary coding of numbers.
As disulfide bridges are tertiary structure elements and thus play a role in
keeping the three-dimensional structure alive, they should be conserved during
simulation. A flip between totally different levels of angles would indicate a
conformational change of the disulfide bridge, which was obviously not the case
during our simulation.
Another important time dependent property of dynamic systems is the root
mean square deviation (RMSD). RMSD indicates how much two structures vary
in terms of differences between the coordinates of the structures and is calculated
with
29
Figure 6.9: RMSD during Production Run for different Parts of the Protein
v
u
u1 X N
β 2 1
DRM S α
= h(~xi − ~xi ) i = t
2 (~xα − ~xβi )2 (6.2)
N i=1 i
where ~xi is the coordinate of atom i and α and β correspond to the different
structures. Calculation of RMSD for a time series of coordinate sets compared
to a reference structure yields graphs like shown in Fig.6.9, where RMSD values
of BPTI are displayed for (i) the whole protein, (ii) the side chains and (iii) the
backbone of the protein compared to BPTI crystal structure.
As you can see, the side chains RMSD is higher compared to the backbone,
which means that they are more flexible, whereas the backbone seems rather
rigid. The whole protein RMSD obviously has values between these two, because
for this calculation both atom selections (side chain and backbone) are taken
into account at once.
A special case of RMSD is the root mean square fluctuation (RMSF) where the
reference structure is an average structure over the whole trajectory:
v
u
u1 X N
α Average 2 1
FRM S = h(~xi − ~xi ) i =t
2 (~xα − ~xAverage )2 . (6.3)
N i=1 i i
30
Figure 6.10: Summary of the proceeding of a Molecular Dynamics simulation
31
Figure 6.11: As a molecule consisting of three atoms, water has three normal
modes which are presented here. Experimental (and calculated) frequencies are
shown. The first one has a significantly
q lower frequency ν than the others.
k
Quantity ν is proportional to ν with force constant k and reduced mass µ
d2 x
m = F = −kx. (6.6)
dt2
Since series expansions are useful for approximating functions (4) we can
rewrite the energy function in one dimension like this:
1
V (x) = V (0) + V 0 (0) · x + V 00 (0) · x2 + . . . . (6.7)
2
Making the function vary in a quadratic fashion the series is truncated after
the second derivative. If zero corresponds to a minimum of the energy landscape
it is
32
Figure 6.12: The motion of a molecule around an energy minimum can be ap-
proximately described by a parabolic energy profile. This is the reason why
one has to generate the energy-minimized structure (green ball), which is lo-
cated around a minimum of the energy surface, before starting a normal mode
calculation
1
V (x) = V (0) + V 00 (0) · x2 , (6.8)
2
and with
d2 x
m = F = −V 0 (x) (6.9)
dt2
we obtain
F = −V 00 (0) · x. (6.10)
The solution of the second-order differential equation
d2 x V 00 (0)
= x (6.11)
dt2 m
k
is x = A · exp(iωt)with the angular velocity ω = m and the amplitude A.
Fig.6.11 actually illustrates the vibrational motion of a particle in one dimension
(red path). Fig.6.13 shows this kind of motion in two dimensions. To picture it
in three dimensions is more difficult.
In three dimensions the frequencies of the normal modes together with the
displacements of the individual atoms may be calculated from a molecular me-
chanics force field using the Hessian matrix of second derivatives (V). The
Hessian must first be converted to the equivalent force-constant matrix in mass-
weighted coordinates (F), as follows:
1 1
F = M 2 V00 9M 2 . (6.12)
33
Figure 6.13: Vibrations in 2-dimensional space. In reality one more dimension
comes into play
34
Figure 6.14: A linear triatomic molecule like CO2 . The vectors (here: scalars)
x1 , x2 , x3 define displacements of the corresponding atoms.
35
Figure 6.15: Results of normal mode calculation for a linear triatomic molecule.
λi , ηi and xi describe eigenvalues, eigenvectors and amplitudes.
Figure 6.16: Protein molecules are the most examined type of molecule with
respect to vibrational motion. Obtaining the normal modes of motion, one can
notice a difference between the motional frequency of bigger (global) parts and
smaller (local) parts of the molecule, e.g. a whole domain or even just an atomic
link between two distinct atoms. Global motions of a protein are often specific
to it and can be related to its function.
k k
λ1 = 0, λ2 = andλ3 = 3 (6.23)
m m
Now each of the three frequencies can be obtained as shown above. They
correspond to modes of a translation, a symmetric stretch and an asymmetric
stretch respectively (Fig.6.2). In this triatomic example the three-dimensional
eigenvector of each mode determines the displacement of each atom.
The motions of larger segments of an analyzed protein, for example, are ade-
quate for studying the molecules function. These motions are represented by
lowfrequency modes. High frequency modes correspond to motions of smaller
molecule parts (Fig.6.16).
The harmonic approximation to the energy surface is found to be appropriate
for welldefined energy minima such as the intramolecular degrees of freedom of
small molecules. For larger systems the harmonic approximation breaks down.
Such systems also have an extraordinarily large number of minima on the energy
surface. In these cases it is not possible to calculate accurately thermodynamic
properties using normal mode analysis. Rather, molecular dynamics simula-
tions or other methods must be used to sample the energy surface from which
properties can be derived.
36
Chapter 7
Other analyses exploring the energy surface focus on determining reaction path-
ways or transition structures of molecules. Since the minimum points of the
energy landscape may correspond to the reactants or products of a chemical re-
action or two important conformations of a molecule (Fig.7.1), the path between
those two minima (the ‘reaction path’ or ‘pathway’) might be of interest. The
transition structure is the point of highest potential energy along the reaction
pathway.
As one can imagine many methods have been worked out for elucidating re-
37
action pathways and finding transition structures. These structures correspond
to saddle points with one negative eigenvalue of the Hessian matrix, where the
energy passes through a maximum for movement along the reaction path. Cal-
culational methods for locating transition structures do so by searching along
the reaction pathway, e.g. by using minimization algorithms when provided with
an initial structure close to the wanted one. Conversely, methods for finding
the reaction pathway start from the transition structure and move downwards
using minimization. Yet other methods determine both simultaneously from
the two minima bordering the reaction path. In particular, the conjugate peak
refinement [26] is a good method for locating transition structures for systems
with many atoms where a number of such states between two conformations
may be.
Besides the molecular dynamics simulation, there is another widely spread com-
puter simulation technique we would like to mention here: The Monte Carlo
(MC) method which differs in some ways from the MD method. The Monte
Carlo simulation method also provides a picture of the system in different con-
formations. However, it does not show how the system switches between these
since the behavior of atomic and molecular systems cannot be determined with
respect to processing time. A Monte Carlo simulation generates configurations
of a system by randomly changing the positions of the atoms present and so the
outcome of each calculation depends only on its preceding one. Furthermore, the
total energy is determined directly from the potential energy function. These
two points are in contrast to a molecular dynamic simulation where Newtons
equations of motion are the basis. Nonetheless, thermodynamic quantities can
be derived using appropriate statistical mechanics formulae.
38
Appendix A
Acknowledgements
39
Bibliography
40
[13] Nevins N., Chen K., Allinger N.L., Molecular Mechanics
(MM4) Calculations on Alkenes, Journal of Computational
Chemistry, 1996, 17: 669
[14] Nevins N., Chen K., Allinger N.L., Molecular Mechanics
(MM4) Calculations on Conjugated Hydrocarbons, Journal of
Computational Chemistry, 1996, 17: 695
[15] Nevins N., Chen K., Allinger N.L., Molecular Mechanics
(MM4) Vibrational Frequency Calculations for Alkenes and
Conjugated Hydrocarbons, Journal of Computational Chem-
istry, 1996, 17: 730
[16] Vollhardt K.P.C., Schore N.E., Organic Chemistry, Freeman,
1999, Chap: 2.5-2.7
[17] Cox S.R., Williams D.E., Representation of Molecular Electro-
static Potential by a New Atomic Charge Model, Journal of
Computational Chemistry, 1981, 2: 304
[18] Fowler P.W., Buckingham A.D., Central or Distributed Mul-
tipole Moments? Electrostatic Models of Aromatic Dimers,
Chemical Physics Letters, 1991, 176: 11
[19] Halgren T.A., Representation of van der Waals (vdW) Inter-
actions in Molecular Mechanics Force Fields: Potential Form,
Combination Rules, and vdW Parameters, Journal of the
American Chemical Society, 1992, 114: 7827
41