Molecular Dynamics simulations in Python

Graduate course lecture, University of Toronto Missisauga, Department of Chemical and Physical Sciences, 2019

The course on which the project focused is PHY426H5 Computational Modeling in Physics (SCI) in the Spring semester of 2019 with the instructor Dr. Sarah Rauscher.

This lecture is created for CPS Teaching Fellowship where we introduce a novel approach to study advanced scientific programming. The goal of today’s lecture is to present Molecular Dynamics (MD) simulations of macromolecules and how to run them using Python programmming language. We will use a lot of numpy functions and a few of new modules, such as openmm for MD simulations. Lecture on github.


Lecture 7: Molecular Dynamics Simulations in Python


1. Introduction

This lecture was created as part of a CPS Teaching Fellowship. We are introducing a novel approach to study advanced scientific programming. The goal of today’s lecture is to present Molecular Dynamics (MD) simulations of macromolecules. We will learn how to run these simulations using the Python programmming language. We will use many numpy functions and a few new modules, such as openmm for MD simulations. These are the important concepts that we will cover:

2. Newton’s Laws of Motion

Newton’s 2nd law connects the kinematics (movements) of a body with its mechanics (total force acting on it) and defines the dynamic evolution of its position:

\[m\frac{d^2r(t)}{dt^2} = F = - \nabla{U(r)},\]

where $m$ is the mass, $r$ is the position, $F$ is the force and $U(r)$ is the potential energy, which depends only on the position of the body. If one knows the forces acting upon the body, one can find the position of the body at any moment $r(t)$, i.e. predict its dynamics. This can be done by solving Newton’s equation of motion. It is a second order ODE that can be solved analytically for a few simple cases: constant force, harmonic oscillator, periodic force, drag force, etc. However, a more general approach is to use computers in order to solve the ODE numerically.

3. Simulation of Dynamics of Particles

There are many methods for solving ODEs. The second order ODE is transformed to the system of two first order ODEs as follows:

\[\frac{dr(t)}{dt} = v(t)\] \[m\frac{dv(t)}{dt} = F(t)\]

We use a finite difference approximation that comes to a simple forward Euler Algorithm:

\[v_{n+1} = v_n + \frac{F_n}{m} dt\] \[r_{n+1} = r_n + v_{n+1} dt\]

Here we discretize time t with time step $dt$, so $t_{n+1} = t_n + dt$, and $r_{n} = r(t_n)$, $v_{n} = v(t_n)$, where $n$ is the timestep number. Using this method, computing dynamics is straightforward.

Example 3.1. Simulation of a projectile on Earth.


We want to know the dynamics of a green apple ($m = 0.3$ kg) tossed horizontally at 10 cm/s speed from the top of the Toronto CN Tower (553 m) for the first 10 seconds.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation

# Setup the figure and axes...
fig, ax = plt.subplots(figsize=(8,8))

## Adjust axes limits according to your problem. Here we don't need more than a couple of meters left or right, and 600 meters up
ax.set(xlim=(-2, 2), ylim=(0, 600), xlabel='Position, meters', ylabel='Height, meters', title='Apple falling from CN tower')

# parameters of the problem
T = 10. #s
m = 0.3 #kg
g = 9.8 #m/s^2
v0x = -0.1 #m/s
H = 553. #m

# setting a timestep to be 50 ms
dt = 0.05 #s
N = int(T / dt)

# Allocating arrays for 2D problem
v = np.zeros((N+1, 2))
r = np.zeros((N+1, 2))
f = np.zeros((N+1, 2))

# initial conditions:
r[0] = np.array([0., H])
v[0] = np.array([-v0x, 0.])

# the only force is gravity
f[:] = np.array([0., -m * g])

## Run dynamics:
for n in range(N):
    v[n+1] = v[n] + f[n]/m * dt
    r[n+1] = r[n] + v[n+1] * dt

## drawing the first data point  
scat = ax.scatter(r[0,0], r[0,1], marker='o', c='g', s=200)

## animating 
def animate(i):
    scat.set_offsets(r[i])

ani = animation.FuncAnimation(fig, func=animate, frames=N)
## this function will create a lot of *.png files in a folder 'CNtower_frames'
## and create an HTML page with a simulation
ani.save('CNtower.html', writer=animation.HTMLWriter(fps= 1//dt))
plt.close()
#ani.save('CNtower.mp4', fps= 1//dt)

Let’s visualize the dynamics using embedded HTML. It’s interactive and you can play a movie step by step:

from IPython.display import HTML
HTML('CNtower.html')


</input>
Once </input> Loop </input> Reflect </input>

When a closed system of particles are interacting through pairwise potentials, the force on each particle $i$ depends on its position with respect to every other particle $j$:

\[m_i\frac{d^2r_i(t)}{dt^2} = \sum_jF_{ij}(t) = -\sum_j\nabla_i{U(|r_{ij}(t)|)}\]

where $r_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2}$ is the distance between particle $i$ and $j$, and $i,j \in (1,N)$.

Example 3.2. Simulation of 3-body problem with Hooke’s law:

We want to know the dynamics of 3 particles $m = 1$ kg connected to each other with invisible springs with $K_s = 5$ N/m, and $r_0 = 1$ m initially located at (0, 2), (2, 0) and (-1, 0) on the 2D plane for the first 10 seconds of their motion.

Hint: The pairwise potential is (Hooke’s Law): \(U(r_{ij}) = \frac{K_s}{2}(r_{ij} - r_0)^2\)

The negative gradient of the potential is a force from $j$-th upon $i$-th:

\[\mathbf{F_{ij}} = - \nabla_i{U(r_{ij})} = - K_s (r_{ij} - r_0) \nabla_i r_{ij} = - K_s (r_{ij} - r_0) \mathbf{r_{ij}} / r_{ij}\]
# Setup the figure and axes...
fig, ax = plt.subplots(figsize=(6,6))
ax.set(xlim=(-3.5, 3.5), ylim=(-3.5, 3.5), ylabel='meters', xlabel='meters', title='3-Body problem')

# parameters of the problem
T = 10. #s
m = 1.0 #kg
ks = 5 #N/m
r0 = 1. #m

# setting a timestep to be 50 ms
dt = 0.05 #s
N = int(T / dt)

# Allocating arrays for 2D problem: first axis - time. second axis - particle's number. third - coordinate
v = np.zeros((N+1, 3, 2))
r = np.zeros((N+1, 3, 2))
f = np.zeros((N+1, 3, 2))

# initial conditions for 3 particles:
r[0,0] = np.array([0., 2.])
r[0,1] = np.array([2., 0.])
r[0,2] = np.array([-1., 0.])

def compute_forces(n):
    '''The function computes forces on each pearticle at time step n'''
    for i in range(3):
        for j in range(3):
            if i != j:
                rij = r[n,i] - r[n,j]
                rij_abs = np.linalg.norm(rij)
                f[n, i] -= ks * (rij_abs - r0) * rij / rij_abs 
## Run dynamics:
for n in range(N):
    compute_forces(n)
    v[n+1] = v[n] + f[n]/m * dt
    r[n+1] = r[n] + v[n+1] * dt

## drawing and animating 
scat = ax.scatter(r[0,:,0], r[0,:,1], marker='o', c=['b', 'k', 'r'], s=1000)

def animate(i):
    scat.set_offsets(r[i])

ani = animation.FuncAnimation(fig, animate, frames=N)
plt.close()
## this function will create a lot of *.png files in a folder '3Body_frames'
ani.save('3body.html', writer=animation.HTMLWriter(fps= 1//dt))

Again, looking at the trajectory representation in real time:

HTML('3body.html')


</input>
Once </input> Loop </input> Reflect </input>

While we now have a basic knowledge of the purpose and methodology of simulations, we still need to understand what proteins are and why they are important.


4. Proteins, structure and functions


Protein structure is the three-dimensional arrangement of atoms in a protein, which is a chain of amino acids. Proteins are polymers – specifically polypeptides – formed from sequences of 20 types of amino acids, the monomers of the polymer. A single amino acid monomer may also be called a residue, indicating a repeating unit of a polymer. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions such as:

  • hydrogen bonding
  • ionic interactions
  • Van der Waals forces
  • hydrophobic packing

To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure using techniques such as X-ray crystallography, NMR spectroscopy, and others.

4.1 Levels of structure:

Primary structure of a protein refers to the sequence of amino acids in the polypeptide chain.

Secondary structure refers to highly regular local sub-structures of the actual polypeptide backbone chain. There are two main types of secondary structure: the α-helix and the β-strand or β-sheets.

Tertiary structure refers to the three-dimensional structure of monomeric and multimeric protein molecules. The α-helixes and β-sheets are folded into a compact globular structure.

Quaternary structure is the three-dimensional structure consisting of two or more individual polypeptide chains (subunits) that operate as a single functional unit (multimer).

4.2 Functions:

  • Antibodies - bind to specific foreign particles, ex: IgG
  • Enzymes - speed up chemical reactions, ex: Lysozyme
  • Messengers - transmit signals, ex: Growth hormone
  • Structural components - support for cells, ex: Tubulin
  • Transport/storage - bind and carry small molecules, ex: Hemoglobin

Lysozyme is a protein-enzyme (found in tears, saliva, mucus and egg white) that is a part of the innate immune system with antimicrobial activity characterized by the ability to damage the cell wall of bacteria. Bacteria have polysaccharides (sugars) in their cell wall, that bind to the groove, and lysozyme cuts the bond and destroys bacteria.

Protein Sequence in DNAProtein StructureProtein Strucure with Sugar
SequenceStructureFunction

Figure credit: C.Ing and wikipedia


5. Molecular Mechanics


Since we now know what proteins are and why these molecular machines are important, we consider the method to model them. The basic idea is to create the same kind of approach as we used in the 3-body simulation. Now, our system consists of thousands particles (atoms of the protein plus atoms of surrounding water) and they all are connected via a complex potential energy function.

An all-atom potential energy function $V$ is usually given by the sum of the bonded terms ($V_b$) and non-bonded terms ($V_{nb}$), i.e.

\[V = V_{b} + V_{nb},\]

where the bonded potential includes the harmonic (covalent) bond part, the harmonic angle and the two types of torsion (dihedral) angles: proper and improper. As it can be seen, these functions are mostly harmonic potentials

\[V_{b} = \sum_{bonds}\frac{1}{2}K_b(b-b_0)^2 + \sum_{angles}K_{\theta}(\theta-\theta_0)^2 + \sum_{dihedrals}K_{\phi}(1-cos(n\phi - \phi_0)) + \sum_{impropers}K_{\psi}(\psi-\psi_0)^2\]

For example, $b$ and $\theta$ represent the distance between two atoms and the angle between two adjacent bonds; $\phi$ and $\psi$ are dihedral (torsion) angles. These can be evaluated for all the atoms from their current positions. Also, $K_b$, $K_\theta$, $K_\phi$, and $K_\psi$ are the spring constants, associated with bond vibrations, bending of bond angles, and conformational fluctuations in dihedral and improper angles around some equilibrium values $b_0$, $\theta_0$, $\phi_0$, and $\psi_0$, respectively.

The non-bonded part of the potential energy function is represented by the electrostatic and van der Waals potentials, i.e.

\[V_{nb} = \sum_{i,j}\left(\frac{q_{i}q_{j}}{4\pi\varepsilon_{0}\varepsilon r_{ij}} + \varepsilon_{ij}\left[\left(\frac{\sigma^{min}_{ij}}{r_{ij}}\right)^{12}-2\left(\frac{\sigma^{min}_{ij}}{r_{ij}}\right)^{6}\right]\right)\]

where $r_{ij}$ is the distance between two interacting atoms, $q_i$ and $q_j$ are their electric charges; $\varepsilon$ and $\varepsilon_0$ are electric and dielectric constant; $\varepsilon_{ij} = \sqrt{\varepsilon_i\varepsilon_j}$ and $\sigma_{ij} = \frac{\sigma_i + \sigma_j}{2}$ are van der Waals parameters for atoms $i$ and $j$.

Importantly, each force field has its own set of parameters, which are different for different types of atoms.

6. Molecular dynamics of proteins


Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules, i.e. their dynamical evolution.

In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton’s equations of motion for a system of interacting particles, where forces between the particles and their potential energies are often calculated using molecular mechanics force fields.

Now with all that intellectual equipment, we can start running legit Molecular Dynamics simulations. All we need is an initial structure of the protein and software that computes its dynamics efficiently.

Procedure:

  1. Load initial coordinates of protein atoms (from *.pdb file)
  2. Choose force field parameters (in potential function V from section 5).
  3. Choose parameters of the experiment: temperature, pressure, box size, solvation, boundary conditions
  4. Choose integrator, i.e. algorithm for solving equation of motion
  5. Run simulation, save coordinates time to time (to *.dcd file).
  6. Visualize the trajectory
  7. Perform the analysis

These are the Python libraries we are going to need today:

  1. nglview - module to visualize molecules
  2. mdanalysis - module to analyze MD trajectory
  3. openmm - module to run MD simulation
from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
import MDAnalysis as md
import nglview as ng
from sys import stdout

These files are already preloaded to your data folder.

pdb0_file = 'data/villin_water.pdb'
pdb1_file = 'data/polyALA.pdb'
pdb2_file = 'data/polyGLY.pdb'
pdb3_file = 'data/polyGV.pdb'

PDB files contain 3D structures of proteins that were resolved by experimental techniques. They can be downloaded from ProteinDataBank. Here we can see what a .pdb file looks like:

file0 = open(pdb0_file, 'r')
counter = 0
for line in file0:
    if counter < 10:
        print(line)
    counter += 1
REMARK    GENERATED BY TRJCONV

HEADER    Villin N68H in explicit water

REMARK    THIS IS A SIMULATION BOX

CRYST1   49.163   45.981   38.869  90.00  90.00  90.00 P 1           1

MODEL        0

ATOM      1  N   LEU     1      25.160  14.160  19.440  1.00  0.00

ATOM      2  H1  LEU     1      24.350  13.730  19.870  1.00  0.00

ATOM      3  H2  LEU     1      25.980  13.680  19.760  1.00  0.00

ATOM      4  H3  LEU     1      25.180  15.100  19.810  1.00  0.00

ATOM      5  CA  LEU     1      25.090  13.920  17.980  1.00  0.00

We can look at the protein via nglview:

u = md.Universe(pdb0_file)
ng.show_mdanalysis(u, gui=True)
NGLWidget()



Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…

Example 6.1. MD simulation of protein folding into alpha-helix


Run a simulation of fully extended polyalanine polyALA.pdb for 400 picoseconds in a vacuo environment with T=300 K and see if it can fold to any secondary structure:

### 1.loading initial coordinates
pdb = PDBFile(pdb3_file) 

### 2.choosing a forcefield parameters
ff = ForceField('amber10.xml')  
system = ff.createSystem(pdb.topology, nonbondedMethod=CutoffNonPeriodic)

### 3. Choose parameters of the experiment: temperature, pressure, box size, solvation, boundary conditions, etc
temperature = 300*kelvin
frictionCoeff = 1/picosecond
time_step = 0.002*picoseconds
total_steps = 400*picoseconds / time_step

### 4. Choose an algorithm (integrator)
integrator = LangevinIntegrator(temperature, frictionCoeff, time_step)

### 5. Run simulation, saving coordinates time to time:

### 5a. Create a simulation object
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

### 5b. Minimize energy
simulation.minimizeEnergy()

### 5c. Save coordinates to dcd file and energies to a standard output console:
simulation.reporters.append(DCDReporter('data/polyALA_traj.dcd', 1000))
simulation.reporters.append(StateDataReporter(stdout, 5000, step=True, potentialEnergy=True,\
                                              temperature=True, progress=True, totalSteps = total_steps))

### 5d. Run!
simulation.step(total_steps)
#"Progress (%)","Step","Potential Energy (kJ/mole)","Temperature (K)"
2.5%,5000,5262.389194066251,309.1792062025364
5.0%,10000,5286.58883453618,307.68666956890166
7.5%,15000,5233.48495931716,291.1269163957427
10.0%,20000,5337.896016009641,290.01442478439674
12.5%,25000,5171.862252076715,318.66193295472954
15.0%,30000,5201.603393606135,262.32020987556973
17.5%,35000,5231.681957878747,314.39652514053125
20.0%,40000,5292.29828864241,285.4574236183474
22.5%,45000,5148.230342539477,299.3980459342863
25.0%,50000,5148.786159232427,295.14914161812186
27.5%,55000,4995.014402727151,296.76517635534435
30.0%,60000,5028.376083186233,272.94226694677195
32.5%,65000,5021.9512945252745,284.8756601023911
35.0%,70000,5049.3857193702115,297.30906373931
37.5%,75000,5021.124801600323,311.25426481263173
40.0%,80000,4924.619271596075,331.74254400962667
42.5%,85000,5147.154491443674,295.44894692159687
45.0%,90000,4959.246793658329,291.969497315438
47.5%,95000,4959.680609775365,303.6875919151087
50.0%,100000,4932.005582835091,283.415591716552
52.5%,105000,5020.748161466297,296.4382832661531
55.0%,110000,4873.502870975827,307.4766967102292
57.5%,115000,4863.058383271943,297.75470992654186
60.0%,120000,4825.485808612784,303.50074320258307
62.5%,125000,4750.454764763368,292.39496276415167
65.0%,130000,4933.319523532833,310.3032255817655
67.5%,135000,4747.073838160885,321.80496153589326
70.0%,140000,4686.6943264175825,332.9691352262142
72.5%,145000,4806.42124800881,295.29413113461646
75.0%,150000,4721.839489529408,310.6623637286667
77.5%,155000,4579.65793023612,305.30641996535087
80.0%,160000,4762.214371734177,305.4936940229858
82.5%,165000,4827.56596936731,281.96096292249734
85.0%,170000,4780.854328289395,302.85494719336236
87.5%,175000,4708.553977746668,303.56771945505017
90.0%,180000,4688.801022310174,295.7320921217744
92.5%,185000,4729.911180116662,301.0325222063754
95.0%,190000,4640.376658420844,314.1819097565552
97.5%,195000,4641.535714300734,263.1082790924476
100.0%,200000,4611.624250980559,280.6128232555509

Visualization

Let’s look at the trajectory:

### 6. Visualization
sys = md.Universe(pdb3_file, 'data/polyALA_traj.dcd')
ng.show_mdanalysis(sys, gui=True)
NGLWidget(count=200)



Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…

Example 6.2. Analysis of MD trajectory.


End-to-end distance:

### analysis of end-to-end distance

## choose terminal atoms 
N_terminus = sys.select_atoms('resid 1 and name N')
C_terminus = sys.select_atoms('resid 25 and name C')

## go through the whole trajectory and compute distance between them for every frame
dist = []
for frame in sys.trajectory:
    dist.append(np.linalg.norm(N_terminus.positions - C_terminus.positions))

## the result is in the dist array    
dist = np.array(dist) 

Number of hydrogen bonds:

from MDAnalysis.analysis import hbonds ## module for analysis of hydrogen bonds

## compute information about hbonds and write it to the 'hb.timeseries'
hb = hbonds.hbond_analysis.HydrogenBondAnalysis(sys)
hb.run()

## print information for the first 10 frames
for frame in hb.timeseries[:10]:
    print(frame)
[]
[]
[]
[]
[]
[]
[[2, 17, 'ALA1:H2', 'ALA2:O', 1.9231754847132658, 132.07458988125043]]
[[3, 17, 'ALA1:H3', 'ALA2:O', 2.355419232373549, 134.467785651158]]
[[3, 17, 'ALA1:H3', 'ALA2:O', 1.7133716830514205, 159.6524110745892], [63, 47, 'ALA7:H', 'ALA5:O', 2.1682246615078897, 130.5838945361935]]
[[63, 47, 'ALA7:H', 'ALA5:O', 2.217855178663783, 127.49586853002494]]
## go through the 'hb.timeseries' file and calculate number of bonds for each time frame 
## (it's the length of array frame)
hb_number = []
for frame in hb.timeseries:
    hb_number.append(len(frame))
    
## the result is in the number array     
hb_number = np.array(hb_number)

We can plot end-to-end distance and number of hydrogen bonds vs time:

plt.figure(figsize=(15,5))

plt.subplot(121)
plt.plot( dist, '-k' )
plt.xlabel('timesteps')
plt.ylabel('end-to-end distance, A')

plt.subplot(122)
plt.plot(hb_number, 'g-')
plt.ylabel('# of hydrogen bonds')
plt.xlabel('timesteps')

plt.show()

png

Ramachandran plot:

from MDAnalysis.analysis import dihedrals  ## module for dihedrals analysis

ram1 = dihedrals.Ramachandran(sys).run(0,30) ## analyse for first 30 steps (black color)
ram2 = dihedrals.Ramachandran(sys).run(170,200) ## analyse for last 30 steps (blue color)

## ramachandran plot
fig, ax = plt.subplots(figsize=(8,8))
ram1.plot(ax=ax, color='k', marker='.')
ram2.plot(ax=ax, color='b', marker='.')
ax.arrow(20, 20, -40, -40, width=2, head_width=8, head_length=12, fc='b', ec='b')
ax.text(30, 20, 'alpha region', color='blue', fontsize=20)
ax.arrow(20, 150, -40, 0, width=2, head_width=8, head_length=12, fc='k', ec='k')
ax.text(30, 150, 'beta region', fontsize=20)
/home/klyshko/anaconda3/lib/python3.6/site-packages/MDAnalysis/analysis/dihedrals.py:286: UserWarning: Cannot determine phi and psi angles for the first or last residues
  warnings.warn("Cannot determine phi and psi angles for the first "





Text(30, 150, 'beta region')

png


Exercise 1. Simulation of a projectile.

Read through sections 2-3 and example 3.1 of the lecture. Write a program that simulates the 2-s motion of the NFL ball ($m = 15$ ounces) thrown by quaterback T.Brady from height $H=8$ ft at speed $v=60$ MPH which is 30 degrees to the horizontal. Save the simulation as brady.html.

Hint. Be careful with units. You need SI.

Use the cell below.

## GO PATS!

Exercise 2. Simulation of an N-body problem.

Read through sections 2-3 and example 3.2 of the lecture. Write a program that simulates the 10-s dynamics of 4 particles, each with $m = 0.5$ kg, connected to each other with invisible springs with $K_s = 8$ N/m, and $r_0 = 2$ m initially located at (0,0), (0, 2), (2, 0) and (2, 2) on the 2D plane. Save the simulation as 4body.html.

Use the cell below.

Exercise 3. Your first MD simulation.

Read through section 6 and example 6.1-6.2 of the lecture. Run 3 simulations of fully extended polyglycine data/polyGLY.pdb for 1 nanosecond in vacuum (no water) with $T_1=100 K$, $T_2=300 K$, and $T_3=500 K$ and visually compare how extended the final structure is at each temperature.

Exercise 4. MD simulation analysis.

Perform a quantitative analysis of the trajectories you obtained in Exercise 3. Use, for example, the end-to-end distance or the function radius_of_gyration() from the MDAnalysis module, which returns the radius of gyration of the protein.