This repository contains the scripts to perform Molecular Dynamics (MD) simulations of the polymer models discussed in the paper:
To improve reproducibility, the present repository is provided with the following DOI: 10.5281/zenodo.6726064.
- Structure of the folder and input files
- Python scripts:
- Sub-folders:
- Installing software
- Run simulations
- Demo simulation
- Expected output of simulations
- Research projects
The current folder contains six Python scripts (.py) and three sub-folders.
The main script of MD simulations is sbs-le.py. It takes as input three dependencies, i.e., the scripts saw.py (that prepares the initial state of simulation), sbs_class.py (that prepares the context for SBS simulations), le4hoomd.py (that prepares the context for LE simulations, is taken from its own repository on https://codeberg.org/ehsan/LE4hoomd). The script cont-dist-map-sbs-le.py and cont-dist-map-sbs-le2HDF.py are model python scripts to produce, respectively in .txt and HDF formats, contact and distance maps from the output simulations (see below).
HCT116_chr21_30kb: input files for LE and SBS simulations for the studied locus in HCT116 and corresponding python scripts to set model parameters. For model parameters see also the Methods sections of the paper.IMR90_chr21_30kb: input files for LE and SBS simulations for the studied locus in IMR90 and corresponding python scripts to set model parameters. For model parameters see also the Methods sections of the paper.demo_sim: folder containing, as an example, a very short demo simulation of the eLE model in IMR90.
The HOOMD-blue Python package can be installed on Linux-64 or OSx-64 systems by the following command line:
conda install -c conda-forge hoomd
Please note that the scripts are written for HOOM version 2.9.x. The installation time requires few minutes. All details on prerequisites (e.g., software packages and libraries) can be found at the official website http://glotzerlab.engin.umich.edu/hoomd-blue/.
To run one MD polymer simulation, use in the current folder the following command line:
python3 sbs-le.py --settings $LOCUS_chr21_30kb.settings_$MODEL_$LOCUS_30kb.py --sim-id test$MODEL --hoomd cpu
-
$LOCUSidentifies the cell line to model: set it toIMR90orHCT116. -
$MODELidentifies the model to simulate: set it toLEoreLEorSBSorLESBS.
Thus, for example, to run one simulation of LE+SBS model in IMR90:
python3 sbs-le.py --settings IMR90_chr21_30kb.settings_LESBS_IMR90_30kb.py --sim-id testLESBS --hoomd cpu
Roughly 10 minutes occur to generate a demo simulation up to 2e6 timesteps on normal PC. To have fully equilibrated conformations, our simulations run up to 2e8 timesteps, so the running time of one simulation is tens of hours. To produce an ensemble of polymer conformations, simply run multiple copies (e.g., 1000) of the above scripts.
We provide as an example a demo run simulation of eLE model in IMR90 with very few timesteps (1.3e6 MD timesteps). To run the simulation, use in the current folder the following command line:
python3 sbs-le.py --settings demo_sim.settings_eLE_IMR90_30kb.py --sim-id testeLE --hoomd cpu
Note: the simulations we performed in the paper ran for thousands of MD timesteps to have fully equilibrated conformations (up to 2e8 MD timesteps); this demo is an example.
The main output of a simulation is a gsd file named test$MODEL-dump.gsd, which contains all details of system dynamics at each simulated timestep (such as x-y-z coordinates, particle types and bond types, as standard in MD simulations). $MODEL is, again, one of the models mentioned before. Hence, in the case of an eLE simulation, the output will be named: testeLE-dump.gsd.
The output gsd file can be easily analyzed by using standard python packages. As an example, we report the python script cont-dist-map-sbs-le.py, which computes distance and contact maps (in txt format) from the gsd output of the simulation. To run this python script, use the line:
python3 cont-dist-map-sbs-le.py -i test$MODEL-dump.gsd -o heatmap --particles $P --t1 0.7 --t2 1.0 --step 0.003 --contact-thr 5.0 --dist TRUE~
As before, set $MODEl to LE or eLE or SBS or LESBS. Set $P to "750" in IMR90 and "930" in HCT116.
Similar command line holds for the python script cont-dist-map-sbs-le2HDF.py, which simply provides distance and contact map in HDF format.
Other outputs printed during the simulation:
test$MODEL-dump-ctcf.txt(in LE-based simulations): txt file with simulated CTCF tracks.test$MODEL-restart.gsd: gsd file to resume MD run in case the simulation breaks.
Simulations in the following works are performed by the code presented here: