A code for Mechanistic Dataset generation

A python code for mechanistic dataset generation.

Step 0 - Installation

git clone https://github.com/jfjoung/Mechanistic_dataset.git
cd Mechanistic_dataset
conda env create --file environment.yml
conda activate mechdata

Step 1 - Data preparation

In order to generate mechanistic dataset from the overall reactions, reaction SMILES and its reaction class name as in NameRXN are required. The reaction SMILES and its reaction class name should be separated by a space as shown below. It doesn't require atom mapping, a code will automatically assign all the atom mapping number.

BrCCCOC1=CC(OCO2)=C2C=C1.OC3C(CCN(C3)C(OC(C)(C)C)=O)C4=CC=C(O)C=C4.[OH-]>>OC1C(CCN(C1)C(OC(C)(C)C)=O)C2=CC=C(OCCCOC3=CC(OCO4)=C4C=C3)C=C2 Williamson ether synthesis

Step 3 - Set configuation parameters

This section outlines the configuration parameters used in the script, detailing their purpose and functionality.

num_combination: Limits the number of reactant combinations per template to prevent combinatorial explosion.
uni_rxn: Adds a unimolecular version of a bimolecular reaction template while retaining the original.
proton: Ensures proton balance by adjusting templates on-the-fly based on the presence of acids or bases.
max_num_temp: Limits the number of possible templates when modifications like uni_rxn or proton are applied.
stoichiometry: Allows duplicated reactants to be added if stoichiometry errors are detected.
do_not_pruning: Controls whether to prune the reaction network to retain only those that lead to the final product.
num_cycles: Sets a limit on the number of catalytic cycles to avoid long computation times during conversion to SMARTS.
num_reaction_node: Limits the number of reactions in the reaction network to prevent excessive processing time.
byproduct: Adds byproducts from previous reactions when converting to elementary reaction SMARTS.
spectator: Includes all chemical species in the flask at the current step when converting to SMARTS.
full: Determines whether to return overall reaction SMARTS instead of elementary reaction SMARTS.
end: Adds reactions that generate a product from an already existing product.
plain: Generates reaction SMARTS without atom mapping numbers.
explicit_H: Explicitly represents hydrogens in the reaction SMARTS.
reagent: Positions non-reacting species between '>>' in Reaction SMARTS (Reactants > Reagents > Products).
remapping: Remaps atom mapping in SMARTS to start from 1.
data: Path to the file containing overall reactions.
save: Path to the file where results will be saved.
debug: Logs reactions with errors for debugging purposes.
all_info: Saves all reactions, including the reaction network.
stat: Generates statistics for the reactions.
rxn_class: Specifies a particular reaction class to extract elementary reactions from.
process: Number of processors to use in multiprocessing.
verbosity: Sets the logging level.

Step 3 - Run a code to generate mechanistic data

Mechanistic data can be generated with either of the two methods:

with bash script
generate_mechanism_data.sh can be modified with the arguments.

bash run/generate_mechanism_data.sh

with python script

python ver2.py

Step 4 - Visualize reaction templates

If you want to interactively browse and visualize the reaction templates (reaction class, mechanism, SMARTS for each step),
you can open the Colab notebook below:

This notebook allows you to:

Select a reaction class and mechanism from dropdown menus
View allowed and excluded reaction SMARTS for each mechanism
Visualize each elementary reaction template
Choose how atom features (like H count or charge) are shown in the molecule view

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
data		data
run		run
scripts		scripts
templates		templates
utils		utils
.gitignore		.gitignore
MechData_visualization.ipynb		MechData_visualization.ipynb
Mechanistic_dataset_visualization.ipynb		Mechanistic_dataset_visualization.ipynb
README.md		README.md
analysis.py		analysis.py
environment.yml		environment.yml
post_processing.py		post_processing.py
run.py		run.py
test.py		test.py
ver2.py		ver2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A code for Mechanistic Dataset generation

Step 0 - Installation

Step 1 - Data preparation

Step 3 - Set configuation parameters

Step 3 - Run a code to generate mechanistic data

Step 4 - Visualize reaction templates

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

jfjoung/Mechanistic_dataset

Folders and files

Latest commit

History

Repository files navigation

A code for Mechanistic Dataset generation

Step 0 - Installation

Step 1 - Data preparation

Step 3 - Set configuation parameters

Step 3 - Run a code to generate mechanistic data

Step 4 - Visualize reaction templates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages