SpaceGA is a tool for accelerated searching of chemical spaces using molecular docking. It integrates the similarity search tool SpaceLight by BiosolveIT with a graph-based genetic algorithm (GB-GA) to efficiently navigate large combinatorial spaces.
SpaceGA explores chemical spaces by using the same crossover module as the GB-GA by Jensen but replaces the mutation module with a similarity search using SpaceLight. This allows SpaceGA to identify "available mutations" within a given chemical space, ensuring that all generated molecules exist within the specified library.
For reproducing the results from the SpaceGA paper, refer to the ./SpaceGApaper directory. The most recent version of SpaceGA is available in the ./SpaceGA directory.
You can install SpaceGA using pip:
pip install -e .- SpaceLight: A valid license is required to use SpaceGA.
- Lilly Medchem Rules (optional): Supports molecular filtering.
- Additional dependencies are listed in
requirements.txt.
You can run SpaceGA directly from the command line using a .json file as input:
python3 /path/to/SpaceGA/main.py /path/to/inputs.jsonAlternatively, you can use the provided Jupyter notebooks:
SpaceGAmanual.ipynbSpaceGAautomated.ipynb
These notebooks serve as implementation examples in a Python script.
SpaceGA accepts various arguments specified either in the input .json file or when initializing SpaceGA.
| Argument | Type | Description | Default |
|---|---|---|---|
o |
str |
Path to output directory | 'output' |
i |
str |
Path to .smi file with input molecules |
'input.smi' |
space |
str |
Path to the desired BiosolveIT space | 'space' |
spacelight |
str |
Path to the SpaceLight installation | 'spacelight' |
| Argument | Type | Description | Default |
|---|---|---|---|
O |
bool |
Allow overwrite of output | False |
iterations |
int |
Number of iterations | 10 |
p_size |
int |
Population size | 100 |
children |
int |
Number of molecules evaluated per iteration divided by p_size |
100 |
crossover_rate |
float |
Crossover rate | 0.2 |
cpu |
int |
Number of CPUs available | 1 |
sl_cpu |
int |
Number of CPUs for SpaceLight processing | Same as cpu |
al |
bool |
Use active learning-based ML filtering of offspring | False |
sim_cutoff |
float |
Similarity cutoff after each iteration (1.00 means no filter) |
1.00 |
f_comp |
int |
Number of top f_comp * children similar molecules retained |
100 |
fp_type |
str |
Fingerprint type for SpaceLight | ECFP4 |
model_name |
str |
Machine learning model name (stored in SpaceGA/ml/models.py) |
'NN1' |
scoring_tool |
dict |
Dictionary specifying the scoring function | {...} |
scoring_inputs |
dict |
Inputs for the scoring function | {} |
filtering_inputs |
dict |
Inputs for filtering molecules | {} |
SpaceGA includes two built-in scoring functions that can be specified as follows:
scoring_tool = {"module": "SpaceGA.scoring", "tool": scoringtool}- FPSearch: Maximizes Tanimoto similarity to a query molecule.
- LogPSearch: Maximizes logP.
| Argument | Type | Description |
|---|---|---|
smiles |
str |
Query SMILES string for similarity maximization |
No additional arguments are required for LogPSearch.
Users can define and import custom scoring functions following this format:
from abc import ABC, abstractmethod
class Scorer(ABC):
@abstractmethod
def __init__(self, arguments):
pass
def score(self, smi_lst, name_lst):
passHere, arguments is a dictionary containing SpaceGA settings, ensuring seamless integration.
SpaceGA supports filtering based on molecular properties, which can be configured using a dictionary. Below are the available filtering options:
| Argument | Type | Description | Default |
|---|---|---|---|
minlogP |
float |
Minimum logP value | None |
maxlogP |
float |
Maximum logP value | None |
minMw |
float |
Minimum molecular weight | None |
maxMw |
float |
Maximum molecular weight | None |
minHBA |
int |
Minimum number of hydrogen bond acceptors | None |
maxHBA |
int |
Maximum number of hydrogen bond acceptors | None |
minHBD |
int |
Minimum number of hydrogen bond donors | None |
maxHBD |
int |
Maximum number of hydrogen bond donors | None |
minRings |
int |
Minimum number of rings | None |
maxRings |
int |
Maximum number of rings | None |
minRotB |
int |
Minimum number of rotatable bonds | None |
maxRotB |
int |
Maximum number of rotatable bonds | None |
BRENK |
bool |
Apply BRENK filter | False |
PAINS |
bool |
Apply PAINS filter | False |
Lilly |
str |
Path to Lilly_Medchem_Rules.rb | None |
Substructure |
str |
SMART string of required substructure | None |