Metis is GUI to enable the collection of accurate and detailed feedback on small molecules. At its core, it is built around Esben Bjerrums rdEditor using PySide2.
You can find the preprint at ChemRxiv
Table of Contents
Download the repository and navigate to the download location. You can install metis with pip install .. Make sure the environment you want to install into is activated and has python >= 3.9, <3.11 installed.
If you wish to use REINVENT 3 in the backend, also install REINVENT 3 on a remote machine.
Some notes on the dependencies.
PySide 2
Getting the environment set up with PySide2 can be somewhat challenging. It is planned to move to PySide6. There already exists a branch for it, which you can try out. It works but has not yet completely been tested.
scikit-learn
The version scikit-learn constraints are only set to make sure that the examples given here work.
In theory, you could use any scikit-learn version. If you want to use Reinvent in the backend, you need to make sure that the version of scikit-learn Reinvent is using on the remote machine
should be updated to the version that matches your local installation used by metis.
cairosvg
Depending on the OS you are running installing cairosvg through pip can cause issues, as cairo is not found. On MacOS you can solve this by installing cairo using homebrew, or you can install cairosvg using conda-
It is assumed you have a working version of Reinvent on a Server instance that is running Slurm and ssh.
-
Change the ssh settings in the
example_project/de_novo_files/ssh_settings.ymlfile.ssh_login: your login to SSH e.g.username@remote_serveryou should be able to access your remote server without a password, for example, using an RSA Keypath_remote_folder: path on the remote machine, from where Reinvent files will be loaded and stored.de_novo_json: specify which defaultreinvent.jsonfile to usedefault_slurm: specify which default Slurm job to use
-
Copy and unzip the
metis_reinvent.zipto the remote machine. Make sure that thepath_remote_folderin thessh_settings.ymlfile matches with the folder location and also in theinitial_reinvent.json.
After installation simply run:
metis -f path/to/settings.yml --output /path/where/to/save/
This will start the GUI. Examples can be found below.
In the most simple example, only the GUI will be started to collect feedback. No models are trained and no de novo run started.
- If you want to show the atom contributions to the predictions/model explanation
- (show_atom_contributions: render: true)
- you will experience heavy slowdowns when switching to a new molecule.
- The only solution at the moment is not to show them.
- You can set show_atom_contributions: render: False.
- This will yield a much smoother experience. cd example_project
metis -f settings_ui.yml --output results/
Here, next to collecting feedback, a reward model is also trained on the feedback. For this, we provided a QSAR model and Oracle model for JNK3 activity.
The setting use_oracle_score: False, will use the feedback of humans as the target variable that is to be predicted. If the setting is set to True, the molecules liked by the chemist will be scored by the oracle, and these scores will then be used as the target varible for the reward model. This can be thought of as an active learning setting, where the chemists decides which molecules are being "biologically validated".
cd example_project
metis -f settings_reward_model.yml --output results/
With these settings, a REINVENT de novo run can be started directly using Metis on a remote machine.
The remote machine needs:
- a working installation of REINVENT 3.
- update the REINVENTS scikit-learn to >1.0.0
- Slurm
- access through SSH wih a key
- the unzipped
example_project/metis_reinvent.zipfolder
Once copied and unzipped, the paths and settings in the de_novo_files folder need to be adapted to fit to your paths on the remote machine.
cd example_project
metis -f settings_denovo.yml --output results/
Here is a brief overview of all settings
| Name | Type | Required | Default |
|---|---|---|---|
| seed | Union[int, None] | False | |
| tutorial | bool | False | False |
| debug | bool | False | False |
| max_iterations | int | True | ... |
| innerloop_iterations | Union[int, None] | False | None |
| activity_label | str | True | ... |
| introText | str | True | ... |
| propertyLabels | Dict | True | ... |
| data | DataConfig | True | ... |
| ui | UIConfig | True | ... |
| de_novo_model | Union[DeNovoConfig, None] | False | None |
| reward_model | Union[RewardModelConfig, None] | False | None |
debug: ifTruewill overwrite existing results foldersmax_iterationsdefines how often molecules are sampled, feedback collected and the model updatedinnerloop_iterationhow often molecules are resampled from the same scaffold memory before the model is sent to the remote machine
| Name | Type | Required | Default |
|---|---|---|---|
| initial_path | str | True | ... |
| path | str | True | ... |
| selection_strategy | str | True | ... |
| num_molecules | int | True | ... |
| run_name | str | True | ... |
initial_path: path to inital dataset, the molecules that shall be evaluated firstpath: path to subsequent datasets, these come from the server, and are generated by Reinvent, should end inscaffold_memory.csvselection_strategy: how to pick which molecules to shownum_molecules: how many molecules to showrun_name: what is the name of the run, under this name the results will be stored
| Name | Type | Required | Default |
|---|---|---|---|
| show_atom_contributions | AdditionalWindowsConfig | False | {'render': False, 'path': None, 'ECFP': None} |
| show_reference_molecules | AdditionalWindowsConfig | False | {'render': False, 'path': None, 'ECFP': None} |
| tab | TabConfig | True | ... |
| navigationbar | NavigationbarConfig | True | ... |
| general | GeneralConfig | True | ... |
| substructures | SubstructureConfig | True | ... |
| global_properties | GlobalPropertiesConfig | True | ... |
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | False | False |
| path | Union[str, None] | False | |
| ECFP | Union[ECFPConfig, None] | False |
| Name | Type | Required | Default |
|---|---|---|---|
| bitSize | int | True | ... |
| radius | int | True | ... |
| useCounts | bool | False | False |
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | True | ... |
| tab_names | List | True | ... |
renderifFalseit will not render the additional tabs
| Name | Type | Required | Default |
|---|---|---|---|
| sendButton | NavButtonConfig | True | ... |
| editButton | NavButtonConfig | True | ... |
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | False | False |
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | False | True |
| slider | bool | False | False |
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | False | False |
| liabilities | Dict | True | ... |
Liablities control which properties you can select substructures for:
Keys such as ugly or tox are simply used within the script.
name will define how the button is called
color will define the color of the button as well as the color of the atom highlight
liabilities:
ugly:
name: "Mutagenicity"
color: "#ff7f7f"
tox:
name: "Toxicity"
color: "#51d67e"
stability:
name: "Stability"
color: "#eed358"
like:
name: "Good"
color: "#9542f5"
| Name | Type | Required | Default |
|---|---|---|---|
| render | bool | False | False |
| liabilities | List | True | ... |
| Name | Type | Required | Default |
|---|---|---|---|
| ssh_settings | str | True | ... |
| use_human_scoring_func | bool | False | False |
| use_reward_model | bool | False | False |
| Name | Type | Required | Default |
|---|---|---|---|
| use_oracle_score | bool | False | True |
| weight | Union[str, None] | False | None |
| oracle_path | Union[str, None] | False | None |
| qsar_model_path | str | True | ... |
| training_data_path | str | True | ... |
| ECFP | ECFPConfig | True | ... |
use_oracle_scoreinstead of using the feedback directly to train the reward model, one can use the oracle model to score molecules liked by the user. The reward model is then trained on the predictions of the oracle rather than on the direct feedback. This mimics an active learning scenario where the chemist can choose which molecules he wants to biologically validate