Thanks to visit codestin.com
Credit goes to github.com

Skip to content
forked from txie-93/cgcnn

Crystal graph convolutional neural networks for predicting material properties.

License

vickie02736/cgcnn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crystal Graph Convolutional Neural Networks

This software package implements the Crystal Graph Convolutional Neural Networks (CGCNN) that takes an arbitary crystal structure to predict material properties.

The package provides two major functions:

  • Train a CGCNN model with a customized dataset.
  • Predict material properties of new crystals with a pre-trained CGCNN model.

The following paper describes the details of the CGCNN framework:

Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties

This revision addresses a few minor issues:

  1. It is easy for the crystal graph cache to be dumped, causing significant slowdown in the data processing end, as mentioned here. This version will automatically write .pkl files for each structure so the features do not have to be regenerated on-the-fly.
  2. The predict.py script was fixed, and several minor changes were made to the log file.
  3. A new atom_init.json file was made, in part to address the issue raised here. It makes no difference though in the end.

Table of Contents

How to cite

Please cite the following work if you want to use CGCNN.

@article{PhysRevLett.120.145301,
  title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
  author = {Xie, Tian and Grossman, Jeffrey C.},
  journal = {Phys. Rev. Lett.},
  volume = {120},
  issue = {14},
  pages = {145301},
  numpages = {6},
  year = {2018},
  month = {Apr},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevLett.120.145301},
  url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}

Prerequisites - for UCL myriad cluster

Cluster information (example): NVIDIA-SMI 535.54.03 - Driver Version: 535.54.03, CUDA Version: 12.2

GPU Name Persistence-M Bus-Id Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M. MIG M.
0 Tesla V100-PCIE-32GB Off 00000000:D8:00.0 Off 0 N/A 31C P0 36W / 250W 0MiB / 32768MiB 1% Default N/A

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda and pip. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

The environment requirements are saved in the file environment.yml, with the guide for Creating an environment from an environment.yml file.

Usage

Define a customized dataset

To input crystal structures to CGCNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target properties for each crystal (not needed for predicting, but you need to put some random numbers in id_prop.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  2. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID (filename) for the crystal.

  3. id_prop.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the value of target property. If you want to predict material properties with predict.py, you can put any number in the second column. (The second column is still needed.) data/split_csv.ipynb can be used to make id_prop.csv for QMOF dataset.

The structure of the root_dir should be:

root_dir
├── id_prop.csv
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There are two examples of customized datasets in the repository: data/sample-regression for regression and data/sample-classification for classification.

For advanced PyTorch users

The above method of creating a customized dataset uses the CIFData class in cgcnn.data. If you want a more flexible way to input crystal structures, PyTorch has a great Tutorial for writing your own dataset class.

Train a CGCNN model

Before training a new CGCNN model, you will need to:

Then, in directory cgcnn, you can train a CGCNN model for your customized dataset by:

python main.py root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/sample-regression has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/sample-regression

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/sample-regression

You can also train a classification model with label --task classification. For instance, you can use data/sample-classification by:

python main.py --task classification --train-size 5 --val-size 2 --test-size 3 data/sample-classification

After training, you will get three files in cgcnn directory.

  • model_best.pth.tar: stores the CGCNN model with the best validation accuracy.
  • checkpoint.pth.tar: stores the CGCNN model at the last epoch.
  • test_results.csv: stores the ID, target value, and predicted value for each crystal in test set.

Predict material properties with a pre-trained CGCNN model

Before predicting the material properties, you will need to:

Then, in directory cgcnn, you can predict the properties of the crystals in root_dir:

python predict.py pre-trained.pth.tar root_dir

For instace, you can predict the formation energies of the crystals in data/sample-regression:

python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression

And you can also predict if the crystals in data/sample-classification are metal (1) or semiconductors (0):

python predict.py pre-trained/semi-metal-classification.pth.tar data/sample-classification

Note that for classification, the predicted values in test_results.csv is a probability between 0 and 1 that the crystal can be classified as 1 (metal in the above example).

After predicting, you will get one file in cgcnn directory:

  • test_results.csv: stores the ID, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset in id_prop.csv, which is not important.

Here is the comprehensive bash script for the modified version:

python main.py ./data \
    --epochs 600 \
    --target band_gap \

Here is added the "target" (directory), including train.csv.

python ../predict.py\
    --modelpath ../output/model_best.pth.tar\
    --cifpath ./data\
    --target band_gap

Authors

This software was originally written by Tian Xie and Prof. Jeffrey Grossman. This slightly modified version was made by Andrew S. Rosen.

Modified by Kewei Zhu to fit Myriad cluster.

License

CGCNN is released under the MIT License.

About

Crystal graph convolutional neural networks for predicting material properties.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 76.8%
  • Python 22.7%
  • Shell 0.5%