ph451 Final Project 2
ph451 Final Project 2
April 2024
Abstract
With the rising popularity of Perovskite compounds in technology centers, particularly pho-
tovoltaics, experimental research into Perovskites has increasing appeal. To conduct this research
Perovskites must be created, but not all Perovskite structures are energetically favorable enough
to be reliably grown in a lab setting. This stability is based on the distance of a compound from
its convex hull. This distance can be calculated by hand, but it is a time consuming process. We
program. By training three models we find that we can isolate important factors in stability
prediction, estimate a compounds convex hull distance, and produce a prediction of stable or
unstable based on if a compounds convex hull distance is above or below 40 meV per atom.
These tasks are performed by a random forest, neural network regressor, and neural network
1 Introduction
The side-by-side advancement of modern material physics and accessible computational power has
made possible the data-mining and analysis of vast theoretical and experimental material properties
material physics has generated complex and robust prediction schemes which have informed materials
discovery and experimental research for the last two decades. In this project, we seek to address
key questions in the field of ceramics and ceramic oxides through a combination of domain-specific
knowledge and neural network modelling. As a result of our work, we have developed two neural
networks capable of predicting the formation energy and convex-hull-distance of novel Perovskite
also utilized a simple Random-Forest model to provide physical insight into which features are most
1
predictive in regard to Perovskite stability/instability. We hope our work can function as a ”pre-
screening” for Perovskite researchers seeking to isolate compounds worthy of further experimental
Novel materials have driven technological revolutions in nearly all sectors of society, including health,
energy, scientific research, and beyond. In the medical sector, the discovery of silicon-based semi-
conducting devices provided for magnetic-resonance imaging and improved patient implants (such
as heart stents) [1]. In the industrial setting, materials science has been aimed towards problems
of energy storage and energy collection devices, tunable materials, and structurally robust material
frameworks [2].
Over the last decade, interest in a new material class know as ”Perovskites” has grown due to
promising initial results detailing their wide range of tunable properties. For example, Perovskite
based solar cells were found to have an abnormally high solar power conversion efficiency [2] compared
to standard silicon-based devices [3], primarily due to their large photoabsorbiton coefficient [4].
Certain Perovskite materials, particularly Tin-Iodide based Perovskites, have also been shown to
control of both their electronic and crystallographic properties [2]. In the biochemical research sphere,
Perovskites have shown tremendous performance in chemical sensing tasks, with the ability to detect
Given the complex nature of these materials, a brief description of what constitutes a Perovskite
is warranted.
Perovskite compounds are crystalline materials with a chemical formula of ABX3 , where A and B
are taken to be cations, and X is taken to be an anion (typically oxygen). These materials typically
form in a cubic structure, with the B-type ions sitting at the center and the X ions siting at the cube
faces, forming a X-ion octahedron which surrounds the cationic center. The A-type cations are set
2
Figure 1: Unit-Cell of Perovskite material (Figure is from Ref. 5).
Currently, companies and research labs across the globe are looking for new ways to synthesize
Perovskite materials and realize these exciting technologies at an industrial level [6]. Despite this
excitement, serious roadblocks have emerged in the field. In order to realize these novel material
properties, synthesized Perovskites must be phase-stable, stoichiometrically pure, and stable against
environmental factors. In general, it is known for many Perovskites that there exist materials of
similar stoicheometry which may be more thermodynamically favored to form than the Perovskite
of interest. In laboratory settings, reports have shown this effect manifesting as phase-segregation,
rapid oxidation, degradation due to light exposure, temperature-induced material degradation, etc.
[2,3,6]. The onset of these phase-instabilities can completely disrupt the desirable properties of the
intended Perovskite.
Given the existing challenges of Perovskite phase-stability, a natural question arises: Can we predict
which Perovskites will be the most stable? A predictive model which can classify Perovskites as more
or less ”experimentally stable” would function to pre-screen the Perovskite material space, allowing
researchers and companies to isolate the most promising candidates for Perovskites with properties
which are robust against potential phase-instability and environmental degradation. One common
metric of material stability is the Convex-Hull distance (CHD) (for more information, please see
here.). In a sense, the CHD measures the difference in formation energy of a material from the
closest known stable-phase compound with similar crystal structure and stoicheometry. A CHD of 0
implies the material is stable while large positive values of CHD indicate increasing phase-instability.
Therefore, we seek to develop a set of models (in particular, a Machine-Learning model) capable
of predicting a materials CHD, formation energy, and general stability given information about its
3
constituent elements.
Machine-Learning and related theoretical and/or computational approaches have shown great suc-
cess in predicting the thermodynamic properties of a wide variety of compounds. It has been
shown that high-throughput Density-Functional Theory (DFT) predictions are highly accurate in
predicting formation energies and CHD values, but are computationally lengthy and expensive
Support-Vector Machines, Neural-Networks) have been shown to provide low MSE predictions of
formation energies for ternary compounds similar to the Perovskite material class [7]. More specifi-
cally, research performed by Wei Li et al in 2018 found that a neural-network approach to classifying
meV
Perovskites as above or below a 40 atom CHD threshold was successful with a test accuracy, F1-score,
and AUC-score of 0.93, 0.88, and 0.976 respectively [8,9]. A natural materials-discovery pipeline
therefore presents itself. Existing experimental and DFT-based Perovskite data can be used to train
complex Machine-Learning models which can predict highly-stable Perovskites which are viable for
3 Research Goals
In our project, we focus on a particular portion of this materials-discovery pipeline, namely, the
energy values and classifying Perovskites as stable vs. unstable according to a particular CHD
threshold. Throughout the course of this project and the development of the model(s), we aim to
2. Which elemental features provide the most predictive power in determining Perovskite stability?
The dataset that is to be used for this model can be found at figshare.com and was collected under
NSF grant 1148011 in collaboration with the Wisconsin Education Innovation Committee [11]. It
contains 65 columns of data (featues and labels), including the atomic radii of the composite atoms
4
for each material, the unit cell volume, the electron affinity, and the first ionization energy. The data
is categorized by three key labels: convex hull distance, formation energy, ”stable vs. unstable” based
meV
upon a 40 atom CHD threshold. In all there are 1929 instances split between the training, validation,
and testing sets. To give a sense of some of the (potentially) relevant features, we plot below the
covalent radius vs. the formation Energy and CHD values respectively.
meV
Figure 2: Plot of Convex Hull Distance ( atom ) vs. Covalent Radius (Å).
eV
Figure 3: Plot of Formation Energy ( atom ) vs. Covalent Radius (Å).
5
From these graphs it is hard to draw any firm conclusions as they are quite scattered, though
there appears to be some vertical linear groupings in the covalent radius vs formation energy graph.
We also show scatter plots for electron affinity vs convex hull distance and formation energy:
meV KJ
Figure 4: Plot of Convex Hull Distance ( atom ) vs. Electron Affinity ( mol ).
eV KJ
Figure 5: Plot of Formation Energy ( atom ) vs. average AB site Electron Affinity ( mol ).
Here there can be seen groupings of a linear variety between the electron affinity and the formation
energy, being more distinct and at higher energies. As mentioned in the background, the Perovskite
structure is typically of the ABX3 form. However, in many Perovskites there is not a single A or
6
B site element. Instead, the A and B site element varies from unit cell to unit cell between a few
elements of interest. At a minimum, each Perovskite must be composed of at least three distinct
elements. To understand the underlying element distribution of our dataset, we provide a bar-chart
Figure 6: Histogram depicting the number of elements in the various Perovskite instances.
It can be seen that there are significantly more 4 and 5 element materials than any other number.
Lastly, in order to contextualize our later results we must discuss the underlying label distribution
of our dataset. Namely, we find that the energies follow a multimodal distribution with an under-
eV eV
ling Gaussian background centered at ∼ −1.8 atom . The largest peaks appear around −2 atom and
eV
−1.5 atom . In regard to stability, we find that only 29.4% of pervoskites in the dataset are stable
(w.r.t to the chosen CHD threshold) with 70.6% deemed unstable. This ratio will be important to
To begin, the data was read from a local excel file into a pandas dataframe within the attached lab
notebook. The data was subsequently converted to a numpy array and split into training, validation,
and test data using the SciKit-Learn Train-Test-Split function [12]. Three models were trained in
7
eV
Figure 7: Bar-Chart of formation energies ( atom ) for the 1929 instances in our dataset.
Figure 8: Pie chart depicting the percentage of stable and unstable Perovskites within the dataset.
order to address the various research goals. The first model was a random-forest ensemble [13],
whereas the second and third models were a neural network regressor and classifier respectively. The
train-test split was first applied including only the two numerical labels (Convex Hull Distance and
formation energy). This first data-split was used to train the neural network regressor. A second
train-test split was performed on a copy of the original dataset, this time including only the categorical
label (stable vs. unstable). This split dataset was used for both the random-forest model and neural
8
network classifier. The stability labels were generated by comparing the CHD for an instance to
meV
a 40 atom threshold, similar to other works in this field [8]. Instances with a CHD above 40 were
labeled unstable (0), while those with CHD below 40 were labeled stable (1). For both split data
sets, 80%/10%/10% train/validation/test split percentages were used. Finally, these data were read
into separate PyTorch dataloaders in order to be interpretable to our models during training.
Our primary model architectures for both the regression and classification tasks were chosen to be
neural networks due to their versatility, ability to capture complex relations, and the particular
skillsets of our group. However, two critical drawbacks of the neural network approach are apparent.
Firstly, it is often difficult to accurately interpret the weights of a complex neural network, making it
challenging to determine which combination(s) of features are the strongest-predictors of our desired
label/outcome. Additionally, neural networks are often computationally expensive and data-hungry
which can make them unwieldy, especially when compared to simpler architectures. For these reasons,
we decided to first train a Random-Forest model on the classification task before moving onto the
neural networks. The Random-Forest will also serve as a useful benchmark for the success of our
more complex models, allowing us to gauge if the increased sophistication produces (or is worth)
Our Random-Forest was composed of 100 decision trees with a Gini-Impurity training criterion
[14]. The min samples leaf and min samples split were both set to 5, with no max depth restriction
The model performed with an 88.1% (82.3%) validation (test) classification accuracy, which (as
we will see later) under-performed the neural network classifier. However, the interpretability of
the model provides some useful physical insights. In particular, the trained Random-Forest model
returns a normalized Gini-importance index (GII) for each feature, with a higher GII denoting a
higher predictive power associated with that feature. For more information on the GII, see [14]. We
The four most predictive features in order were ”Asite BCCenergy pa max”, ”Asite BCCenergydiff min”,
”Bsite Second Ionization Potential (V) weighted avg”, and ”Bsite At.#weighted avg”, with normal-
ized GII values of approximately 0.050, 0.049, 0.045, and 0.043 respectively. The first feature corre-
sponds to the formation energy energy of the A-site element (per atom) in the BCC crystal phase.
If multiple A-site elements are present, the max of their respective BCC formation energies is taken.
9
Figure 9: Gini Importance Index of the 61 Perovskite features.
Likewise, the second feature corresponds to the minimum pairwise difference in BCC formation ener-
gies between any two A-site elements. If only one A-site element is present then this feature is set to
0. The third feature is the weighted average of the second ionization potential (V) across the B-site
elements. The last feature is the weighted average of the B-site atomic number (essentially encoding
the B-site element choice). In general, these features encode three concepts: the crystallography of
the A-site element(s), the chemical stability of the B-site element(s), and the B-site element choice
more generically. Information surrounding these three key concepts seem to provide the most pre-
dictive power in determining the stability of the Perovskite in question. The most important feature
(Asite BCCenergy pa max) is plotted below against the associated CHD values. We observe the
data in this plot are highly columated, with certain columns having a much higher or lower mean
CHD value, making them highly informative to split on in the random-forest context. A similar
trend is observed for the other three high GII features. We also note that one categorical feature,
”Is Pnictide”, was never split on and therefore has a GII of zero. This feature encodes whether
a Perovskite contains a group 15 element, meaning such information is largely uncorrelated with
Perovskite stability.
After testing the Random-Forest model, two independent neural network models were trained and
tested. The first model is a regression-oriented PyTorch-based [15] neural network. The model begins
with a batch-normalization of the input data [16], in order to standardize the variety of inputs and
10
Figure 10: Asite BCCenergy pa max vs. CHD values for all 1929 Perovskite instances.
learn their general scale and offset. We then implement a dense layer with 256 output neurons,
another batch normalization layer, and finally a Parametric-ReLU activation layer[17]. This 3-layer
sequence is repeated 3 more times, with the number of output neurons in the dense layer decreasing
approximately 2-fold each time. Additionally, dropout layers [18] are added to the final two sequences
in order to combat overfitting (which may arise from the PReLU activation, among other sources).
The dropout probabilities were fixed during training to around 0.5 to 0.6. The final layer has 2 output
neurons, as we are attempting to predict two labels (convex hull distance and formation energy).
The second model is a PyTorch-based neural network classifier, and is identical to the first model
with three exceptions. Firstly, the final linear layer has only one output neuron as we are only
predicting a single label (stable vs. unstable). Secondly, a sigmoid activation is called after the final
linear layer, in order to produce a valid class probability. Finally, the parametric-ReLU activations
were replaced with Leaky-ReLU activations [17] with a fixed slope parameter of 0.01. This final
For both models, an Adam optimizer [19] was utilized with an initial learning rate of 0.3 (0.6) for
the regressor (classifier), and a weight-decay [20] of 1e-6 for both models. The Adam optimizer was
used in order to converge to a loss minima quicker than standard SGD approaches. The learning rate
which will be discussed further in the following section. The T 0 and eta min values for the scheduler
were set to 300 epochs and 0.002 respectively. The learning rate vs. the number of epochs is shown
below for the training-loop of the regression model. As we can see, the learning rate initially drops
11
following a cosine curve, reaching near zero after 300 epochs. The learning rate is then artificially
spiked back to the initial learning rate, before the cosine trend is repeated with a period of 300
epochs. Both neural networks were trained for a total of 2500 epochs. For further detail, please see
the attached pdf which contains the relevant code discussed above.
Figure 11: Plot of regression model learning rate vs. number of training epochs.
The particular metrics and loss-functions utilized will be discussed further in the analysis section.
Standard training and testing loops were utilized, similar to those seen in previous hackathons
and hands-on-activities (Hands-On 6 for example). We note that these loops were modified in order
to return the best model from training (according to the validation metric) as opposed to the final
model, as to suppress any chance irregularities which may occur towards the end of the training
cycle.
Given the two-fold neural network approach of our project (regression and classification) and our
groups particular skill-sets, a neural network approach was deemed best suited for this task. The
batch-normalization layers were introduced in order to improve information propagation through the
network, and to allow the model to learn the general scale of each of the input features. A ReLU-like
activation was chosen in order to avoid the problem of vanishing gradients. PReLU activation in
particular was chosen to combat the “dying ReLU” problem during training and to allow our model
12
to find the best slope parameters during training. On the other hand, these extra degrees of freedom
(the PReLU slopes) could lead to overfitting if not properly handled. While this overfitting was not
observed for the regressor, the classifier suffered significant overfitting during our initial tests. To
combat this model failure, two regularization techniques were chosen. Firstly, the dropout layers
were added to regularize the model and prevent severe overfitting of the training set. Secondly, the
aforementioned weight-decay was utilized in order to prevent exploding neuron weights. For the
Adam optimizer, weight-decay acts similarly to a L2 regularization seen in other models, penalizing
Finally we address the choice of learning scheduler. Our initial attempts utilized a performance-
based scheduler (ReduceLROnPlateau from PyTorch) with a patience of 5 epochs and a learning
factor of 0.95. However, we noticed the learning rates for both the regressor and classifier reduced
exponentially to zero, largely due to unforeseen instabilities during training. This led to a ”freeze”
in learning after approximately 300 epochs. In order to combat this ”dying-learning-rate” problem,
the learning rate via a cosine function of the number of epochs, reaching a learning-rate minima
(eta min) at a specified number of epochs (T 0). The learning rate is then artificially spiked back
to the initial learning rate, repeating the same cosine pattern with a period of 300 epochs (as was
previously mentioned).
6 Analysis
To analyze our results on the regression model we decided to use the mean-squared error as a loss
function and the L1 loss as our metric, seeking to minimize each [22]. The L1 loss was chosen as a
performance metric primarily due to outliers that were noted in the initial data visualization. Both
the loss and metric dropped quickly over the first five or so epochs before leveling out and decreasing
more incrementally. We also observed large spikes over certain epoch ranges, which can be explained
By these metrics, the best model we achieved was at epoch 2304 with a test MSE loss of 0.0628
and a test L1 loss of 0.1738. Although the regressor was not trained for the stable/unstable classifi-
cation task, we can also calculate an accuracy score for the regressor by comparing the model CHD
meV
predictions against the 40 atom and generating associated class predictions. Under this schema, the
13
Figure 12: Plot of regression model MSE and L1 loss vs. number of training epochs.
To train our classifier model we used binary cross entropy loss as the loss function and a standard
accuracy metric. For accuracy we divided the number of correct classifications by the total number
of classifications, thus trying to minimize the BCE loss criterion and maximize the accuracy metric.
The loss function dropped quickly while the accuracy metric took a while to grow and was somewhat
jumpy between epochs (as seen in Fig.13). The best model (achieved on epoch 2133) had a test BCE
loss of 0.2953 and an accuracy of 92.5%. We managed to achieve an ROC-AUC of 0.932 with an F1
score of 0.761. The ROC-Curve and confusion matrix for this model are shown below. We note that
this classification accuracy is on par with other models on similar datasets [8], and was notably more
The model was able to very accurately determine the unstable materials, with a lower but still
reasonably high accuracy for classifying the stable compounds. This is likely in part due to the
number of unstable materials greatly outnumbering the stable ones in our dataset and in nature.
The similarity in values between type 1 and type 2 errors also seems to be quite close. Despite this
similarity, we conclude that the model has a slight false positive bias (predicted stable when the Per-
ovskite is actually unstable), due to the asymmetry in the underlying stable/unstable distribution.
In other words, since there are approximately 3 times as many unstable compounds in the dataset,
we would expect approximately three times more false negatives than false positives.
14
Figure 13: Plot of classifier model BCE loss and accuracy vs. number of training epochs.
Figure 14: Receiver-Operating Curve (ROC) for the Perovskite stability classifier.
Although a more thorough analysis may be required (see model considerations), initial estimates
of the most important features for Perovskite stability have been determined from the Random-
15
Figure 15: Confusion Matrix for Perovskite stability classifier.
Forest model. Three types of features have been identified as key estimators of Perovskite stability:
formation energy), and more generally the B-site element choice. For example, the second ionization
energy of the B-site element seems to heavily dictate the stability of the Perovskite compound.
This result fits well with a chemical view of Perovskite stability, as the electronic environments of
the constituent components dictate whether or not a stable bond(s) will be formed. Secondly, the
thermodynamic/crystallographic properties of the A-site element, such as the BCC formation energy,
also act as key predictors. This information seems to capture the idea of thermodynamic competition
between the desired Perovskite phase and elemental or binary phases. It may also imply a similarity
in local atomic environment for the A-site element between the BCC and Perovskite phases, so that
high/low A-site BCC stability correlates with improved/diminished Perovskite stability. We note
that Wei Li et al [8] also found similar features were important in predicting stability in more general
In regard to the neural network models, we found that both the regressor and classifier were
highly successful in their respective tasks, achieving accuracies/losses which were comparable to other
models seen in the literature [8,10]. Notably, the neural network classifier achieved a 4% increase in
test accuracy compared to the random-forest model, justifying the more sophisticated architecture
choice. Despite this jump in complexity our approach still represents a notable reduction in the
16
feature-space, model size, and training time compared to other similar works [8,10].
Although the random forest model provides a certain degree of information regarding relative feature
importance (for the classification task), more advanced methods may provide a complete picture of the
approach may provide further insights into the predictive power of our features and allow us to fur-
ther reduce the feature space our neural networks have to explore. Additionally, when comparing the
neural network classifier training and validation accuracies, we can see there is still slight overfitting
occurring. This may be due to the choice of Leaky ReLU activation. Therefore, further experimenta-
tion with the classifier activation functions and our regularization methods is required. Despite these
considerations, our first attempt is highly promising, as we were able to produce a classifier with a
test accuracy of ∼ 92.5%. It is possible that with further hyperparameter tuning of the dropout and
initial learning rates that these accuracies could be further improved. However, our current models
The methods developed in this paper have the potential to improve the lives of Perovskite re-
searchers by equipping them to make fast assessments with limited data regarding Perovskite stability.
Nonetheless, further hyperparameter tuning and a larger dataset may yet improve the model’s ca-
pabilities still further. It is, in the authors’ opinion, worth further investigation into improving the
models.
17
8 References
1. Saddow SE. Silicon Carbide Technology for Advanced Human Healthcare Applications. Micromachines
(Basel). 2022 Feb 22;13(3):346. doi: 10.3390/mi13030346. PMID: 35334637; PMCID: PMC8949526.
2. Jung, Hyun Suk, and Nam-Gyu Park. ”Perovskite Solar Cells: From Materials to Devices.” Small 11.1
3. Meng, Lei, Jingbi You, and Yang Yang. ”Addressing the Stability Issue of Perovskite Solar Cells for
4. K. Rao, Maithili, et al. ”Review on Persistent Challenges of Perovskite Solar Cells’ Stability.” Solar
5. Shellaiah, Muthaiah, and Kien Wen Sun. ”Review on Sensing Applications of Perovskite Nanomateri-
6. Rong, Yaoguang, et al. ”Challenges for Commercializing Perovskite Solar Cells.” Science 361.6408
7. Peterson, Gordon G. C., and Jakoah Brgoch. ”Materials Discovery through Machine-Learning Forma-
8. Li, Wei, Ryan Jacobs, and Dane Morgan. ”Predicting the Thermodynamic Stability of Perovskite
Oxides Using Machine-Learning Models.” Computational Materials Science 150 (2018): 454-63. Print.
9. Wu, Yabi, et al. ”First Principles High Throughput Screening of Oxynitrides for Water-Splitting
10. Saal, James E., Anton O. Oliynyk, and Bryce Meredig. ”Machine-Learning in Materials Discovery:
Confirmed Predictions and Their Underlying Approaches.” Annual Review of Materials Research 50.1
11. ”Machine Learning Materials Datasets”, (2018), NSF grant 1148011 and the Wisconsin Education
Innovation Committee
18
12. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier
Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre
Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-
learn: Machine Learning in Python. J. Mach. Learn. Res. 12, (2/1/2011), 2825–2830.
13. Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
14. Stefano Nembrini, Inke R König, Marvin N Wright, The revival of the Gini importance?, Bioinformatics,
15. Paszke, Adam et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.”
16. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by
reducing internal covariate shift. In Proceedings of the 32nd International Conference on International
17. T. Jiang and J. Cheng, ”Target Recognition Based on CNN with LeakyReLU and PReLU Activation
Functions,” 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC),
18. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1
(January 2014)
19. Kingma, Diederik P. and Jimmy Ba. “Adam: A Method for Stochastic Optimization.” CoRR abs/1412.6980
(2014)
20. Zhang, Guodong et al. “Three Mechanisms of Weight Decay Regularization.” ArXiv abs/1810.12281
(2018)
21. Loshchilov, Ilya and Frank Hutter. “SGDR: Stochastic Gradient Descent with Warm Restarts.” arXiv:
Learning (2016)
22. Wang, Q., Ma, Y., Zhao, K. et al. A Comprehensive Survey of Loss Functions in Machine Learning.
19