0% found this document useful (0 votes)

29 views92 pages

Lecture 1 Foundations of Materials Informatics

Materials Informatics applies data science to materials science, leveraging machine learning to uncover complex trends and facilitate breakthroughs in material development. The field, still relatively new, emphasizes the importance of data-driven approaches to overcome traditional research limitations and improve material discovery. Various machine learning techniques are utilized to analyze small, heterogeneous datasets, enabling the identification of new materials and properties through innovative computational methods.

Uploaded by

Bethany Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views92 pages

Lecture 1 Foundations of Materials Informatics

Uploaded by

Bethany Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Materials Informatics

Taylor D. Sparks
University of Utah, Materials Science and Engineering
Many people and agencies make this work possible!
My full materials informatics course is available on YouTube/Github!
Materials Informatics is data science applied to materials science

Structure

Characterization
Performance
Property

Processing
Materials Informatics is data science applied to materials science

Structure

Characterization
Performance
Property

Processing
Machine learning is capable of extracting highly complex trends from data
Machine learning has already made huge impacts on science and engineering!
Machine learning has already made huge impacts on science and engineering!

Al 7A77
New alloy!
Materials Informatics is only a few decades old!
In the early days of materials informatics, nobody knew what they were doing
We initially only used “big data” to write analytical reviews

Sparks et al. (2016) Script. Mat.

Gaultois et al. (2013) Chem Mat.
Ghadbeigi et al. (2016) Energy Environ. Sci.
Gaultois et al. (2016) APL Mat.
Why do we need data-driven materials science?
Most materials research is incremental, breakthroughs are the exception!
Useful information can be mined from large datasets without domain
knowledge!
Are there materials “genes” responsible for their properties and performance?

• Equip the next generation workforce

• Enable a paradigm shift in materials
development
• Integrate experiments, computation, and theory
• Facilitate access to materials data
New tools of discovery are needed to explore chemical whitespace
Is materials informatics a passing fad or here to stay?
Machine learning is at the type of the Hype Cycle
Some discoveries are “1% inspiration, 99% perspiration”
Some discoveries are being in the right place at the right time
Some inspiration can be taken from nature
Some discoveries require a keen eye for detail
Some discoveries require a keen eye ear for detail
Some discoveries result form totally unsafe lab practices!
Some discoveries require mistakes
Materials Informatics differs substantially from traditional machine learning
Materials Informatics datasets are typically pretty small compared to traditional ML
Materials Informatics usually interests itself in identifying unusual outliers rather than
averages
Materials research suffers from heterogeneous data with many modalities
Materials research suffers from heterogeneous data with many modalities
UncertaintyWhy
quantification
do we need is data-driven
extremely important
materialswhen
discovery?
each new data is expensive

Facilities and physical lab space required $$

Materials need to be purchased $$

UncertaintyWhy
quantification
do we need is data-driven
extremely important
materialswhen
discovery?
each new data is
expensive

Samples need to be synthesized

High chance of failure

Diverse reasons for failure

UncertaintyWhy
quantification
do we need is data-driven
extremely important
materialswhen
discovery?
each new data is expensive

Samples need to be characterized $$ &

Characterization requires equipment $$

Characterization requires expertise $$

Characterization takes time

Compositional
Why design
do we need
spacedata-driven
is enormous!
materials discovery?

Total unique inorganic compounds ≈ 1012

Four component systems

(AaBbCcDd).

Ignore dopants
(a, b, c, and d > 0.03). https://htwins.net/scale2/
Traditional ML has moved away from feature engineering to deep learning
Feature engineering can significantly improve predictions with <103 data instances
Feature engineering can significantly improve predictions with <103 data instances

Anton Oliynyk!
Structure-property-processing linkages depend on interpretable relationships

Structure

Characterization
Performance

Property

Processing
Machine learning is learning from patterns in data

AI: the theory and development of

computer systems able to perform
tasks that normally require human
intelligence, such as visual perception,
speech recognition, decision-making,
and translation between languages.
Machine learning is learning from patterns in data

ML: the use and development of computer

systems that are able to learn and adapt
without following explicit instructions, by
using algorithms and statistical models to
analyze and draw inferences from patterns
in data.
Machine learning is learning from patterns in data

Deep learning: part of a broader family of

machine learning methods that imitates
the workings of the human brain in
processing data and creating patterns for
use in decision making.
Materials scientists have long noticed patterns in data
Empirical relationships are correlations that may not be supported by theory
Materials scientists are using nearly all types of machine learning
There are multiple types of machine learning algorithms available

Ensemble Techniques: Bayesian: Neural Networks:

• Random forest • Kriging or GP • ANN
• Gradient boosted • Gaussian RF • GAN
• Adaboost • Bayesian NN • CNN
• Extra Trees

Linear Models:
Support Vector Machine: • Lasso
• SVR • Ridge
• Linear SVR • K nearest neighbors
There are multiple types of machine learning algorithms available

Ensemble Techniques: Neural Networks:

• Fast learners Bayesian: • Fast (GPU)
• Efficient • Works well with • Feature-free
(parallelization) small data • GANs
• Non-linear • Includes uncertainty • High accuracy
• Problem with • “Physics informed” as • Blackest box
extrapolation priors utilized • overfitting
• Feature weights

Linear Models:
Support Vector Machine: • Interpretable
• Kernel selection • Fast
• Good metrics • Not suitable for many
• Hinge loss problems (linear vs
• Scales poorly non-linear)
Case study: Superhard materials
Superhard materials have important commercial applications
Diamond is the ultimate superhard material
High hardness materials can be classified in two groups
(i) Containing only light elements like B, C, N, O, Si
c-BN BC2N B 6O Preparation of these materials
requires extreme pressures of
≈15 GPa and temperatures
>2100°C

Wentorf J. Chem. Phys. 1961, 34, 809.

He et al. Appl. Phys. Lett. 2002, 81, 643.

Example 1: Using a machine learning model to Solozhenko et al. Appl. Phys. Lett. 2001, 78, 1385.

identify new materials

(ii) Containing a transition metal and main group elements (B, N, C)
Task:
ReB2
Predict material WB4
property using regression
Synthesis tends to only require
elevated temperatures between
1500°C and 2000°C

Kaner and co-workers, Science 2007, 316, 436.

Kaner and co-workers, PNAS 2011, 108, 10958.
Machine learning can screen for new materials!
Hardness measurement databases don’t exist, we need a proxy

Mansouri et al. (2017) Int. Mat. Man. Innov.

Bulk and shear modulus should serve as a good proxy

bulk modulus (B0) (GPa)

shear modulus (G) (GPa)

G3/B02 (GPa)

Chen et al. (2011) Intermetallics

Bulk and shear modulus should be good proxies for hardness

https://materialsproject.org/

Mansouri et al. (2018) JACS

Feature selection is critical for predictions

Mansouri et al. (2018) JACS

Feature are either composition or structure based

Composition-based features vs Structure-based features

Not all features are equally important
Many algorithms and hyperparameters possible for training
Many algorithms and hyperparameters possible for training
A model must first be validated prior to use for predictions
Bulk modulus predictions perform better than shear modulus

RMSECV = 17.21 GPa RMSECV = 16.35 GPa

R2 CV = 0.94 R2 CV = 0.84

Mansouri et al. (2018) JACS

Bulk and shear modulus predicted for entire Pearson database

Prediction of B and G for 15770 binaries 56266 ternaries 46251

quaternaries from Pearson’s Crystal Database (PCD)

118,287 compounds predicted in less than 30 seconds!

Using Intel® Core™i5-4690K CPU @ 3.50 GHz PC (Windows 10)

Mansouri et al. (2018) JACS

Superhard materials need to be both incompressible and rigid

Mansouri et al. (2018) JACS

Known superhard materials show up in the top right quadrant

WB4 TaC
WC
ReB2
OsB2

Mansouri et al. (2018) JACS

New alternative superhard candidates identified

Re-W-C system

Mo-W-B-C system

Mansouri et al. (2018) JACS

Materials could be synthesized easily at ambient pressure

Mansouri et al. (2018) JACS

Rapid discovery of two new superhard materials at low load

Mo-W-B-C Re-W-C

Mansouri et al. (2018) JACS

High pressure synchrotron measurements confirmed bulk modulus

Mansouri et al. (2018) JACS

We can track volume as pressure is increased

Mansouri et al. (2018) JACS

Bulk modulus determined by fitting volume as function of pressure
Third-order Birch-Murnaghan Equation of State

ReWC MoWBC
ML predicted: 370 Gpa ML predicted: 398 Gpa
Experimental: 372±3.6 Gpa Experimental: 380±8.1 GPa

Mansouri et al. (2018) JACS

There are even more types!

https://machinelearningmastery.com/types-of-learning-in-machine-learning/
Unsupervised learning includes classification, clustering, density estimation, projection
Clustering Density Estimation

Visualization Projection
Reinforcement learning differs from supervised learning

An agent operates in an environment and must learn to operate using feedback

- No fixed dataset and the feedback may be delayed or noisy
Hybrid learning blurs lines between learning types

https://machinelearningmastery.com/types-of-learning-in-machine-learning/
Semi-supervised learning requires making the most of only partially labeled data

Algorithms are used to learn relationships between

labeled and unlabeled data to then use all the data
Self-supervised learning is unsupervised learning framed as supervised learning problem
Colorization

Supervised learning algorithms are used to solve an

alternate or pretext task, the result of which is a
model or representation that can be used in the
solution of the original (actual) modeling problem.

Inpainting
Multi-instance learning is supervised learning where “bags” of samples are labeled

Members of the “Perovskite” bag all contain some shared attributes along with some non-shared attributes.

Q: which attributes are essential to “Perovskite” bag?

Inference refers to reaching an outcome or making a decision

https://machinelearningmastery.com/types-of-learning-in-machine-learning/
Inductive vs deductive learning are opposites

Inductive learning is learning general rules from specific examples.

Deductive learning is learning specifics examples from general rules.
Transductive learning is predicting specific examples from specific examples.

Inductive learning:
- Model learns the general rules
- Draw general conclusions about future from past examples
- Fitting the ML model

Deductive learning:
- Top down reasoning seeking all premises to be met before
conclusion
- Using the ML model for inference

Transductive learning:
- Better predictions with few labeled points
- No predictive model built, new prediction requires full calculation
again
Inference refers to reaching an outcome or making a decision

https://machinelearningmastery.com/types-of-learning-in-machine-learning/
Multi-task learning is fitting a model to one dataset with multiple related problems

Training models together is more than efficient, it should improve overall performance!
- Useful when dataset has abundance of input data labeled for one task but another
task with much less labeled data.
- This will allow us to “borrow statistical strength” from tasks with lots of data and to
share it with tasks with little data.
- Improves model generalizability
Active learning allows for very efficient learning when new data points are expensive

Active learning is a technique where the model is able to query an oracle during the
learning process in order to resolve ambiguity during the learning process.
- Well-suited to small datasets where new data is expensive to generate or label
- Very efficient learner since model can ignore features it already understands well
- Similar to semi-supervised learning except new ground truth labels are generated
instead of relying on models to label the unlabeled data.
Online learning involves continual updating of the model after each data point
acquisition
Online learning involves using the data available and updating the model directly before
a prediction is required or after the last observation was made.
- Well-suited to sequential datasets where new data could be changing over time
(consider shoe sales as a fad comes and goes)
- Possibly subject to catastrophic interference (catastrophic forgetting)
Transfer learning is when a trained model can be applied to another related task

In transfer learning, the learner must perform two or more different tasks, but we assume
that many of the factors that explain the variations in task 1 are relevant to the variations
that need to be captured for learning subsequent tasks.
- Well-suited for instances when first task has extensive data, but subsequent tasks have
only limited data.
- Differs from multi-task learning by sequentially learning the different tasks
Ensemble learning is when multiple models are trained on data and the results are
combined
The objective of ensemble learning is to achieve better performance with the ensemble
of models as compared to any individual model. This involves both deciding how to
create models used in the ensemble and how to best combine the predictions from the
ensemble members.
- Takes advantage of pros/cons of each algorithm or model type
- Can provide additional measure of uncertainty

Machine Learning For Materials Science
No ratings yet
Machine Learning For Materials Science
288 pages
Krishnan N. Machine Learning For Materials Discovery. Numerical Recipes... 2024
No ratings yet
Krishnan N. Machine Learning For Materials Discovery. Numerical Recipes... 2024
287 pages
InfoMat - 2019 - Wei - Machine Learning in Materials Science
No ratings yet
InfoMat - 2019 - Wei - Machine Learning in Materials Science
34 pages
35 2018 Jacs
No ratings yet
35 2018 Jacs
10 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
Machine Learning in Materials Science
No ratings yet
Machine Learning in Materials Science
21 pages
A Review On Background and Applications of Machine Learning in Materials Research
No ratings yet
A Review On Background and Applications of Machine Learning in Materials Research
11 pages
Materials 16 05977
No ratings yet
Materials 16 05977
30 pages
Cammd Aims Lecture1
No ratings yet
Cammd Aims Lecture1
24 pages
Aims Notes
No ratings yet
Aims Notes
28 pages
Machine Learning in Materials Science
No ratings yet
Machine Learning in Materials Science
7 pages
An Intelligent Computing System To Detect Material
No ratings yet
An Intelligent Computing System To Detect Material
5 pages
Pilania 2021
No ratings yet
Pilania 2021
13 pages
Translated - 1 s2.0 S2095809918313559 Main
100% (1)
Translated - 1 s2.0 S2095809918313559 Main
10 pages
Machine Learning For Molecular and Materials Science 2c34wzvcqw
No ratings yet
Machine Learning For Molecular and Materials Science 2c34wzvcqw
21 pages
Role of Machine Learning in The Field of Fiber Reinforced Polymer
No ratings yet
Role of Machine Learning in The Field of Fiber Reinforced Polymer
6 pages
Agrawal-Choudhary2019 Article DeepMaterialsInformaticsApplic
No ratings yet
Agrawal-Choudhary2019 Article DeepMaterialsInformaticsApplic
14 pages
AFM Inno. Mat Sci Via ML
No ratings yet
AFM Inno. Mat Sci Via ML
14 pages
Artificial Intelligence and Machine Learning Techniques For Material Design and Discovery
No ratings yet
Artificial Intelligence and Machine Learning Techniques For Material Design and Discovery
10 pages
SRM Presentation
No ratings yet
SRM Presentation
41 pages
DFT To Machine Learning
No ratings yet
DFT To Machine Learning
75 pages
Research Paper (2) DIYA
No ratings yet
Research Paper (2) DIYA
12 pages
2020 Morgan D Annual Review of Matrials Research
No ratings yet
2020 Morgan D Annual Review of Matrials Research
35 pages
Course Proposal MSEXXX Materials Informatics
No ratings yet
Course Proposal MSEXXX Materials Informatics
4 pages
Machine Learning and Artificial Neural Network Accelerated Computational Discoveries in Materials Science
No ratings yet
Machine Learning and Artificial Neural Network Accelerated Computational Discoveries in Materials Science
21 pages
ML For Mat. Sc.
No ratings yet
ML For Mat. Sc.
41 pages
ML in Materials Science: A Review
No ratings yet
ML in Materials Science: A Review
40 pages
ML For Composites
No ratings yet
ML For Composites
11 pages
Artificial Intelligence For Materials Science: Yuan Cheng Tian Wang Gang Zhang
No ratings yet
Artificial Intelligence For Materials Science: Yuan Cheng Tian Wang Gang Zhang
231 pages
Dunn Et Al. - 2020 - Benchmarking Materials Property Prediction Methods The Matbench Test Set and Automatminer Reference
No ratings yet
Dunn Et Al. - 2020 - Benchmarking Materials Property Prediction Methods The Matbench Test Set and Automatminer Reference
10 pages
Areview On Recent Applications of Machine Learning
No ratings yet
Areview On Recent Applications of Machine Learning
22 pages
Engineering Physics, Chemistry and Biology Unit I: Physics (18PCB101J)
No ratings yet
Engineering Physics, Chemistry and Biology Unit I: Physics (18PCB101J)
14 pages
Lecture 2 Turning Data Into Knowledge
No ratings yet
Lecture 2 Turning Data Into Knowledge
107 pages
Application of Deep Neural Network Learning in Composites Design
No ratings yet
Application of Deep Neural Network Learning in Composites Design
55 pages
Predicting Material Properties Using Machine Learning For Accelerated Materials Discovery
No ratings yet
Predicting Material Properties Using Machine Learning For Accelerated Materials Discovery
9 pages
The Future of Self Driving Laboratories From Human in The Loop Interactive Ai To Gamification
No ratings yet
The Future of Self Driving Laboratories From Human in The Loop Interactive Ai To Gamification
31 pages
MCM Research Article-2
No ratings yet
MCM Research Article-2
32 pages
Predicting The Electronic and Structural Properties of Two-Dimensional Materials Using Machine Learning
No ratings yet
Predicting The Electronic and Structural Properties of Two-Dimensional Materials Using Machine Learning
14 pages
Private and Public Efforts Infuse Artificial Intelligence Into Materials Research
No ratings yet
Private and Public Efforts Infuse Artificial Intelligence Into Materials Research
5 pages
Materials Design
No ratings yet
Materials Design
32 pages
InfoMat - 2023 - Li - Methods Progresses and Opportunities of Materials Informatics
No ratings yet
InfoMat - 2023 - Li - Methods Progresses and Opportunities of Materials Informatics
30 pages
Module 1 Lecture Notes
No ratings yet
Module 1 Lecture Notes
13 pages
Tagade Et Al. - 2019 - Attribute Driven Inverse Materials Design Using de
No ratings yet
Tagade Et Al. - 2019 - Attribute Driven Inverse Materials Design Using de
14 pages
Materials Informatics: From The Atomic-Level To The Continuum
No ratings yet
Materials Informatics: From The Atomic-Level To The Continuum
38 pages
MCM Research Article
No ratings yet
MCM Research Article
24 pages
Machine Learning in Materials Research - Developments Over The Last Decade and Challenges For The Future
No ratings yet
Machine Learning in Materials Research - Developments Over The Last Decade and Challenges For The Future
7 pages
AI's Role in Materials Engineering
No ratings yet
AI's Role in Materials Engineering
16 pages
2023advances in Machine Learning - and Artificial Intelligence-Assisted Material Design of Steels
No ratings yet
2023advances in Machine Learning - and Artificial Intelligence-Assisted Material Design of Steels
22 pages
FF7: A Code Package For High-Throughput Calculations and Constructing Materials Database
No ratings yet
FF7: A Code Package For High-Throughput Calculations and Constructing Materials Database
9 pages
GAMM-Mitteilungen - 2021 - Stoll - Machine Learning For Material Characterization With An Application For Predicting
No ratings yet
GAMM-Mitteilungen - 2021 - Stoll - Machine Learning For Material Characterization With An Application For Predicting
21 pages
A Low-Cost Robot Science Kit For Education With Symbolic Regression For Hypothesis Discovery and Validation
No ratings yet
A Low-Cost Robot Science Kit For Education With Symbolic Regression For Hypothesis Discovery and Validation
8 pages
Big Semantic Data Processing in The Materials Design Domain: Definitions
No ratings yet
Big Semantic Data Processing in The Materials Design Domain: Definitions
11 pages
Williamson Brutchey 2023 Using Data Driven Learning To Predict and Control The Outcomes of Inorganic Materials Synthesis
No ratings yet
Williamson Brutchey 2023 Using Data Driven Learning To Predict and Control The Outcomes of Inorganic Materials Synthesis
12 pages
S0927025621000859
No ratings yet
S0927025621000859
1 page
Base Paer
No ratings yet
Base Paer
61 pages
1 s2.0 S2666523923001575 Mainhyyyy
No ratings yet
1 s2.0 S2666523923001575 Mainhyyyy
24 pages
02 Machine Learning
No ratings yet
02 Machine Learning
45 pages
14 Different Types of Learning in Machine Learning
No ratings yet
14 Different Types of Learning in Machine Learning
32 pages
Supply Chain Management Using Machine Learning
No ratings yet
Supply Chain Management Using Machine Learning
43 pages
Vladimir Cherkassky IJCNN05
No ratings yet
Vladimir Cherkassky IJCNN05
40 pages
Learning Scenarios
No ratings yet
Learning Scenarios
25 pages

Lecture 1 Foundations of Materials Informatics

Uploaded by

Lecture 1 Foundations of Materials Informatics

Uploaded by

Materials Informatics

Sparks et al. (2016) Script. Mat.

• Equip the next generation workforce

Facilities and physical lab space required $$

Materials need to be purchased $$

Samples need to be synthesized

High chance of failure

Diverse reasons for failure

Samples need to be characterized $$ &

Characterization requires equipment $$

Characterization requires expertise $$

Characterization takes time

Total unique inorganic compounds ≈ 1012

Four component systems

AI: the theory and development of

ML: the use and development of computer

Deep learning: part of a broader family of

Ensemble Techniques: Bayesian: Neural Networks:

Ensemble Techniques: Neural Networks:

Wentorf J. Chem. Phys. 1961, 34, 809.

identify new materials

Kaner and co-workers, Science 2007, 316, 436.

Mansouri et al. (2017) Int. Mat. Man. Innov.

bulk modulus (B0) (GPa)

shear modulus (G) (GPa)

Chen et al. (2011) Intermetallics

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Composition-based features vs Structure-based features

RMSECV = 17.21 GPa RMSECV = 16.35 GPa

Mansouri et al. (2018) JACS

Prediction of B and G for 15770 binaries 56266 ternaries 46251

118,287 compounds predicted in less than 30 seconds!

Using Intel® Core™i5-4690K CPU @ 3.50 GHz PC (Windows 10)

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

Mansouri et al. (2018) JACS

An agent operates in an environment and must learn to operate using feedback

Algorithms are used to learn relationships between

Supervised learning algorithms are used to solve an

Q: which attributes are essential to “Perovskite” bag?

Inductive learning is learning general rules from specific examples.

You might also like