Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views24 pages

Cammd Aims Lecture1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views24 pages

Cammd Aims Lecture1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Materials Informatics

What is Machine Learning (ML)?


Rohit Batra
[email protected]

LECTURE 1 (Forenoon, June 7, 2024)


Machine learning and Artificial Intelligence in Materials Science

ML aims to
• develop models based on available data
• make predictions for new situations, and
• continually improve as new data becomes available
Lets start with a Survey

How many of you


1. have trained an ML model to solve a materials related problem?
2. have trained an ML model?
3. have taken some course on ML (online or in-person)?
4. have some vague idea about ML (e.g., through YouTube)?
5. do not know anything about ML?
6. belong to an experimental research group?

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 2


ChatGPT – How it works?

• Use past data to learn co-


occurrence of words
• Use features to understand user
query
• Predicts the next word

Source: https://chat.openai.com/

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 3


Movie Recommendation – The Netflix Problem

Use data on past user activity to learn


to identify their preferences

Courtesy: Learning from data, Yaser S. Abu-Mostafa, Malik


Magdon-Ismail, and Hsuan-Tien Lin

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 4


Another Example – Loan Approval

For how much amount should the bank approve the loan?

Courtesy: Learning from data, Yaser S. Abu-Mostafa, Malik


Magdon-Ismail, and Hsuan-Tien Lin

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 5


Loan Approval as an ML problem
Convert “learning” to a function finding problem

y =
x1 (age) x2 (gender) x3 (salary) x4 (debt) ...
f(x1,x2,...)
Person 1 ... ... ... ... ... ...
Prediction Historical data

Person 2 ... ... ... ... ... ...


... ... ... ... ... ... ...
Person N ... ... ... ... ... ...

Person N+1 32 male 40,000 26,000 ... ?

Model (hypothesis)

Loan amount for Person N+1 = yN+1 = f(xN+1, xN+2, …)

Output (prediction) Input (customer application)

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 6


Machine Learning in Materials Science
Example dataset Fingerprinting (descriptors) and learning
Material Property Value Material Fingerprint Property Value
Material 1 P1 Material 1 x11, x12, … x1M P1
Material 2 P2 Material 2 x21, x22, … x2M P2
. . . . .
. . . . .
. . . . .
Material N PN Material N xN1, xN2, … xNM PN

The learning problem Fingerprinting Learning

Material Property Value


f(xi1, xi2, …, xiM) = Pi
Material X ? Prediction Model

Useful for quick and efficient ML model ML model inputs ML model output
(NN, Bayesian(GPR), (fingerprints/features/ (prediction)
Materials Discovery RF, etc.) descriptors)

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 7


Types of ML problems

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 8


Supervised Learning

New data
point

Model(Feature) = Prediction
f(xi1, xi2, …, xiM) = Pi
Rohit Batra, Materials Informatics Lab, MME, IIT Madras 9
Unsupervised Learning

Dimension 2
Dimension 1

Instead of Model(Feature) = Prediction; we have Model(Feature) = NewFeature


f(xi1, xi2, …, xiM) = x’i
Rohit Batra, Materials Informatics Lab, MME, IIT Madras 10
A Real Example – Solid State Li conductors
Solid state Li-ion conductors
Y. Zhang et al., Nature Communications 10, 5260 (2019)

Machine learning methods are a great tool to accelerate Materials Discovery


Rohit Batra, Materials Informatics Lab, MME, IIT Madras 11
A Real Example – Polymer Discovery
Polymers for high temperature high energy capacitors
R. Batra et al., Chemistry of Materials 32, 24, 10489 (2020)

Known Dataset ML Predictions Example Designed Polymers


High Tg (in K), Eg (in eV)
Design
goal

Values within brackets


indicate 1𝜎 deviation

Machine learning methods are a great tool to accelerate Materials Discovery

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 12


A Real Example: Self-driving Labs
Experimental Space (Optimization Parameters) Autonomous Closed-loop experiments
Input feature Type # of possibilities
SA: Solvent DMSO Continuous 6 (discretization)
SB: Solvent EtG Continuous 6 (discretization)
TC: Coating. Temp. Continuous 9 (discretization)
V: Coating speed Continuous 10 (discretization)
SP: Post-treatment Sol. Categorical 13
ST: Post-treatment temp. Continuous 16 (discretization)
Manuscript under review
SV: Post-treatment speed Continuous 4 (discretization)

ML algorithm
• Gaussian process regression
• Bayesian optimization (expected
improvement)

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 13


A Real Example: Alloys Data Extraction
Paragraph from journal article
The decomposition of an equiatomic AlCoCrCuFeNi high-entropy alloy produced by splat quenching and casting was investigated by the
Input to the Model

analytical high-resolution methods: transmission electron microscopy and three-dimensional atom probe. It could be shown that splat-
quenched alloy consisted of an imperfectly ordered body-centred cubic phase with a domain-like structure, whereas normally cast alloy Context
formed several phases of cubic crystal structure. The cast alloy decomposed into both dendrites and interdendrites. A detailed local
compositional analysis carried out by atom probe within the dendrites revealed that the alloying elements in the Ni–Al-rich plates and Cr–Fe-
rich interplates are not randomly distributed, but segregate and form areas with pronounced compositional fluctuations.

Q1. Extract the alloy compositions. Al2CoCrCuFeNi


Large language
Q2. Extract the crystal structure of alloy Al2CoCrCuFeNi. model BCC
Q3. Extract the processing conditions of alloy Al2CoCrCuFeNi (GPT4, Llama3)
Splat quenching & casting

LLM Based Extraction Pipeline

The crystal structure, microstructure, density and Vickers hardness of Alloy Structure Processing
The NbTiVZr alloy is essentially
four multi-principal element a single-phase
alloys, NbTiVZr,alloyNbTiV2Zr,
consisting of a and
CrNbTiZr,
The NbTiVZr disordered
coarse-grained alloy is essentially a single-phase
body-centered cubic (bcc)alloy consisting
phase of a Al2CoCrCuFeNi
thesewith fine,
CrNbTiVZr, are reported.
coarse-grained
submicron-size
The disordered
crystal structure,
precipitates
high-temperature
The characteristics
body-centered
microstructure,
inside the
structural
cubicThe
alloysgrains.
density
of
(bcc)
and NbTiV2Zr
are explored. Vickers
potential
phase with
Thehardness
alloy
alloys
new
fine, of
were
Large language BCC Al2CoCrCu BCC Splat quenching
submicron-size precipitates inside
alloys,the grains. The NbTiV2Zr alloy Arc melting
contains
alloys
four three
multi-principal
prepared
contains
CrNbTiVZr,
contain a
disordered
three
by vacuum element
disordered
disordered
are
bcc
reported.
bcc bcc
phases.
arc melting
phases.
phase
The
The
NbTiVZr,
The
CrNbTiZr
followed
characteristics
and an
byNbTiV2Zr,
hotand
CrNbTiZr
orderedof Lavesand
these
CrNbTiVZr
CrNbTiZr,
isostatic pressing
CrNbTiVZr
phase.
potential
homogenization annealing. The NbTiVZr alloy is essentially a single-
and
new
and model FCC
alloys contain a disordered
high-temperature bcc alloys
structural
phase alloy consisting
phaseare andexplored.
an ordered Laves phase.
of a coarse-grained disordered body-centered (GPT4, Llama3) Al2CoCrCuFeNi NbTiVZr BCC Arc melting
NbTiVZr
cubic (bcc) phase with fine, submicron-size precipitates inside the
grains. The NbTiV2Zr alloy contains three disordered bcc phases. The
CrNbTiZr and CrNbTiVZr alloys contain a disordered bcc phase and an
… … …
ordered Laves phase.
Output alloy dataset
Rohit Batra, Materials Informatics Lab, MME, IIT Madras 14
Is Learning Always Feasible?

Often many hypothesis (or models, f) can fit the data that can provide different prediction estimates.
Having more data MAY solve this…

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 15


Machine Learning Pipeline
Example dataset Fingerprinting (descriptors) and learning
Material Property Value Material Fingerprint Property Value
Material 1 P1 Material 1 x11, x12, … x1M P1
Material 2 P2 Material 2 x21, x22, … x2M P2
. . . . .
. . . . .
. . . . .
Material N PN Material N xN1, xN2, … xNM PN

Fingerprinting Learning

ML pipeline in practice

1. Collect 2. Fingerprint 3. Train 4. Make


Data Materials Model Prediction

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 16


Supervised Learning – Linear Regression

We assume that a linear model can explain the data


1-Dimensional example
𝑦! = 𝑓 𝒙 = 𝑤! + 𝑤" 𝑥

𝑦 = known target property value


𝑥 = known feature values
y

Best fit linear model?


𝑤" = Slope
𝑤! = Intercept

x
How to find the parameters (𝒘)
* of the line
New data point that best fits the data?

See derivation for 1-D linear regression model

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 17


Error Metrics (Loss/cost function)
𝑦! = 𝑓 𝒙 = 𝑤! + 𝑤" 𝑥" + 𝑤& 𝑥& + 𝑤( 𝑥( … + 𝑤) 𝑥)
1-Dimensional example )

= 0 𝑤# 𝑥#
#$!
𝒙 = (𝑥* , 𝑥" , 𝑥& , 𝑥( , … , 𝑥) )
y

= 𝒘' 𝒙
Error 𝒘 = (𝑤* , 𝑤" , 𝑤& , 𝑤( , … , 𝑤) )

Error 𝑒# = 𝑦# − 𝑦*#
x %
1
Mean square error 0(𝑒# )&
𝑁
#$"

Solution that minimizes the training error

𝑋 ' 𝑋𝒘
* = 𝑋' 𝒚
Rohit Batra, Materials Informatics Lab, MME, IIT Madras 18
Other Useful Error Metrics

1-Dimensional example
Error 𝑒# = 𝑦# − 𝑦*#
y

Error %
1
Mean square error 0(𝑒# )&
𝑁
#$"

∑%
#$"(𝑒# )
&
x Coefficient of 1− %
determination (R )
2 8 &
∑#$"(𝑦# − 𝑦)

Mean of 𝒚

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 19


r2 vs R2
1-Dimensional example

∑%
#$"(𝑥# − 𝑥)(𝑦
̅ # − 𝑦)
8
Correlation Coefficient of ∑%
#$"(𝑒# )
&
1− %
coefficient (r2) ∑% ̅ & ∑%
#$"(𝑥# − 𝑥) 8 &
#$"(𝑦# − 𝑦)
determination (R2) ∑#$"(𝑦# − 𝑦)8 &
Mean of 𝒚

Source: https://towardsdatascience.com/r²-or-r²-when-to-use-what-4968eee68ed3

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 20


Cross-Validation (CV)
Example:
All training data
Four-fold CV

Validation CV training
CV iteration

CV training Validation CV training

CV training Validation CV training


Benefits:
CV training Validation • Estimate model performance
• Hype-parameter optimization

Final model training using hyper-parameters from CV

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 21


When (not) to use ML?
To use when:
• fundamental laws do not exist (social network)

• laws exists but are too complex to solve (Chess/Go; Weather, Quantum behavior)

• interested in finding simple relations in data (structure-property relations)

• Self-driving computational and experimental lab (autonomous, overcomes bias)

Not to use when:

• Experiments can be performed readily

• Design space is well explored

• No data (extremely noisy data)

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 22


Suggested Reading

Rohit Batra, Materials Informatics Lab, MME, IIT Madras 23


Rohit Batra, Materials Informatics Lab, MME, IIT Madras 24

You might also like