Transparent Models
with Machine-Learning
2022
Biography
Guillaume is the Chief Actuary and
Co-Founder of Akur8.
He has both a data science and an
actuarial background.
Guillaume started researching the
potential of AI for insurance pricing as
Head of Pricing R&D at AXA Global Direct,
before being incubated at Kamet
Ventures and founding Akur8.
Guillaume Béraud-Sudreau Guillaume is a Fellow of the
Chief Actuary & Co-Founder of Akur8 French Institute of Actuaries and holds
Master’s degrees in Actuarial Science,
Cognitive Science and Engineering from
Institut des Actuaires, Ecole normale
supérieure, and Télécom Paris.
Actuarial Modeling
Actuarial Modeling: Capturing Non-Linearities
What GLMs Offer… …What we want
Generalized Linear Models (“GLMs”) are, by We want to capture the non-linear relations
definition, linear. between the explanatory and predicted variables.
They are easy to fit (as only one parameter They are hard to fit because, for every variable, a
has to be found for every variable). large number of parameters has to be found.
CONFIDENTIAL 4
GLMs and Additive Models equivalence
Variables
Linear Models Non-Linear Models
Transformations
Driver Age=16
Driver Age=17
Driver Age=18
Driver Age=19
Driver Age=20
Driver Age=21
Driver Age=22
Driver Age=23
Driver Age=24
Driver Age=25
Driver Age=26
GLMs and Additive Models are equivalent: coefficients are built for different values of the explanatory variables.
However, creating a non-linear model requires control for overfitting into the fitting process. This can be done by either:
● Controlling for the transformations created
● Leveraging credibility in the fitting process
CONFIDENTIAL 5
Creating a GLM to capture non-linear relationships
All regression models are built around the same main principle:
However, maximizing the likelihood on hundreds of parameters would lead to overfitting, which needs to be controlled.
Two main approaches are used by the actuarial community:
Manage the number of parameters by Integrate priors on the coefficients into the
carefully selecting which transformations model creation:
are used: ● The priors will be directly included into
● Polynomials the likelihood optimization.
● Groupings ● They will reduce the complexity of the
● … models created.
CONFIDENTIAL 6
Modeling with variable transformations
Original Heavy Transformed
Transformed GLM Aggregation
Original Variables
Variables Data-Preparation Modeling Coefficients into a GAM Functional Effects
Variables
Variables
Driver Age -2.50
Driver Age Driver Age 2 0.10
Driver Age 3 -0.02
Annual Mileage 1.20
Annual Mileage Annual Mileage 2 0.30
Annual Mileage 3 -0.01
CONFIDENTIAL 7
Creating a GLM, visualizing an Additive Model
The models created are GLMs:
These models can be visualized either as tables (containing all the , which can be useful for instance to put the model into
production), or as GAMs (Generalized Additive Models).
The GAM visualization is convenient for model review and modification as it displays one function per variable.
= + + + + +5 other
+ ...
variables
Driver Age Driving Experience Vehicle Speed Contract Mileage Vehicle Age
CONFIDENTIAL 8
Leveraging Credibility
Creating a GLM to capture non-linear relationships
All regression models are built around the same main principle:
However, maximizing the likelihood on hundreds of parameters would lead to overfitting, which needs to be
controlled.
Two main approaches are used by the actuarial community:
Manage the number of parameters by Integrate priors on the coefficients into the
carefully selecting which transformations model creation:
are used: ● The priors will be directly included into
● Polynomials the likelihood optimization.
● Groupings ● They will reduce the complexity of the
● … models created.
CONFIDENTIAL 10
Automatic Modeling with Credibility
In order to remove the heavy and time-consuming data-preparation step, a large number of indicator functions are created - these functions
equal one if a variable equals a given value, zero otherwise.
Then a model fitted leveraging credibility ensures the coherence between the different coefficients created.
Original Indicator Indicator
Transformed GLM with Aggregation
Original Variables
Variables Encoding Credibility Coefficients into a GAM Functional Effects
Functions
Variables
Driver Age=16 +3
Driver Age=17 +2.9
Driver Age=18 +2.8
Driver Age=19 +2.6
Driver Age=20 +2.4
Driver Age Driver Age=21 +2.2
Driver Age=22 +2.1
Driver Age=23 +2.0
Driver Age=24 +1.9
Driver Age=25 +1.8
Driver Age=26 +1.7
… …
Nb. of Past Claims
Nb. of Past Claims=1 -0.5
Nb. of Past Claims=2 +1.3
Nb. of Past Claims=3 +3.2
Nb. of Past Claims=4 +3.2
CONFIDENTIAL 11
Automatic Modeling with Credibility
In order to remove the heavy and time-consuming data-preparation step, a large number of indicator functions are created - these functions
equal one if a variable equals a given value, zero otherwise.
Then a model fitted leveraging credibility to ensure the coherence between the different coefficients created.
Original
Original Variables
Indicator Indicator
Transformed GLM withCoefficients Aggregation
Functional Effects
Variables Encoding into a GAM
Functions
Variables
Driver Age=16
Credibility +3
Driver Age=17 +2.9
Driver Age=18 +2.8
Driver Age=19 +2.6
Driver Age=20 +2.4
Driver Age Driver Age=21 +2.2
Driver Age=22 +2.1
Driver Age=23 +2.0
Driver Age=24 +1.9
Driver Age=25 +1.8
Driver Age=26 +1.7
… …
Nb. of Past Claims
Nb. of Past Claims=1 -0.5
Nb. of Past Claims=2 +1.3
Nb. of Past Claims=3 +3.2
Nb. of Past Claims=4 +3.2
CONFIDENTIAL 12
Quick Reminder… What is credibility ☺
Buhlmann credibility is the best-known approach. It is
equivalent to a simple Bayesian framework, where a prior
Credibility, simply put, is the “knowledge” based on a model is updated based on
weighting together of observations.
different estimates to come Usually (after equations involving conditional probabilities),
up with a combined the output of a credibility approach is that the model
predictions are a weighted average between the
estimate. observations and the initial assumption.
Foundations of Casualty Actuarial Science The weight will depend on:
➔ the quantity of data (the larger the data, the higher
the weight)
➔ the strength of the prior assumptions (a very reliable
assumption with small variance will have a large
weight).
CONFIDENTIAL 13
Prior and Credibility
A credibility framework is defined by the prior assumptions the modeller has on his model. These assumptions
represent a prior probability distribution for the models coefficients.
For instance, “simpler” models are usually assumed to be “more likely”.
Classic prior assumptions can be: “The coefficients follow a Gaussian distribution, centered on 0”
The Maximum of Likelihood approach directly integrates the prior:
Taking the log of this formula provides an “easy-to-optimize” log-likelihood function:
CONFIDENTIAL 14
Prior ⇔ Penalized Regressions
Some examples in the Linear Regression case
Prior assumptions are at the center of penalized-regression methods used to control high-dimensional or correlated
data, such as Lasso or Ridge Regression. Controlling the distribution (through the λ parameter) allows for controlling
the overfitting of the models.
Prior: Coefficients Coefficients Distribution: Log-Likelihood (incl. prior)
Gaussian Ridge
Hypothesis ⇔ follow a Normal ⇔ ⇔ ⇔ Regression
distribution N(0, 1/2λ):
Laplace Prior: Coefficients Coefficients Distribution: Log-Likelihood (incl. prior) Lasso
Hypothesis ⇔ follow a Laplace ⇔ ⇔ ⇔ Regression
distribution L(0, 1/λ):
CONFIDENTIAL 15
Lasso and Hypothesis testing
Lasso is especially popular as it is a good tool for When used on binary explanatory variables, it is also
variable selections: models created with the Lasso equivalent to hypothesis testing:
framework are sparse - all the non-relevant
coefficients equal zero. Null Hypothesis: : “The coefficient is not
significantly different from zero.”
The Laplace distribution that underlies the Lasso has
a maximum at zero: ● If the null hypothesis is not rejected, the
coefficient value is zero.
● If the null hypothesis is rejected, the
coefficient has a non-zero value.
CONFIDENTIAL 16
Back to the original problem…
We want to use a GLM leveraging credibility to fit many of coefficients and create a model:
Original
Original Variables
Indicator Indicator
Transformed GLM withCoefficients
GLM with Aggregation
Functional Effects
Variables Encoding Credibility into a GAM
Functions
Variables
Driver Age=16
Credibility +3
Driver Age=17 +2.9
Driver Age=18 +2.8
Driver Age=19 +2.6
Driver Age=20 +2.4
Driver Age=21 +2.2
Driver Age Driver Age=22 +2.1
Driver Age=23 +2.0
Driver Age=24 +1.9
Driver Age=25 +1.8
…
Driver Age=26
…
+1.7
Nb. of Past Claims=1 -0.5
Nb. of Past Claims=2 +1.3
Nb. of Past Claims Nb. of Past Claims=3 +3.2
Nb. of Past Claims=4 +3.2
CONFIDENTIAL 17
Lasso can be used in actuarial modelling…
Lasso can be used to capture the signal on
categorical variables.
Coefficients are created for each level of the
data:
The result is coherent with a credibility
approach: predictions are between their “pure
GLM” values and the grand-mean of the
observations.
Non-significant levels are grouped, with null
coefficients.
CONFIDENTIAL 18
…but Lasso does not capture non-linear effects!
While it is very powerful and well
documented, the Lasso can’t be directly
applied to indicator- representation on the
data to create a non-linear model:
All non-significant coefficients would be
grouped at zero, which makes no sense.
A key piece of information: the order of the
levels would be lost in the process.
No information in the data =
The most likely coefficients
are at zero.
CONFIDENTIAL 19
Credibility on Ordered
Variables
Creating new Priors and Penalties
New priors have to be considered to take into
account the structure of the models created.
In particular, for ordinal variables, two
consecutive coefficients should:
● be more likely to be close than far
apart if they are significantly
different.
● or have the same coefficients if
they are not significantly different…
This concept generalizes the Lasso penalty to continuous function, providing the high level of flexibility and stability
necessary to create GAM models.
CONFIDENTIAL 21
Creating new Priors and Penalties
This means that the derivative of the
coefficient function follows a Laplace
distribution:
As the values of the coefficients are
discrete, the derivative can be written as:
This distribution of probability is used as a
prior when maximizing the likelihood to fit
a model:
CONFIDENTIAL 22
Controlling the Prior distribution
The prior follow a distribution of variance
The coefficients should maximize:
Prior Large weight is
Strong a-priori A smooth
Large 𝝺 distribution has
a small
knowledge on
given to the
smoothness
model
the model. is created.
variance. term.
Prior Large weight is
Weak a-priori A noisy
distribution has given to the
Small 𝝺 a large
knowledge on
the model.
observations
model
is created.
variance. term.
CONFIDENTIAL 23
Weak Prior ⇔ Strong reliance on the observation
The prior has a very limited impact on the final model
CONFIDENTIAL 24
Stronger Prior ⇔ Weaker reliance on the observation
The final model is an average between the most likely coefficients according to the prior and the observations
CONFIDENTIAL 25
Strong Prior ⇔ Very weak reliance on the observation
The weight of the observation in the model is weaker than the priors
CONFIDENTIAL 26
Very Strong Prior ⇔ Full reliance on the prior
The observations can’t disprove such a strong prior - more data would be needed
This is equivalent to failing a significant test against the null hypothesis: “the first two coefficients are equal”.
A stronger effect - or more exposure - would be necessary to disprove it, and split the coefficients.
CONFIDENTIAL 27
Like for a Lasso, this is equivalent to a test!
The behavior is similar to a hypothesis-testing approach:
A priori, we suppose the null-hypothesis:
This null hypothesis is tested with the data, and potentially rejected.
This null hypothesis is equivalent to:
● If it is not rejected by the data, then the coefficients function is locally constant.
● If it is rejected by the data, then the coefficients function is not constant.
CONFIDENTIAL 28
Leveraging the prior on a full model scale
A more balanced prior
(with a medium variance)
leads to more sensitive models.
CONFIDENTIAL 29
Leveraging the prior on a full model scale
Data used to create the models are
naturally noisy.
age
CONFIDENTIAL 30
Leveraging the prior on a full model scale
A very strong prior
(with a small variance)
leads to robust models.
age
CONFIDENTIAL 31
Leveraging the prior on a full model scale
A very weak prior
(with a large variance)
leads to noisy models.
age
CONFIDENTIAL 32
Machine-Learning = GLM and Credibility
From a user’s point of view, the creation of the models is fully automated and provides a unified machine-learning
algorithm. As with all machine-learning techniques, the one presented today relies on a solid statistical basis.
A similar framework can be leveraged to achieve variable selection.
Original Indicator Indicator
Transformed GLM with Aggregation
Original Variables
Variables Encoding GAM Modeling with SmoothnessCoefficients
Credibility Tuning
into a GAM Functional Effects
Functions
Variables
Driver Age=16 +3
Driver Age=17 +2.9
Driver Age=18 +2.8
Driver Age=19 +2.6
Driver Age=20 +2.4
Driver Age=21 +2.2
Driver Age Driver Age=22 +2.1
Driver Age=23 +2.0
Driver Age=24 +1.9
Driver Age=25 +1.8
…
Driver Age=26
… +1.7
Nb. of Past Claims=1 -0.5
Nb. of Past Claims=2 +1.3
Nb. of Past Claims Nb. of Past Claims=3 +3.2
Nb. of Past Claims=4 +3.2
CONFIDENTIAL 33
Thank You!
Our new white paper “Credibility and Penalized Regression”
Guillaume Béraud-Sudreau
Chief Actuary & Co-Founder of Akur8
is available now at www.Akur8.com under “Resources”
[email protected] https://akur8.com/resources