Python for Finance
Regressions, Interpolation & Optimisation
Andras Niedermayer
Outline
1 Regressions in pandas
2 Function approximation
Regression
Interpolation
3 Convex optimization
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 2/38
Pivot tables
We want to create a table with opening prices, using index names as
columns.
We can use the pivot function in Pandas.
1 PivotOpen = IndicesA . pivot ( index = ’ Date ’ ,
2 columns = ’ Index ’ , values = ’ Open ’)
Running the pivot function without the values option creates a
collection of tables. To select the ’Open’:
1 PivotTable = IndicesA . pivot ( index = ’ Date ’ ,
2 columns = ’ Index ’)
3 PivotOpen = PivotTable [ ’ Open ’]
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 3/38
Stacking and unstacking
First, let us index the data by date and index name:
1 IxNew . set_index ([ ’ Date ’ , ’ Index ’] , inplace = True )
To collapse the column of a database to a single (data) series:
1 IxNewStack = IxNew . stack ()
To restore the indices as columns (sometimes useful):
1 IxNewStack = IxNew . stack (). reset_index ()
To restore the original database:
1 IxNewUnstack = IxNewStack . unstack ()
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 4/38
OLS regression (statsmodels.api)
Suppose we want to regress index returns (just computed) on index
daily volatility (high-low range).
Highit
Returnit = α + β + εit
Lowit
The simplest OLS model reads (plug in variables for X and Y):
1 import statsmodels . api as sm
2 model = sm . OLS (Y , X )
3 results = model . fit ()
4 results . summary ()
We can access the results as:
1 Coefficient estimates: results.params
2 Estimator covariance matrix: results.cov HC0
3 p-values: results.pvalues , R-squared: results.rsquared
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 5/38
OLS regression output
Dep. Variable: Return R-squared: 0.018
Model: OLS Adj. R-squared: 0.018
Method: Least Squares F-statistic: 115.6
Date: Tue, 13 Feb 2018 Prob (F-statistic): 1.00e-26
Time: 23:59:34 Log-Likelihood: 19253.
No. Observations: 6345 AIC: -3.850e+04
Df Residuals: 6343 BIC: -3.849e+04
Df Model: 1
coef std err t P>|t| [0.025 0.975]
const 0.1841 0.017 10.765 0.000 0.151 0.218
0 -0.1813 0.017 -10.751 0.000 -0.214 -0.148
Omnibus: 680.608 Durbin-Watson: 1.987
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3744.882
Skew: 0.367 Prob(JB): 0.00
Kurtosis: 6.691 Cond. No. 234.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 6/38
Useful regression output
To see all regression data and not just the summary, type:
model.+<Tab>
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 7/38
Applications
Starting from the IxNew data:
1 On how many days was the return larger for CAC40 than for
DAX?
2 Create a Series object indexed by Date that contain the name of
the index with the highest return.
Hint: Use the idxmax method:
http://pandas.pydata.org/pandas-docs/stable/
generated/pandas.DataFrame.idxmax.html
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 8/38
Applications - (One) Solution
1.
1 PivotReturn = IxNew . reset_index (). pivot ( ’ Date ’ ,
2 ’ Index ’ , ’ Return ’)
3 days_a = ( PivotReturn [ ’ CAC40 ’]
4 > PivotReturn [ ’ DAX ’ ]). sum ()
2.
1 PivotReturn . apply ( lambda x : x . idxmax () , axis =1)
2 PivotReturn . idxmax ( axis =1)
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 9/38
Outline
1 Regressions in pandas
2 Function approximation
Regression
Interpolation
3 Convex optimization
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 10/38
Motivation
1 Most of the times in finance, we do not know the DGP (Data
Generating Process).
2 Many applications in finance involve “reverse engineering”
patterns from data.
3 This is useful, for example to make predictions about the future
dynamics of financial variables.
4 Two main techniques:
1 Regression
2 Interpolation
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 11/38
First, define a function (the DGP)...
We specifically choose a non-polynomial function (more difficult).
1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 def f ( x ):
5 return np . sin ( x )+0.5* x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 12/38
Next, generate data from the DGP
We generate 50 data-points from the DGP: (x, f (x)).
• Function x=np.linspace(a, b, N) returns an array of N
numbers, equally spaced, from a to b.
• What does f (x) return?
1 x = np . linspace ( -2* np . pi , 2* np . pi , 50)
2
3 plt . plot (x , f ( x ) , ’b ’)
4 plt . grid ()
5 plt . xlabel ( ’x ’ , fontsize =18)
6 plt . ylabel ( ’y ’ , fontsize =18)
7 plt . show ()
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 13/38
Regression
Theoretical framework:
1 You are given N points in a 2-D (can be 3-D, 4-D...) space:
(xj , yj ).
2 You choose K (base) functions of xj , i.e., bi (xj ), such that you
believe yj can be written as a linear combination of these
functions.
3 You select coefficients of said linear combinations, αi by
minimizing the squared difference from the actual data.
N K
!2
1 X X
min yj − αi bi (xj ) (1)
αi N
j=1 i=1
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 14/38
Polynomial regression
A simple case is to approximate yj as a polynomial function of xj .
That is, choose: b1 = 1, b1 = x, b2 = x 2 , ..., bk = x k .
Easy to implement in Python with polyfit (polynomial fit):
1 First, get the coefficient list using polyfit.
2 Next, get the fitted values from the coefficient list using
polyval.
1 reg = np . polyfit (x , f ( x ) , deg = k )
2 y_fit = np . polyval ( reg , x )
What happens if we vary the polynomial degree?
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 15/38
Polynomial regression
4
Function
3
Regression
2
1
f(x)
4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 16/38
Application: Beyond polynomials
• The mean squared error of our fit is not zero....rather
1.77 × 10−3 .
• Not surprising, since the original function was not a polynomial.
• How can we approximate it using other base functions, i.e.,
trigonometric?
1 Say we know (prior theoretical work) our function is a
combination of a second order polynomial and sin/cos functions.
2 Let us define a matrix with values for 1, x, x 2 , sin (x), cos (x)
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 17/38
Application: Formalization of the problem
x12
y1 α1 1 x1 sin (x1 ) cos (x1 ) u1
y2
α2
1 x2 x22 sin (x2 ) cos (x2 )
u2
y3 =
α3
1 x3 x32 sin (x3 ) cos (x3 ) +
u3
.. .. .. .. .. .. .. ..
yN αN 1 xN xN2 sin (xN ) cos (xN ) uN
| {z }| {z }
Coefficients Matrix M
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 18/38
Application: Solving the problem in Python (1/2)
• Initialize the matrix M:
1 matrix = np . zeros (( len ( x ) ,5))
• Fill in each column with a variable:
1 matrix [: ,0]=1
2 matrix [: ,1]= x
3 matrix [: ,2]= x **2
4 matrix [: ,3]= np . sin ( x )
5 matrix [: ,4]= np . cos ( x )
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 19/38
Application: Solving the problem in Python (2/2)
We use numpy.linalg.lstsq to minimise the sum of squared
residuals.
Least-square coefficients are given by:
1 reg = np . linalg . lstsq (M , f ( x ))[0]
The fitted-values are computed as a dot-product between the
coefficients vector (reg) and the matrix M:
1 y_fit2 = np . dot ( reg , M . T )
2 # we need to transpose the matrix
1 What are the coefficients in reg?
2 What it the MSE?
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 20/38
Application: Output
4
Function
3
Regression (non-polynomial)
2
1
f(x)
4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 21/38
General idea
1 With regression, one tries to identify a unique function g (x)
that is as close as possible to the “true”, unknown function
f (x), i.e., X
min (g (x) − f (x))2
2 With interpolation, one fits more (generally polynomial)
functions, one between each pair of consecutive points.
• The fit is perfect, i.e., ∀i, gi (xi ) = f (xi ).
• The function is not unique, which is mathematically involved.
• The function is constrained to be continuous, gi (xi ) = gi+1 (xi ).
• Some additional constraint is needed, i.e., second derivatives are
continuous.
3 One needs ordered data in interpolation (unlike regression).
4 Procedure takes more time and is less parsimonious (more
coefficients in the end) – but generally more accurate.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 22/38
Implementation
The interpolation package is in the Scientific Python library (scipy).
The parameter k defines the degree of the polinomial (k = 1 is a
linear spline, k = 3 a cubic spline...)
1 import scipy . interpolate as spi
2 interp = spi . splrep (x , f ( x ) , k =1)
3 y_interp = spi . splev (x , interp )
1 What type of object is interp relative to reg? Why?
2 How good is the linear interpolation?
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 23/38
Interpolation output
4
Function
3
Linear interpolation
2
1
f(x)
4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 24/38
Outline
1 Regressions in pandas
2 Function approximation
Regression
Interpolation
3 Convex optimization
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 25/38
Main idea
We want to minimize a function f (x1 , x2 , x3 , ...xn ):
min f (x1 , x2 , x3 , ...xn ) (2)
xi
All local extrema satisfy
∂f
= 0, ∀i ∈ {1, 2, ...n} . (3)
∂xi
The global minimum/maximum (if it exists and/or is unique) is either
one of the local extrema or one of the domain end-points (see
whiteboard).
More?
The Weierstrass (extreme value) theorem guarantees the existence of
a maximum and minimum on closed and bounded intervals.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 26/38
A two dimensional function
First, we define a function to minimize
1 def fm ( xy ):
2 # xy [0] is x , xy [1] is y
3 return np . sin ( xy [0]) + 1/20.0* xy [0]**2 \
4 + np . sin ( xy [1]) + 1/20.0* xy [1]**2
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 27/38
A three dimensional graphic
12
10
8
f(x,y)
6
4
2
0
2
10
5
10 0
5
0 5
y
x 5
10 10
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 28/38
Brute force optimization (the “caveman” approach)
1 import scipy . optimize as spo
Define a range and step to search for minimum:
1 search_area =( -10 ,10.01 ,5)
Change the function to print all iterations and output:
1 def fm ( xy ):
2 z = np . sin ( xy [0]) + 1/20.0* xy [0]**2 \
3 + np . sin ( xy [1]) + 1/20.0* xy [1]**2
4 print ( " {:8.4 f } {:8.4 f } {:8.4 f } " . format (
5 xy [0] , xy [1] , z ))
6 return z
Run the function brute (force) to find the minimum:
1 min_1 = spo . brute ( fm , ( search_area , search_area ) ,
2 finish = None )
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 29/38
Brute force optimization (the “caveman” approach)
1 What is the minimum found by this method?
2 How can we improve the accuracy? What is the drawback?
The brute force method, while limited, can serve to provide starting
values for more sophisticated algorithms.
One such function, working with numerical gradients, is fmin.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 30/38
Optimization with fmin
General structure:
1 [ xopt , fopt ]= spo . fmin ( function , start_values ,
2 xtol = , ftol = , maxiter = , maxfun = ,)
1 xtol : Relative error in argument acceptable for convergence.
2 ftol: Relative error in function acceptable for convergence.
3 maxiter : Maximum number of iterations to perform.
4 maxfun : Maximum number of function evaluations to make.
We can use the global optimization results as starting values:
1 min_2 = spo . fmin ( fm , min_1 , xtol =0.001 , ftol =0.001)
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 31/38
Caveats
• Local optimization routines can get stuck in local extrema...
• ... or they may never converge.
• It is a good idea to perform a global optimization first to
pinpoint the neighborhood of global minimum.
• What happens if we start fmin with (2, 2) as starting values?
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 32/38
Constrained optimization
Most of the time, we look for optimal values of a function under a
set of constraints.
Problem
There are two securities, A and B: Both cost 10 today. Tomorrow
there are two equally likely states of the world: g or b. In state g,
A = 15 and B = 5. In state b, A = 5 and B = 12. Assume an
√
investor has 100 units of cash today and utility u (w ) = w . What is
his optimal investment?
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 33/38
Application: Formalization of the problem
1√ 1√
max Eu (w1 ) = max 15a + 5b + 5a + 12b, (4)
a,b a,b 2 2
subject to:
10a + 10b ≤ 100. (5)
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 34/38
Application: Python implementation (Solution)
First, define the function. Note: we want to maximize rather than
minimize expected utility! That is, we minimize negative utility.
1 def exp_u ( ab ):
2 return -(0.5* np . sqrt ( ab [0]*15+ ab [1]*5)
3 +0.5* np . sqrt ( ab [0]*5+ ab [1]*12))
Second, define the constraint as a dict variable and an implicit
function.Inequality sign is always implicitly ”≥ 0”.
1 cons =({ ’ type ’: ’ ineq ’ , ’ fun ’:
2 lambda ab : 100 - ab [0]*10 - ab [1]*10}})
Third, choose starting values:
1 startval =[5 ,5]
Fourth, run the minimize function from the optimization package:
1 result = spo . minimize ( exp_u , startval ,
2 method = ’ SLSQP ’ , constraints = cons )
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 35/38
Notes
• method stands for optimization algorithm. SLSQP (Sequential
Least SQuares Programming) allows one to introduce
constraints.
• One can specify Jacobian (jac) or Hessian matrix (hess)
directly.
• In addition, bounds for the argument can be specified by
bounds.
Output methods:
1 result.fun returns the optimum function values.
2 result.x returns the arguments corresponding to the optimum.
3 result.success returns True if optimization complete.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 36/38
Comment
• spo.fmin works reasonably well for problems of lower
dimensions
• for higher levels of dimension, it will not work properly
• if you have a problem with a high level of dimensionality for
which you know that the objective function and the constraints
are convex, use the package cvxopt
• several thousand dimensions are not a problem for cvxopt
• see https://cvxopt.org
• however, this is outside of the scope of this lecture
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 37/38
Numerical integration
Numerical integration is done via the scipy.integrate package.
1 import scipy . integrate as integr
There are several methods to numerically integrate a function (say
f (x) = sin x + x2 ); fixed Gaussian quadrature, adaptive quadrature,
Romberg integration....
All are approximations of the same thing, though...
1 integr . fixed_quad (f , lmin , lmax )[0]
2 integr . quad (f , lmin , lmax )[0]
3 integr . romberg (f , lmin , lmax )[0]
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 38/38