Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
63 views38 pages

Python For Finance: Regressions, Interpolation & Optimisation

This document discusses using Python for finance applications involving regressions, interpolation, and optimization. It covers using pandas to perform regressions on financial data, approximating functions through regression and interpolation to model relationships in data, and convex optimization techniques. Specific topics covered include using pivot tables and stacking/unstacking in pandas, performing ordinary least squares regressions with statsmodels, fitting polynomials through regression to approximate functions, and discussing using other base functions like trigonometric functions for approximation beyond polynomials. The document provides examples of code implementations for these techniques.

Uploaded by

Sergey Borisov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views38 pages

Python For Finance: Regressions, Interpolation & Optimisation

This document discusses using Python for finance applications involving regressions, interpolation, and optimization. It covers using pandas to perform regressions on financial data, approximating functions through regression and interpolation to model relationships in data, and convex optimization techniques. Specific topics covered include using pivot tables and stacking/unstacking in pandas, performing ordinary least squares regressions with statsmodels, fitting polynomials through regression to approximate functions, and discussing using other base functions like trigonometric functions for approximation beyond polynomials. The document provides examples of code implementations for these techniques.

Uploaded by

Sergey Borisov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Python for Finance

Regressions, Interpolation & Optimisation

Andras Niedermayer
Outline

1 Regressions in pandas

2 Function approximation
Regression
Interpolation

3 Convex optimization

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 2/38
Pivot tables

We want to create a table with opening prices, using index names as


columns.
We can use the pivot function in Pandas.
1 PivotOpen = IndicesA . pivot ( index = ’ Date ’ ,
2 columns = ’ Index ’ , values = ’ Open ’)

Running the pivot function without the values option creates a


collection of tables. To select the ’Open’:
1 PivotTable = IndicesA . pivot ( index = ’ Date ’ ,
2 columns = ’ Index ’)
3 PivotOpen = PivotTable [ ’ Open ’]

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 3/38
Stacking and unstacking

First, let us index the data by date and index name:


1 IxNew . set_index ([ ’ Date ’ , ’ Index ’] , inplace = True )

To collapse the column of a database to a single (data) series:


1 IxNewStack = IxNew . stack ()

To restore the indices as columns (sometimes useful):


1 IxNewStack = IxNew . stack (). reset_index ()

To restore the original database:


1 IxNewUnstack = IxNewStack . unstack ()

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 4/38
OLS regression (statsmodels.api)
Suppose we want to regress index returns (just computed) on index
daily volatility (high-low range).

Highit
Returnit = α + β + εit
Lowit

The simplest OLS model reads (plug in variables for X and Y):
1 import statsmodels . api as sm
2 model = sm . OLS (Y , X )
3 results = model . fit ()
4 results . summary ()

We can access the results as:


1 Coefficient estimates: results.params
2 Estimator covariance matrix: results.cov HC0
3 p-values: results.pvalues , R-squared: results.rsquared
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 5/38
OLS regression output

Dep. Variable: Return R-squared: 0.018


Model: OLS Adj. R-squared: 0.018
Method: Least Squares F-statistic: 115.6
Date: Tue, 13 Feb 2018 Prob (F-statistic): 1.00e-26
Time: 23:59:34 Log-Likelihood: 19253.
No. Observations: 6345 AIC: -3.850e+04
Df Residuals: 6343 BIC: -3.849e+04
Df Model: 1
coef std err t P>|t| [0.025 0.975]
const 0.1841 0.017 10.765 0.000 0.151 0.218
0 -0.1813 0.017 -10.751 0.000 -0.214 -0.148
Omnibus: 680.608 Durbin-Watson: 1.987
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3744.882
Skew: 0.367 Prob(JB): 0.00
Kurtosis: 6.691 Cond. No. 234.
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 6/38
Useful regression output

To see all regression data and not just the summary, type:
model.+<Tab>

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 7/38
Applications

Starting from the IxNew data:


1 On how many days was the return larger for CAC40 than for
DAX?
2 Create a Series object indexed by Date that contain the name of
the index with the highest return.
Hint: Use the idxmax method:
http://pandas.pydata.org/pandas-docs/stable/
generated/pandas.DataFrame.idxmax.html

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 8/38
Applications - (One) Solution

1.

1 PivotReturn = IxNew . reset_index (). pivot ( ’ Date ’ ,


2 ’ Index ’ , ’ Return ’)
3 days_a = ( PivotReturn [ ’ CAC40 ’]
4 > PivotReturn [ ’ DAX ’ ]). sum ()

2.

1 PivotReturn . apply ( lambda x : x . idxmax () , axis =1)


2 PivotReturn . idxmax ( axis =1)

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 9/38
Outline

1 Regressions in pandas

2 Function approximation
Regression
Interpolation

3 Convex optimization

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 10/38
Motivation

1 Most of the times in finance, we do not know the DGP (Data


Generating Process).
2 Many applications in finance involve “reverse engineering”
patterns from data.
3 This is useful, for example to make predictions about the future
dynamics of financial variables.
4 Two main techniques:
1 Regression
2 Interpolation

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 11/38
First, define a function (the DGP)...

We specifically choose a non-polynomial function (more difficult).


1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 def f ( x ):
5 return np . sin ( x )+0.5* x

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 12/38
Next, generate data from the DGP

We generate 50 data-points from the DGP: (x, f (x)).


• Function x=np.linspace(a, b, N) returns an array of N
numbers, equally spaced, from a to b.
• What does f (x) return?

1 x = np . linspace ( -2* np . pi , 2* np . pi , 50)


2
3 plt . plot (x , f ( x ) , ’b ’)
4 plt . grid ()
5 plt . xlabel ( ’x ’ , fontsize =18)
6 plt . ylabel ( ’y ’ , fontsize =18)
7 plt . show ()

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 13/38
Regression

Theoretical framework:
1 You are given N points in a 2-D (can be 3-D, 4-D...) space:
(xj , yj ).
2 You choose K (base) functions of xj , i.e., bi (xj ), such that you
believe yj can be written as a linear combination of these
functions.
3 You select coefficients of said linear combinations, αi by
minimizing the squared difference from the actual data.

N K
!2
1 X X
min yj − αi bi (xj ) (1)
αi N
j=1 i=1

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 14/38
Polynomial regression

A simple case is to approximate yj as a polynomial function of xj .


That is, choose: b1 = 1, b1 = x, b2 = x 2 , ..., bk = x k .
Easy to implement in Python with polyfit (polynomial fit):
1 First, get the coefficient list using polyfit.
2 Next, get the fitted values from the coefficient list using
polyval.

1 reg = np . polyfit (x , f ( x ) , deg = k )


2 y_fit = np . polyval ( reg , x )

What happens if we vary the polynomial degree?

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 15/38
Polynomial regression

4
Function
3
Regression
2

1
f(x)

4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 16/38
Application: Beyond polynomials

• The mean squared error of our fit is not zero....rather


1.77 × 10−3 .
• Not surprising, since the original function was not a polynomial.
• How can we approximate it using other base functions, i.e.,
trigonometric?

1 Say we know (prior theoretical work) our function is a


combination of a second order polynomial and sin/cos functions.
2 Let us define a matrix with values for 1, x, x 2 , sin (x), cos (x)

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 17/38
Application: Formalization of the problem

x12
      
y1 α1 1 x1 sin (x1 ) cos (x1 ) u1

 y2  
  α2 
 1 x2 x22 sin (x2 ) cos (x2 )  
  u2 


 y3 =
  α3 
 1 x3 x32 sin (x3 ) cos (x3 ) +
  u3 

 ..   ..  .. .. .. .. ..   .. 
yN αN 1 xN xN2 sin (xN ) cos (xN ) uN
| {z }| {z }
Coefficients Matrix M

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 18/38
Application: Solving the problem in Python (1/2)

• Initialize the matrix M:


1 matrix = np . zeros (( len ( x ) ,5))

• Fill in each column with a variable:


1 matrix [: ,0]=1
2 matrix [: ,1]= x
3 matrix [: ,2]= x **2
4 matrix [: ,3]= np . sin ( x )
5 matrix [: ,4]= np . cos ( x )

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 19/38
Application: Solving the problem in Python (2/2)

We use numpy.linalg.lstsq to minimise the sum of squared


residuals.
Least-square coefficients are given by:
1 reg = np . linalg . lstsq (M , f ( x ))[0]

The fitted-values are computed as a dot-product between the


coefficients vector (reg) and the matrix M:
1 y_fit2 = np . dot ( reg , M . T )
2 # we need to transpose the matrix

1 What are the coefficients in reg?


2 What it the MSE?

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 20/38
Application: Output

4
Function
3
Regression (non-polynomial)
2

1
f(x)

4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 21/38
General idea

1 With regression, one tries to identify a unique function g (x)


that is as close as possible to the “true”, unknown function
f (x), i.e., X
min (g (x) − f (x))2
2 With interpolation, one fits more (generally polynomial)
functions, one between each pair of consecutive points.
• The fit is perfect, i.e., ∀i, gi (xi ) = f (xi ).
• The function is not unique, which is mathematically involved.
• The function is constrained to be continuous, gi (xi ) = gi+1 (xi ).
• Some additional constraint is needed, i.e., second derivatives are
continuous.
3 One needs ordered data in interpolation (unlike regression).
4 Procedure takes more time and is less parsimonious (more
coefficients in the end) – but generally more accurate.

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 22/38
Implementation

The interpolation package is in the Scientific Python library (scipy).


The parameter k defines the degree of the polinomial (k = 1 is a
linear spline, k = 3 a cubic spline...)
1 import scipy . interpolate as spi
2 interp = spi . splrep (x , f ( x ) , k =1)
3 y_interp = spi . splev (x , interp )

1 What type of object is interp relative to reg? Why?


2 How good is the linear interpolation?

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 23/38
Interpolation output

4
Function
3
Linear interpolation
2

1
f(x)

4
8 6 4 2 0 2 4 6 8
x
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 24/38
Outline

1 Regressions in pandas

2 Function approximation
Regression
Interpolation

3 Convex optimization

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 25/38
Main idea

We want to minimize a function f (x1 , x2 , x3 , ...xn ):

min f (x1 , x2 , x3 , ...xn ) (2)


xi

All local extrema satisfy


∂f
= 0, ∀i ∈ {1, 2, ...n} . (3)
∂xi

The global minimum/maximum (if it exists and/or is unique) is either


one of the local extrema or one of the domain end-points (see
whiteboard).

More?
The Weierstrass (extreme value) theorem guarantees the existence of
a maximum and minimum on closed and bounded intervals.

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 26/38
A two dimensional function

First, we define a function to minimize


1 def fm ( xy ):
2 # xy [0] is x , xy [1] is y
3 return np . sin ( xy [0]) + 1/20.0* xy [0]**2 \
4 + np . sin ( xy [1]) + 1/20.0* xy [1]**2

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 27/38
A three dimensional graphic

12
10
8

f(x,y)
6
4
2
0
2
10
5
10 0
5
0 5
y
x 5
10 10

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 28/38
Brute force optimization (the “caveman” approach)
1 import scipy . optimize as spo

Define a range and step to search for minimum:


1 search_area =( -10 ,10.01 ,5)

Change the function to print all iterations and output:


1 def fm ( xy ):
2 z = np . sin ( xy [0]) + 1/20.0* xy [0]**2 \
3 + np . sin ( xy [1]) + 1/20.0* xy [1]**2
4 print ( " {:8.4 f } {:8.4 f } {:8.4 f } " . format (
5 xy [0] , xy [1] , z ))
6 return z

Run the function brute (force) to find the minimum:


1 min_1 = spo . brute ( fm , ( search_area , search_area ) ,
2 finish = None )
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 29/38
Brute force optimization (the “caveman” approach)

1 What is the minimum found by this method?


2 How can we improve the accuracy? What is the drawback?

The brute force method, while limited, can serve to provide starting
values for more sophisticated algorithms.
One such function, working with numerical gradients, is fmin.

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 30/38
Optimization with fmin

General structure:
1 [ xopt , fopt ]= spo . fmin ( function , start_values ,
2 xtol = , ftol = , maxiter = , maxfun = ,)

1 xtol : Relative error in argument acceptable for convergence.


2 ftol: Relative error in function acceptable for convergence.
3 maxiter : Maximum number of iterations to perform.
4 maxfun : Maximum number of function evaluations to make.

We can use the global optimization results as starting values:


1 min_2 = spo . fmin ( fm , min_1 , xtol =0.001 , ftol =0.001)

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 31/38
Caveats

• Local optimization routines can get stuck in local extrema...


• ... or they may never converge.
• It is a good idea to perform a global optimization first to
pinpoint the neighborhood of global minimum.
• What happens if we start fmin with (2, 2) as starting values?

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 32/38
Constrained optimization

Most of the time, we look for optimal values of a function under a


set of constraints.
Problem
There are two securities, A and B: Both cost 10 today. Tomorrow
there are two equally likely states of the world: g or b. In state g,
A = 15 and B = 5. In state b, A = 5 and B = 12. Assume an

investor has 100 units of cash today and utility u (w ) = w . What is
his optimal investment?

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 33/38
Application: Formalization of the problem

1√ 1√
max Eu (w1 ) = max 15a + 5b + 5a + 12b, (4)
a,b a,b 2 2
subject to:

10a + 10b ≤ 100. (5)

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 34/38
Application: Python implementation (Solution)
First, define the function. Note: we want to maximize rather than
minimize expected utility! That is, we minimize negative utility.
1 def exp_u ( ab ):
2 return -(0.5* np . sqrt ( ab [0]*15+ ab [1]*5)
3 +0.5* np . sqrt ( ab [0]*5+ ab [1]*12))

Second, define the constraint as a dict variable and an implicit


function.Inequality sign is always implicitly ”≥ 0”.
1 cons =({ ’ type ’: ’ ineq ’ , ’ fun ’:
2 lambda ab : 100 - ab [0]*10 - ab [1]*10}})

Third, choose starting values:


1 startval =[5 ,5]

Fourth, run the minimize function from the optimization package:


1 result = spo . minimize ( exp_u , startval ,
2 method = ’ SLSQP ’ , constraints = cons )
Wednesday, February, 2019 Python for Finance - Lecture 6
Andras Niedermayer - Université Paris-Dauphine 35/38
Notes

• method stands for optimization algorithm. SLSQP (Sequential


Least SQuares Programming) allows one to introduce
constraints.
• One can specify Jacobian (jac) or Hessian matrix (hess)
directly.
• In addition, bounds for the argument can be specified by
bounds.
Output methods:
1 result.fun returns the optimum function values.
2 result.x returns the arguments corresponding to the optimum.
3 result.success returns True if optimization complete.

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 36/38
Comment

• spo.fmin works reasonably well for problems of lower


dimensions
• for higher levels of dimension, it will not work properly
• if you have a problem with a high level of dimensionality for
which you know that the objective function and the constraints
are convex, use the package cvxopt
• several thousand dimensions are not a problem for cvxopt
• see https://cvxopt.org
• however, this is outside of the scope of this lecture

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 37/38
Numerical integration

Numerical integration is done via the scipy.integrate package.


1 import scipy . integrate as integr

There are several methods to numerically integrate a function (say


f (x) = sin x + x2 ); fixed Gaussian quadrature, adaptive quadrature,
Romberg integration....
All are approximations of the same thing, though...
1 integr . fixed_quad (f , lmin , lmax )[0]
2 integr . quad (f , lmin , lmax )[0]
3 integr . romberg (f , lmin , lmax )[0]

Wednesday, February, 2019 Python for Finance - Lecture 6


Andras Niedermayer - Université Paris-Dauphine 38/38

You might also like