Data
Analysis
With Python
A. Beck
Introduction
Data Analysis With Python
Using
Python
Basic
Python
Arnaud Beck
Scipy
Data I/O
Visualization
Laboratoire Leprince-Ringuet, cole Polytechnique, CNRS/IN2P3
Space Science Training Week
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
Why come to Python ?
Data
Analysis
Should I use low-level,compiled language or an interpreted language ?
With Python Commercial or open source ?
A. Beck
Introduction
C/C++ Matlab Python
Using Easy and flexible X X
Python
Basic Performances X
Python
Scipy
Free and available on any system X X
Data I/O
Visualization
Why stick to Python ?
Data
Analysis
Python is distinguished by its large and active scientific computing community.
With Python There are people developing libraries for virtually anything.
A. Beck
Glue to other languages
Introduction
Using Libraries to interface other languages (C/C++/Fortran)...
Python
...with the same performances ! !
Basic
Python Critical part of codes are written in a lower level language.
Scipy
Data I/O Parallelization
Visualization
MPI
OpenMP
GPU
Data management and visualization
IO data in any format (HDF5, VTK, ...)
Data management dedicated libraries (scipy, pandas)
Direct visualization or interfaces with other softwares (Paraview, Mayavi)
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
Getting Python for data analysis
Data
Analysis
With Python
A. Beck
Basic Python distribution
Introduction
Using Available on any Linux or Mac OS.
Python
Basic
Python
Critical for data analysis
Scipy
Data I/O
Modules : Scipy, Matplotlib
Visualization
Application specific
Modules : mpi4py, VTK, pytable, etc.
It is possible to install fully pre-built scientific Python environment : Enthought
Python Distribution or Python(x,y) for Windows.
Running Python
Data
Analysis
Interactive mode in a Python shell
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Use of a script
Data I/O
Visualization
Turn your python script into a unix script
You can compile scripts into binary .pyc files. Mostly for developers.
IPython : a convenient and comfortable Python shell
Data
Analysis
With Python
A. Beck
Introduction
Interesting features
Using
Python Command history
Basic
Python Any Xterm command accessible via !
Scipy Commands auto-completion
Data I/O
Quick help through the use of ?
Visualization
Inline and interactive graphics
Timing and profiling tools
Many many more ...
Best tool for exploring, debugging or work interactively. Have a look !
IPython example
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
Python is an object oriented language
Data
Analysis
With Python
A. Beck
Introduction
In Python, we do things with stuff !
Using
Python
things = operations
Basic
stuff = objects
Python
Scipy
Data I/O Type Example
Visualization
Numbers 128, 3.14, 4+5j
Strings 'Rony', "Giovannis"
Lists [1,"string",2.45]
Tuples (1,"string",2.45)
Strings, Lists and Tuples are sequences.
Strings and Tuples are immutable.
Numbers
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Strings
Ordered collection (or sequence) of characters
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
String Methods
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Lists
Sequence of any objects
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Slices
Manipulating sequences
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Importing modules
Data
Analysis
With Python
A. Beck
Modules define new object types and operations.
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
The large and growing Python users community provides an increasing number
of modules that already do what you need.
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
The Scipy module
Data
Analysis
With Python
A. Beck Scipy is a collection of powerful , high level functions for mathematics and data
Introduction
management. It is based on the numpy.ndarray object type and vectorized
Using
operations. The operations are optimized and coded in C to deliver high
Python performances.
Basic
Python
Scipy
Data I/O
Visualization
If you are using a for loop, you are probably doing something wrong !
Creating an ndarray
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Manipulating ndarrays
Data
Analysis
With Python
A. Beck
Introduction
Slicing is still the basis of array manipulation.
Using
Python Reshape > Change number and size of dimensions of the array.
Basic
Python
Sort > Quite self explanatory.
Scipy Delete, insert, append > Remove or add parts of the array.
Data I/O
Squeeze, flatten, ravel > More ways to control dimensionality of the array.
Visualization
Transpose,swapaxes, rollaxis > More ways to arange the dimensions as
you want
These functions are important because a well aranged data is a quickly
processed data.
Extracting information from your data
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Intersection (convenient for filtering)
Histograms (perfect for distribution functions)
Convolution
Integration
Interpolation
Name it ...
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
Reading data
Data
Analysis
With Python
A. Beck
Introduction
The whole game is to fit your data in a ndarray.
Using
Python
Basic
Python
Scipy
data = scipy.fromfile("file",dtype=float32,count=-1,sep=" ")
Data I/O
Visualization Works with raw binary files and ASCII files but not very flexible.
data = scipy.loadtxt("file",skiprows=0,delimiter=",")
More flexible but works only with text files.
The file object
Data
Analysis
With Python
A. Beck
The file object is a basic python type. It is created by
Introduction
Using
Python
Basic
Python
fid = open("filename","r")
Scipy
"r" for read, "w" for write.
Data I/O
Visualization
fid.readline() > reads a line in a string
fid.readlines() > reads all line in a list of strings
fid.tell() > returns the files current position (in byte)
fid.seek(n) > goes to position n
fid.read() > reads all file in a string
fid.close()
Manipulating a file
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Quick words about reading HDF5 files
Data
Analysis
Reading HDF5 files is module dependant. You can use either tables or h5py
With Python for instance.
A. Beck
Introduction These modules coexist well with Scipy and load data directly into ndarray.
Using
Python
Basic tables example
Python
Scipy
Data I/O
Visualization
Writing data
Data
Analysis scipy.save("file",ndarray) and scipy.load("file") in order to use
With Python
the binary scipy format to store arrays.
A. Beck
ndarray.tofile() in order to store an array in a text file or raw binary.
Introduction
fileobject.write("any_string") to write a string in a text file.
Using
Python The h5py and tables modules are used to write HDF5 files.
Basic
Python VTK script
Scipy
Data I/O
Visualization
Outline
Data
Analysis
With Python
A. Beck
Introduction
1 Introduction
Using
Python
Basic 2 Using Python
Python
Scipy
Data I/O 3 Basic Python
Visualization
4 Scipy
5 Data I/O
6 Visualization
Visualization workflow
Data
Analysis
With Python
A. Beck
Introduction
Python Python
Using Raw Data Postprocessed data Formated data file
Python
Basic
Python
Scipy
Data I/O
Python, matplotlib
Visualization
Visualization software
Paraview, Visit, Mayavi ...
Plot
Visualization
Matplotlib : the figure object
Data
Analysis
fig = figure([options])
With Python
A. Beck Options include :
Introduction Size in inches
Using
Python Dpi 1.0
Basic Face and edge colors
Python
Scipy Frame layout 0.8
Data I/O
Normalized to maximum
Visualization
Operations include :
Title and axis labels 0.6
fig.xlabel("string")
Axis ticks and extent 0.4
fig.ticks(ndarray)
Injected Charge
Display a colorbar
0.2 Bubble Size
fig.colorbar()
Laser Amplitude
Display a legend
fig.legend() 0.0
0 5 10 15
Save figure (png or eps) Propagation length [mm]
fig.savefig()
Matplotlib : Simple plots
Data
Analysis
plot(x,y,[options])
With Python
A. Beck
Introduction If x is omitted, default is x=range(len(y)).
Using
Python
Basic
All typical options are here : lines (style, color, width ...), markers (size, shape,
Python colors ...), labels for legend, antialiasing, transparency, many more ...
Scipy
Data I/O
Visualization
Matplotlib 2D plots : imshow and pcolor
Data
Analysis
With Python
A. Beck 2Dar = rand((100,100)) 2Dar = rand((100,100))
imshow(2Dar,[options]) pcolor(2Dar,[options])
Introduction
Using
Python
Basic
Python
1.0 100 1.0
Scipy
0.9 0.9
0
Data I/O
0.8 80 0.8
Visualization
20
0.7 0.7
0.6 60 0.6
40
0.5 0.5
60
0.4 40 0.4
0.3 0.3
80
0.2 20 0.2
0.1 0.1
0 20 40 60 80
0.0 0 0.0
0 20 40 60 80 100
2D plots with a little bit of tuning
Data
Analysis
With Python
A. Beck 3.0
60
Introduction
Using
40
Python
Basic
Python
20 2.0
Scipy
y [m]
Data I/O
0
Visualization
20 1.0
40
60
0.0
60 40 20 0
x ct [m]
Other features of matplotlib
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
Matplotlib has native LATEX rendering
Data
Analysis
label = r"$Math \LaTex code$"
With Python
A. Beck
Introduction
Using
Python
Basic
Python
Scipy
Data I/O
Visualization
The futur of visualization in Python
Data
Analysis
With Python
A. Beck
Introduction
Using
Python
Basic It is an extremely vast, active and changing domain.
Python
Scipy
New modules are emerging : Chaco, MayaVi, Bokeh, stressing interactivity and
Data I/O
Visualization
dynamic data visualizations in web browsers and in 3D.
What you saw today is extremely basic and is only a tiny part of what Python is
capable of.