Introduction to other
le types
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Other le types
Excel spreadsheets
MATLAB les
SAS les
Stata les
HDF5 les
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pickled les
File type native to Python
Motivation: many datatypes for which it isn’t obvious how to store
them
Pickled les are serialized
Serialize = convert object to bytestream
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pickled les
import pickle
with open('pickled_fruit.pkl', 'rb') as file:
data = pickle.load(file)
print(data)
{'peaches': 13, 'apples': 4, 'oranges': 11}
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing Excel spreadsheets
import pandas as pd
file = 'urbanpop.xlsx'
data = pd.ExcelFile(file)
print(data.sheet_names)
['1960-1966', '1967-1974', '1975-2011']
df1 = data.parse('1960-1966') # sheet name, as a string
df2 = data.parse(0) # sheet index, as a float
INTRODUCTION TO IMPORTING DATA IN PYTHON
You’ll learn:
How to customize your import
Skip rows
Import certain columns
Change column names
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Importing SAS/Stata
les using pandas
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
SAS and Stata les
SAS: Statistical Analysis System
Stata: “Statistics” + “data”
SAS: business analytics and biostatistics
Stata: academic social sciences research
INTRODUCTION TO IMPORTING DATA IN PYTHON
SAS les
Used for:
Advanced analytics
Multivariate analysis
Business intelligence
Data management
Predictive analytics
Standard for computational analysis
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing SAS les
import pandas as pd
from sas7bdat import SAS7BDAT
with SAS7BDAT('urbanpop.sas7bdat') as file:
df_sas = file.to_data_frame()
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing Stata les
import pandas as pd
data = pd.read_stata('urbanpop.dta')
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Importing HDF5 les
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
HDF5 les
Hierarchical Data Format version 5
Standard for storing large quantities of numerical data
Datasets can be hundreds of gigabytes or terabytes
HDF5 can scale to exabytes
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing HDF5 les
import h5py
filename = 'H-H1_LOSC_4_V1-815411200-4096.hdf5'
data = h5py.File(filename, 'r') # 'r' is to read
print(type(data))
<class 'h5py._hl.files.File'>
INTRODUCTION TO IMPORTING DATA IN PYTHON
The structure of HDF5 les
for key in data.keys():
print(key)
meta
quality
strain
print(type(data['meta']))
<class 'h5py._hl.group.Group'>
INTRODUCTION TO IMPORTING DATA IN PYTHON
The structure of HDF5 les
for key in data['meta'].keys():
print(key)
Description
DescriptionURL
Detector
Duration
GPSstart
Observatory
Type
UTCstart
print(data['meta']['Description'].value, data['meta']['Detector'].value)
b'Strain data time series from LIGO' b'H1'
INTRODUCTION TO IMPORTING DATA IN PYTHON
The HDF Project
Actively maintained by the HDF Group
Based in Champaign, Illinois
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Importing MATLAB
les
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
MATLAB
“Matrix Laboratory”
Industry standard in engineering and science
Data saved as .mat les
INTRODUCTION TO IMPORTING DATA IN PYTHON
SciPy to the rescue!
scipy.io.loadmat() - read .mat les
scipy.io.savemat() - write .mat les
INTRODUCTION TO IMPORTING DATA IN PYTHON
What is a .mat le?
INTRODUCTION TO IMPORTING DATA IN PYTHON
What is a .mat le?
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing a .mat le
import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)
print(type(mat))
<class 'dict'>
keys = MATLAB variable names
values = objects assigned to variables
print(type(mat['x']))
<class 'numpy.ndarray'>
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N TO I M P O R T I N G D ATA I N P Y T H O N