JUPYTER (IPYTHON)
NOTEBOOK CHEATSHEET
Jupyter (IPython) Notebook Cheatsheet
About Jupyter Notebooks
The Jupyter Notebook is a web application that allows you to create and
share documents that contain executable code, equations, visualizations
and explanatory text.
This guide walks you through the basics of using Jupyter Notebooks locally
(running Python 3, Pandas, matplotlib and Pandas Treasure Data Connector)
as a data analysis and visualization control panel with Treasure Data as your
data backend. (To run Jupyter notebooks remotely, just omit the setup
steps and surf directly to your online service in your browser.)
About Treasure Data
Treasure Data is a data collection, data storage and data analytics cloud
service, which integrates easily with Jupyter Notebooks. As this guide
shows, you can use Jupyter Notebooks as a flexible control panel for your
data analytics running on Treasure Data.
This Guide’s Audience
1. Newcomers to Jupyter: This guide shows how to get your first Jupyter
notebook up and running.
2. Data scientists trying to analyze larger datasets: If you are maxing out on
memory/disk on your local Jupyter notebook, integrating Treasure Data with
your Jupyter notebook can help you to scale.
3. Educators: Jupyter is an excellent teaching tool for data analysis, and this
guide can be used as a teaching aid.
Pre-Setup
To use Treasure Data with Jupyter Notebooks, you should sign up for Treasure
Data and get your master API key from Treasure Data Console.
You can also get Condas as an environment manager, and to install
dependencies.
Next, run the following at the command line, to set up your virtual python
environment “analysis”, and switch to it :
$ export MASTER_TD_API_KEY=”<YOUR_TREASUREDATA_MASTER_
API_KEY>”
$ conda create -n analysis python=3
$ source activate analysis
2
Jupyter (IPython) Notebook Cheatsheet
Condas Setup
Install dependencies and launch your Jupyter notebook
# install dependencies
(analysis)$ conda install pandas
(analysis)$ conda install matplotlib
(analysis)$ conda install -c https://conda.anaconda.
org/anaconda seaborn
(analysis)$ conda install ipython-notebook
(analysis)$ pip install pandas-td
#activate notebook
(analysis)$ ipython notebook
Cleanup and basic administration
These are the steps you’d take to clean up or manage your virtual environment
#deactivate notebook
(analysis)source deactivate
#remove environment
$ conda remove --name analysis --all
#list environments
$ conda info --envs
or
$ conda env list
3
Jupyter (IPython) Notebook Cheatsheet
Jupyter Notebooks
Creating a new notebook
1. Launch Jupyter Notebook by typing $ipython notebook at the console.
2. On the top right corner, click New->Python 3 (or Python 2).
Executing Commands
Command execution - and embedding code - is done by entering the Python
code to the current cell and hitting Shift + Return or hitting the play button ( )
in the toolbar:
Embedding Text
Text is embedded by selecting Markdown in the toolbar dropdown list,
typing text into the cell, and hitting Shift + Return.
4
Jupyter (IPython) Notebook Cheatsheet
Embedding Raw Text
Raw Text is embedded by selecting Raw NBConcvert in the toolbar dropdown
list, typing text into the cell, and hitting Shift + Return.
Embedding Links
Links are created by selecting Markdown in the toolbar dropdown list.,
typing text into the cell as shown below and and hitting Shift + Return.
Text Yields
https://www.treasuredata.com https://www.treasuredata.com
[click this link](https://www.treasuredata.com) click this link
[click this link](https://www. click this link
treasuredata.com “Treasure Data Home”) (shows Treasure Data Home on mouseover)
Embedding Formulae & Equations as
LaTEX
IPython notebook uses MathJax to render LaTeX inside html/markdown.
To render equations as LaTEX:
1. Refer to the Mathjax Tutorial for syntax help.
2. Place cursor in the cell where you want to type the equation.
3. Select Markdown in the toolbar dropdown list.
4. Type out your formula. You should wrap your formula string in ‘$$’.
5. Hit Shift + Return.
Formula Yields
$$c = \sqrt{a^2 + b^2}$$
$$e^{i\pi} + 1 = 0$$
$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$
$$r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k}
dx'$$
$$\begin{pmatrix}
1 & a_1 & a_1^2 & \cdots & a_1^n \\
1 & a_2 & a_2^2 & \cdots & a_2^n \\
\vdots & \vdots& \vdots & \ddots & \vdots \\
1 & a_m & a_m^2 & \cdots & a_m^n
\end{pmatrix}$$
5
Jupyter (IPython) Notebook Cheatsheet
IPython help and reference commands
IPython is its own programming language environment, and a complete
tutorial is out of the scope of this guide. Its kernel is supported in the Jupyter
Notebook.
For a complete introduction to IPython, go here: http://IPython.readthedocs.
org/en/stable/
Below are some commands supported in Jupyter Notebooks to get you started:
Command Description
%quickref IPython quick reference.
help() Python’s help utility.
Keyboard Shortcuts
The following are the most used keyboard shortcuts for a Jupyter Notebook
running the Python Kernel. This list is changing all the time; please check
help->keyboard shortcuts in your notebook for the latest.
Command Description
Enter enter edit mode
Command + a; Command + c; Command + v select all; copy; paste
Command + z; Command + y undo; redo
Command + s save and checkpoint
Command + b; Command + a insert cell below; insert cell above
Shift + Enter run cell, select below
Shift + m merge cells
Command + ]; Command + [ indent; dedent
Ctrl + Enter run cell
Option + Return run cell, insert cell below
Escape enter command mode
Escape + d + d delete selected cell
Escape + y change cell to code
Escape + m change cell to markdown
Escape + r change cell to raw
Escape + 1 change cell to Heading 1
Escape + n change cell to heading n
Escape + b create cell below
Escape + a Insert cell above
6
Jupyter (IPython) Notebook Cheatsheet
Magic Commands
Here are some commonly used magic functions.
Statement Explanation Example
Comprehensively lists and
%magic %magic
explains magic functions .
When active, enables you to call
%automagic %automagic
magic functions without the ‘%’.
%quickref Launch IPython quick reference. %quickref
In [561]: %time method = [a for a
%time Times a single statement.
in data if b.startswith(‘http’)]
CPU times: user 772 µs, sys: 9 µs,
returns total: 781 µs
Wall time: 784 µs
Pastebins lines from
%pastebin %pastebin 3 18-20 ~1/1-5
your current session.
Lines 3 and lines 18-20 from current
returns session, and lines 1 - 5 from the previous
session to gist.github.com pastebin link.
%debug Enters the interactive debugger. %debug
Print command input
%hist %hist
(and output) history.
Automatically enter python
%pdb %pdb
debugger after any exception.
Opens up a special prompt
%cpaste for manually pasting Python %cpaste
code for execution.
Delete all variables and
%reset names defined in the %reset
current namespace.
Run a python script
%run %run script.py
inside a notebook.
Display variables defined in the
%who, %who_ls, %whos interactive namespace, with %who, %who_ls, %whos
varying levels of verbosity.
Delete a variable in the
%xdel local namespace. Clear any %xdel variable
references to that variable.
7
Jupyter (IPython) Notebook Cheatsheet
Timing Code (%time and %timeit)
IPython has two magic functions: %time and %timeit to automate the
process of timing statement execution.
Given:
data = [‘http://www.google.com’, ‘http://www.treasure-
data.com’, ‘ftp://myuploads.net’, ‘John Hammink’] * 1000
Statement Explanation Example Input Example output
In [561]: %time method CPU times: user 772 µs,
%time Times a single statement. = [a for a in data if sys: 9 µs, total: 781 µs
a.startswith('http')] Wall time: 784 µs
Runs a statement %timeit [a for
1000 loops, best of 3: 713
%timeit multiple times to get a in data if
µs per loop
an average runtime. a.startswith('http')]
Profiling (%prun and %run -p)
Statement Explanation
%prun my_function() Runs a function (or code) within the python profiler.
%run -p my_script.py Runs a script under the profiler.
Matplotlib
Matplotlib is the default library for plotting in both IPython and Jupyter
Notebooks. Here’s how you would plot a graph from data you have stored in
Treasure Data.
%matplotlib inline
import os
import pandas as pd
import pandas_td as td
#Initialize our connection to Treasure Data
con = td.connect(apikey=os.environ[‘MASTER_TD_API_KEY’],
endpoint=’https://api.treasuredata.com’)
#Query engine for Presto
engine=con.query_engine(database=’sample_datasets’,
type=’presto’)
#Get all rows from 2010-01-01 to 2010-02-01
df = td.read_td_table(‘nasdaq’, engine, limit=None,
time_range=(‘2010-01-01’, ‘2010-02-01’),
index_col=’time’, parse_dates={‘time’: ‘s’})
#Plot the sum of volumes, grouped by time (index level =
0)
df.groupby(level=0).volume.sum().plot()
8
Jupyter (IPython) Notebook Cheatsheet
This returns
Seaborn
Based on Matplotlib, Seaborn has a strong focus on visualizing statistical
results such as univariate and bivariate linear regression, data matrices, time
series and more. Seaborn also offers better aesthetics by default with built-in
themes and color palettes.
You can read more about Seaborn here: http://stanford.edu/~mwaskom/
software/seaborn/
#Create a violin plot
import seaborn as sns
sns.violinplot(df.groupby(level=0).volume.sum())
9
Jupyter (IPython) Notebook Cheatsheet
Pandas Treasure Data connector
Pandas-td is an open source tool that allows you to connect to Treasure Data
on the backend, and use Pandas from within Jupyter Notebooks. Pandas-td is
available, as open source, here: https://github.com/treasure-data/pandas-td
Connecting to Treasure Data
#Initialize our connection to Treasure Data
con = td.connect(apikey=os.environ[‘MASTER_TD_API_
KEY’], endpoint=’https://api.treasuredata.com’)
Listing tables
#list all tables from Treasure Data database ‘sample_datasets’
con.tables(‘sample_datasets’)
Reading a table
#set the engine to Presto
engine=con.query_engine(database=’sample_datasets’,
type=’presto’)
#Get 3 lines, converting time to DateTimeIndex
td.read_td_table(‘nasdaq’, engine, limit=3,
index_col=’time’, parse_dates={‘time’:
‘s’})
Sending queries
td.read_td_query(‘’’
select count(*) from nasdaq where symbol=’AAPL’
‘’’, engine)
td.read_td_query(‘’’
select time, close from nasdaq where symbol = ‘AAPL’
‘’’, engine, index_col=’time’, parse_dates={‘time’:
‘s’})
Sampling data from a table
#1-percent sample, with 100k rows
td.read_td_table(‘nasdaq’, engine, limit=100000,
sample=0.01,
index_col=’time’, parse_dates={‘time’: ‘s’})
10
Jupyter (IPython) Notebook Cheatsheet
Dataframes
Dataframe is a core data structure for Pandas, a popular Python data analysis
framework. Treasure Data’s Pandas connector maps Dataframes to Treasure
Data’s tables and vice versa.
Getting Data into a dataframe
#Get all rows from 2010-01-01 to 2010-02-01
df = td.read_td_table(‘nasdaq’, engine, limit=None,
time_range=(‘2010-01-01’, ‘2010-02-01’),
index_col=’time’, parse_dates={‘time’: ‘s’})
Grouping Data in a dataframe
#Plot the sum of volumes, grouped by time (index level
=0)
df.groupby(level=0).volume.sum().plot()
Getting number of rows in a dataframe
len(df)
Sampling data from a large set into a dataframe
#1-percent sample, with 100k rows
df= td.read_td_table(‘nasdaq’, engine, limit=100000,
sample=0.01,
index_col=’time’, parse_dates={‘time’: ‘s’})
11
Jupyter (IPython) Notebook Cheatsheet
System Commands
To run any command at the system shell, simply prefix it with !, e.g.:
!ping www.google.com
!ls -l
...
Statement Explanation
!cmd Run cmd in the system shell.
b? Displays the variable’s type with a description.
b?? Displays the source code of a function, b.
output = !cmd args Run cmd and store stdout in the variable output.
%alias alias_name command Create an alias for a given command.
%bookmark Bookmark a directory from within IPython.
%cd directory Change the working directory.
%pwd Get the current working directory.
%env Get the system environment variables as a dict.
To work with the current directory stack:
Statement Explanation
Place the current directory on the stack and
%pushd directory
change to the specified target directory.
%popd Change to the directory popped from the stack.
%dirs List the current directory stack.
12
Jupyter (IPython) Notebook Cheatsheet
Command history
Statement Explanation Example
Print command input (and
%hist %hist
output) history.
%dhist Get the visited directory history. %dhist
Directory history (kept in _dh)
Directory history (kept in _dh)
returns
0: /Users/you/Documents/notebooks/
1: /Users/you/Documents
_ Returns variable from the previous line.
_33 Returns output variable from line 33.
_i33 Returns input variable from line 33.
Pastebins lines from your
%pastebin %pastebin 3 18-20 ~1/1-5
current session.
Lines 3 and lines 18-20 from current session,
returns and lines 1 - 5 from the previous session
to gist.github.com pastebin link.
%logstart Initiates logging for a session. %logstart
13
Jupyter (IPython) Notebook Cheatsheet
IPython Debugger
The following magic commands support debugging.
Statement Explanation Example
Automatically enter python
%pdb %pdb
debugger after any exception.
%run -t Run and time execution of a script. %run -t my_script.py
%run -d Run script under the Python debugger. %run -d my_script.py
%run -p Run a script under the profiler. %run -p my_script.py
Run a script, giving it access to
%run -i the variables from the current %run -i my_script.py
IPython namespace.
%debug Enters the interactive debugger. %debug
Next Steps
We believe that, regardless of your experience level, it’s easy to get
started analyzing larger datasets straight from within Jupyter Notebooks.
1. Sign up for Treasure Data and get your master API key from Treasure Data
Console.
2. Get Condas as an environment manager, and install dependencies.
3. Install Treasure Data Pandas Connector and launch your first notebook.
You can also request a demo from Treasure Data. We’re here to help you get
started!
14
+1.866.899.5386 - [email protected]
2565 Leghorn Street, Mountain View, CA 94040 V. 1.0