Data Science Jupyter Notebook Environment

A modular, multi-target Docker environment for data science with Python 3.13. Build only what you need - from a lightweight base image to a comprehensive full environment.

Package Manager: uv - Fast Python package manager from Astral

Quick Start

Use Prebuilt Images

Prebuilt images are published to GitHub Container Registry on every push to main:

docker run -p 8888:8888 \
  -v $(pwd)/notebooks:/home/jupyter/notebooks \
  -v $(pwd)/data:/home/jupyter/data \
  ghcr.io/wlame/jupyter-docker:scientific

See Available Targets for the full list. Replace scientific with any target name.

Build from Source

# Build only what you need
docker build --target base -t ds-base .
docker build --target scientific -t ds-scientific .
docker build --target ml -t ds-ml .
docker build --target full -t ds-full .

Run the Container

docker run -p 8888:8888 \
  -v $(pwd)/notebooks:/home/jupyter/notebooks \
  -v $(pwd)/data:/home/jupyter/data \
  ds-scientific

Access Jupyter Lab at: http://localhost:8888

Available Targets

Target	Size	Description	Inherits From
`base`	~500MB	Common utilities (JSON, HTTP, dates)	Ubuntu 24.04
`scientific`	~1.2GB	NumPy, SciPy, Pandas, Statsmodels	base
`visualization`	~900MB	Matplotlib, Plotly, Bokeh, Altair	base
`dataio`	~800MB	Parquet, HDF5, Excel, SQL	base
`ml`	~2GB	Scikit-learn, XGBoost, LightGBM	scientific
`deeplearn`	~6GB	PyTorch, TensorFlow, Keras	ml
`vision`	~2GB	OpenCV, Pillow, YOLO	base
`audio`	~3GB	Librosa, TorchAudio, soundfile	base
`geospatial`	~2GB	Cartopy, GeoPandas, Folium	scientific
`timeseries`	~2GB	tsfresh, sktime, Prophet	scientific
`nlp`	~4GB	spaCy, Transformers, NLTK	base
`speech`	~4GB	Whisper, gTTS, SpeechBrain	base
`face`	~7GB	DeepFace, dlib, face-alignment	base
`full`	~14GB	Everything combined	standalone

Target Inheritance Tree

base
├── scientific
│   ├── ml
│   │   └── deeplearn
│   ├── geospatial
│   └── timeseries
├── visualization
├── dataio
├── vision
├── audio
├── nlp
├── speech
└── face

full (standalone - includes all)

Build All Targets

Use the build script to build and test all targets:

# Build and test all targets
./build-all.sh

# Build only (no tests)
./build-all.sh --build-only

# Test existing images
./build-all.sh --test-only

# Build specific targets
./build-all.sh base scientific ml

Target Details

Base (Common Utilities)

Essential utilities included in all specialized targets.

Library	Description
IPython, Jupyter, JupyterLab	Interactive computing
orjson, ujson, simplejson	JSON processing
lxml, xmltodict, BeautifulSoup4	XML/HTML parsing
PyYAML	YAML processing
requests, httpx, aiohttp	HTTP clients
Pydantic	Data validation
tqdm, loguru	Progress bars, logging
python-dateutil, pytz, pendulum	Date/time utilities
joblib, toolz, more-itertools	Utilities

Scientific (Numerical Computing)

Core libraries for numerical and statistical computing.

Library	Description
NumPy	N-dimensional arrays
SciPy	Scientific algorithms
Pandas	DataFrames
Statsmodels	Statistical models
SymPy	Symbolic mathematics

Visualization (Charts & Dashboards)

Interactive and static visualization libraries.

Library	Description
Matplotlib	2D/3D plotting
Seaborn	Statistical visualization
Plotly	Interactive charts
Bokeh	Web-based visualization
HoloViews, hvPlot	Declarative visualization
Panel	Dashboards
Altair	Declarative statistical viz

DataIO (Data Formats & Databases)

Read and write various data formats.

Library	Description
PyArrow	Apache Arrow columnar data
fastparquet	Parquet format
h5py, PyTables	HDF5 format
openpyxl, xlrd	Excel files
SQLAlchemy	Database ORM

ML (Machine Learning)

Classical machine learning algorithms.

Library	Description
Scikit-learn	ML algorithms
XGBoost	Gradient boosting
LightGBM	Fast gradient boosting
imbalanced-learn	Imbalanced datasets
Optuna	Hyperparameter optimization

DeepLearn (Neural Networks)

Deep learning frameworks.

Library	Description
PyTorch	Dynamic neural networks
TorchVision	Computer vision for PyTorch
TensorFlow	ML platform
Keras	High-level neural network API

Vision (Image Processing)

Computer vision and image manipulation.

Library	Description
Pillow	Image processing
OpenCV (headless)	Computer vision
scikit-image	Image algorithms
imageio	Image I/O
Ultralytics	YOLOv8 object detection

Audio (Audio Processing)

Audio analysis and manipulation.

Library	Description
TorchAudio	Audio for PyTorch
librosa	Music/audio analysis
soundfile	Audio file I/O
pydub	Audio manipulation
audioread	Audio decoding

Geospatial (Maps & GIS)

Geographic data processing and visualization.

Library	Description
Cartopy	Map projections
GeoPandas	Geospatial DataFrames
Shapely	Geometric operations
PyProj	Coordinate transformations
Folium	Interactive maps
GeoViews	Geographic visualization

TimeSeries (Time Series Analysis)

Time series modeling and forecasting.

Library	Description
tsfresh	Feature extraction
sktime	Time series ML
pmdarima	Auto-ARIMA
Prophet	Forecasting

NLP (Natural Language Processing)

Text processing and language models.

Library	Description
spaCy	Industrial NLP
NLTK	Classic NLP toolkit
Transformers	Hugging Face models
sentence-transformers	Sentence embeddings
tokenizers	Fast tokenization

Speech (Speech Recognition & TTS)

Speech-to-text and text-to-speech.

Library	Description
openai-whisper	Best overall ASR
faster-whisper	4x faster ASR (CTranslate2)
SpeechRecognition	Lightweight ASR API wrapper
coqui-tts	Mature TTS engine
gTTS	Google Text-to-Speech
piper-tts	ONNX-based CPU-friendly TTS
pyannote-audio	Speaker diarization
speechbrain	All-in-one speech toolkit

Face (Face Detection & Recognition)

Face detection, recognition, analysis, and generation.

Library	Description
DeepFace	Recognition + attribute analysis
dlib	Face detection, 68-point landmarks
MTCNN	TensorFlow face detection
RetinaFace	Face detection with landmarks
face-alignment	2D/3D face landmarks (PyTorch)
diffusers	Face generation (Stable Diffusion)

Full (Complete Environment)

All libraries from all targets combined. Use when you need everything.

Docker Commands Reference

Build Commands

# Build specific target
docker build --target scientific -t ds-scientific .

# Build with no cache
docker build --no-cache --target ml -t ds-ml .

# Build full environment
docker build --target full -t ds-full .

Run Commands

# Run with volume mounts
docker run -p 8888:8888 \
  -v $(pwd)/notebooks:/home/jupyter/notebooks \
  -v $(pwd)/data:/home/jupyter/data \
  ds-scientific

# Run in background
docker run -d -p 8888:8888 --name jupyter \
  -v $(pwd)/notebooks:/home/jupyter/notebooks \
  ds-ml

# Verify imports
docker run --rm ds-scientific uv run python /home/jupyter/scripts/verify_scientific.py

# Run IPython
docker run --rm -it ds-scientific uv run ipython

Volume Mounts

Local Directory	Container Path	Purpose
`./notebooks`	`/home/jupyter/notebooks`	Your notebooks
`./data`	`/home/jupyter/data`	Data files
`./examples`	`/home/jupyter/examples`	Example scripts

Example Files

The examples/ directory contains Python scripts and Jupyter notebooks:

Example	Description
`01_numpy_scipy_basics`	NumPy arrays, SciPy statistics
`02_pandas_data_analysis`	DataFrame operations
`03_matplotlib_seaborn_viz`	Static visualizations
`04_plotly_interactive`	Interactive charts
`05_bokeh_holoviews`	Bokeh and HoloViews
`06_geospatial`	Maps with Cartopy, GeoPandas, Folium
`07_timeseries_analysis`	Time series, ARIMA, forecasting
`08_data_io_serialization`	JSON, XML, Parquet, HDF5
`09_machine_learning`	Classification, regression
`10_deep_learning_pytorch`	PyTorch neural networks
`11_deep_learning_tensorflow`	TensorFlow and Keras
`12_image_processing`	PIL, OpenCV, scikit-image
`13_object_detection_yolo`	YOLOv8 object detection
`14_nlp_text_analysis`	spaCy, NLTK, sentence-transformers
`15_audio_analysis`	librosa, torchaudio features
`16_altair_panel_viz`	Altair, hvPlot, Panel dashboards
`17_scipy_signal_processing`	FFT, filters, spectrograms
`18_sqlalchemy_database`	SQLAlchemy ORM, Parquet, HDF5
`19_speech_processing`	Whisper ASR, gTTS, torchaudio
`20_face_analysis`	dlib, DeepFace, face-alignment

Choosing the Right Target

Use Case	Recommended Target
Data analysis with Pandas	`scientific`
Creating charts/dashboards	`visualization`
Machine learning models	`ml`
Deep learning/neural networks	`deeplearn`
Image processing	`vision`
Audio processing	`audio`
Geographic data/maps	`geospatial`
Time series forecasting	`timeseries`
Text/NLP work	`nlp`
Speech recognition/synthesis	`speech`
Face detection/recognition	`face`
Need everything	`full`

Container Details

Base Image: Ubuntu 24.04
Python: 3.13 (via deadsnakes PPA)
Package Manager: uv
User: jupyter (non-root)
Working Directory: /home/jupyter
Exposed Port: 8888

Security Note

Default configuration has no authentication for local development. For production:

docker run -p 8888:8888 \
  -e JUPYTER_TOKEN=your-secret-token \
  ds-scientific \
  uv run jupyter lab --ip=0.0.0.0 --IdentityProvider.token='your-secret-token'

GPU Support

For NVIDIA GPU support:

docker run --gpus all -p 8888:8888 ds-deeplearn

Note: Requires NVIDIA Container Toolkit.

Project Structure

.
├── Dockerfile              # Multi-stage Dockerfile
├── build-all.sh            # Build and test all targets
├── README.md
├── targets/
│   ├── base/
│   │   ├── pyproject.toml
│   │   └── verify_imports.py
│   ├── scientific/
│   ├── visualization/
│   ├── dataio/
│   ├── ml/
│   ├── deeplearn/
│   ├── vision/
│   ├── audio/
│   ├── geospatial/
│   ├── timeseries/
│   ├── nlp/
│   ├── speech/
│   ├── face/
│   └── full/
├── examples/
│   ├── *.py                # Python scripts
│   └── *.ipynb             # Jupyter notebooks
├── notebooks/              # Your notebooks (mounted)
└── data/                   # Your data (mounted)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
data		data
examples		examples
notebooks		notebooks
scripts		scripts
targets		targets
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build-all.sh		build-all.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Data Science Jupyter Notebook Environment

Quick Start

Use Prebuilt Images

Build from Source

Run the Container

Available Targets

Target Inheritance Tree

Build All Targets

Target Details

Base (Common Utilities)

Scientific (Numerical Computing)

Visualization (Charts & Dashboards)

DataIO (Data Formats & Databases)

ML (Machine Learning)

DeepLearn (Neural Networks)

Vision (Image Processing)

Audio (Audio Processing)

Geospatial (Maps & GIS)

TimeSeries (Time Series Analysis)

NLP (Natural Language Processing)

Speech (Speech Recognition & TTS)

Face (Face Detection & Recognition)

Full (Complete Environment)

Docker Commands Reference

Build Commands

Run Commands

Volume Mounts

Example Files

Choosing the Right Target

Container Details

Security Note

GPU Support

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages