Kalbe Workshop 2025: Advanced Data Analytics
with Python
General Overview
Welcome to the Kalbe Python Workshop 2025! This project focuses on Exploratory Data Analysis (EDA) using
the Online Retail II dataset. The workshop is designed to help participants solve real-world data analysis
problems using Python, so make sure it is already installed
Installation Guide
Choose your distribution
You can run the notebooks using one of the following environments:
• Anaconda
• VS Code + Python extension
• Miniforge (lightweight Conda alternative)
Anaconda:
1. Go to the Anaconda Distribution download page.
2. Select the installer matching your OS (Windows/macOS/Linux) and download it.
3. Install it and Launch the App
VS Code:
1. Download and Install Visual Studio Code for your OS.
2. Open VS Code → Extensions sidebar → install the Python extension (by Microsoft).
3. (Windows only) install the Windows Terminal for a nicer shell.
4. Start VS Code, press Ctrl+Shift+P → Python: Select Interpreter → choose your conda
environment or base Python.
Miniforge:
Miniforge provides a lightweight Conda-compatible environment.
1. Go to the Miniforge
2. Select the installer matching your OS (Windows/macOS/Linux) and download it.
3. Install it and Launch Miniforge Prompt
Environment Setup
We recommend to create environment for each of your projects if they will likely mess up library version and
dependencies (you could have an environment for general EDA, one for deep learning, one for optimization,
and so on)
On Anaconda:
1. For anaconda, it’s quite straightforward, go to the environment tabs, create a new one
On VS Code:
On Miniforge
1. On Miniforge Prompt, to create an environment, do: conda create --name my_env_name
python=3.11
2. After done installing, you then can activate that using conda activate my_env_name (you could
see the environment on the left changes)
Install JupyterLab:
On Anaconda:
It is already preinstalled, just launch it
On Miniforge:
pip install --upgrade pip
pip install jupyterlab
jupyter lab # launches in your default browser
Cloning the Repo
1. You then can navigate to your prepared folder cd /path/to/your/project
2. clone this repo git clone https://github.com/faathirchikal/kalbe-python-workshop-
2025.git
3. Navigate to the project folder
4. Install the libraries pip install -r requirements.txt -U
5. Done, you can navigate the notebook as you wish
Data Overview
This project data is about retail sales for each product category with additional information like discount,
promotion, competitor pricing etc. you can download it from here: Online Retail II UC and put them in data/raw/
folder Data definition:
Column Type Description
Invoice number. Nominal. A 6-digit integral number uniquely assigned to
InvoiceNo object each transaction. If this code starts with the letter 'c', it indicates a
cancellation
Product (item) code. Nominal. A 5-digit integral number uniquely
StockCode object
assigned to each distinct product
Description object Product (item) name. Nominal
Quantity int64 The quantities of each product (item) per transaction. Numeric
Invoice date and time. Numeric. The day and time when a transaction
InvoiceDate datetime
was generated
UnitPrice float64 Unit price. Numeric. Product price per unit in sterling (£)
Customer number. Nominal. A 5-digit integral number uniquely assigned
CustomerID object
to each customer
Country name. Nominal. The name of the country where a customer
Country object
resides
Project Structure
Make sure your folder looks like this
├── data/
│ └── preprocessed/ # preprocessed data folder
│ └── raw/ # raw data folder
├── 01_data_preprocessing.py # Data Cleaning
├── 02_eda.py # General EDA
├── 03A_price_elasticity.py # Product Price Elasticity
├── 03B_other_analysis.py # Other Analysis
├── 04_forecast_preprocessing.py # Preprocess for forecast
├── 05_forecast.py # Forecast code
├── requirements.txt # Libraries needed
├── streamlit_app.py # Streamlit app