Aadhaar Data Analysis

This repository contains scripts and data for analysing Aadhaar enrolment, biometric, and demographic data.

Setup

Prerequisites

Python 3.8+
UV (recommended)
Typst (for report compilation)

Tip

To install UV, follow the instructions in the official documentation. To install Typst, see typst.app.

Clone the repository

git clone https://github.com/arnav-kr/aadhaar-stats.git
cd aadhaar-stats

Sync Environment & Install Dependencies

uv sync

Usage

Run Full Pipeline

The main script runs all analysis stages and compiles the final report:

uv run main.py

Options

uv run main.py --skip-analysis  # only compile report
uv run main.py --skip-report    # only run analysis scripts
uv run main.py --assistant      # launch AI assistant

AI Assistant

The project includes an AI-powered assistant for exploring the analysis data interactively.

Setup

Create a .env file in the project root with your API key:

AI_API_KEY=your-api-key-here
AI_MODEL=gemini-3-flash-preview

Get an API key from Google AI Studio

Usage

uv run main.py --assistant

The assistant can answer questions about:

Enrolment statistics and trends
State and district comparisons
Migration patterns
Data quality metrics
Anomaly detection results
And more...

Run Individual Scripts

uv run scripts/preprocess.py
uv run scripts/univariate.py
# etc.

Project Structure

├── main.py                 # Main pipeline script
├── assistant/              # AI-powered data exploration assistant
│   ├── __init__.py
│   ├── chat.py             # Chat interface using Gemini
│   └── data_provider.py    # Local data context provider
├── data/
│   ├── raw/                # Raw Aadhaar CSV files
│   │   ├── enrolment/      # New enrolment records
│   │   ├── demographic/    # Demographic update records
│   │   └── biometric/      # Biometric update records
│   ├── processed/          # Cleaned and normalized data
│   ├── intermediate/       # Intermediate processing artifacts
│   └── maps/               # Geographic boundary files (shapefiles, geojson)
├── scripts/
│   ├── preprocess.py       # Data cleaning and normalization
│   ├── univariate.py       # Single-variable analysis
│   ├── bivariate.py        # Two-variable relationship analysis
│   ├── trivariate.py       # Three-variable interaction analysis
│   ├── data_quality.py     # Data quality assessment
│   ├── advanced.py         # Advanced insights and forecasting
│   ├── spatial.py          # Geographic visualizations
│   └── utils/              # Shared utilities and constants
├── plots/
│   ├── univariate/         # Single-variable plots
│   ├── bivariate/          # Two-variable plots
│   ├── trivariate/         # Three-variable plots
│   ├── data_quality/       # Data quality visualizations
│   └── advanced/           # Advanced analysis plots
├── analysis/               # JSON outputs from analysis scripts
├── descriptions/           # YAML descriptions for plots and analysis
└── report/
    ├── main.typ            # Typst source document
    └── report.pdf          # Compiled PDF report (generated)

Scripts

Script	Description
`preprocess.py`	Loads raw CSVs, normalizes state/district names, validates pincodes, parses dates
`univariate.py`	State-wise distribution, age groups, temporal trends, activity patterns
`bivariate.py`	Correlation analysis, state-age relationships, migration patterns
`trivariate.py`	State-time-enrolment clustering, age-time dynamics, anomaly detection
`data_quality.py`	Spelling variations, naming inconsistencies, data entry issues
`advanced.py`	Demand forecasting, migration corridors, fraud indicators, resource allocation
`spatial.py`	Geographic map visualizations using shapefiles

Output

67 plots across 5 analysis categories
JSON analysis files with computed statistics
71 page PDF report with findings and recommendations

License

This project is licensed under the AGPL-3.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analysis		analysis
assistant		assistant
data		data
descriptions		descriptions
plots		plots
report		report
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aadhaar Data Analysis

Setup

Prerequisites

Clone the repository

Sync Environment & Install Dependencies

Usage

Run Full Pipeline

Options

AI Assistant

Setup

Usage

Run Individual Scripts

Project Structure

Scripts

Output

License

About

Uh oh!

Languages

License

arnav-kr/aadhaar-stats

Folders and files

Latest commit

History

Repository files navigation

Aadhaar Data Analysis

Setup

Prerequisites

Clone the repository

Sync Environment & Install Dependencies

Usage

Run Full Pipeline

Options

AI Assistant

Setup

Usage

Run Individual Scripts

Project Structure

Scripts

Output

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages