Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Analysis of Aadhaar Enrolment, Demographic and Biometric Data

License

Notifications You must be signed in to change notification settings

arnav-kr/aadhaar-stats

Repository files navigation

Aadhaar Data Analysis

This repository contains scripts and data for analysing Aadhaar enrolment, biometric, and demographic data.

Setup

Prerequisites

  • Python 3.8+
  • UV (recommended)
  • Typst (for report compilation)

Tip

To install UV, follow the instructions in the official documentation. To install Typst, see typst.app.

Clone the repository

git clone https://github.com/arnav-kr/aadhaar-stats.git
cd aadhaar-stats

Sync Environment & Install Dependencies

uv sync

Usage

Run Full Pipeline

The main script runs all analysis stages and compiles the final report:

uv run main.py

Options

uv run main.py --skip-analysis  # only compile report
uv run main.py --skip-report    # only run analysis scripts
uv run main.py --assistant      # launch AI assistant

AI Assistant

The project includes an AI-powered assistant for exploring the analysis data interactively.

Setup

  1. Create a .env file in the project root with your API key:

    AI_API_KEY=your-api-key-here
    AI_MODEL=gemini-3-flash-preview
  2. Get an API key from Google AI Studio

Usage

uv run main.py --assistant

The assistant can answer questions about:

  • Enrolment statistics and trends
  • State and district comparisons
  • Migration patterns
  • Data quality metrics
  • Anomaly detection results
  • And more...

Run Individual Scripts

uv run scripts/preprocess.py
uv run scripts/univariate.py
# etc.

Project Structure

├── main.py                 # Main pipeline script
├── assistant/              # AI-powered data exploration assistant
│   ├── __init__.py
│   ├── chat.py             # Chat interface using Gemini
│   └── data_provider.py    # Local data context provider
├── data/
│   ├── raw/                # Raw Aadhaar CSV files
│   │   ├── enrolment/      # New enrolment records
│   │   ├── demographic/    # Demographic update records
│   │   └── biometric/      # Biometric update records
│   ├── processed/          # Cleaned and normalized data
│   ├── intermediate/       # Intermediate processing artifacts
│   └── maps/               # Geographic boundary files (shapefiles, geojson)
├── scripts/
│   ├── preprocess.py       # Data cleaning and normalization
│   ├── univariate.py       # Single-variable analysis
│   ├── bivariate.py        # Two-variable relationship analysis
│   ├── trivariate.py       # Three-variable interaction analysis
│   ├── data_quality.py     # Data quality assessment
│   ├── advanced.py         # Advanced insights and forecasting
│   ├── spatial.py          # Geographic visualizations
│   └── utils/              # Shared utilities and constants
├── plots/
│   ├── univariate/         # Single-variable plots
│   ├── bivariate/          # Two-variable plots
│   ├── trivariate/         # Three-variable plots
│   ├── data_quality/       # Data quality visualizations
│   └── advanced/           # Advanced analysis plots
├── analysis/               # JSON outputs from analysis scripts
├── descriptions/           # YAML descriptions for plots and analysis
└── report/
    ├── main.typ            # Typst source document
    └── report.pdf          # Compiled PDF report (generated)

Scripts

Script Description
preprocess.py Loads raw CSVs, normalizes state/district names, validates pincodes, parses dates
univariate.py State-wise distribution, age groups, temporal trends, activity patterns
bivariate.py Correlation analysis, state-age relationships, migration patterns
trivariate.py State-time-enrolment clustering, age-time dynamics, anomaly detection
data_quality.py Spelling variations, naming inconsistencies, data entry issues
advanced.py Demand forecasting, migration corridors, fraud indicators, resource allocation
spatial.py Geographic map visualizations using shapefiles

Output

  • 67 plots across 5 analysis categories
  • JSON analysis files with computed statistics
  • 71 page PDF report with findings and recommendations

License

This project is licensed under the AGPL-3.0 License. See the LICENSE file for details.

About

Analysis of Aadhaar Enrolment, Demographic and Biometric Data

Topics

Resources

License

Stars

Watchers

Forks