🧠 Autism Diagnostic Support System

XGBoost · SHAP Explainability · Apple-Health Inspired UI

🌟Executive Summary

A full end-to-end ML system for ASD screening using AQ-10 questionnaire data. Includes XGBoost classifier, SHAP interpretability, Apple-Health-style UI, and clinical-grade PDF reporting. Built as an impressive, production-quality portfolio project demonstrating ML engineering, data pipelines, explainability, UI engineering, and deployment.

📑 Table of Contents

Problem Statement

Millions of individuals remain undiagnosed or diagnosed late for Autism Spectrum Disorder (ASD) due to:

Limited access to clinical specialists
Long waiting periods for assessments
Lack of awareness or hesitation to seek help
Resource constraints in low-income regions

The challenge:
How can we build a fast, transparent, accessible tool to support early ASD screening — without replacing clinical evaluation?

This project answers that by building a responsible, explainable ML-based support system using the AQ-10 screening questionnaire.

Why Autism Detection Matters

ASD affects approximately 1 in 100 people globally
Early identification improves communication, social, and learning outcomes
Screening gaps exist in rural and low-resource healthcare systems
Digital tools can help triage cases early
Machine learning can support clinicians — not replace them

This project demonstrates how XAI + ML can enhance early screening accessibility.

Dataset Description

Source:
UCI / Kaggle — Autism Screening Adults & Children Dataset

Dataset Type:
Questionnaire-based binary classification (ASD vs Non-ASD)

Contents:

AQ-10 questionnaire (10 binary questions)
Demographics:
age, gender, ethnicity, country of residence
Medical factors: jaundice at birth
Social factors: relation (parent/self), used autism app before
Target: class_asd

Size: ~700 samples
Features: 19
Label distribution: Balanced enough for supervised learning

📌 Note:
The dataset is small and highly separable because AQ-10 questions are directly diagnostic.
This explains the unusually high performance of ML models.

Project Pipeline (ML Workflow)

flowchart LR
    subgraph PREP[Data Preparation]
        A1[Raw Data]
        A2[Cleaning]
        A3[Feature Engineering]
        A4[Encoding]
        A5[Train-Test Split]
        A1 --> A2 --> A3 --> A4 --> A5
    end

    subgraph MODEL[Modeling]
        B1[XGBoost Training]
        B2[Evaluation Metrics]
        B3[SHAP Explainability]
        B1 --> B2 --> B3
    end

    subgraph APP[Application Layer]
        C1[Streamlit App]
        C2[PDF Report Generator]
    end

    A5 --> B1
    B3 --> C1
    C1 --> C2

System Architecture Diagram

flowchart LR

    subgraph UI[Streamlit Frontend]
        A[User Inputs & Dashboard]
    end

    subgraph BE[Backend Model Layer]
        B[Preprocessing → XGBoost Model → SHAP Explainability]
    end

    subgraph AR[Artifacts & Reports]
        C[Models • Processed Data • PDF Reports]
    end

    A --> B --> C

Model Performance & Comparison

Model	Accuracy	F1 Score	Recall	AUC
Logistic Regression	1.00	1.00	1.00	0.99
Random Forest	0.94	0.89	0.84	0.996
XGBoost (Chosen)	0.986	0.974	0.974	0.9995
Neural Network (MLP)	1.00	1.00	1.00	1.00

🏆 Why XGBoost Was Chosen

Best trade-off between accuracy, stability, and interpretability
Works extremely well on small structured datasets
Fully compatible with TreeSHAP for transparent explainability
Fast, robust, and highly generalizable

Explainability (SHAP)

This project uses SHAP (SHapley Additive Explanations) to provide transparent, interpretable insights into why the model predicts ASD Positive or Negative.

🔍 Local SHAP (Per-Patient Explanation)

Shows how each feature contributed to an individual prediction.

Example:

Feature	SHAP Value
a9_score	-1.2110
a6_score	-1.0042
a5_score	-0.8661
a7_score	-0.8091
a3_score	-0.7645
a4_score	-0.7368

Negative SHAP → pushes toward ASD Negative
Positive SHAP → pushes toward ASD Positive
Larger magnitude → stronger influence

Streamlit App — UI Gallery

📸 Click to expand full UI + SHAP gallery

🏠 Home Dashboard

🔍 Prediction View (Model Output)

📝 Generated PDF Report

🧠 SHAP Explainability

📌 Local Feature Impact (Bar Plot)

📌 Global Beeswarm Plot

📌 Waterfall (Single Sample)

📈 Evaluation Metrics

✔️ Confusion Matrix

✔️ ROC Curve

✔️ Calibration Curve

Project Features

✨ Click to expand Feature Highlights

Core Features

ASD Risk Prediction using an optimized XGBoost classifier

Apple-Health Inspired UI with clean, clinical-style cards

Real-time Probability Ring that visualizes ASD+ likelihood

AQ-10 Questionnaire Input (10 binary symptom questions)

Demographic Inputs with encoded categorical features

Dynamic Risk Scoring based on total AQ-10 + age

SHAP Explainability (local + global)

Explainability Features

Local SHAP force explanation (per patient)

Global beeswarm + bar importance plots

Waterfall plot for individual predictions

Top 6 contributing features displayed in-dashboard

PDF Report Generator

Exports a clinical-style report containing:

Prediction

Probability

AQ-10 score

Risk level

Recommendation

Top SHAP contributions

Great for portfolio + recruiters.

Machine Learning Pipeline

Preprocessing: encoding + feature engineering

Train/test split

XGBoost model training

ROC, AUC, confusion matrix, calibration

Serialized model artifacts saved in /models/

Evaluation & Model Monitoring

Confusion Matrix

ROC Curve

Calibration Curve

Model Comparison Table

SHAP-based auditing

Software Architecture Highlights

Clear separation of concerns (src/, app/, models/, notebooks/)

Production-like artifact loading in Streamlit

Modular risk scoring function

Explainability integrated into UI

Deployment-Ready

Fully packaged Streamlit app

GitHub-friendly structure

Works locally or on cloud platforms (Streamlit Cloud)

How to Run Locally

Follow the steps below to run the Autism Diagnostic Support System on your machine.

Clone the Repository

git clone https://github.com/aparnaworkspace/autism-diagnostic-support-tool
cd autism-diagnostic-support-tool

Create and Activate a Virtual Environment

python3 -m venv venv
source venv/bin/activate

(Windows users: venv\Scripts\activate)

Install Dependencies

pip install -r requirements.txt

Run the Streamlit Application

streamlit run app/streamlit_app.py

Regenerate SHAP Explainability Visuals

If you want fresh SHAP plots (bar, beeswarm, waterfall):

python notebooks/04_Model_Evaluation.py

Jupyter Notebook Workflow

To explore EDA or model training:

jupyter lab

Folder Structure

A well-structured, production-style codebase:

autism-diagnostic-support-tool/
│
├── app/
│   └── streamlit_app.py
│
├── assets/
│   ├── home.png
│   ├── prediction.png
│   ├── pdf_report.png
│   ├── shap_bar.png
│   ├── shap_beeswarm.png
│   ├── shap_waterfall_sample_0.png
│   ├── confusion_matrix.png
│   ├── calibration_curve.png
│   └── roc_curve.png
│
├── data/
│   ├── raw/
│   │   ├── autism_screening.csv
│   │   └── Autism-Child-Data.csv
│   └── processed/
│       ├── autism_combined.csv
│       ├── X_train.csv
│       ├── X_test.csv
│       ├── y_train.csv
│       └── y_test.csv
│
├── models/
│   ├── best_model.pkl
│   ├── scaler.pkl
│   ├── label_encoders.pkl
│   └── shap_explainer_and_values.pkl
│
├── notebooks/
│   ├── 01_EDA.ipynb
│   ├── 02_Feature_Engineering.ipynb
│   ├── 03_Model_Training.ipynb
│   └── 04_Model_Evaluation.ipynb
│
├── reports/
│   ├── confusion_matrix.png
│   ├── roc_curve.png
│   ├── calibration_curve.png
│   └── *.pdf
│
├── src/
│   ├── preprocess.py
│   ├── train_model.py
│   ├── risk_scoring.py
│   └── explainability.py
│
├── docs/
│   ├── MODEL_CARD.md
│   ├── DATA_CARD.md
│   ├── MODEL_COMPARISON.md
│   ├── SYSTEM_ARCHITECTURE.md
│   └── ETHICS_CARD.md
│
├── requirements.txt
└── README.md

Clinical Disclaimer

⚠️ This tool is NOT a diagnostic system.

It is an educational machine-learning project designed for:

research demonstration

explainability exploration (SHAP)

portfolio and skill showcasing

Autism Spectrum Disorder (ASD) diagnosis requires trained clinicians and involves:

behavioural observation

developmental history

structured clinical interviews

neuropsychological assessments

multi-disciplinary evaluation

genetics & neurological analysis

No machine-learning model, screening questionnaire, or digital tool can replace professional evaluation. This project should not be used for medical, clinical, or therapeutic decision-making.

Limitations

Despite strong performance, the project has important limitations that recruiters and reviewers should know:

Dataset Limitations

Small dataset (≈700 samples)

Questionnaire-based (AQ-10) → inherently diagnostic

Limited feature variety (binary responses, demographics)

May contain cultural or demographic biases

Does not include real-world behavioural, video, audio, MRI, or genetic data

Model Limitations

High accuracy partly due to dataset separability

May not generalize to unseen populations or clinical settings

No temporal, behavioural, or contextual signals

Risk of overfitting due to small sample size

Application Limitations

UI is for demonstration only (not medically approved)

SHAP helps explain decisions but does not guarantee model fairness

PDF reports are educational summaries, not clinical documents

Contact

If you’d like to connect or discuss this project: 📧 [email protected]

🔗 LinkedIn:www.linkedin.com/in/aparnasajeevan1610

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Autism Diagnostic Support System

🌟Executive Summary

📑 Table of Contents

Problem Statement

Why Autism Detection Matters

Dataset Description

Project Pipeline (ML Workflow)

System Architecture Diagram

Model Performance & Comparison

🏆 Why XGBoost Was Chosen

Explainability (SHAP)

🔍 Local SHAP (Per-Patient Explanation)

Streamlit App — UI Gallery

Project Features

Core Features

Explainability Features

PDF Report Generator

Machine Learning Pipeline

Evaluation & Model Monitoring

Software Architecture Highlights

Deployment-Ready

How to Run Locally

Folder Structure

Clinical Disclaimer

Limitations

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
app		app
assets		assets
data		data
docs		docs
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
dev.sh		dev.sh
requirements.txt		requirements.txt

aparnaworkspace/autism-diagnostic-support-tool

Folders and files

Latest commit

History

Repository files navigation

🧠 Autism Diagnostic Support System

🌟Executive Summary

📑 Table of Contents

Problem Statement

Why Autism Detection Matters

Dataset Description

Project Pipeline (ML Workflow)

System Architecture Diagram

Model Performance & Comparison

🏆 Why XGBoost Was Chosen

Explainability (SHAP)

🔍 Local SHAP (Per-Patient Explanation)

Streamlit App — UI Gallery

Project Features

Core Features

Explainability Features

PDF Report Generator

Machine Learning Pipeline

Evaluation & Model Monitoring

Software Architecture Highlights

Deployment-Ready

How to Run Locally

Folder Structure

Clinical Disclaimer

Limitations

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages