NIW-NP-RAG

Retrieval-Augmented Generation (RAG) pipeline for analyzing USCIS Administrative Appeals Office (AAO) case PDFs with a focus on National Interest Waiver (NIW) decisions. The repository contains tools extract and preprocess text, build semantic embeddings, index with FAISS, and expose a RAG query service.

Architecture

Features

PDF Crawling & Download: Automated crawler with polite throttling, deduplication, and pause/stop controls
Text Extraction: Robust PDF parsing with OCR fallback for scanned documents
Semantic Chunking: Intelligent document splitting using embedding-based boundaries
Vector Embeddings: High-quality semantic embeddings via sentence-transformers or cloud providers
FAISS Indexing: Fast approximate nearest-neighbor search for semantic retrieval
RAG Pipeline: Retrieve relevant context and generate answers using LLM prompts
Web UI: Streamlit interface for interactive queries and document exploration
REST API: FastAPI service for programmatic access and integration

Retrieval Performance

Evaluated on a held-out test set of NIW case queries:

Metric	Score
Recall@K	72.00%
Precision@K	20.80%
Mean Reciprocal Rank (MRR)	0.70
Normalized Discounted Cumulative Gain (nDCG@K)	0.6957

Quick Start

Prerequisites

Python 3.10+ (recommended to use a virtual environment)
Git

Setup

Clone the repository and navigate to the project folder.
Create and activate a virtual environment.
Install the required dependencies:

pip install -r requirements.txt

Note: Obtain the data and place it in the data/ folder. Contact me for access if needed.

Run the API and UI

# API
uvicorn niw_np_rag.app.main:app --host 0.0.0.0 --port 8000

# Streamlit UI (in another terminal)
streamlit run streamlit_app.py

Folder structure

NIW-NP-RAG/                         # Project root directory
│
├── assets/                         # Static files, images, or supporting assets
│
├── niw_np_rag/                     # Main Python package
│   ├── app/                        # Application modules (e.g., RAG pipeline, API)
│   ├── config/                     # Configuration files and environment settings
│   └── scripts/                    # Utility or preprocessing scripts
│
├── data/                           # (Optional) Raw and processed data storage
│
├── test/                           # Test scripts and small development environment
│
├── streamlit_app.py                # Streamlit UI entry point
│
└── README.md                       # Project documentation

Configuration

Save credentials or API keys (LLM, embedding providers) in environment variables or a config file not checked into source control.
Example environment variables:
- GOOGLE_API_KEY
- OPENAI_API_KEY
- VECTOR_STORE_PATH

Place provider-specific configuration under niw_np_rag/config/ and do not commit secrets.

Contributing

Fork the repo and create a feature branch
Keep changes small and unit-tested
Open a pull request with a description and testing steps
Use black/ruff for style (if configured)

License & attribution

Specify project license here (e.g., MIT). Acknowledge any third-party tools, libraries, and data sources used.

Contact

For questions, open an issue or contact the maintainer via the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
assets		assets
evaluation/datasets		evaluation/datasets
frontend		frontend
niw_np_rag.egg-info		niw_np_rag.egg-info
niw_np_rag		niw_np_rag
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
NIW_NP_RAG.ipynb		NIW_NP_RAG.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements_upd.txt		requirements_upd.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NIW-NP-RAG

Architecture

Features

Retrieval Performance

Quick Start

Prerequisites

Setup

Folder structure

Configuration

Contributing

License & attribution

Contact

About

Uh oh!

Releases

Packages

Languages

License

zamanmiraz/NIW-NP-RAG

Folders and files

Latest commit

History

Repository files navigation

NIW-NP-RAG

Architecture

Features

Retrieval Performance

Quick Start

Prerequisites

Setup

Folder structure

Configuration

Contributing

License & attribution

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages