💻 Code | 🤗 Model | 🤖 Demo | 📑 Technical Report
English | 中文
SmartResume is an layout‑aware resume parsing system. It ingests resumes in PDF, image and common Office formats, extracts clean text (OCR + PDF metadata), reconstructs reading order with layout detection, and leverages LLMs to convert content into structured fields such as basic info, education, and work experience.
demo.mp4
- Python >= 3.9
- CUDA >= 11.0 (optional, for GPU acceleration)
- Memory >= 8GB
- Storage >= 10GB
- Clone the repository
git clone https://github.com/alibaba/SmartResume.git
cd SmartResume- Create conda environment
conda create -n resume_parsing python=3.9
conda activate resume_parsing- Install dependencies
pip install -e .- Configure environment
# Copy configuration template
cp configs/config.yaml.example configs/config.yaml
# Edit configuration file and add API keys
vim configs/config.yaml# Parse single resume file
python scripts/start.py --file resume.pdf
# Specify extraction types
python scripts/start.py --file resume.pdf --extract_types basic_info work_experience educationfrom smartresume import ResumeAnalyzer
# Initialize analyzer
analyzer = ResumeAnalyzer(init_ocr=True, init_llm=True)
# Parse resume
result = analyzer.pipeline(
cv_path="resume.pdf",
resume_id="resume_001",
extract_types=["basic_info", "work_experience", "education"]
)
print(result)SmartResume now supports local model deployment using vLLM, reducing dependency on external APIs:
# Download Qwen-0.6B-resume model
python scripts/download_models.py
# Deploy model
bash scripts/start_vllm.shFor detailed local model deployment guide, see LOCAL_MODELS.
| Metric Category | Specific Metric | Value | Description |
|---|---|---|---|
| Layout Detection | [email protected] | 92.1% | High layout detection accuracy |
| Information Extraction | Overall Accuracy | 93.1% | High accuracy |
| Processing Speed | Single Page Time | 1.22s | High performance |
| Language Support | Supported Languages | many | Covering major global languages |
For detailed benchmark results, see Benchmark Results.
For detailed configuration options, see the Configuration Guide.
This project is licensed under the LICENSE.
Currently, some models in this project were previously trained with third-party detectors. We plan to explore and replace them with models under more permissive licenses to enhance user-friendliness and flexibility.
@article{Zhu2025SmartResume,
title={Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation},
author={Fanwei Zhu and Jinke Yu and Zulong Chen and Ying Zhou and Junhao Ji and Zhibo Yang and Yuxue Zhang and Haoyuan Hu and Zhenghao Liu},
journal={arXiv preprint arXiv:2510.09722},
year={2025},
url={https://arxiv.org/abs/2510.09722}
}Note: Please ensure compliance with relevant laws and regulations and privacy policies.
