Thanks to visit codestin.com
Credit goes to github.com

Skip to content

alibaba/SmartResume

Repository files navigation

SmartResume - Intelligent Resume Parsing System

SmartResume Logo

💻 Code   |   🤗 Model   |   🤖 Demo   |   📑 Technical Report

English | 中文

Project Introduction

SmartResume is an layout‑aware resume parsing system. It ingests resumes in PDF, image and common Office formats, extracts clean text (OCR + PDF metadata), reconstructs reading order with layout detection, and leverages LLMs to convert content into structured fields such as basic info, education, and work experience.

demo.mp4

Quick Start

Requirements

  • Python >= 3.9
  • CUDA >= 11.0 (optional, for GPU acceleration)
  • Memory >= 8GB
  • Storage >= 10GB

Installation

  1. Clone the repository
git clone https://github.com/alibaba/SmartResume.git
cd SmartResume
  1. Create conda environment
conda create -n resume_parsing python=3.9
conda activate resume_parsing
  1. Install dependencies
pip install -e .
  1. Configure environment
# Copy configuration template
cp configs/config.yaml.example configs/config.yaml
# Edit configuration file and add API keys
vim configs/config.yaml

Basic Usage

Method 1: Command Line Interface (Recommended)

# Parse single resume file
python scripts/start.py --file resume.pdf

# Specify extraction types
python scripts/start.py --file resume.pdf --extract_types basic_info work_experience education

Method 2: Python API

from smartresume import ResumeAnalyzer

# Initialize analyzer
analyzer = ResumeAnalyzer(init_ocr=True, init_llm=True)

# Parse resume
result = analyzer.pipeline(
    cv_path="resume.pdf",
    resume_id="resume_001",
    extract_types=["basic_info", "work_experience", "education"]
)

print(result)

Local Model Deployment

SmartResume now supports local model deployment using vLLM, reducing dependency on external APIs:

# Download Qwen-0.6B-resume model
python scripts/download_models.py

# Deploy model
bash scripts/start_vllm.sh

For detailed local model deployment guide, see LOCAL_MODELS.

Key Features

Metric Category Specific Metric Value Description
Layout Detection [email protected] 92.1% High layout detection accuracy
Information Extraction Overall Accuracy 93.1% High accuracy
Processing Speed Single Page Time 1.22s High performance
Language Support Supported Languages many Covering major global languages

Benchmark Results

For detailed benchmark results, see Benchmark Results.

Configuration

For detailed configuration options, see the Configuration Guide.

License Information

This project is licensed under the LICENSE.

Currently, some models in this project were previously trained with third-party detectors. We plan to explore and replace them with models under more permissive licenses to enhance user-friendliness and flexibility.

Acknowledgments

Citation

@article{Zhu2025SmartResume,
  title={Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation},
  author={Fanwei Zhu and Jinke Yu and Zulong Chen and Ying Zhou and Junhao Ji and Zhibo Yang and Yuxue Zhang and Haoyuan Hu and Zhenghao Liu},
  journal={arXiv preprint arXiv:2510.09722},
  year={2025},
  url={https://arxiv.org/abs/2510.09722}
}

Note: Please ensure compliance with relevant laws and regulations and privacy policies.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages