A powerful tool to extract Multiple Choice Questions (MCQs) from text, DOCX, and PDF files, converting them into organized Excel format.
-
Install dependencies:
pip install -r requirements_etl.txt
-
Run the web app:
streamlit run app.py
-
Use the tool:
- Paste MCQ text directly
- Upload DOCX or PDF files
- Edit extracted data
- Download as Excel file
- Multi-format Support: Text, DOCX, PDF files
- Smart Parsing: Extracts questions, options, and solutions
- Interactive Editing: Edit data before export
- Excel Export: Download organized data
- Page/Paragraph Selection: Process specific sections of large documents
app.py- Main web applicationextract_mcqs.py- Core extraction logictest_extraction.py- Testing scriptPROJECT_DOCUMENTATION.md- Detailed documentation
The tool generates Excel files with:
- Question text
- 4 multiple choice options
- Solution/answer
- Serial numbers
- Numbered options: (1), 1), ( 1 )
- Lettered options: (a), A), (A)
- Various solution formats: Sol., Solution, Answer, Ans.
For detailed information, see PROJECT_DOCUMENTATION.md