A smart application for full document extraction, reorganization, and Q&A interaction — powered by OpenAI and Streamlit.
- Upload Files: Supports
.pdf,.txt,.jpg,.png. - Smart Extraction:
- Attempts structured extraction via MarkItDown.
- Falls back to OCR (Tesseract) if necessary.
- Content Reorganization:
- Reorganizes messy extracted text into clean Markdown via OpenAI model (
gpt-4.1-mini).
- Reorganizes messy extracted text into clean Markdown via OpenAI model (
- Interactive Q&A:
- Ask direct questions about the uploaded content.
- Receives precise and concise answers instantly.
- Download Reorganized Files:
- Save the cleaned-up Markdown as a
.txtfile.
- Save the cleaned-up Markdown as a
- User-Friendly Interface:
- Built with Streamlit for ease of use and beautiful layout.
- Sidebar contains author links and branding.
- PDF documents (
.pdf) - Text files (
.txt) - Image files (
.jpg,.jpeg,.png,.bmp,.tiff)
- Upload a file.
- Start extraction and display the extracted text.
- Reorganize the text into structured Markdown.
- Ask questions about the content.
- Download the reorganized content.
Install the dependencies:
pip install streamlit openai tiktoken markitdown pytesseract pymupdf pillow✨ Enjoy smart document interaction!