-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Insights: docling-project/docling
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 6 people
-
fix: mime error in document streams
#1523 merged
May 6, 2025 -
fix: usage of hashlib for FIPS
#1512 merged
May 2, 2025 -
chore: format JSON test files to enable comparison
#1511 merged
May 2, 2025 -
chore: restore typing hint for self.script_readers
#1500 merged
Apr 30, 2025 -
fix: Guard against attribute errors in TesseractOcrModel __del__
#1494 merged
Apr 30, 2025 -
fix: enable cuda_use_flash_attention2 for PictureDescriptionVlmModel
#1496 merged
Apr 30, 2025
5 Pull requests opened by 5 people
-
Added support for xlsm files
#1520 opened
May 5, 2025 -
docs(enrichment): add enrichments for tables and figures
#1525 opened
May 6, 2025 -
fix: Fix issue withe detecting docx files, and files with upper case extentions
#1528 opened
May 6, 2025 -
fix(HTML): handle row spans in header rows
#1536 opened
May 6, 2025 -
feat: add textbox content extraction in msword_backend
#1538 opened
May 7, 2025
5 Issues closed by 3 people
-
Bounding Boxes of Docling Document children inconsistently wrong
#1398 closed
May 6, 2025 -
Option to include page number in document conversion output
#1506 closed
May 2, 2025 -
TesseractOcrModel can throw an exception during garbage collection
#1493 closed
Apr 30, 2025
21 Issues opened by 19 people
-
using SmolVLM-256M-Instruct
#1537 opened
May 7, 2025 -
docling.exceptions.ConversionError: File format not allowed
#1535 opened
May 6, 2025 -
processing docx, pptx,... will fail if the zip-container has a folder named "[trash]"
#1534 opened
May 6, 2025 -
HTML document conversion fails to extract any content
#1533 opened
May 6, 2025 -
Upgrading from 2.28.4 to 2.29.0 breaks docx to markdown pipeline
#1532 opened
May 6, 2025 -
Conversion fails with munmap_chunk(): invalid pointer
#1531 opened
May 6, 2025 -
Add Textbox content extraction in DOCX backend
#1529 opened
May 6, 2025 -
Improve parallelization for remote services
#1522 opened
May 5, 2025 -
Performance Issue. 28.1% less inference time on demo case with a simple change.
#1521 opened
May 5, 2025 -
Docs for hardware specs
#1517 opened
May 4, 2025 -
Smoldocling generation output gets stuck in a loop for certain images.
#1515 opened
May 3, 2025 -
export functions not working for smoldocling vlm pipelines
#1513 opened
May 2, 2025 -
Cache intermediate results with diskcache
#1509 opened
May 2, 2025 -
SmolDocling on Windows (was a 245KB file takes forever in the UI)
#1510 opened
May 2, 2025 -
Documentation Improvement: Add Descriptions and Context for Features
#1505 opened
May 1, 2025 -
Feature Request: Support for .docx and .pptx similar to pdfPipeline
#1504 opened
May 1, 2025 -
Include page images as REFERENCED when saving a docling document as json
#1503 opened
May 1, 2025 -
Custom OCR
#1502 opened
May 1, 2025 -
Usage of force_full_page_ocr breaks with larger documents
#1499 opened
Apr 30, 2025 -
Issue about two images are placed side by side in a DOCX file
#1498 opened
Apr 30, 2025
14 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Stuck when using with Celery
#1277 commented on
Apr 30, 2025 • 0 new comments -
Got Error: Javascript is off when trying to process some PDF files
#1491 commented on
Apr 30, 2025 • 0 new comments -
raises_on_error = False should catch all exceptions, including HTTP errors
#770 commented on
Apr 30, 2025 • 0 new comments -
Converter seems to get stuck on very large pdfs
#1283 commented on
May 5, 2025 • 0 new comments -
Make easyocr optional
#1462 commented on
May 5, 2025 • 0 new comments -
How can I extract the content of headers and footers from a DOCX file?
#1394 commented on
May 5, 2025 • 0 new comments -
Export to markdown only contains H2 headers
#1023 commented on
May 6, 2025 • 0 new comments -
Unable to parse this docx file
#1403 commented on
May 6, 2025 • 0 new comments -
HTML Pipeline IndexError: list index out of range
#1309 commented on
May 6, 2025 • 0 new comments -
TableStructureModel initialization fails: "Cannot copy out of meta tensor" when using CPU device
#1447 commented on
May 7, 2025 • 0 new comments -
feat: Establish confidence estimation for document and pages
#1313 commented on
May 6, 2025 • 0 new comments -
fix: incorrect force_backend_text behaviour for VLM DocTag pipelines
#1371 commented on
May 1, 2025 • 0 new comments -
feat(html): add anchor tag support in HTML conversion
#1402 commented on
May 4, 2025 • 0 new comments -
feat: new HTML backend that handles styled html as well as images
#1411 commented on
May 6, 2025 • 0 new comments