Docling’s cover photo
Docling

Docling

Technology, Information and Internet

Get your documents ready for gen AI

About us

Docling unlocks the information trapped in your PDFs, Office files, images, and more, so you can automate document processing and build AI applications with ease and speed.

Website
https://docling.ai
Industry
Technology, Information and Internet
Company size
2-10 employees
Type
Nonprofit

Employees at Docling

Updates

  • Docling reposted this

    Sunday Coffee & Code: RAG improvements, and a few detours Most of my recent Coffee & Code time has been going into one very specific thing, improving the RAG layer behind my Microsoft Agent Framework–based RFP multi-agent solution. Here’s the path so far. Step 1: Chroma + Docling (via MCP) ● This was the first “real” implementation. Simple vector search over RFP input, fed by Docling. ● It worked well enough to support one live RFP response, which is an MVP bar for me - whether something is more than an experiment. Step 2: LightRAG ● I then went looking for ways to improve retrieval quality and came across LightRAG (https://lnkd.in/gCftxBBr), which combines vector search, knowledge graph, and re-ranking. ● Technically, it was impressive. I got it running, built the knowledge graph, and the architecture made a lot of sense. ● But it also did something unsettling: the knowledge graph started containing things that were never in the source documents (for example, “Noah Carter – World Athletics Championship”). Whether this was a hallucinated artifact or not, it was a reminder to take a closer look. Step 3: OpenRAG Last week I came across IBM's OpenRAG stack - https://www.openr.ag/ - (Langflow + Docling + OpenSearch Project) and it checked a lot of boxes for me: ● LangFlow (where I built my first real agent back in 2024) ● Docling (already central to my document pipeline) ● OpenSearch (scalable, production-grade search) ● Langfuse support (observability - also used it before and it provides valuable insights) The install was quick (Docker, Inc based), though I did hit a startup issue on my Amazon Web Services (AWS) EC2 instance that took some fiddling to resolve (GitHub issue - https://lnkd.in/g38qb8QK - fixed with my PR: https://lnkd.in/g594VTQ6). Once running, though, this stack is genuinely interesting: configurable models, Langfuse tracing, and full access to LangFlow flows sitting right under the RAG layer. This feels much closer to something I could operate, tune, and trust over time (as well as getting support for). Next steps: Now it’s about stabilising the OpenRAG deployment and wiring it cleanly into my RAG Manager so the agent can start using it for real RFP work. As always, this about exploring the latest tools as well as answering a simple consulting question: Can I use this in a real, production, workflow and trust it? More to come.

    • No alternative text description for this image
    • No alternative text description for this image
  • Docling reposted this

    I have built a production grade, multi modal RAG platform. No SaaS, all local. Full data sovereignty. After rigorous design and iteration, I have deployed a system that ingests, reasons over, and answers questions across documents, images, audio, and web content in a fully controlled, auditable environment. The Architecture: * Orchestration: n8n as the event driven workflow engine and API surface. * Ingestion: Gotenberg (Web/HTML), Docling (Office/PDF), and GPU accelerated WhisperX (Audio/Video with diarisation). * Security: Custom FastAPI service for PII redaction before embedding generation. * Storage: Qdrant (Metadata-rich) + Ollama (nomic-embed-text). * Retrieval: Two-stage process (Vector search + LLM reranking) with strict citation tracing. * Unified Identity: UUIDs per document with full traceability from chunk back to source. * Predictable: Deterministic routing and explicit error paths. * Private: Designed for on-prem or air gapped deployment. You own the data, the models, and the hardware. This isn't just a chatbot, it is a compliance ready infrastructure. If you would like this designed, deployed, or tailored for your environment, drop me a message or comment below. I learn. I build. I ship. Brett

    • No alternative text description for this image
  • Docling reposted this

    🔔 Docling is now on Pyradar 𝗗𝗼𝗰𝗹𝗶𝗻𝗴 is now added to the document parsing category on 𝗣𝘆𝗥𝗮𝗱𝗮𝗿 𝗗𝗼𝗰𝗹𝗶𝗻𝗴 is an open-source Python library designed to simplify document processing by parsing a wide range of formats (including PDFs, DOCX, PPTX, HTML, images, and more) into a unified structured representation. That is a very real need if you’re building 𝗥𝗔𝗚 pipelines or document-heavy AI workflows and don’t want to fight layouts, tables, and random formatting all day. You can now check out how 𝗗𝗼𝗰𝗹𝗶𝗻𝗴 compares with other document parsing frameworks on 𝗣𝘆𝗥𝗮𝗱𝗮𝗿 👀 👉 https://pyradar.com Sometimes the most useful tools aren’t new; they’re just finally easy to compare.😜 #Python #PyRadar #OpenSource #AI #RAG #DocumentParsing #DevTools

    • No alternative text description for this image
  • Docling reposted this

    Want to build an AI assistant that can understand and answer questions from documents? In my latest IBM Developer tutorial, I show you how to: - Ingest and parse documents with Docling - Build a Retrieval-Augmented Generation (RAG) pipeline with Quarkus & LangChain4j - Deploy a working REST API you can extend Try it out and level up your AI dev skills: https://lnkd.in/gaCDPQjT #AI #Java #SoftwareEngineering #Quarkus Docling IBM IBM Developer Esau Betancourt Michele Dolfi, PhD

    • No alternative text description for this image
  • New Video: Unlock Better RAG & AI Agents with Docling A new YouTube video hosted by Cedric Clyburn and Mingxuan Z. introduces Docling and shows how to unlock more powerful RAG pipelines and AI agents by treating documents as first-class citizens in the GenAI stack. The discussion also covers how Docling integrates with other popular frameworks, making it easy to plug into existing GenAI workflows. 📦 Docling is your toolbox for documents in the Gen AI era — helping you parse, structure, and prepare documents so your models and agents can reason better and go further. 👉 In the video, you’ll learn: - Transform unstructured files like PDFs, images, and tables into structured data - Build smarter RAG workflows with optimized, document-aware pipelines - Enable scalable AI agents using structured and multimodal data - Apply cutting-edge techniques for multimodal RAG - Integrate Docling seamlessly with other GenAI frameworks Watch it at: https://lnkd.in/dEbyJdZg 🔗 Learn more & get involved: - GitHub: https://lnkd.in/g63qJWM9 - Join our Discord: https://docling.ai/discord - Join one of our upcoming office hours If you’re building with documents and GenAI, Docling can help. 🚀 #Docling #GenAI #RAG #AIAgents #OpenSource #DeveloperTools

  • Docling reposted this

    Docling has been exceeding my expectations lately — and in many real-world cases, it’s performing better than Unstructured for document parsing and extraction. I’m currently building a Knowledge Graph pipeline using a mixed corpus of PDFs, images, and XLSX files — and Docling has become my first-level parser and processing backbone. While working on 𝗣𝗗𝗙𝘀𝘁𝗿𝗮𝗰𝘁, I already saw how powerful Docling was for PDF extraction. But recently, I started stress-testing it with large XLSX files — and it proved itself again. After extraction, I’m using LLMGraphTransformer + LangGraph (LangChain) to convert the parsed data into an agent-driven Knowledge Graph, and the results have been very promising. I hope they continue investing in the LLMGraphTransformer library. If you’re also building Knowledge Graphs at scale — I’d love to connect and learn how are you doing it at scale. I’m planning to bring some of these capabilities into PDFstract soon to support the open-source community. Finally #AI is generating lot of business values. #LangChain #Docling #AIEngineering #Neo4j #RAG #DataEngineering #KnowledgeGraphs #OpenSource

    • No alternative text description for this image
  • Docling reposted this

    Local document processing for AI has never been faster. We talk a lot about models, agents and MCP. But there are many innovations happening in the "boring" part of the stack. For a long time, running document pipelines locally was a bottleneck. Parsing PDFs, extracting tables, normalizing formats was all CPU-bound. That’s changing fast. At CES 2026, NVIDIA announced an agentic AI toolkit that includes Docling, now optimized for RTX GPU acceleration. If you have a computer with NVIDIA GeForce RTX, you can build the fastest document processing pipeline for: - local RAG setups - agent workflows - real-time document ingestion - privacy-sensitive, on-device use cases Incredibly proud to be working closely with the Docling team at IBM, and seeing how fast they are able to not only make Docling the best document processing solution but also make it work with the entire AI ecosystem. Curious to see what you'll be building with Docling!

    • No alternative text description for this image
  • Docling reposted this

    🚀 Exciting Collaboration Announcement at CES: NVIDIA × Docling 🚀 We’re thrilled to share that NVIDIA and Docling are collaborating to deliver best-in-class performance for high-quality document conversion. This collaboration has already led to significant optimizations in Docling, leveraging batch processing to achieve up to a 4× speed-up compared to CPU-only workloads. ️Thanks to the close partnership with the NVIDIA RTX AI team, we were able to develop, test, and benchmark across multiple NVIDIA systems, including the powerful RTX 5090—pushing the limits of document conversion performance on RTX PCs. Learn more about the NVIDIA RTX optimizations: https://lnkd.in/dpsHq7af Get started with Docling on RTX: https://lnkd.in/dsN4KHwt We’re excited about what’s ahead and look forward to enabling faster, higher-quality document processing for the community. Stay tuned for more Docling news! Many thanks to our awesome collaborators at Nvidia: Aslı Sabancı Demiröz, Douglas O'Flaherty, Sean Sodha and many more! cc: Abdel Labbi, Sriram Raghavan, Shankar Ramaswamy, Ed Anuff, Milan (Marquez) Higginbotham, Alejandro Cantarero #CES #IBM #NVIDIA #Docling #RTXAI #OpenSource #DocumentAI #Performance #AIInfrastructure

    • No alternative text description for this image
  • Docling reposted this

    I just published one of the most practical tutorials I’ve written this year. Over the last months I’ve been experimenting with bringing agentic systems closer to real enterprise data. The missing link has always been reliable document ingestion—PDFs, tables, multi-column layouts, the messy things real companies use every day. Thanks to the great work from Eric Deandrea, Thomas Vitale, Alex Soto, and the amazing folks at Docling, we now have first-class Java support and a Quarkus extension that makes high-fidelity ingestion effortless. So I built a full Enterprise RAG pipeline around it: • Docling for layout-aware PDF parsing • pgvector + PostgreSQL for vector search • LangChain4j + Ollama for local LLM reasoning • Guardrails to keep responses safe and grounded • All running inside Quarkus If you're exploring RAG, agentic systems, or AI-powered workflows in Java, this walkthrough gives you a complete, reproducible baseline. Full tutorial here: https://lnkd.in/dFD-zt_Z #Java #Quarkus #AI #Docling #RAG #LLM #SoftwareEngineering #OpenSource #EnterpriseAI

    • No alternative text description for this image

Similar pages