We’re sharing the first two tools we’ve begun building at the DIGIT-X Lab. Both are open-source, fully on-premise, and still early — but they represent the direction we’re quietly working toward. MOSAICX: A healthcare text-structuring engine that takes unstructured clinical text: reports, notes, PDFs, and turns it into structured, machine-readable data. It’s a first step toward the digital foundations needed for reliable, AI-driven medicine. AnnotateX: A companion tool for creating high-quality textual annotations (“gold standards”), with optional AI assistance. These annotations help evaluate and refine tools like MOSAICX and support the creation of robust clinical datasets. Both tools run entirely on your own infrastructure — your data stays with you. And both will grow through collaboration, feedback, and careful iteration. DIGITX | LMU Radiology Quietly ambitious about hard things.
[Phase 01] Our attempt in Structuring unstructured medical data - to enable the conversion of real world data to real world evidence. We are officially open-sourcing MosaicX, the first tool we’re building at the DIGITX lab. MosaicX is an early-stage medical text-structuring engine designed to transform unstructured clinical text into structured, machine-readable data. Alongside it, we’re releasing AnnotateX, our second tool for creating high-quality structured text annotations. We built it so that we can validate MosaicX. Together, these tools form the starting point of our effort to build the digital foundations needed for AI-driven precision medicine. Coming from a background in molecular hybrid imaging and image analysis (with my work in ENHANCE.PET) and moving into LLMs and clinical text feels like starting my PhD all over again. But one thing is clear: to make AI-driven precision medicine a reality, we must ensure clinical data is truly computable, including the “dark matter” of healthcare such as PDFs and free-text reports. Both MosaicX and AnnotateX are still in their infancy - a lot of work to be done, but we’re releasing them openly to learn, iterate, and grow together with the community. We are currently looking for clinical text datasets across radiology and other specialties (non imaging) to guide the next stages of development and validation. Cool thing is both of them run on-premise and can use Ollama backend. Your data stays with you. If you’re working on structured clinical data or interested in collaborating for a large scale validation, feel free to reach out. DIGITX Repository with DOIs for both the tools: https://lnkd.in/dRJpSMiV Shout out to Canva AI for the cool audio generation. DIGITX | LMU Radiology