Codestin Search App

Define a policy around how MMDA will:

Represent Images in Document
Serialize/Load Images
Integrate with other vision libraries like LayoutParser. Particularly around this point, aim for 2 options: (1) Indirect integration where user is expected to run their vision models outside of MMDA, format their image data into a manner compatible with MMDA, then load them in to manipulate within MMDA. This is suitable for libraries like LayoutParser that depend on detectron2 and may have incompatible environments with the rest of MMDA. (2) Direct integration where a user can run vision models directly in same environment as other MMDA.Predictors. This is suitable for libraries like Huggingface which are adding vision models
Includes MMDA Image fields, such as Tables/Figures and their associated Captions

Provide feedback