Define a policy around how MMDA will:
- Represent Images in Document
- Serialize/Load Images
- Integrate with other vision libraries like LayoutParser. Particularly around this point, aim for 2 options: (1) Indirect integration where user is expected to run their vision models outside of MMDA, format their image data into a manner compatible with MMDA, then load them in to manipulate within MMDA. This is suitable for libraries like LayoutParser that depend on detectron2 and may have incompatible environments with the rest of MMDA. (2) Direct integration where a user can run vision models directly in same environment as other MMDA.Predictors. This is suitable for libraries like Huggingface which are adding vision models
- Includes MMDA Image fields, such as Tables/Figures and their associated Captions
List view
0 issues of 0 selected
There are no open issues in this milestone
Add issues to milestones to help organize your work for a particular release or project. Find and add issues with no milestones in this repo.