Codestin Search App

Thanks to visit codestin.com
Credit goes to github.com

Milestones

MMDA canonical handling of Image-type data
Define a policy around how MMDA will: - Represent Images in Document - Serialize/Load Images - Integrate with other vision libraries like LayoutParser. Particularly around this point, aim for 2 options: (1) Indirect integration where user is expected to run their vision models outside of MMDA, format their image data into a manner compatible with MMDA, then load them in to manipulate within MMDA. This is suitable for libraries like LayoutParser that depend on detectron2 and may have incompatible environments with the rest of MMDA. (2) Direct integration where a user can run vision models directly in same environment as other MMDA.Predictors. This is suitable for libraries like Huggingface which are adding vision models - Includes MMDA Image fields, such as Tables/Figures and their associated Captions
No due date
0% complete0 open 0 closed
MMDA efficiency refactor
MMDA is currently developed without too much consideration for efficiency. There are some major refactors that could boost performance: - Switch to a better serialization data structure than JSON - Switch to a better indexing data structure than Interval Trees
No due date
0% complete0 open 0 closed
MMDA expansion to handle Relations
MMDA currently segments Documents into SpanGroups (e.g. entities), but doesn't have a natively supported way of representing relations between those units. Currently, relational information is being stored explicitly as metadata within the Source and Target units, but this is unintuitive/costly.
No due date
0% complete0 open 0 closed
MMDA Quality of Life refactor
MMDA needs a pretty major refactor that will break some of its usage. They are: 1. A way of managing namespaces of different fields to allow for overloading (`bib.title` vs `doc.title`) 2. A way of `.annotate()` on a `span_group` rather than at a Document-level, for example, adding titles to bib entries 3. Making explicit annotation of `BoxGroup` from `SpanGroup` and defining explicit conversions from one to another
No due date
•1/3 issues closed
33% complete2 open 1 closed
MMDA reaches CORD19 parity
MMDA includes HTML-ified tables, which are only found in CORD19
No due date
0% complete0 open 0 closed
MMDA reaches S2ORC LaTeX parity
MMDA adds additional functionality: 1. Section-Section hierarchies 2. Table of contents metadata that links to associated sections 3. Identification of inline and display formulas 4. LaTeX representation of identified formulas
No due date
•0/1 issues closed
0% complete1 open 0 closed
MMDA v1.0 release (S2ORC parity)
MMDA contains enough functionality to reproduce S2ORC PDF Parse JSONs. Contains: 1. Citation mentions in context linked to bibliography entries that are separated from body text and parsed 2. Identification of inline references to floating elements (e.g. tables/figures/sections/footnotes) which are pulled out of the main body text 3. Identification of captions which are pulled out of main body text and associated with corresponding table/figure 4. Identification of section headings with appropriate association of body text to the section
Overdue by 3 year(s)
•
Due by September 30, 2022
•0/5 issues closed
0% complete5 open 0 closed