Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: UW-COSMOS/Cosmos

v0.8.1

23 Dec 20:15
a251b0d

Choose a tag to compare

Adding sections endpoint to COSMOS service. No changes to core COSMOS workflow

v.0.8.0

27 Nov 20:07
314c908

Choose a tag to compare

Adding initial watermark removal

v0.7.1

01 May 20:53
637d6df

Choose a tag to compare

With some of the build updates, CPU fixes, and reasonable docker image base.

v0.7.0

30 Apr 18:13
31a48f2

Choose a tag to compare

  • Standalone COSMOS service
  • New method of equation detection

Change base image

31 Mar 16:02

Choose a tag to compare

The previous base image was deprecated. Switching to nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04 as base.

v0.6.1 - Minor table extraction fix

07 Mar 17:59

Choose a tag to compare

  • Fixed a bug where empty parquet files stopped all table extraction processing.

Table extraction, HTCosmos

27 Feb 21:41

Choose a tag to compare

New:

  • Inclusion of table extraction (via --extract-tables option on ingest_documents script)
  • HTCosmos - run COSMOS pipeline in a high-throughput mode on an HTCondor cluster

Table context enrichment, text normalization, and fixes

10 Aug 19:14

Choose a tag to compare

  • Table context enrichment during ingestion. Enabling (via the --use-table-context-enrichment option on the ingest CLI) will match detected tables to mentions within the body text, adding a context_from_text field to the output parquet.

  • The retrieval API has been updated to search either:

    • local_content field (default) - the text content of the table and its associated caption, if any
    • full_content field - local_content plus context_from_field
    • Any of the three fields separately (content, caption_content, context_from_text)
  • Text normalization. Enabling (via the --use-text-normalization option on the ingest CLI) will do basic unicode normalization to regularize ligature usage and mojibake issues from the text layer.

  • ASKE-ID lookup within the retrieval API.

v0.4.0 - New weights; retrieval API updates

16 Feb 21:32
23a7cc5

Choose a tag to compare

  • New weights including a newer set of annotations
  • Added a few necessary files for training detection + postprocessing.
  • API key requirement added (though currently disabled)
  • Document level lookups and filters
  • Filter by dataset_id
  • Store and filter on object size
  • Concatenate contents and header_content field into one full_contents field and use that for retrieval

v0.3.0 - New pipeline, entity linking and semantic context for tables

04 Dec 18:57
4aa562e

Choose a tag to compare

  • Modular pipeline with new workflow definitions, cli, unicode (#122)
  • Initial entity linking using SciSpacy (#135)
    • Entity recognition + linking to UMLS entities
  • Initial semantic context for tables (#137)
  • (ongoing) documentation to match