Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@osma
Copy link
Member

@osma osma commented Aug 28, 2025

This PR adds support for an optional document_id field in JSON & JSONL corpus formats. It can also be used as a special field in CSV corpus files. The document_id will be stored within the Document object and retained in JSONL output produced by the new annif index-file command.

TODO:

  • annif index-file should preserve document_id even when the --no-include-doc option is used (but only include it if not None)
  • check the implementation of the REST API method suggest-batch - can it make use of this mechanism?

Closes #885

@osma osma added this to the 1.4 milestone Aug 28, 2025
@osma osma self-assigned this Aug 28, 2025
@codecov
Copy link

codecov bot commented Aug 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.69%. Comparing base (d9137ef) to head (36fd3cf).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #886   +/-   ##
=======================================
  Coverage   99.69%   99.69%           
=======================================
  Files         103      103           
  Lines        8209     8271   +62     
=======================================
+ Hits         8184     8246   +62     
  Misses         25       25           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud
Copy link

@osma osma marked this pull request as ready for review August 29, 2025 07:47
@osma osma requested a review from juhoinkinen August 29, 2025 07:47
Copy link
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@osma osma merged commit 187f495 into main Aug 29, 2025
15 checks passed
@osma osma deleted the issue885-json-corpus-document-id branch August 29, 2025 08:29
@osma osma mentioned this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support document_id in JSON corpus formats

3 participants