-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-7226: Add endpoint to identify "missing" DOIs #6175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEAB-7226: Add endpoint to identify "missing" DOIs #6175
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## hotfix/1.18.1 #6175 +/- ##
===================================================
+ Coverage 74.07% 74.14% +0.07%
- Complexity 5724 5730 +6
===================================================
Files 397 397
Lines 20571 20581 +10
Branches 2116 2116
===================================================
+ Hits 15238 15260 +22
+ Misses 4326 4312 -14
- Partials 1007 1009 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swagger editor validator breakage is known from develop (can probably grab the one line change).
Looks good in general.
The bigger issue is a lack of testing, think we have the time to add something simple.
I added some simple tests. Alas, |
| hoverfly.simulate(ZENODO_SIMULATION_SOURCE); | ||
| WorkflowsApi workflowsApi = new WorkflowsApi(getOpenAPIWebClient(true, USER_2_USERNAME, testingPostgres)); | ||
| handleGitHubRelease(workflowsApi, DockstoreTesting.WORKFLOW_DOCKSTORE_YML, "refs/tags/0.8", USER_2_USERNAME); | ||
| assertEquals(0, workflowsApi.getVersionsMissingAutomaticDoi(1000).size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, could add some comments to make this more readable
denis-yuen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sonar has a couple comments that are worth a quick clean-up
I changed the code in question to throw a better exception, although in practice, the error condition (no Workflow for a given WorkflowVersion) should happen very rarely (if at all, I'm on the fence as to whether or not it's actually possible). The other Sonarcloud feedback recommended that we not use the deprecated method |
|
Description
Per the linked ticket, the Zenodo DOI process was failing, so some tagged version that should have automatic DOIs did not.
This PR adds a new admin/curator-only endpoint that we can use to identify these "missing" DOIs, so that we can generate them. My experiments on some db dumps suggest that about 50% of the versions that should have an automatic DOI do not (because DOI generation failed). So, this endpoint will returns lots of versions.
A DOI is "missing" if the version and parent entry meet the automatic DOI criteria, and the version was created after February 19, 2025 (the day we launched the automatic DOI feature).
The new
getVersionsMissingAutomaticDoiendpoint is different from thegetVersionsNeedingRetroactiveDoiendpoint.getVersionsMissingAutomaticDoiidentifies all versions that should have been assigned a DOI by the webservice, and returns them in most-recent-first order.getVersionsNeedingRetroactiveDoiis intended to be used to distribute retroactive DOIs evenly to all eligible workflows, even old ones.The new endpoint is backed by a db query, which I originally tried to adapt from one of the queries used by
getVersionsNeedingRetroactiveDoi. The resulting query ran forever (longer than my patience for it), so I reordered some parts of the join, improving response time to about 60 seconds. However, this was still too slow, so I came up with a new query that essentially intersects two sets of version IDs: one set that meets the version-related criteria, and another set corresponding to workflows that meet the workflow-related criteria. The resulting query runs in a second or so.Review Instructions
On staging, hit the
getVersionsMissingAutomaticDoiendpoint, and confirm that the first few results correspond to recent tagged versions that should have an automatic DOI, but do not.Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7226
Security and Privacy
If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install@RolesAllowedannotation