Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@svonworl
Copy link
Contributor

@svonworl svonworl commented Oct 9, 2025

Description
This PR makes some improvements to the Zenodo DOI generation code that should improve the chances of it successfully generating new DOIs.

Some time in June, as far as we can tell, the DOI creation process began to fail because the Zenodo createFile and deleteFile endpoints were responding with spurious 403 errors, often more often than not, and occasional 503s. The calls would also sometimes succeed, and it does not appear that we were using the endpoint wrong. Rather, something is going sideways on the Zenodo side.

When the DOI generation process fails, a DOI is not generated for the tagged version, which is a bummer. However, something more insidious was happening...

The failed DOI generations left draft deposits in the Zenodo system, causing all future DOI generation attempts to fail for the associated workflow.

This PR addresses the above problems by:

  1. Attempting to delete any draft deposit(s?) corresponding to the concept DOI, at the start of the DOI generation process. This should address the problem of existing drafts and "jammed" workflows, with the caveat that we find drafts via what appears to be an ElasticSearch query on the Zenodo side, which may not always provide up-to-date information (a draft may not yet be indexed, or still be in the index after it is deleted).
  2. Deleting the in-process draft deposit if there's a failure, as we exit the DOI generation code. This should keep the system free of new drafts.
  3. Retrying the createFile and deleteFile calls on failure, to increase the probability that we will succeed. The code currently makes 5 attempts, each separated by a 1 second sleep. I'm tempted to increase the number of attempts, but also concerned about triggering a rate limit.
  4. Adding a bit of LOG output that'll make it easier to tell how the above changes are working.

In tandem, the above changes should allow DOI generation to succeed much more frequently. However, it'll still fail on occasion.

It's very difficult to test how this code responds to various Zenodo failures, especially via automatic tests. So, instead, I user tested locally, by tweaking the code in various spots to simulate various failures (including leaving a draft in the Zenodo sandbox), and submitted various requests to confirm that the code was working properly.

Review Instructions
On staging, push some tagged versions on an entry, and confirm that most of the DOIs have been generated correctly. Try the same thing on prod, after we deploy. After a few weeks on prod, analyze the logs and see if we need to take any more action.

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7226

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@svonworl svonworl self-assigned this Oct 9, 2025
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 60.78431% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.02%. Comparing base (cc46e6f) to head (ac2426a).
⚠️ Report is 4 commits behind head on hotfix/1.18.1.

Files with missing lines Patch % Lines
.../io/dockstore/webservice/helpers/ZenodoHelper.java 60.78% 19 Missing and 1 partial ⚠️
Additional details and impacted files
@@                 Coverage Diff                 @@
##             hotfix/1.18.1    #6174      +/-   ##
===================================================
- Coverage            74.07%   74.02%   -0.05%     
- Complexity            5724     5731       +7     
===================================================
  Files                  397      397              
  Lines                20571    20611      +40     
  Branches              2116     2117       +1     
===================================================
+ Hits                 15238    15258      +20     
- Misses                4326     4345      +19     
- Partials              1007     1008       +1     
Flag Coverage Δ
bitbuckettests 25.78% <0.00%> (-0.06%) ⬇️
hoverflytests 27.48% <60.78%> (+0.04%) ⬆️
integrationtests 55.91% <0.00%> (-0.11%) ⬇️
languageparsingtests 10.77% <0.00%> (-0.03%) ⬇️
localstacktests 21.14% <0.00%> (-0.05%) ⬇️
toolintegrationtests 29.70% <0.00%> (-0.06%) ⬇️
unit-tests_and_non-confidential-tests 26.05% <0.00%> (-0.17%) ⬇️
workflowintegrationtests 39.44% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@denis-yuen
Copy link
Member

  1. Attempting to delete any draft deposit(s?) corresponding to the concept DOI, at the start of the DOI generation process.

Idle thought reading in sequence. How do we think this will react, thinking particularly of the Broad case where they may be deleting and re-creating multiple tags at the same time (will one webservice delete the in-progress draft deposits being created for another one?)

with the caveat that we find drafts via what appears to be an ElasticSearch query on the Zenodo side, which may not always provide up-to-date information (a draft may not yet be indexed, or still be in the index after it is deleted).

On the other hand, maybe this is slow enough?

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried relying on the index before and had poor results, see below

// Create a Lucene query that finds drafts corresponding to the specified concept DOI.
// Apparently, this endpoint pulls information from ElasticSearch, so the view may be stale.
// Drafts may take a while to appear, or seem to persist after they are deleted.
String query = "(conceptrecid:\"%d\") AND (submitted:\"false\")".formatted(conceptDoiId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could also try both)


private static void deleteDeposit(DepositsApi depositsApi, int depositId) {
try {
depositsApi.deleteDeposit(depositId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svonworl
Copy link
Contributor Author

svonworl commented Oct 9, 2025

  1. Attempting to delete any draft deposit(s?) corresponding to the concept DOI, at the start of the DOI generation process.

Idle thought reading in sequence. How do we think this will react, thinking particularly of the Broad case where they may be deleting and re-creating multiple tags at the same time (will one webservice delete the in-progress draft deposits being created for another one?)

I think it's ok, the reasoning is something like:

  • Currently, we serialize the push processing at the repo level, so, in theory, for a given repo, we should only be generating a single DOI at a given time.
  • To create the next version DOI, we only remove drafts associated with the particular concept DOI, so we're not going to accidentally clobber drafts of other workflows.
  • The drafts that show up in the query might be stale, and could actually be published or deleted, in which case the deleteDeposit API call fails, the exception is absorbed, and the DOI generation process continues.

@svonworl
Copy link
Contributor Author

Ok, so, I ran some experiments and dabbled with some test code. I made some improvements to the findDraftDeposits function, that make it more reliable and ensure good performance, even if the API starts mixing published deposits into the response for whatever reason.

Here's my conclusion:

I strongly recommend we go with the current solution. Would like to get this into the hotfix, deploy, and assess how it works after a couple of weeks. Can we do that?

My reasoning:

  1. The design and behavior of the various APIs suggest that they provide different views of the same ElasticSearch resource (I could be wrong, of course). If that's true, switching the calls doesn't buy us anything, and calling both endpoints burns time.
  2. When possible, we should use well-documented endpoints, because undocumented endpoints are more likely to change or disappear.
  3. The deleteDeposits endpoint is documented to only delete unpublished deposits, so there's no danger there. As mentioned in point 1, would not be surprising if the deleteDraftRecord endpoint used the exact same machinery on the backend.

@svonworl svonworl requested a review from denis-yuen October 10, 2025 16:25
@denis-yuen
Copy link
Member

denis-yuen commented Oct 10, 2025

When possible, we should use well-documented endpoints, because undocumented endpoints are more likely to change or disappear.

To be clear, this is not an argument in favour of the current approach.

The "new" endpoints are documented by zenodo in openapi, but incompletely without return objects.

The "old" endpoints are purely documented by us in openapi by inspecting their textual documentation and behaviour.

I then extended the openapi description that we use (owned by us in our repository) for the "old" endpoints to cover those two "new" endpoints.

@denis-yuen
Copy link
Member

3. The deleteDeposits endpoint is documented to only delete unpublished deposits, so there's no danger there. As mentioned in point 1, would not be surprising if the deleteDraftRecord endpoint used the exact same machinery on the backend.

I'm not sure about the downside of just using the new endpoint here.

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with splitting the difference, how about using the old search endpoint but the new endpoint for deleting drafts?

@svonworl
Copy link
Contributor Author

When possible, we should use well-documented endpoints, because undocumented endpoints are more likely to change or disappear.

To be clear, this is not an argument in favour of the current approach.

The "new" endpoints are documented by zenodo in openapi, but incompletely without return objects.

The "old" endpoints are purely documented by us in openapi by inspecting their textual documentation and behaviour.

It is indeed an argument in favor of the current approach, both of the "old" endpoints are documented here:
https://developers.zenodo.org/

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
54.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@svonworl
Copy link
Contributor Author

I changed the code to use the "new" listUserRecords and deleteDraftRecord API calls.

@svonworl svonworl requested a review from denis-yuen October 14, 2025 16:47
// to mix a few published records into the response, or doesn't list the draft first.
final int maxResults = 10;
// In the Zenodo API, page numbers start at 1 (!)
return previewApi.listUserRecords(query, "newest", maxResults, 1, true, false).getHits().getHits().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created https://ucsc-cgl.atlassian.net/browse/SEAB-7341
probably need to work on tags/namespace

@svonworl svonworl merged commit 60b6a3f into hotfix/1.18.1 Oct 14, 2025
19 of 23 checks passed
@svonworl svonworl deleted the feature/seab-7226/address-doi-creation-failures branch October 14, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants