Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@svonworl
Copy link
Contributor

@svonworl svonworl commented Nov 21, 2024

Description
This draft PR is a very rough proof-of-concept of a "custom DOI" scheme, in which the DOIs are generated via UC's EZID service https://ezid.cdlib.org/ and point at pages on the Dockstore site, "WorkflowHub style". Currently, this prototype generates custom DOIs for versions, a productized system would generate DOIs for entries similarly. This PR is not even close to complete/finished, so please do not spend any time critiquing code, tests, etc. There are many known defects and missing parts. The goal is to demonstrate the technique so we can determine whether to continue moving forward with it...

Please run this locally to see it in action. That way, you can inspect/clear the db, etc. Note that the new DOIs don't appear in the UI, I haven't tried to figure out why (yet). You can view them via API responses, looking at the db, side effects, etc.

There is a lot to unpack here, thus many words. Please read all of them, they will answer many of your questions...

About DOIs and EZID

A DOI has three components:

  1. a name. for example 10.5072/FK2.24231.247355. A DOI always starts with 10., some decimal digits, and a slash.
  2. a target URL, which points at the referenced object.
  3. metadata, describing the object.

An existing DOI's metadata and target url can be updated/changed at any time. The name is immutable.

The EZID service is a wrapper for DataCite, run by the University of California. The main advantage over using DataCite directly is that EZID is free to us because we are a UC-based project. The EZID API is also quite simple.

Very High-Level Architecture

This PR introduces a DoiHelper class containing a method createDoi which creates a custom DOI for a specified entry version. It delegates to an instance of DoiService, which currently has two implementations:

  • EzidDoiService: generates a test DOI via a single call to the EZID API.
  • DummyDoiService: logs the name, url, and metadata of the DOI to be created. It doesn't actually create a real DOI, but always pretends that it did by returning successfully.

The DoiService interface is designed with generality in mind: that is, with the appropriate code layered on top, we could theoretically use it to generate a DOI for any object on Dockstore (an organization, collection, user, etc).

Metadata XML generation is encapsulated in the helper class DataCiteHelper.

Configuring the webservice

There are new configuration options ezidUser and ezidPassword to specify the EZID account information. For this demo, we'll use the EZID test account, user apitest. I'll slack the password on devs.

If you specify the EZID account info in the config file, the webservice will use EzidDoiService, otherwise DummyDoiService.

Generating a custom DOI

There is a new generateCustomDoi Dockstore endpoint that will generate a DOI for a version that you have permission to modify, you can hit it with a command line something like:

curl -X POST -H 'Authentication: bearer '`cat /tmp/token` http://localhost:4200/api/entries/24231/versions/247355/generateCustomDoi

It will create an EZID DOI (if the webservice is configured to do so), then create a corresponding Doi entity in the database and associate it with the version. If you query the version, you should see the DOI, and it will appear in the doi table.

Under the hood

The high-level DOI construction process is as follows:

  1. generate a DOI name from the entry version, format is <shoulder>.<entryId>.<versionId>, something like 10.5072/FK2.24231.247355
  2. generate a target url that points at the entry version's Dockstore page.
  3. generate DOI metadata that describes the object. Currently, we generate DataCite XML: http://schema.datacite.org/meta/kernel-4.5/
  4. make an API call that instructs EZID to create a DOI with the above information.

We rewrite localhost urls to point at qa because EZID will refuse (with 403 Forbidden) to generate a DOI that's targeted at localhost, for good reason...

Conceptually, this PR breaks the "initiator" concept, because both the DOIs generated in this PR and our existing automatic Zenodo DOIs are/will be initiated by Dockstore. For now, we introduce a new initiator type CUSTOM as a stopgap solution.

About EZID test DOIs

The EZID technical docs are good: https://ezid.cdlib.org/doc/apidoc.html

DOIs generated by the EZID test account are "test" DOIs, which are like real DOIs, except they:

  1. Don't propagate to the resolution services.
  2. Are deleted after 2 weeks (per the docs).

Much like other DOIs, the test DOIs can't be deleted manually.

Test DOIs use the shoulder 10.5072/FK2. When we get a real EZID account, we can request a custom shoulder that includes "dockstore", something like 10.1234/dockstore. Note that including names in DOIs is specifically discouraged, both by the DOI docs and EZID docs/personnel, because names can change over time. Zenodo and WorkflowHub both do it, though, it's great marketing.

If a particular entry version already has a custom DOI, and you want to generate more, to prevent the DOI name from colliding with the existing DOI, set the customDoiShoulder config file option to use the test DOI shoulder with a few extra characters added, for example 10.5072/FK2foo.

The EZID docs request that we contact them before creating a large number (10000) of test DOIs, so please try not to hammer them.

You can view the test DOIs via the EZID API.

EZID does not have a separate testing sandbox.

Misc

Probably needs to be rewriten to use OkHttp, URLConnection is super-fiddly.

Review Instructions
Describe if this ticket needs review and if so, how one may go about it in qa and/or staging environments.
For example, a ticket based on Security Hub, Snyk, or Dependabot may not need review since those services
will generate new warnings if the issue has not been resolved properly. On the other hand, an infrastructure
ticket that results in visible changes to the end-user will definitely require review.
Many tickets will likely be between these two extremes, so some judgement may be required.

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6712

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@svonworl svonworl marked this pull request as draft November 21, 2024 17:14
@codecov
Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 5.62500% with 151 lines in your changes missing coverage. Please review.

Project coverage is 72.77%. Comparing base (5fed68d) to head (525fc82).
Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
...ckstore/webservice/helpers/doi/DataCiteHelper.java 0.00% 66 Missing ⚠️
...ckstore/webservice/helpers/doi/EzidDoiService.java 0.00% 35 Missing ⚠️
.../dockstore/webservice/resources/EntryResource.java 5.00% 19 Missing ⚠️
...io/dockstore/webservice/helpers/doi/DoiHelper.java 12.50% 14 Missing ⚠️
...e/webservice/DockstoreWebserviceConfiguration.java 0.00% 9 Missing ⚠️
...kstore/webservice/helpers/doi/DummyDoiService.java 0.00% 4 Missing ⚠️
...rc/main/java/io/dockstore/webservice/core/Doi.java 33.33% 2 Missing ⚠️
...ore/webservice/helpers/MetadataResourceHelper.java 66.66% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #6038      +/-   ##
=============================================
- Coverage      74.45%   72.77%   -1.69%     
+ Complexity      5495     5407      -88     
=============================================
  Files            381      385       +4     
  Lines          19786    19934     +148     
  Branches        2043     2051       +8     
=============================================
- Hits           14731    14506     -225     
- Misses          4075     4429     +354     
- Partials         980      999      +19     
Flag Coverage Δ
bitbuckettests 26.48% <3.12%> (-0.18%) ⬇️
hoverflytests 27.77% <5.00%> (-0.18%) ⬇️
integrationtests 56.30% <5.62%> (-0.41%) ⬇️
languageparsingtests 10.99% <3.12%> (-0.07%) ⬇️
localstacktests 21.43% <3.12%> (-0.15%) ⬇️
toolintegrationtests 29.83% <3.12%> (-0.21%) ⬇️
unit-tests_and_non-confidential-tests 25.61% <3.12%> (-0.18%) ⬇️
workflowintegrationtests 34.01% <3.12%> (-4.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@david4096
Copy link
Contributor

david4096 commented Nov 21, 2024

This presents a lot of work, good job! In principle, I like the idea of making better use of UC resources. Also, I understand there are concerns about the Zenodo integration/implementation and this is a good fail-safe. It sounds like the path of least resistance is to try out Zenodo for publishing the rest of the DOIs and then come back to this if there are issues.

@denis-yuen
Copy link
Member

denis-yuen commented Nov 21, 2024

Looks good!

Note for other reviewers, if you have a custom hostname, replace localhost in EzidDoiService with it.
There's no error handling (as explained), so a workflow with a blank description failed for me. Something else failed for some other reason.

I got a nextflow workflow to work

{
  "id": 538,
  "initiator": "CUSTOM",
  "name": "10.5072/FK2.18093.319396",
  "type": "VERSION"
}

Does not resolve as expected if I understand correctly.

Got the general idea, don't need to spelunk through the ezid api.

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved in principle, will need to discuss next steps when ready

Copy link
Collaborator

@coverbeck coverbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run this, but skimmed the code and it seems reasonable.

Note that the new DOIs don't appear in the UI, I haven't tried to figure out why (yet)

WAG, I wonder if it's because you're not creating a concept DOI for the entry, and the UI may rely on a concept UI for display?

In any case, if this gets "productized", I think there are certain assumptions in the code about concept DOIs that may need to be addessed, or at least investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants