-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-6712: Prototype "custom" Dockstore DOIs #6038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
SEAB-6712: Prototype "custom" Dockstore DOIs #6038
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6038 +/- ##
=============================================
- Coverage 74.45% 72.77% -1.69%
+ Complexity 5495 5407 -88
=============================================
Files 381 385 +4
Lines 19786 19934 +148
Branches 2043 2051 +8
=============================================
- Hits 14731 14506 -225
- Misses 4075 4429 +354
- Partials 980 999 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
|
This presents a lot of work, good job! In principle, I like the idea of making better use of UC resources. Also, I understand there are concerns about the Zenodo integration/implementation and this is a good fail-safe. It sounds like the path of least resistance is to try out Zenodo for publishing the rest of the DOIs and then come back to this if there are issues. |
|
Looks good! Note for other reviewers, if you have a custom hostname, replace localhost in EzidDoiService with it. I got a nextflow workflow to work Does not resolve as expected if I understand correctly. Got the general idea, don't need to spelunk through the ezid api. |
denis-yuen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved in principle, will need to discuss next steps when ready
coverbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't run this, but skimmed the code and it seems reasonable.
Note that the new DOIs don't appear in the UI, I haven't tried to figure out why (yet)
WAG, I wonder if it's because you're not creating a concept DOI for the entry, and the UI may rely on a concept UI for display?
In any case, if this gets "productized", I think there are certain assumptions in the code about concept DOIs that may need to be addessed, or at least investigated.
Description
This draft PR is a very rough proof-of-concept of a "custom DOI" scheme, in which the DOIs are generated via UC's EZID service https://ezid.cdlib.org/ and point at pages on the Dockstore site, "WorkflowHub style". Currently, this prototype generates custom DOIs for versions, a productized system would generate DOIs for entries similarly. This PR is not even close to complete/finished, so please do not spend any time critiquing code, tests, etc. There are many known defects and missing parts. The goal is to demonstrate the technique so we can determine whether to continue moving forward with it...
Please run this locally to see it in action. That way, you can inspect/clear the db, etc. Note that the new DOIs don't appear in the UI, I haven't tried to figure out why (yet). You can view them via API responses, looking at the db, side effects, etc.
There is a lot to unpack here, thus many words. Please read all of them, they will answer many of your questions...
About DOIs and EZID
A DOI has three components:
10.5072/FK2.24231.247355. A DOI always starts with10., some decimal digits, and a slash.An existing DOI's metadata and target url can be updated/changed at any time. The name is immutable.
The EZID service is a wrapper for DataCite, run by the University of California. The main advantage over using DataCite directly is that EZID is free to us because we are a UC-based project. The EZID API is also quite simple.
Very High-Level Architecture
This PR introduces a
DoiHelperclass containing a methodcreateDoiwhich creates a custom DOI for a specified entry version. It delegates to an instance ofDoiService, which currently has two implementations:EzidDoiService: generates a test DOI via a single call to the EZID API.DummyDoiService: logs the name, url, and metadata of the DOI to be created. It doesn't actually create a real DOI, but always pretends that it did by returning successfully.The
DoiServiceinterface is designed with generality in mind: that is, with the appropriate code layered on top, we could theoretically use it to generate a DOI for any object on Dockstore (an organization, collection, user, etc).Metadata XML generation is encapsulated in the helper class
DataCiteHelper.Configuring the webservice
There are new configuration options
ezidUserandezidPasswordto specify the EZID account information. For this demo, we'll use the EZID test account, userapitest. I'll slack the password on devs.If you specify the EZID account info in the config file, the webservice will use
EzidDoiService, otherwiseDummyDoiService.Generating a custom DOI
There is a new
generateCustomDoiDockstore endpoint that will generate a DOI for a version that you have permission to modify, you can hit it with a command line something like:It will create an EZID DOI (if the webservice is configured to do so), then create a corresponding
Doientity in the database and associate it with the version. If you query the version, you should see the DOI, and it will appear in thedoitable.Under the hood
The high-level DOI construction process is as follows:
<shoulder>.<entryId>.<versionId>, something like10.5072/FK2.24231.247355We rewrite localhost urls to point at qa because EZID will refuse (with 403 Forbidden) to generate a DOI that's targeted at localhost, for good reason...
Conceptually, this PR breaks the "initiator" concept, because both the DOIs generated in this PR and our existing automatic Zenodo DOIs are/will be initiated by Dockstore. For now, we introduce a new initiator type
CUSTOMas a stopgap solution.About EZID test DOIs
The EZID technical docs are good: https://ezid.cdlib.org/doc/apidoc.html
DOIs generated by the EZID test account are "test" DOIs, which are like real DOIs, except they:
Much like other DOIs, the test DOIs can't be deleted manually.
Test DOIs use the shoulder
10.5072/FK2. When we get a real EZID account, we can request a custom shoulder that includes "dockstore", something like10.1234/dockstore. Note that including names in DOIs is specifically discouraged, both by the DOI docs and EZID docs/personnel, because names can change over time. Zenodo and WorkflowHub both do it, though, it's great marketing.If a particular entry version already has a custom DOI, and you want to generate more, to prevent the DOI name from colliding with the existing DOI, set the
customDoiShoulderconfig file option to use the test DOI shoulder with a few extra characters added, for example10.5072/FK2foo.The EZID docs request that we contact them before creating a large number (10000) of test DOIs, so please try not to hammer them.
You can view the test DOIs via the EZID API.
EZID does not have a separate testing sandbox.
Misc
Probably needs to be rewriten to use OkHttp,
URLConnectionis super-fiddly.Review Instructions
Describe if this ticket needs review and if so, how one may go about it in qa and/or staging environments.
For example, a ticket based on Security Hub, Snyk, or Dependabot may not need review since those services
will generate new warnings if the issue has not been resolved properly. On the other hand, an infrastructure
ticket that results in visible changes to the end-user will definitely require review.
Many tickets will likely be between these two extremes, so some judgement may be required.
Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6712
Security and Privacy
If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install@RolesAllowedannotation