Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@svonworl
Copy link
Contributor

@svonworl svonworl commented Nov 19, 2024

Description
This PR changes the webservice to improve the quality of messages that are generated when an exception is thrown.
Rather than patch the particular problem noted in the issue, I took the opportunity to refactor the exception message generation code...

The core change is a new ExceptionHelper class, which generates some information (currently, a recommended message and an HTTP status code) for a given exception. The basic concept is that, down the line, we can put much of our exception handling logic into ExceptionHelper and use it everywhere, and, as a start to that process, in this PR, we wire it into the push processing code and our custom ExceptionMappers.

Eventually, ExceptionHelper could do other things, like help us to log information about the exception, etc.

Yes, it's called ExceptionHelper even though it handles all Throwabless. This is a common practice, see apache's ExceptionUtils for an example...

Currently, ExceptionHelper handles throwables as follows:

  1. If the throwable is from the java or javax packages, use the throwable's message and status code 500.
  2. If the throwable or one of its causes is a ConstraintViolationException, map the constraint name to a message if possible, status code 409.
  3. If the throwable or one of its causes is a PersistenceException, use a generic "update failed"-type message, status code 500.
  4. Otherwise, return Throwable.getMessage() and status code 400.

There's a switch that maps the constraint names and handles a few constraint violations that I thought were mostly likely to happen, given a list of constraints (via \d <tablename>, select conname from pg_catalog.pg_constraint;, etc) and my non-comprehensive understanding of the code. If the constraint name can't be mapped to a specific message, we say the constraint's name and that it has been violated. There are a ton of different constraints that could be violated, doesn't seem practical to map all of them by hand. If we were more systematic about naming them, we could handle a lot more via some relatively simple code...

Opinionated Take: Database constraints are a last line of defense to maintain db consistency. We should strive to check and fail early on conditions that would violate a constraint, because the later in a request that we fail, the more chance that we've updated other state that can't be rolled back (ES, discuss, etc).

Previously, we'd implemented three ExceptionMappers, two of which were handling subclasses of PersistenceException. We refactored these to a single mapper, and moved their logic into ExceptionHelper.

Originally, I tried to use a method from apache's ExceptionUtil to generate a list of throwables for a given exception. I happened to look at the code, and noticed it had average case runtime O(N^2). So, it could be attacked by creating an exception with a very large number of chained causes. This could happen: imagine a tag parser that wrapped a thrown exception at each enclosing tag level, and a malicious input file that had tags nested millions deep.

Checkstyle had some issues with this code. It didn't like LinkedHashSet appearing in the var declaration, but it has to be there to ensure the performance problem is remedied. Also, it got confused by the switch expression.

Leaking information via exception/error messages is always a concern. In certain situations, this PR will put the name of the violated db constraint name into the message. At other times, the message stored in the Exception (retrieved via Exception.getMessage) is used. The net effect isn't much different from how the code works now, but something to consider...

There's a unit test that does some basic checks, and I user tested the scenario in the ticket for both workflows and notebooks.

For fun, I wrote ExceptionHelper to be if-statement-free, and it ended up being fairly intelligible, maybe more so than if it had ifs...

Review Instructions
Try to reproduce the bug as detailed in the ticket, and confirm that the message in the App Logs has changed to something like "a workflow with the same name exists".

Issue
https://ucsc-cgl.atlassian.net/browse/DOCK-2582
#5997

Security and Privacy

See note above about information leakage via exception messages.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@codecov
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 68.85246% with 19 lines in your changes missing coverage. Please review.

Project coverage is 74.48%. Comparing base (57b0aee) to head (ba67ff5).
Report is 8 commits behind head on develop.

Files with missing lines Patch % Lines
.../dockstore/webservice/helpers/ExceptionHelper.java 66.07% 16 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #6037      +/-   ##
=============================================
- Coverage      74.56%   74.48%   -0.09%     
- Complexity      5501     5515      +14     
=============================================
  Files            381      380       -1     
  Lines          19786    19810      +24     
  Branches        2043     2044       +1     
=============================================
+ Hits           14754    14756       +2     
- Misses          4056     4072      +16     
- Partials         976      982       +6     
Flag Coverage Δ
bitbuckettests 26.59% <0.00%> (-0.18%) ⬇️
hoverflytests 27.88% <0.00%> (-0.07%) ⬇️
integrationtests 56.71% <52.45%> (+0.01%) ⬆️
languageparsingtests 11.00% <0.00%> (-0.05%) ⬇️
localstacktests 21.51% <0.00%> (-0.06%) ⬇️
toolintegrationtests 29.96% <0.00%> (-0.07%) ⬇️
unit-tests_and_non-confidential-tests 25.92% <59.01%> (+0.13%) ⬆️
workflowintegrationtests 38.09% <44.26%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

String message = "this is a test";
Throwable t = makeNonJavaException(message);
assertEquals(t.getMessage(), message(t));
assertEquals(400, status(t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@svonworl svonworl Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use an existing enum like https://www.javadoc.io/doc/org.apache.httpcomponents/httpcore/4.4.4/org/apache/http/HttpStatus.html#SC_BAD_REQUEST to be more readable

Ditto below

The raw values are on purpose. If both the code under test and the code doing the testing use the same constant A, if A is accidentally changed to a wrong value, the tests continue to pass, even though the code now works incorrectly. Less of a problem when using third party constants, through. I changed the test to use a different classes' http status constants.

@denis-yuen
Copy link
Member

Opinionated Take: Database constraints are a last line of defense to maintain db consistency. We should strive to check and fail early on conditions that would violate a constraint, because the later in a request that we fail, the more chance that we've updated other state that can't be rolled back (ES, discuss, etc).

This sounds fine.
i.e. we should always have database constraints as a last ditch method to maintain consistency, but earlier via annotations and the like, even better assuming the two ways of maintaining consistency ... maintain their consistency with one another

@svonworl
Copy link
Contributor Author

This sounds fine. i.e. we should always have database constraints as a last ditch method to maintain consistency, but earlier via annotations and the like, even better assuming the two ways of maintaining consistency ... maintain their consistency with one another

Regarding consistency with each other, if there was some way we could express a "constraint" so that it expanded to both

  1. an early code-based check,
  2. a database constraint,

that would be great. Not sure that's doable given our current architecture, but maybe...

@svonworl
Copy link
Contributor Author

Regarding consistency with each other, if there was some way we could express a "constraint" so that it expanded to both

  1. an early code-based check,
  2. a database constraint,

that would be great. Not sure that's doable given our current architecture, but maybe...

This kinda relates to another thing...

I wanted to add support for constraint "descriptions" in the liquibase XML, so that we could specify that constraint "xyz" meant "your XYZ can't be blank" (or whatever) in the "create constraint" liquibase clause. Didn't seem to be a way to do it, though...

@denis-yuen
Copy link
Member

Didn't seem to be a way to do it, though...

Not sure if it's what you mean, but there's a fallback method for adding in postgres-specific stuff not supported by JPA.
#2131 has some of the details, I thought there were better docs somewhere but my google/github search-fu is failing me

@svonworl
Copy link
Contributor Author

Not sure if it's what you mean, but there's a fallback method for adding in postgres-specific stuff not supported by JPA. #2131 has some of the details, I thought there were better docs somewhere but my google/github search-fu is failing me

Thanks! Was thinking that maybe a "description" could be added to the constraint's liquibase tag or annotation, and we could extract that metadata at build time to create a map of constraint names to descriptions. I suppose we could have a db table that represented that mapping, we'd want to read it early to make sure it is available (because when we're handling a db exception, it's probably inaccessible). Maybe someday, what's in the PR now is probably sufficient...

return handleJavaThrowable()
.or(() -> handleConstraintViolationException())
.or(() -> handlePersistenceException())
.or(() -> result(HttpStatus.SC_BAD_REQUEST))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the fallback be an internal server error? We don't know what happened; it's some sort of unexpected exception -- while it could be a bad request, we don't know for sure and should probably look at it (we'd get notified via CloudWatch alerts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can see it both ways, but your take seems reasonable, let's do it. Also, we should probably handle CustomWebApplicationException by passing those messages/codes get as is. Will make those mods.

@denis-yuen
Copy link
Member

Thanks! Was thinking that maybe a "description" could be added to the constraint's liquibase tag or annotation, and we could extract that metadata at build time to create a map of constraint names to descriptions.

We actually have comments on some columns

COMMENT ON COLUMN sourcefile_verified.platformversion IS 'By default set to null';
COMMENT ON COLUMN validation.message IS 'A mapping of file path to message.';
COMMENT ON COLUMN tool.lastbuild IS 'For automated builds: When refresh is hit, the last time the tool was built gets stored here. If tool was never built on quay.io, then last build will be null. N/A for hosted/manual path tools';
COMMENT ON COLUMN tool.lastupdated IS 'For automated builds: last time tool/namespace was refreshed Dockstore, tool info updated, default version selected. For hosted tools: when you created the tool';
COMMENT ON COLUMN tool.lastmodified IS 'For automated builds: N/A. For hosted: Last time a file was updated (new version created)';
COMMENT ON COLUMN tool.dbcreatedate IS 'For automated builds and hosted: Time registered on Dockstore, either by refresh or manual register. Can be blank as this column was added in 2018.';
COMMENT ON COLUMN tool.dbupdatedate IS 'For automated builds: Last time tool/namespace was refreshed, different version is selected, checker workflow was added, or tool info updated (like path information). For hosted: Last time a file was updated (new version created), default version selected. Can be blank as this column was added in 2018. Basically anytime db entry modified';
COMMENT ON COLUMN tag.lastbuilt IS 'For automated builds: The last time the container backing this tool version was built. For hosted: N/A';
COMMENT ON COLUMN tag.dbcreatedate IS 'For automated builds and hosted/manual path: Time registered on Dockstore, either by refresh or manual register. Can be blank as this column was added in 2018.';
COMMENT ON COLUMN tag.dbupdatedate IS 'For automated builds and hosted/manual path: Time created or last time version tab was edited (under actions in version tab). Basically anytime db entry modified';
COMMENT ON COLUMN workflow.lastmodified IS 'For remote: When refresh is hit, the last time GitHub repo was changed is recorded. Hosted: Last time a new version was made.';
COMMENT ON COLUMN workflow.lastupdated IS 'For remote: When refresh all is hit for first time. Hosted: Time created.';
COMMENT ON COLUMN workflow.dbcreatedate IS 'Remote: When workflow is refreshed for first time. Hosted: Time created';
COMMENT ON COLUMN workflow.dbupdatedate IS 'For remote: When refresh all is hit for first time, update workflow info (like path information), or add checker workflow. Hosted: Time created. Basically anytime db entry modified.';
COMMENT ON COLUMN workflowversion.lastmodified IS 'Remote: Last time version on GitHub repo was changed. Hosted: time version created';
COMMENT ON COLUMN workflowversion.dbcreatedate IS 'Remote: When workflow was refreshed for first time. Hosted: time version created';
COMMENT ON COLUMN workflowversion.dbupdatedate IS 'Remote: When workflow was refreshed for the first time or last time version was edited under action column in versions tab. Hosted: time version created or last time version was edited under actions column in versions tab. Basically anytime db entry modified';

Commenting is not a SQL standard, but is postgres specific https://www.postgresql.org/docs/current/sql-comment.html
but looks like it works on constraints

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
50.0% Coverage on New Code (required ≥ 80%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Contributor

@david4096 david4096 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, and very thorough description

@svonworl svonworl merged commit 5d6190c into develop Nov 21, 2024
16 of 19 checks passed
@svonworl svonworl deleted the feature/dock-2582/better-constraint-violation-message branch November 21, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants