-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-7194: Add "time series" metric database entity #6127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEAB-7194: Add "time series" metric database entity #6127
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6127 +/- ##
=============================================
- Coverage 74.21% 74.11% -0.10%
- Complexity 5662 5663 +1
=============================================
Files 389 390 +1
Lines 20335 20362 +27
Branches 2101 2101
=============================================
+ Hits 15091 15092 +1
- Misses 4246 4272 +26
Partials 998 998
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This is a question that I have or are curious about as well |
denis-yuen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general thoughts/questions, general direction makes sense
dockstore-webservice/src/main/java/io/dockstore/webservice/core/metrics/MetricsByStatus.java
Show resolved
Hide resolved
dockstore-webservice/src/main/java/io/dockstore/webservice/core/metrics/MetricsByStatus.java
Show resolved
Hide resolved
| @JdbcTypeCode(SqlTypes.JSON) | ||
| @Column(nullable = false) | ||
| @NotNull | ||
| @ArraySchema(arraySchema = @Schema(description = "List of sample values, oldest values first"), schema = @Schema(description = "Sample value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wrinkle of daily is that adding a day and ordering oldest values first is we need to rewrite the whole array. On second thought, may be unavoidable with daily and a fixed number of days
Not sure if there's any way to avoid rewriting everything every new day we run the aggregator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wrinkle of daily is that adding a day and ordering oldest values first is we need to rewrite the whole array. On second thought, may be unavoidable with daily and a fixed number of days Not sure if there's any way to avoid rewriting everything every new day we run the aggregator
Given the current "compute and overwrite the entire metric record on update" semantics of the aggregator, my assumption is that, we'll be updating the entire time series (along with all of the other computed metrics) every time the execution data for a workflow version changes, in the near term, at least. I can imagine optimizing the system to make the time series metrics (and potentially other metrics) incrementally computable/updatable, but it's far from a simple thing, especially if you're supporting updates to existing execution information, in addition to information about new executions.
|
denis-yuen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging for knowledge
Description
This PR adds a new
TimeSeriesMetricentity and corresponding database table to the webservice. It also adds adailyExecutionCountstime series property toMetricsByStatus.In this configuration, our metrics system will be able to store the daily run counts for any combination of platform (including
ALL) and execution status. However, to avoid bloating the database and the metrics responses, the initial plan is to modify the aggregator to only generate a single time series per workflow version, corresponding to daily successful executions on all platforms.We define a time series as a list of numeric values sampled at a regular time interval. Per that definition, a time series consists of a list of sample values, the date/time of the first sample, and the interval between samples (hour, day, month, etc). Per the storytime discussion, rather than sliding bins, we'll aggregate into fixed bins that span each hour, day, etc. For example, 10:00-10:59, 11:00-11:59, 12:00-12:59, or Monday, Tuesday, Wednesday, etc
We have a choice regarding how to store the samples: either as individual rows in a
samplestable which we would join to eachtime series, or as an array in a column in the time series table itself. We chose the latter, because it's more compact and less work for the db server. If we intended to do elaborate queries on the samples themselves, or update/add/delete individual samples, piecemeal, we'd probably choose the former.For now, I went with a jsonb representation for the list of sample values. Alternatively, we could use sql arrays. It's not yet clear which representation is better supported by Hibernate/HQL, so this may change.
There's intentionally not much testing at the moment. After a prototype is done, I'll survey the coverage, and add tests as necessary.
Note that the branch name contains the wrong ticket name/number.
Review Instructions
Confirm the presence of the new database table and that
openapi.yamlcontains a description of the new entity.Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7194
Security and Privacy
If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install@RolesAllowedannotation