-
Notifications
You must be signed in to change notification settings - Fork 5
add depth_in_meters field to get_acoustic_detections() #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Should we do a minor version bump for this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
- Any reasoning behind the position of the field?
- Please also include the field in (there is currently no test for this, will create issue for that)
etn/inst/assets/datapackage.json
Lines 569 to 572 in 35e70f3
{ "name": "signal_to_noise_ratio", "type": "integer" }, - Minor version bump sounds good to me
1. Position of fieldThe documentation for get_acoustic_detections refers to the field definitions (which seem out of date), which refer to the following csv file: https://github.com/inbo/etn/blob/main/inst/assets/etn_fields.csv, in this csv on row 112, there is a field I don't have strong feelings about this, apart that I didn't want to add it to the end. I'm fine with moving it to be grouped with the animal fields such as scientific_name
datapackage
Minor version bumpWill do! Thank you for the review |
|
Thanks, I would suggest to have the field immediately after |
|
Will do |
…duced frictionless datapackage
|
datapackage.json is in the wrong order, and is missing 5 more fields from the tags table: length, diameter, weight, floating, and archive_memory. However, I can't find any examples of these fields in use in the tables I have access to, apart from archive memory in these 3 examples: [1] "shad_scheldt_dst"
# A tibble: 1 × 5
length diameter weight floating archive_memory
<dbl> <dbl> <dbl> <chr> <chr>
1 NA NA NA NA 2 MB
[2] "FISHINTEL"
# A tibble: 2 × 5
length diameter weight floating archive_memory
<dbl> <dbl> <dbl> <chr> <chr>
1 NA NA NA NA 2
2 NA NA NA NA 8
[3] "2018_EC"
# A tibble: 1 × 5
length diameter weight floating archive_memory
<dbl> <dbl> <dbl> <chr> <chr>
1 NA NA NA NA 64 MB It seems like archive_memory should be a string, length, diameter and weight seem to be doubles in the returned table, but I have no values to make sure. floating seems to be a string as well. This didn't come to light earlier because there was no test for this in test-download_acoustic_dataset.R |
|
@PietrH these fields are directly taken from: Lines 129 to 133 in 35e70f3
The data type for those fields in the database is: DBI::dbGetQuery(con, "
SELECT
pg_typeof(archive_length) AS length,
pg_typeof(archive_diameter) AS diameter,
pg_typeof(archive_weight) AS weight,
pg_typeof(archive_floating) AS floating,
pg_typeof(device_internal_memory) AS memory
FROM
common.tag_device_limited
LIMIT 1
")
length diameter weight floating memory
1 double precision double precision double precision boolean character varyingLooks like dplyr interpreted the boolean as string 🤷♂️ So in frictionless:
|
R/get_acoustic_detections.R
Outdated
| assertthat::assert_that(is.logical(limit), msg = "limit must be a logical: TRUE/FALSE.") | ||
| if (limit) { | ||
| limit_query <- glue::glue_sql("LIMIT 100", .con = connection) | ||
| limit_query <- glue::glue_sql("ORDER BY det.id_pk LIMIT 100", .con = connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering is necessary to ensure the result of the filter is always the same, if we leave out the ORDER BY, then a test will fail. However, this step is very expensive, defeating the point of the LIMIT.
Consider restoring to unordered limiting, but documenting the behavior and perhaps rewriting the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering makes the examples and testing stages too slow:
✔ checking examples (40m 42.8s)
Examples with CPU or elapsed time > 5s
user system elapsed
get_animals 12.828 0.092 15.457
get_acoustic_detections 10.664 0.552 2406.708
get_tags 6.940 0.040 8.048There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting actually seems to be a little bit slower than not limiting, at least in some circumstances:
Unit: seconds
expr
get_acoustic_detections(con, acoustic_project_code = "demer", limit = TRUE)
get_acoustic_detections(con, acoustic_project_code = "demer", limit = FALSE)
min lq mean median uq max neval
151.7327 154.1687 164.0103 156.6048 170.1491 183.6935 3
110.5260 112.2894 123.5281 114.0527 130.0292 146.0056 3|
|
||
| # Selection is case insensitive | ||
| expect_equal( | ||
| get_acoustic_detections(con, acoustic_project_code = "demer", limit = TRUE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two (equivalent) queries each create a 33MB object, we should probably look for acoustic_project_code values that result in smaller objects for testing, that would speed things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try 2015_HOMARUS
We are not sorting the limit after all, because the sort operation is so expensive there would be no point limiting anymore.
|
@peterdesmet ready for review
Sometimes it passes, sometimes it does not. Seems to fail when doing The object that we test here
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed, sorry it took so long! 😅
- I would move
R/testthat-helpers.Rtotests/testthat/helper.R(cf. camtraptor) - Made a number of stylistic changes, but would be good if all are tested again
- Are changes made to the citation now?
|
Tests pass |
resolves #261
resolves #275
resolves #268
Main changes
expect_equal()toexpect_identical()in tests, see Replaceexpect_equal()withexpect_identical()in tests where possible #268Changes to tests
The test for download_acoustic_dataset() now uses the OS temp dir instead of a folder in the package root.
Add dependency on frictionless. In test-download_acoustic_dataset.R we now use frictionless to actually check if the produced datapackage can be read without any warnings. I've also added checks for any fields that might be missing from the schema or in the wrong order. This turned out to be the case, these were added to the datapackage.json file that is used for testing.
I've also switched over test-download_acoustic_dataset.R over to using snapshots to check for the download messaging instead of the boolean character matching from before. This often caused the test to fail for reasons that were unclear, this method stores a standardized markdown file that is committed of the messaging that can be examined via a diff. This workflow is part of the 3rd edition of testthat, and I've switched the package over to this version. This included making some changes to other tests, namely switching over
expect_is()toexpect_type().The snapshot could also notify us if the result of demer_2014 changes, and this is how I implemented it originally, but at the moment I'm only checking the console out (cat). And not the actual files generated as these should be covered in the other tests. If we want to include this data in the package, we need to make sure the rights are cleared under the repo license.
I've silenced some of the messaging internal to the tests to clean up the console output during package checking.
Added a test for list_values to check the messaging
Notes / Possible new issues
The test for get_acoustic_detections() is slow
A number of optimizations are possible, most of the time goes to multi-select queries like
get_acoustic_detections(con, acoustic_tag_id = c("A69-1601-16129", "A69-1601-16130"))that can take around ten seconds to complete. I was expecting queries like this to be quicker.Quite a bit of code is more then 80 columns wide
We could consider opening issues for these. But neither is urgent since they only impact development workflow.
Created on 2023-02-03 with reprex v2.0.2
Slightly different locally:
Created on 2023-02-03 with reprex v2.0.2
I suggest we add a year (year of latest release) for easy of copy pasting.