Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@PietrH
Copy link
Member

@PietrH PietrH commented Jan 2, 2023

resolves #261
resolves #275
resolves #268

Main changes

  • Added single line to sql query, adapted test to expect the new column
  • Added missing to datapackage.json schema
  • I also made some small stylistic changes to some other functions, removing trailing whitespace, reducing column width.
  • Bumped minor version number
  • Created NEWS.md
  • swapped expect_equal() to expect_identical() in tests, see Replace expect_equal() with expect_identical() in tests where possible #268

Changes to tests

  • The test for download_acoustic_dataset() now uses the OS temp dir instead of a folder in the package root.

  • Add dependency on frictionless. In test-download_acoustic_dataset.R we now use frictionless to actually check if the produced datapackage can be read without any warnings. I've also added checks for any fields that might be missing from the schema or in the wrong order. This turned out to be the case, these were added to the datapackage.json file that is used for testing.

  • I've also switched over test-download_acoustic_dataset.R over to using snapshots to check for the download messaging instead of the boolean character matching from before. This often caused the test to fail for reasons that were unclear, this method stores a standardized markdown file that is committed of the messaging that can be examined via a diff. This workflow is part of the 3rd edition of testthat, and I've switched the package over to this version. This included making some changes to other tests, namely switching over expect_is() to expect_type().

  • The snapshot could also notify us if the result of demer_2014 changes, and this is how I implemented it originally, but at the moment I'm only checking the console out (cat). And not the actual files generated as these should be covered in the other tests. If we want to include this data in the package, we need to make sure the rights are cleared under the repo license.

  • I've silenced some of the messaging internal to the tests to clean up the console output during package checking.

  • Added a test for list_values to check the messaging

Notes / Possible new issues

  • The test for get_acoustic_detections() is slow
    A number of optimizations are possible, most of the time goes to multi-select queries like get_acoustic_detections(con, acoustic_tag_id = c("A69-1601-16129", "A69-1601-16130")) that can take around ten seconds to complete. I was expecting queries like this to be quicker.

  • Quite a bit of code is more then 80 columns wide

We could consider opening issues for these. But neither is urgent since they only impact development workflow.

  • citation("etn") output is missing a year:
citation("etn")
#> Warning in citation("etn"): no date field in DESCRIPTION file of package 'etn'
#> Warning in citation("etn"): could not determine year for 'etn' from package
#> DESCRIPTION file
#> 
#> To cite package 'etn' in publications use:
#> 
#>   Peter Desmet, Damiano Oldoni and Stijn Van Hoey (NA). etn: Access
#>   Data from the European Tracking Network. https://github.com/inbo/etn,
#>   https://inbo.github.io/etn.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {etn: Access Data from the European Tracking Network},
#>     author = {Peter Desmet and Damiano Oldoni and Stijn {Van Hoey}},
#>     note = {https://github.com/inbo/etn, https://inbo.github.io/etn},
#>   }

Created on 2023-02-03 with reprex v2.0.2

Slightly different locally:

citation("etn")
#> Warning in citation("etn"): no date field in DESCRIPTION file of package 'etn'
#> Warning in citation("etn"): could not determine year for 'etn' from package
#> DESCRIPTION file
#> 
#> To cite package 'etn' in publications use:
#> 
#>   Desmet P, Oldoni D, Van Hoey S (????). _etn: Access Data from the
#>   European Tracking Network_. https://github.com/inbo/etn,
#>   https://inbo.github.io/etn.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {etn: Access Data from the European Tracking Network},
#>     author = {Peter Desmet and Damiano Oldoni and Stijn {Van Hoey}},
#>     note = {https://github.com/inbo/etn, https://inbo.github.io/etn},
#>   }

Created on 2023-02-03 with reprex v2.0.2

I suggest we add a year (year of latest release) for easy of copy pasting.

@PietrH
Copy link
Member Author

PietrH commented Jan 3, 2023

Should we do a minor version bump for this change?

@peterdesmet peterdesmet self-requested a review January 3, 2023 14:40
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

  1. Any reasoning behind the position of the field?
  2. Please also include the field in
    {
    "name": "signal_to_noise_ratio",
    "type": "integer"
    },
    (there is currently no test for this, will create issue for that)
  3. Minor version bump sounds good to me

@PietrH
Copy link
Member Author

PietrH commented Jan 4, 2023

Nice!

1. Any reasoning behind the position of the field?

2. Please also include the field in https://github.com/inbo/etn/blob/35e70f3d284cb64494313e667f238378564d494b/inst/assets/datapackage.json#L569-L572
    (there is currently no test for this, will create issue for that)

3. Minor version bump sounds good to me

1. Position of field

The documentation for get_acoustic_detections refers to the field definitions (which seem out of date), which refer to the following csv file: https://github.com/inbo/etn/blob/main/inst/assets/etn_fields.csv, in this csv on row 112, there is a field sensor_value_depth, which I suspect depth_in_meters is based on. I'm also reasoning that depth_in_meters is a property of the sensor, not the animal itself. Thus it should follow fields to do with the sensor rather than the animal.

I don't have strong feelings about this, apart that I didn't want to add it to the end. I'm fine with moving it to be grouped with the animal fields such as scientific_name

  • Move depth_in_meters to after deploy_longitude

datapackage

Minor version bump

Will do! Thank you for the review

@peterdesmet
Copy link
Member

Thanks, I would suggest to have the field immediately after deploy_longitude then, as part of the "location" information.

@PietrH
Copy link
Member Author

PietrH commented Jan 4, 2023

Will do

@PietrH
Copy link
Member Author

PietrH commented Jan 19, 2023

datapackage.json is in the wrong order, and is missing 5 more fields from the tags table: length, diameter, weight, floating, and archive_memory.

However, I can't find any examples of these fields in use in the tables I have access to, apart from archive memory in these 3 examples:

[1] "shad_scheldt_dst"
# A tibble: 1 × 5
  length diameter weight floating archive_memory
   <dbl>    <dbl>  <dbl> <chr>    <chr>         
1     NA       NA     NA NA       2 MB          
[2] "FISHINTEL"
# A tibble: 2 × 5
  length diameter weight floating archive_memory
   <dbl>    <dbl>  <dbl> <chr>    <chr>         
1     NA       NA     NA NA       2             
2     NA       NA     NA NA       8             
[3] "2018_EC"
# A tibble: 1 × 5
  length diameter weight floating archive_memory
   <dbl>    <dbl>  <dbl> <chr>    <chr>         
1     NA       NA     NA NA       64 MB  

It seems like archive_memory should be a string, length, diameter and weight seem to be doubles in the returned table, but I have no values to make sure. floating seems to be a string as well.

This didn't come to light earlier because there was no test for this in test-download_acoustic_dataset.R

@peterdesmet
Copy link
Member

@PietrH these fields are directly taken from:

etn/R/get_tags.R

Lines 129 to 133 in 35e70f3

tag_device.archive_length AS length,
tag_device.archive_diameter AS diameter,
tag_device.archive_weight AS weight,
tag_device.archive_floating AS floating,
tag_device.device_internal_memory AS archive_memory,

The data type for those fields in the database is:

DBI::dbGetQuery(con, "
  SELECT
    pg_typeof(archive_length) AS length,
    pg_typeof(archive_diameter) AS diameter,
    pg_typeof(archive_weight) AS weight,
    pg_typeof(archive_floating) AS floating,
    pg_typeof(device_internal_memory) AS memory
  FROM
    common.tag_device_limited
  LIMIT 1
")

            length         diameter           weight floating            memory
1 double precision double precision double precision  boolean character varying

Looks like dplyr interpreted the boolean as string 🤷‍♂️

So in frictionless:

  • length: number
  • diameter: number
  • weight: number
  • floating: boolean
  • memory: string

@PietrH PietrH self-assigned this Feb 2, 2023
assertthat::assert_that(is.logical(limit), msg = "limit must be a logical: TRUE/FALSE.")
if (limit) {
limit_query <- glue::glue_sql("LIMIT 100", .con = connection)
limit_query <- glue::glue_sql("ORDER BY det.id_pk LIMIT 100", .con = connection)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering is necessary to ensure the result of the filter is always the same, if we leave out the ORDER BY, then a test will fail. However, this step is very expensive, defeating the point of the LIMIT.

Consider restoring to unordered limiting, but documenting the behavior and perhaps rewriting the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering makes the examples and testing stages too slow:

✔  checking examples (40m 42.8s)
   Examples with CPU or elapsed time > 5s
                             user system  elapsed
   get_animals             12.828  0.092   15.457
   get_acoustic_detections 10.664  0.552 2406.708
   get_tags                 6.940  0.040    8.048

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting actually seems to be a little bit slower than not limiting, at least in some circumstances:

Unit: seconds
                                                                              expr
  get_acoustic_detections(con, acoustic_project_code = "demer",      limit = TRUE)
 get_acoustic_detections(con, acoustic_project_code = "demer",      limit = FALSE)
      min       lq     mean   median       uq      max neval
 151.7327 154.1687 164.0103 156.6048 170.1491 183.6935     3
 110.5260 112.2894 123.5281 114.0527 130.0292 146.0056     3


# Selection is case insensitive
expect_equal(
get_acoustic_detections(con, acoustic_project_code = "demer", limit = TRUE),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two (equivalent) queries each create a 33MB object, we should probably look for acoustic_project_code values that result in smaller objects for testing, that would speed things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try 2015_HOMARUS

PietrH added 4 commits May 4, 2023 10:32
We are not sorting the limit after all, because the sort operation is so expensive there would be no point limiting anymore.
@PietrH
Copy link
Member Author

PietrH commented May 4, 2023

@peterdesmet ready for review

Sometimes it passes, sometimes it does not. Seems to fail when doing devtools::check() but pass in console or when running tests for a single file. The test itself seems fine, I suspect there might be duplicate detection_id's in the acoustic detentions table.

The object that we test here df could be different every run, because it uses a limit without an order by statement.

  • Feel free to fix style issues directly, or just let me know.

@PietrH PietrH requested a review from damianooldoni June 1, 2023 07:46
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, sorry it took so long! 😅

  • I would move R/testthat-helpers.R to tests/testthat/helper.R (cf. camtraptor)
  • Made a number of stylistic changes, but would be good if all are tested again
  • Are changes made to the citation now?

@PietrH
Copy link
Member Author

PietrH commented Nov 21, 2023

Tests pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants