Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@stephhazlitt
Copy link
Member

Changes from @ateucher from the DIP SRE.

@BonnieJRobert BonnieJRobert self-requested a review July 24, 2025 21:56
person("Sam", "Albers", , "[email protected]", role = c("aut", "cre")),
person("Andy", "Teucher", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0002-7840-692X")),
person("Bonnie", "Robert", , "[email protected]", role = "aut"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
person("Bonnie", "Robert", , "bonnie.robbie@gov.bc.ca", role = "aut"),
person("Bonnie", "Ashcroft", , "bonnie.ashcroft@gov.bc.ca", role = "aut"),

#' `list(thousands = ",", comment = "#", force_not_null = c("a", "b"), null_padding = TRUE, buffer_size = 5)`
#'
#' @return a character string of the query, to be called with [DBI::dbExecute()]
#' @export
Copy link
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example here would be helpful - using one of the built-in datasets, perhaps?

Copy link
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @export
#' @export
#' @examples
#' \dontrun{
#' con <- DBI::dbConnect(duckdb::duckdb())
#' p <- "./inst/extdata/starwars-dict.csv"
#' spec <- dput(sniff_csv_duckdb(p)$schema)
#' csv_to_duckdb_query(con, p, "starwars", spec)
#' }

nullstr = NULL,
opts = list()
) {

Copy link
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not necessary given the users and use cases of dipr, but I wonder about guarding against SQL injection. In this case via the tbl_name parameter:

if (grepl(";", tbl_name) stop(glue::glue("possible SQL injection introduced via tablename: {tbl_name}"))

Comment on lines +74 to +77
old_column_names <- names(schema)

names(new_column_names) <- old_column_names
names(old_column_names) <- old_column_names
Copy link
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is confusing; code comments would be helpful.


check_params(con, opts)

varchar_scalar_opts <- Filter(\(x) is.character(x) & length(x) == 1, opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to extract the code that constructs the various *_opts_parsed variables into individual functions. Consider refactoring into smaller, focused functions like:
format_varchar_scalar_opts <- function(opts) { ... }
format_varchar_array_opts <- function(opts) { ... } etc.

)
}

make_select_statement <- function(schema, col_rename = NULL, dt_override = NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the final SQL CREATE or REPLACE TABLE statement is executed via DBI::dbExecute(), column names with spaces (like First Name AS first_name) will cause SQL syntax errors. The dipr functions are behaving as expected, but it could be a "nice to have" to warn against this. Here's an example:

> print(strSQL)
CREATE OR REPLACE TABLE starwars AS SELECT Name Abbr AS name_abbr, Name AS name FROM read_csv( './inst/extdata/starwars-dict.csv', types = { 'Name Abbr': 'VARCHAR', 'Name': 'VARCHAR' })
> DBI::dbExecute(con,strSQL)

This will cause dbSendQuery() to error.

@BonnieJRobert
Copy link
Member

I focused my review on the code in the csv-duckdb.R file and ran a few toy examples. The code runs as expected with no serious errors that I could reproduce. However I highlighted a couple areas of concern. Additionally, the master branch should be renamed to main and there is a stale branch from several years back with a few updates to the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants