-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Labels
Description
Description
For some reason, some functions called as a part of the pipeline take an incredible amount of time when compared to calling them directly.
Minimal reproducible example
_targets.R
library(targets)
get_toy_data <- function() {
library(data.table)
dt <- data.table(
id = sample(1:1e8, 3000)
)
dt <- dt[
,
.(date1 = seq.Date(as.Date("2017-01-01"), as.Date("2017-01-31"), by = "day")),
by = id
]
dt <- dt[
,
.(date2 = seq.Date(as.Date(date1) - 10, as.Date(date1), by = "day")),
by = .(id, date1)
]
dt[
,
avail := sample(c(rep(FALSE, 9), TRUE), size = nrow(dt), replace = TRUE)
]
return(dt[])
}
f <- function(data) {
start_time <- Sys.time()
library(data.table)
data[avail == TRUE, date2_if_available := date2]
data[
,
date2_first_available := min(date2_if_available, na.rm = TRUE),
by = .(id, date1)
]
end_time <- Sys.time()
print(end_time - start_time)
return(data[])
}
list(
tar_target(data, get_toy_data()),
tar_target(fdata, f(data))
)
Now when I run tar_make(fdata), the fdata target takes more than 2 minutes to complete. When I run f(tar_read(data)) instead, it only takes around 2 seconds. There is no issue with the data target. I have tried this on both macOS and Ubuntu, both inside and outside of renv, getting similar results everytime. What might be the issue here?