Run SQL queries on Snowflake, Redshift, Postgres, SQLite, or a local DuckDB database from R using a single function: queryDB().
snowquery now also supports streaming remote query results directly into a local DuckDB file for fast, repeatable, low‑cost analysis.
# The easiest way to get snowquery
install.packages("snowquery")
# Or you can get the development version from GitHub
# install.packages("devtools")
devtools::install_github("mermelstein/snowquery")Redshift is currently only available on the development version of this package. See installation instructions above.
When connecting to a Redshift DWH you might need to specify an SSL connection. You can do this with a sslmode='require' connection variable or by passing that to the queryDB() function directly.
Because Snowflake's driver requires a ton of fiddling in order to make it work for R. It sucks. A lot.
To sum up the current experience of running SQL against Snowflake from:
- python: good ✅
- R: bad ❌
That's why the snowquery package takes the Snowflake python connector and leverages it in the background to run queries from R.
For more information on using snowquery, please see the package website.
Base R dependencies are installed automatically via the DESCRIPTION. Additional runtime needs:
- DuckDB caching:
duckdb(installed automatically when you install the package, but you may also install manually withinstall.packages("duckdb")). - Snowflake: A Python 3 runtime accessible to
reticulate. The package will automatically install / upgradesnowflake-connector-python[pandas]the first time you run a Snowflake query. No manualpip installis required in normal use. - Redshift / Postgres: Handled through
RPostgres(already imported). If you need SSL, setsslmodein the YAML or pass it directly. - SQLite: Provide a file path in the credential YAML (see below).
If the package cannot find a working Python, you'll receive an actionable error explaining what to install.
For all db connections you will need a YAML file called snowquery_creds.yaml at ~/snowquery_creds.yaml with one or more named connections:
---
my_snowflake_dwh:
db_type: snowflake
account: your_account
warehouse: your_wh
database: your_db
user: your_user # note: key is 'user' not 'username' for snowflake
password: your_pw
role: your_role
my_redshift_dwh:
db_type: redshift
host: redshift-cluster.host.aws.com
port: 5439
database: analytics
username: rs_user
password: rs_pw
sslmode: require
my_postgres_db:
db_type: postgres
host: localhost
port: 5432
database: pg_db
username: pg_user
password: pg_pw
my_sqlite_db:
db_type: sqlite
database: /path/to/local.sqlite
my_duckdb_local:
db_type: duckdb # connects to the default analytics.duckdb file in working dir
This follows a named connection format, where you can have multiple named connections in the same file. For example you might have a my_snowflake_dwh connection and a my_snowflake_admin connection, each with their own credentials.
This package looks for the credential file at this location: ~/snowquery_creds.yaml. If it is in any other location it will not work. If the package cannot locate the file you will receive an error like: cannot open file '/expected/path/to/file/snowquery_creds.yaml': No such file or directory. You can manually pass credentials to the queryDB() function but it is recommended to use the YAML file.
You are now ready to query away!
Load this library in your R environment with library(snowquery).
queryDB("SELECT * FROM MY_AWESOME_TABLE", conn_name='my_snowflake_dwh')or
# Query Snowflake and get results as a data frame
results <- queryDB("SELECT * FROM my_large_table LIMIT 1000", conn_name = "my_snowflake_dwh")or
# You can also pass in credentials manually
results <- queryDB("SELECT * FROM my_table",
db_type='snowflake',
username='my_username',
password='my_password',
account='my_account',
database='my_database',
warehouse='my_warehouse',
role='my_role',
timeout=30)
print(results)Provide cache_table_name to stream results into the local DuckDB file analytics.duckdb:
# Cache a Snowflake query (streaming batches when possible)
queryDB(
"SELECT * FROM MY_SCHEMA.BIG_FACT_TABLE WHERE load_date >= '2025-09-01'",
conn_name = "my_snowflake_dwh",
cache_table_name = "big_fact_local",
overwrite = TRUE
)
# message: Successfully cached 1234567 rows to DuckDB table 'big_fact_local' (analytics.duckdb).
# Cache a Postgres query (uses dplyr/dbplyr lazy streaming under the hood)
queryDB(
"SELECT id, event_ts, metric FROM events WHERE event_ts >= now() - interval '7 days'",
conn_name = "my_postgres_db",
cache_table_name = "recent_events",
overwrite = TRUE
)Key behaviors:
- Snowflake path streams using
fetch_pandas_batches()when available; otherwise falls back to a single fetch. - Postgres / Redshift path uses a lazy dplyr table registered into DuckDB for efficient transfer.
overwrite = TRUEcreates/replaces the DuckDB table; setoverwrite = FALSEto append (for Snowflake append happens batch‑by‑batch; for Postgres/Redshift you can modify logic as needed).- You cannot cache from a DuckDB source (
db_type == 'duckdb').
Add a DuckDB connection to the YAML (see my_duckdb_local example above) and query cached tables:
local_summary <- queryDB(
"SELECT COUNT(*) AS n_rows FROM big_fact_local",
conn_name = "my_duckdb_local"
)
local_summaryYou can also use DBI or dplyr directly:
library(DBI)
duck_con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "analytics.duckdb")
DBI::dbGetQuery(duck_con, "SELECT * FROM recent_events LIMIT 5")
DBI::dbDisconnect(duck_con, shutdown = TRUE)There is one function you need: queryDB(). It will take a SQL query as a string parameter and run it on the db.
For example:
library(snowquery)
query <- "SELECT * FROM MY_AWESOME_TABLE"
result_dataframe <- queryDB(query, conn_name='my_snowflake_dwh')
print(result_dataframe)or
library(snowquery)
queryDB("SELECT * FROM MY_AWESOME_TABLE", conn_name='my_snowflake_dwh')or
library(snowquery)
# You can also pass in credentials manually
result <- queryDB("SELECT * FROM my_table",
db_type='snowflake',
username='my_username',
password='my_password',
account='my_account',
database='my_database',
warehouse='my_warehouse',
role='my_role',
timeout=30)
print(result)Workflow:
- Pull once from a remote DWH.
- Iterate locally against DuckDB for joins, aggregations, prototyping.
- Refresh by re‑running with
overwrite = TRUE, or append withoverwrite = FALSE(Snowflake / append path: ensure schema consistency).
Why DuckDB?
- Fast analytical execution (vectorized / columnar).
- Lightweight (no server to run).
- Plays nicely with data frames and dplyr (
dbplyrtranslations work out of the box for Postgres/Redshift streaming path).
Limitations / Notes:
- Snowflake streaming depends on connector feature availability; falls back to full fetch if batch iterator missing.
- Appending from heterogenous schemas is not validated automatically.
- No explicit indexing (internal zone maps generally sufficient).
- Caching from a DuckDB source is intentionally blocked (it is already local).
Planned (future): progress display, verbosity flag, helper to enumerate cached tables.