Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Sqlalchemy dev POC #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
dd3cf40
Scaffold basic file structure for a new dialect
susodapop Jul 7, 2022
e13fb3b
barebone (& non-working) DatabricksDialect implementation
overcoil Jul 13, 2022
0ee8d26
reminder about TABLE_SCHEM
overcoil Jul 13, 2022
2742a92
initial checkin with working pytest test suites.
overcoil Jul 19, 2022
c6c1322
remove secret in comment
overcoil Jul 19, 2022
a594e7c
minor corrections
overcoil Jul 19, 2022
595178b
add cleanup and notes for self
overcoil Jul 20, 2022
d7e72a8
add sample programs & prelim README
overcoil Jul 20, 2022
59bf8b7
add prelim support for string and derived types
overcoil Jul 22, 2022
44a9e32
tidy up for the week; pulled out partial interval support for the while
overcoil Jul 22, 2022
6463fef
add a block of test suites from SA
overcoil Jul 26, 2022
d3bd5d2
Trial add of Github Action for the SQLAlchemy dialect
overcoil Jul 26, 2022
8716645
yml error
overcoil Jul 26, 2022
d75710a
add self-reference to trigger a run when the action is update
overcoil Jul 26, 2022
55e4448
correct usage of env var
overcoil Jul 27, 2022
dc70f90
move to repo secrets instead
overcoil Jul 27, 2022
d6f98dd
correct drop pseudo-targets
overcoil Jul 27, 2022
8e480ad
add trigger on the top-level Makefile (convenience)
overcoil Jul 27, 2022
d68ddb5
add dbsqlcli for cleanup
overcoil Jul 27, 2022
c6a9386
add init invocation of dbsqlcli
overcoil Jul 27, 2022
01f5a16
override the return code
overcoil Jul 27, 2022
d121881
wrong conjunction!
overcoil Jul 27, 2022
55aa104
correct table name
overcoil Jul 27, 2022
6655473
restrict to the dev branch for the while
overcoil Jul 27, 2022
629a510
various minor items
overcoil Aug 4, 2022
01a4d00
Merge branch 'sqlalchemy-dev' of https://github.com/overcoil/fork-dat…
overcoil Aug 9, 2022
dd79718
Cleaned up and reformatted
overcoil Aug 10, 2022
2a8b36d
removed unneeded cruft from earlier experiments
overcoil Aug 10, 2022
1fcdfc2
missed the other Makefile
overcoil Aug 10, 2022
488b94d
revert unneeded changes; remove dead template
overcoil Aug 10, 2022
7ba1f2a
remove unneeded action; expand decimal test case to cover default and…
overcoil Aug 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 21 additions & 219 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions src/databricks/sqlalchemy/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*env

130 changes: 130 additions & 0 deletions src/databricks/sqlalchemy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Introduction

This is work-in-progress of a SQLAlchemy dialect for Databricks.

The dialect is embedded within the Databricks SQL Connector.

## Connection String

Using the dialect requires the following:

1. SQL Warehouse hostname
2. Endpoint
3. Access token

The schema `default` is used unless an alternate is specified via _Default-schema_.

The connection string is constructed as follows:

`databricks+thrift://token:`_Access-token_`@`_SQL-warehouse-hostname_`/`_Default-schema_`?http_path=`_Endpoint_


## Data Types

|Databricks type| SQLAlchemy type | Extra|
|:-|:-|:-|
`smallint` | `integer` |
`int` | `integer` |
`bigint` | `integer` |
`float` | `float` |
`decimal` | `float` |
`boolean` | `boolean` |
`string` | WIP |
`date` | WIP |
`timestamp` | WIP |



## Sample Code

The focus of this dialect is enabling SQLAlchemy Core (as opposed to SQLAchemy ORM).



### The Simplest Program

A program (see [`sample-app-select.py`](https://github.com/overcoil/fork-databricks-sql-python/blob/sqlalchemy-dev/src/databricks/sqlalchemy/sample-app-select.py)) to read from a Databricks table looks roughly as follows:

```Python
import os

from sqlalchemy import create_engine
from sqlalchemy import MetaData
from sqlalchemy import Table, Column, Integer, BigInteger, Float, Boolean
from sqlalchemy import select

# pickup settings from the env
server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
default_schema = os.getenv("DATABRICKS_SCHEMA")

# use echo=True for verbose log
engine = create_engine(f"databricks+thrift://token:{access_token}@{server_hostname}/{default_schema}?http_path={http_path}", echo=False, future=True)

metadata_obj = MetaData()

# NB: sample_numtypes is a pre-created/populated table
tableName = "sample_numtypes"

# declare the schema we're expecting
numtypes = Table(
tableName,
metadata_obj,
Column('f_byte', Integer),
Column('f_short', Integer),
Column('f_int', Integer),
Column('f_long', BigInteger),
Column('f_float', Float),
Column('f_decimal', Float),
Column('f_boolean', Boolean)
)

# SELECT * FROM t WHERE f_byte = -125
stmt = select(numtypes).where(numtypes.c.f_byte == -125)
print(f"Attempting to execute: {stmt}\n")

print(f"Rows from table {tableName}")

with engine.connect() as conn:
for row in conn.execute(stmt):
print(row)
```


### Table definition via reflection
Reflection may be used to recover the schema of a table dynamically via [the `Table` constructor's `autoload_with` parameter](https://docs.sqlalchemy.org/en/14/core/reflection.html).

```Python
some_table = Table("some_table", metadata_obj, autoload_with=engine)
stmt = select(some_table).where(some_table.c.f_byte == -125)
...
```

### INSERT statement
```Python

```

### Unmanaged table creation
```Python
# TODO
metadata_obj = MetaData()
user_table = Table(
"user_account",
metadata_obj,
Column('id', Integer, primary_key=True),
Column('name', String(30)),
Column('fullname', String)
)
metadata_obj.create_all(engine)
```

### Direct access to Spark SQL
```Python
# TODO: does this work?
with engine.connect() as conn:
result = conn.execute(text("VACCUM tablename"))
print(result.all())
```

Loading