Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Support transactions (and pandas.to_sql / read_sql_table) #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
C0DK opened this issue Dec 14, 2022 · 5 comments
Closed

Support transactions (and pandas.to_sql / read_sql_table) #72

C0DK opened this issue Dec 14, 2022 · 5 comments

Comments

@C0DK
Copy link

C0DK commented Dec 14, 2022

It's a great project, and truly helps me in a variety of ways. However, I am having a few issues when utilizing it with Pandas.

When reading a table I would like to do:

    def read(
        self, catalog_name: str, schema_name: str, table_name: str
    ) -> DataFrame:
        with self._get_connection(catalog_name) as connection:
            iterator = pd.read_sql_table(
                schema=schema_name,
                table_name=table_name,
                con=connection,
            )

This should return a pandas dataframe.

However, I am required to do the following lacking features from your otherwise great connection.

def read(
        self, catalog_name: str, schema_name: str, table_name: str
    ) -> pd.DataFrame:
        with self._get_connection(catalog_name) as connection:
            query = f"SELECT * FROM {catalog_name}.{schema_name}.{table_name}"
            self.logger.debug("Running query '%s'", query)
            df = pd.read_sql(query, connection)
            self.logger.debug(df)
            return df

Similarly I cannot use the to_sql on a dataframe where i'd like to do something like the following:

    def write(
        self,
        dataframe: pd.DataFrame,
        catalog_name: str,
        schema_name: str,
        table_name: str,
    ) -> pd.DataFrame:
        with self._get_connection(catalog_name) as connection:
            dataframe.to_sql(name=table_name, con=connection, schema=schema_name)

Both fail

databricks.sql.exc.NotSupportedError: Transactions are not supported on Databricks

Versions:

python 3.10.6
databricks-sql-connector==2.2.1
pandas==1.5.2

@susodapop
Copy link
Contributor

I'm not surprised you're seeing issues here. databricks-sql-connector doesn't support SQLAlchemy as of 2.2.1. We are working on a native dialect in a feature branch. And in our e2e tests with this native dialect read_sql works as you'd expect. We'll see if your issue reproduces once we merge to main. For now I'm closing this as we would not expect it to work unless you've installed one of our dev builds.

@susodapop
Copy link
Contributor

Also note that Databricks doesn't support transactions. But this isn't required for SQLAlchemy to work. In our native dialect we disable these warnings 👍

@susodapop
Copy link
Contributor

susodapop commented Dec 15, 2022

Final: you can follow the development of the internal dialect here #57. Here's an e2e test that uses read_sql_table successfully: https://github.com/databricks/databricks-sql-python/pull/57/files#diff-8390b059d76e4a7f773fb17db2682302d2fbe1768408b5dc666a1bab0e576ad1R43-R56

@C0DK
Copy link
Author

C0DK commented Dec 19, 2022

do you know how i'd be able to write a table then? I'd rather not have to write the whole SQL statement by hand. I can use the read_sql fine, but I don't have a similar thing working where it can write a table.

@susodapop
Copy link
Contributor

susodapop commented Dec 19, 2022

You can use the experimental dialect or wait til it merges to main (should be a few weeks, we're in late beta now). With the dialect installed you just call to_sql and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants