-
Notifications
You must be signed in to change notification settings - Fork 4k
Recognise spark connect datatype. #9954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I was able to verify the fix Before: After: Setup Spark Connect Tutorial: https://medium.com/@yssmelo/spark-connect-launch-spark-applications-anywhere-with-the-client-server-architecture-dbt-f99399c566fe Sample App: import streamlit as st
from pyspark.sql import SparkSession
# spark = SparkSession.builder.getOrCreate()
spark = SparkSession.builder.remote("sc://0.0.0.0:15002").getOrCreate()
from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row
df = spark.createDataFrame([
Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])
st.write(type(df))
st.table(df) |
|
@raethlein I noticed earlier that this PR failed on one of the linters. Would you like me to apply the changes to fix or are you in the middle of making more changes here? I don't have a preference either way |
|
@OSalama It looks like I cannot push to your branch with some formatting to get the Update: Oh I have just seen your comment 🙂 I wanted to help out but I don't seem to be able to push to your branch. Would you mind trying to run the formatter? |
Done! |
raethlein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 Thank you!
## Describe your changes Treats `pyspark.sql.connect.dataframe.DataFrame` as a PySpark dataframe. ## GitHub Issue Link (if applicable) streamlit#9953 ## Testing Plan - Added mock + test to check it gets properly recognised. - I've applied this fix in my production environment, with a Spark connect dataframe which was previously failing to plot, and it now gets processing correctly. --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license. --------- Co-authored-by: Benjamin Räthlein <[email protected]>


Describe your changes
Treats
pyspark.sql.connect.dataframe.DataFrameas a PySpark dataframe.GitHub Issue Link (if applicable)
#9953
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.