Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@OSalama
Copy link
Contributor

@OSalama OSalama commented Dec 2, 2024

Describe your changes

Treats pyspark.sql.connect.dataframe.DataFrame as a PySpark dataframe.

GitHub Issue Link (if applicable)

#9953

Testing Plan

  • Added mock + test to check it gets properly recognised.
  • I've applied this fix in my production environment, with a Spark connect dataframe which was previously failing to plot, and it now gets processing correctly.

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@raethlein
Copy link
Collaborator

raethlein commented Dec 10, 2024

I was able to verify the fix

Before:

Screenshot 2024-12-10 at 13 56 11

After:

Screenshot 2024-12-10 at 13 56 57


Setup Spark Connect Tutorial: https://medium.com/@yssmelo/spark-connect-launch-spark-applications-anywhere-with-the-client-server-architecture-dbt-f99399c566fe

Sample App:

import streamlit as st

from pyspark.sql import SparkSession

# spark = SparkSession.builder.getOrCreate()

spark = SparkSession.builder.remote("sc://0.0.0.0:15002").getOrCreate()

from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row

df = spark.createDataFrame([
    Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
    Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
    Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])
st.write(type(df))
st.table(df)

@raethlein raethlein added feature:st.dataframe Related to the `st.dataframe` element security-assessment-completed Security assessment has been completed for PR impact:users PR changes affect end users labels Dec 10, 2024
@raethlein raethlein added the change:feature PR contains new feature or enhancement implementation label Dec 10, 2024
@OSalama
Copy link
Contributor Author

OSalama commented Dec 10, 2024

@raethlein I noticed earlier that this PR failed on one of the linters. Would you like me to apply the changes to fix or are you in the middle of making more changes here? I don't have a preference either way

@raethlein
Copy link
Collaborator

raethlein commented Dec 10, 2024

@OSalama It looks like I cannot push to your branch with some formatting to get the Enforce Pre-Commit Hooks gate to pass. Could you try to run the precommit hooks locally (or at least try to run ruff format) which seems to be the issue here?

Update: Oh I have just seen your comment 🙂 I wanted to help out but I don't seem to be able to push to your branch. Would you mind trying to run the formatter?

@OSalama
Copy link
Contributor Author

OSalama commented Dec 10, 2024

@OSalama It looks like I cannot push to your branch with some formatting to get the Enforce Pre-Commit Hooks gate to pass. Could you try to run the precommit hooks locally (or at least try to run ruff format) which seems to be the issue here?

Update: Oh I have just seen your comment 🙂 I wanted to help out but I don't seem to be able to push to your branch. Would you mind trying to run the formatter?

Done!

Copy link
Collaborator

@raethlein raethlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 Thank you!

@raethlein raethlein merged commit 53abad4 into streamlit:develop Dec 10, 2024
29 checks passed
@raethlein raethlein mentioned this pull request Dec 10, 2024
2 tasks
edegp pushed a commit to edegp/streamlit that referenced this pull request Jan 19, 2025
## Describe your changes
Treats `pyspark.sql.connect.dataframe.DataFrame` as a PySpark dataframe.

## GitHub Issue Link (if applicable)
streamlit#9953

## Testing Plan
- Added mock + test to check it gets properly recognised.
- I've applied this fix in my production environment, with a Spark
connect dataframe which was previously failing to plot, and it now gets
processing correctly.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.

---------

Co-authored-by: Benjamin Räthlein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation feature:st.dataframe Related to the `st.dataframe` element impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants