Conversation
…s.txt and split out requirements-dev.txt. Version bumps.
…ney/build-upgrades
…ney/build-upgrades
There was a problem hiding this comment.
Let's move it to python/tests/test.py?
There was a problem hiding this comment.
We should really split it up into parts, but I want to wait and just keep it tests.py for now. I'll do another PR soon to break it up into like tests/connected_components.py and stuff.
python/graphframes/tests.py
Outdated
| "spark.submit.pyFiles", | ||
| os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"), | ||
| ) | ||
| cls.sc = SparkContext(master="local[4]", appName="GraphFramesTests", conf=cls.conf) |
There was a problem hiding this comment.
Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506
There was a problem hiding this comment.
@SemyonSinchenko do you mean this? Can I do the equivalent in Python?
case MethodCase.CONNECTED_COMPONENTS => {
val cc = apiMessage.getConnectedComponents
graphFrame.connectedComponents
.setAlgorithm(cc.getAlgorithm)
.setCheckpointInterval(cc.getCheckpointInterval)
.setBroadcastThreshold(cc.getBroadcastThreshold)
.run()
}There was a problem hiding this comment.
@SemyonSinchenko it looks like this code was removed, so no longer an issue?
There was a problem hiding this comment.
@SemyonSinchenko sorry, what about this? I was confused - I see checkpointing happening elsewhere.
cls.spark._jsc.setCheckpointDir(cls.checkpointDir)There was a problem hiding this comment.
Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506
I don't think so, no. At least I do not know how to accomplish that. My attempt went badly :)
| def spark_session(): | ||
| # Create a SparkSession with a smaller number of shuffle partitions. | ||
| spark = ( | ||
| SparkSession(GraphFrameTestUtils.sc) |
There was a problem hiding this comment.
We can just explicitly pass all the same confs (app name, etc.) to the SparkSession.builder
There was a problem hiding this comment.
Sorry, what do you want me to do here? :) I don't understand.
There was a problem hiding this comment.
@SemyonSinchenko I want to skip this comment... it is hard to keep things working once I monkey with how this code works. I spent an hour and I'd rather leave this alone, ship pytest and then Spark Connect :)
python/graphframes/tests.py
Outdated
| cls.conf = SparkConf().setAppName("GraphFramesTests") | ||
| cls.conf.set( | ||
| "spark.submit.pyFiles", | ||
| os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"), |
There was a problem hiding this comment.
Why do we need it? If the project is installed via poetry install to the same venv, like pyspark, it will be here anyway.
| ranks = ( | ||
| graph.pregel.setMaxIter(5) | ||
| .withVertexColumn( | ||
| "rank", | ||
| F.lit(1.0 / numVertices), | ||
| F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | ||
| + F.lit(alpha / numVertices), | ||
| ) | ||
| .sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree")) | ||
| .aggMsgs(F.sum(Pregel.msg())) | ||
| .run() | ||
| ) |
There was a problem hiding this comment.
| ranks = ( | |
| graph.pregel.setMaxIter(5) | |
| .withVertexColumn( | |
| "rank", | |
| F.lit(1.0 / numVertices), | |
| F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | |
| + F.lit(alpha / numVertices), | |
| ) | |
| .sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree")) | |
| .aggMsgs(F.sum(Pregel.msg())) | |
| .run() | |
| ) | |
| pregel = graph.pregel | |
| ranks = ( | |
| pregel.setMaxIter(5) | |
| .withVertexColumn( | |
| "rank", | |
| F.lit(1.0 / numVertices), | |
| F.coalesce(pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | |
| + F.lit(alpha / numVertices), | |
| ) | |
| .sendMsgToDst(pregel.src("rank") / pregel.src("outDegree")) | |
| .aggMsgs(F.sum(pregel.msg())) | |
| .run() | |
| ) |
Because with support of SparkConnect there will be two possible types (Pregel classic and Pregel connect)
|
@SemyonSinchenko I'm starting my review of your PR now, it will take a few days... but do you mind if we get this one in first? |
|
@SemyonSinchenko trying to finish this as quickly as possible, almost got it done... waiting to see if tests pass :) |
|
Whoah, it builds! @SemyonSinchenko will definitely get your comments addressed this morning so we can merge and I can review your updated PR for Connect! |
SemyonSinchenko
left a comment
There was a problem hiding this comment.
LGTM overall! I think we should merge it because in #506 I need to change testing fixtures again and there is no clear way to avoid it. So, let's merge it?
|
@SemyonSinchenko thanks for the flexibility - merging! |
What changes were proposed in this pull request?
I converted the
unittest/nosetests topytest.Why are the changes needed?
This is important because
nosedoes not support Python 3.10, which means we can't either and still run unit tests.