Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
f4e9cdb
Converted tests to pytest. Build a Python package. Update requirement…
rjurney Feb 16, 2025
c256244
Restore Python .gitignore
rjurney Feb 16, 2025
6c3df0b
Extra newline removed
rjurney Feb 16, 2025
b2838d2
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Feb 16, 2025
caf5091
Added VERSION file set to 0.8.5
rjurney Feb 16, 2025
7cfa2d1
isort; fiex edgesDF variable name.
rjurney Feb 16, 2025
2ca9a15
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Feb 16, 2025
a8bf0be
Back out Dockerfile changes
rjurney Feb 16, 2025
54a942d
Back out version change in build.sbt
rjurney Feb 16, 2025
8b0e346
Backout changes to config and run-tests
rjurney Feb 16, 2025
46c2b93
Back out pytest conversion
rjurney Feb 16, 2025
18b5da0
Back out version changes to make nose tests pass
rjurney Feb 16, 2025
8eca097
Remove changes to requirements
rjurney Feb 16, 2025
277c06f
Put nose back in requirements.txt
rjurney Feb 16, 2025
b55ee48
Remove version bump to version.sbt
rjurney Feb 16, 2025
f8a8fd9
Remove packages related to testing
rjurney Feb 16, 2025
bc2cb36
Remove old setup.py / setup.cfg
rjurney Feb 16, 2025
728be33
New pyproject.toml and poetry.lock
rjurney Feb 16, 2025
3cea1a8
Short README for Python package, poetry won't allow a ../README.md path
rjurney Feb 16, 2025
87cc975
Remove requirements files in favor of pyproject.toml
rjurney Feb 16, 2025
6f84a5a
Try to poetrize CI build
rjurney Feb 16, 2025
9a8eef0
pyspark min 3.4
rjurney Feb 16, 2025
75ecd99
Local python README in pyproject.toml
rjurney Feb 16, 2025
80231d0
Trying to remove he working folder to debug scala issue
rjurney Feb 16, 2025
2a9170b
Set Python working directory again
rjurney Feb 16, 2025
3de2263
Accidental newline
rjurney Feb 16, 2025
4662717
Install Python for test...
rjurney Feb 17, 2025
1b7b9f8
Run tests from python/ folder
rjurney Feb 17, 2025
58da493
Try running tests from python/
rjurney Feb 17, 2025
9f4aa24
poetry run the unit tests
rjurney Feb 17, 2025
11b2782
poetry run the tests
rjurney Feb 17, 2025
9772344
Try just using 'python' instead of a path
rjurney Feb 17, 2025
d55dbfe
poetry run the last line, graphframes.main
rjurney Feb 17, 2025
2fc4d08
Remove test/ folder from style paths, it doesn't exist
rjurney Feb 17, 2025
8297a13
Remove .vscode
rjurney Feb 17, 2025
2035d98
VERSION back to 0.8.4
rjurney Feb 17, 2025
f9f4bd7
Remove tutorials reference
rjurney Feb 17, 2025
9ddd6b2
VERSION is a Python thing, it belongs in python/
rjurney Feb 17, 2025
7065647
Include the README.md and LICENSE in the Python package
rjurney Feb 17, 2025
a6c7e91
Some classifiers for pyproject.toml
rjurney Feb 17, 2025
51e3e6d
Trying poetry install action instead of manual install
rjurney Feb 17, 2025
272be06
Removing SPARK_HOME
rjurney Feb 17, 2025
4587999
Returned SPARK_HOME settings
rjurney Feb 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,20 @@ jobs:
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install python dependencies
- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
version: 2.1.1
virtualenvs-create: true
virtualenvs-in-project: false
installer-parallel: true
- name: Build Python package and its dependencies
working-directory: ./python
run: |
python -m pip install --upgrade pip wheel
pip install -r ./python/requirements.txt
pip install pyspark==${{ matrix.spark-version }}
poetry build
poetry install
- name: Test
working-directory: ./python
run: |
export SPARK_HOME=$(python -c "import os; from importlib.util import find_spec; print(os.path.join(os.path.dirname(find_spec('pyspark').origin)))")
./python/run-tests.sh
export SPARK_HOME=$(poetry run python -c "import os; from importlib.util import find_spec; spec = find_spec('pyspark'); print(os.path.join(os.path.dirname(spec.origin)))")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it? Tests will work even without SPARK_HOME

./run-tests.sh
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,12 @@ project/plugins/project/

# Mac
*.DS_Store

# Python specific
python/build
python/dist
build/lib
python/graphframes.egg-info
python/graphframes/tutorials/data
python/docs/_build
python/docs/_site
Empty file removed VERSION
Empty file.
5 changes: 5 additions & 0 deletions python/MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,8 @@
# https://github.com/pypa/sampleproject/blob/master/MANIFEST.in
# For more details about the MANIFEST file, you may read the docs at
# https://docs.python.org/2/distutils/sourcedist.html#the-manifest-in-template
recursive-include python/graphframes *.py
recursive-exclude * __pycache__
recursive-exclude * *.pyc
include README.md
include LICENSE
17 changes: 17 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# GraphFrames `graphframes-py` Python Package

The is the officila [graphframes-py PyPI package](https://pypi.org/project/graphframes-py/), which is a Python wrapper for the Scala GraphFrames library. This package is maintained by the GraphFrames project and is available on PyPI.

For instructions on GraphFrames, check the project [../README.md](../README.md). See [Installation and Quick-Start](#installation-and-quick-start) for the best way to install and use GraphFrames.

## Running `graphframes-py`

You should use GraphFrames via the `--packages` argument to `pyspark` or `spark-submit`, but this package is helpful in development environments.

```bash
# Interactive Python
$ pyspark --packages graphframes:graphframes:0.8.4-spark3.5-s_2.12

# Submit a script in Scala/Java/Python
$ spark-submit --packages graphframes:graphframes:0.8.4-spark3.5-s_2.12 script.py
```
1 change: 1 addition & 0 deletions python/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.8.4
360 changes: 360 additions & 0 deletions python/poetry.lock

Large diffs are not rendered by default.

48 changes: 48 additions & 0 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
[tool.poetry]
name = "graphframes-py"
version = "0.8.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use something like this: https://pypi.org/project/poetry-dynamic-versioning/?

description = "GraphFrames: Graph Processing Framework for Apache Spark"
authors = ["GraphFrames Contributors <[email protected]>"]
license = "Apache 2.0"
readme = "README.md"
packages = [{include = "graphframes"}]
classifiers = [
"Development Status :: 4 - Beta",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12"
]

[tool.poetry.urls]
"Project Homepage" = "https://graphframes.github.io/graphframes"
"PyPi Homepage" = "https://pypi.org/project/graphframes-py"
"Code Repository" = "https://github.com/graphframes/graphframes"
"Bug Tracker" = "https://github.com/graphframes/graphframes/issues"

[tool.poetry.dependencies]
python = ">=3.9 <3.13"
nose = "1.3.7"
pyspark = "^3.4"
numpy = ">= 1.7"

[tool.poetry.group.dev.dependencies]
black = "^25.1.0"
flake8 = "^7.1.1"
isort = "^6.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like pytest is missing.


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.black]
line-length = 100
target-version = ["py39"]
include = ["graphframes"]

[tool.isort]
profile = "black"
src_paths = ["graphframes"]
3 changes: 0 additions & 3 deletions python/requirements.txt

This file was deleted.

20 changes: 10 additions & 10 deletions python/run-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ echo $pyver

LIBS=""
for lib in "$SPARK_HOME/python/lib"/*zip ; do
LIBS=$LIBS:$lib
LIBS=$LIBS:$lib
done

# The current directory of the script.
Expand All @@ -51,7 +51,7 @@ assembly_path="$DIR/../target/scala-$scala_version_major_minor"
echo `ls $assembly_path/graphframes-assembly*.jar`
JAR_PATH=""
for assembly in $assembly_path/graphframes-assembly*.jar ; do
JAR_PATH=$assembly
JAR_PATH=$assembly
done

export PYSPARK_SUBMIT_ARGS="--driver-memory 2g --executor-memory 2g --jars $JAR_PATH pyspark-shell "
Expand All @@ -64,14 +64,14 @@ export PYTHONPATH=$PYTHONPATH:graphframes
# Run test suites

if [[ "$python_major" == "2" ]]; then

# Horrible hack for spark 1.x: we manually remove some log lines to stay below the 4MB log limit on Travis.
$PYSPARK_DRIVER_PYTHON `which nosetests` -v --all-modules -w $DIR 2>&1 | grep -vE "INFO (ParquetOutputFormat|SparkContext|ContextCleaner|ShuffleBlockFetcherIterator|MapOutputTrackerMaster|TaskSetManager|Executor|MemoryStore|CacheManager|BlockManager|DAGScheduler|PythonRDD|TaskSchedulerImpl|ZippedPartitionsRDD2)";

# Horrible hack for spark 1.x: we manually remove some log lines to stay below the 4MB log limit on Travis.
poetry run python `which nosetests` -v --all-modules -w $DIR 2>&1 | grep -vE "INFO (ParquetOutputFormat|SparkContext|ContextCleaner|ShuffleBlockFetcherIterator|MapOutputTrackerMaster|TaskSetManager|Executor|MemoryStore|CacheManager|BlockManager|DAGScheduler|PythonRDD|TaskSchedulerImpl|ZippedPartitionsRDD2)";
else

$PYSPARK_DRIVER_PYTHON -m "nose" -v --all-modules -w $DIR 2>&1 | grep -vE "INFO (ParquetOutputFormat|SparkContext|ContextCleaner|ShuffleBlockFetcherIterator|MapOutputTrackerMaster|TaskSetManager|Executor|MemoryStore|CacheManager|BlockManager|DAGScheduler|PythonRDD|TaskSchedulerImpl|ZippedPartitionsRDD2)";

poetry run python -m "nose" -v --all-modules -w $DIR 2>&1 | grep -vE "INFO (ParquetOutputFormat|SparkContext|ContextCleaner|ShuffleBlockFetcherIterator|MapOutputTrackerMaster|TaskSetManager|Executor|MemoryStore|CacheManager|BlockManager|DAGScheduler|PythonRDD|TaskSchedulerImpl|ZippedPartitionsRDD2)";
fi

# Exit immediately if the tests fail.
Expand All @@ -83,4 +83,4 @@ test ${PIPESTATUS[0]} -eq 0 || exit 1;

cd "$DIR"

$PYSPARK_PYTHON -u ./graphframes/graphframe.py "$@"
poetry run python -u ./graphframes/graphframe.py "$@"
2 changes: 0 additions & 2 deletions python/setup.cfg

This file was deleted.

2 changes: 0 additions & 2 deletions python/setup.py

This file was deleted.

Loading