Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 28f2d47

Browse files
docs: modernize py dependencies docs and example (#32345)
* feat: update Python multifile docs A more common approach to packaging Python package is leveraging pyproject.toml files and having a src directory (instead of a flat directory). This change intends to update the documentation and examples to match this way of packaging Python packages. * fix: fix juliaset package path * cleanup: move main file outside src * docs: address feedback #32345 Add build-system to pyproject.toml. Improve wording on documentation. Add extra step when using custom images. * fix: fix juliaset path * nit: remove extra space * lint: format setup.py * nit: reorder entries in pyproject.toml * update the description --------- Co-authored-by: tvalentyn <[email protected]>
1 parent a895469 commit 28f2d47

8 files changed

Lines changed: 91 additions & 42 deletions

File tree

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
[project]
19+
name = "juliaset"
20+
version = "0.0.1"
21+
description = "Julia set workflow package."
22+
23+
# Configure the required packages and scripts to install.
24+
# Note that the Python Dataflow containers come with numpy already installed
25+
# so this dependency will not trigger anything to be installed unless a version
26+
# restriction is specified.
27+
dependencies = [
28+
"numpy"
29+
]
30+
31+
[build-system]
32+
requires = ["setuptools"]
33+
build-backend = "setuptools.build_meta"

sdks/python/apache_beam/examples/complete/juliaset/setup.py

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,16 @@
1515
# limitations under the License.
1616
#
1717

18-
"""Setup.py module for the workflow's worker utilities.
18+
"""setup.py module for the pipeline package.
1919
20-
All the workflow related code is gathered in a package that will be built as a
21-
source distribution, staged in the staging area for the workflow being run and
22-
then installed in the workers when they start running.
20+
In this example, the pipeline code is gathered in a package that can be built
21+
as source distribution and installed on the workers. The package is defined
22+
in the pyproject.toml file. You can use setup.py file for defining
23+
configuration that needs to be determined programatically, for example,
24+
custom commands to run when a package is installed.
2325
24-
This behavior is triggered by specifying the --setup_file command line option
25-
when running the workflow for remote execution.
26+
You can install this package into the workers at runtime by using
27+
the --setup_file pipeline option.
2628
"""
2729

2830
# pytype: skip-file
@@ -107,19 +109,7 @@ def run(self):
107109
self.RunCustomCommand(command)
108110

109111

110-
# Configure the required packages and scripts to install.
111-
# Note that the Python Dataflow containers come with numpy already installed
112-
# so this dependency will not trigger anything to be installed unless a version
113-
# restriction is specified.
114-
REQUIRED_PACKAGES = [
115-
'numpy',
116-
]
117-
118112
setuptools.setup(
119-
name='juliaset',
120-
version='0.0.1',
121-
description='Julia set workflow package.',
122-
install_requires=REQUIRED_PACKAGES,
123113
packages=setuptools.find_packages(),
124114
cmdclass={
125115
# Command class instantiated and run during pip install scenarios.

sdks/python/apache_beam/examples/complete/juliaset/juliaset/__init__.py renamed to sdks/python/apache_beam/examples/complete/juliaset/src/__init__.py

File renamed without changes.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#

sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py renamed to sdks/python/apache_beam/examples/complete/juliaset/src/juliaset/juliaset.py

File renamed without changes.

sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test.py renamed to sdks/python/apache_beam/examples/complete/juliaset/src/juliaset/juliaset_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727

2828
import pytest
2929

30-
from apache_beam.examples.complete.juliaset.juliaset import juliaset
30+
from apache_beam.examples.complete.juliaset.src.juliaset import juliaset
3131
from apache_beam.testing.util import open_shards
3232

3333

sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset_test_it.py renamed to sdks/python/apache_beam/examples/complete/juliaset/src/juliaset/juliaset_test_it.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
import pytest
2828
from hamcrest.core.core.allof import all_of
2929

30-
from apache_beam.examples.complete.juliaset.juliaset import juliaset
30+
from apache_beam.examples.complete.juliaset.src.juliaset import juliaset
3131
from apache_beam.io.filesystems import FileSystems
3232
from apache_beam.runners.runner import PipelineState
3333
from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher

website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -95,43 +95,53 @@ If your pipeline uses packages that are not available publicly (e.g. packages th
9595

9696
Often, your pipeline code spans multiple files. To run your project remotely, you must group these files as a Python package and specify the package when you run your pipeline. When the remote workers start, they will install your package. To group your files as a Python package and make it available remotely, perform the following steps:
9797

98-
1. Create a [setup.py](https://pythonhosted.org/an_example_pypi_project/setuptools.html) file for your project. The following is a very basic `setup.py` file.
98+
1. Create a [pyproject.toml](https://packaging.python.org/en/latest/tutorials/packaging-projects/) file for your project. The following is a very basic `pyproject.toml` file.
9999

100-
import setuptools
100+
[build-system]
101+
requires = ["setuptools"]
102+
build-backend = "setuptools.build_meta"
103+
104+
[project]
105+
name = "PACKAGE-NAME"
106+
version = "PACKAGE-VERSION"
107+
dependencies = [
108+
# List Python packages your pipeline depends on.
109+
]
101110

102-
setuptools.setup(
103-
name='PACKAGE-NAME',
104-
version='PACKAGE-VERSION',
105-
install_requires=[
106-
# List Python packages your pipeline depends on.
107-
],
108-
packages=setuptools.find_packages(),
109-
)
111+
2. If your package requires if some programmatic configuration, or you need to use the `--setup_file` pipeline option, create a setup.py file for your project.
110112

111-
2. Structure your project so that the root directory contains the `setup.py` file, the main workflow file, and a directory with the rest of the files, for example:
113+
# Note that the package can be completely defined by pyproject.toml.
114+
# This file is optional.
115+
import setuptools
116+
setuptools.setup()
117+
118+
3. Structure your project so that the root directory contains the `pyproject.toml`, the `setup.py` file, and a `src/` directory with the rest of the files. For example:
112119

113120
root_dir/
121+
pyproject.toml
114122
setup.py
115-
main.py
116-
my_package/
117-
my_pipeline_launcher.py
118-
my_custom_dofns_and_transforms.py
119-
other_utils_and_helpers.py
123+
src/
124+
main.py
125+
my_package/
126+
my_pipeline_launcher.py
127+
my_custom_dofns_and_transforms.py
128+
other_utils_and_helpers.py
120129

121130
See [Juliaset](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset) for an example that follows this project structure.
122131

123-
3. Install your package in the submission environment, for example by using the following command:
132+
4. Install your package in the submission environment, for example by using the following command:
124133

125134
pip install -e .
126135

127-
4. Run your pipeline with the following command-line option:
136+
5. If you use a [custom container](#custom-containers), copy and install the package in the container as well.
137+
138+
6. Run your pipeline with the following command-line option:
128139

129140
--setup_file /path/to/setup.py
130141

131-
**Note:** It is not necessary to supply the `--requirements_file` [option](#pypi-dependencies) if the dependencies of your package are defined in the `install_requires` field of the `setup.py` file (see step 1).
132-
However unlike with the `--requirements_file` option, when you use the `--setup_file` option, Beam doesn't stage the dependent packages to the runner.
133-
Only the pipeline package is staged. If they aren't already provided in the runtime environment,
134-
the package dependencies are installed from PyPI at runtime.
142+
**Note:** It is not necessary to supply the `--requirements_file` [option](#pypi-dependencies) if the dependencies of your package are defined in the
143+
`dependencies` field of the `pyproject.toml` file (see step 1). However unlike with the `--requirements_file` option, when you use the `--setup_file` option, Beam doesn't stage the dependent packages to the runner.
144+
Only the pipeline package is staged. If they aren't already provided in the runtime environment, the package dependencies are installed from PyPI at runtime.
135145

136146

137147
## Non-Python Dependencies or PyPI Dependencies with Non-Python Dependencies {#nonpython}

0 commit comments

Comments
 (0)