Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kba
Copy link
Member

@kba kba commented Jan 17, 2024

When switching to pyproject.toml in #1065 we also started using setuptools_scm which sets the version of the ocrd/ocrd_* packages to the latest git tag or a derivative of it for intermediate versions.

That is a very neat solution and really helpful for debugging but it has two drawbacks:

  • it requires .git to be present as either a directory or a reference to the module directory of the containing repo if a submodule. For non-git checkouts (e.g. when downloading the release tarball), make install does not work
  • setuptools_scm requires setuptools>=61 which is too new for most recent Ubuntu/Debian releases (Ubuntu 20.04: 44, Ubuntu 22.04: 59, Debian 11: 52, Debian 10: 40). While we obviously strongly recommend installing to a venv, it should be possible in a 22.04 container (such as a GH Action) to clone core and run make install without a venv.

So I think the neatness of setuptools_scm is not worth the problems it brings.

Instead, in this PR, we add a file /VERSION with symlinks to that file in the package directories and configure setuptools to look for the version in that file.

This means that for every release, we need to manually update the version in /VERSION and git tag accordingly but that is basically what we've been doing for years, so it really is not much of a change.

However, I'm not sure whether my solution is ideal, so if you have any other ideas on how to have a single source of version information for multiple packages, without resorting to a full-fledged monorepo solution, I'd be happy to hear.

@kba kba requested review from MehmedGIT and bertsky January 17, 2024 14:21
@bertsky
Copy link
Collaborator

bertsky commented Jan 17, 2024

Sorry to hear that. Since I did not see the issues firsthand, please convince me some more:

  • it requires .git to be present as either a directory or a reference to the module directory of the containing repo if a submodule. For non-git checkouts (e.g. when downloading the release tarball), make install does not work

You mean for the Github release assets? But couldn't we just change the GH workflow to provide true sdists (and/or wheels) for each package?

Also, we could simply configure a fallback_version.

Moreover, for the GH release asset zip, we could try to bake in the envvar SETUPTOOLS_SCM_PRETEND_VERSION with the respective value.

  • setuptools_scm requires setuptools>=61 which is too new for most recent Ubuntu/Debian releases (Ubuntu 20.04: 44, Ubuntu 22.04: 59, Debian 11: 52, Debian 10: 40). While we obviously strongly recommend installing to a venv, it should be possible in a 22.04 container (such as a GH Action) to clone core and run make install without a venv.

AFAIK the build requirements are never in the venv, and they are usually fetched from the remote package index during build. Could you show me the concrete error message you got because of that?

@bertsky
Copy link
Collaborator

bertsky commented Jan 17, 2024

Oh, and if we do switch back to file-based, then I believe the VERSION file (or symlink) needs to be in the manifests.

@kba
Copy link
Member Author

kba commented Jan 17, 2024

  • setuptools_scm requires setuptools>=61 which is too new for most recent Ubuntu/Debian releases (Ubuntu 20.04: 44, Ubuntu 22.04: 59, Debian 11: 52, Debian 10: 40). While we obviously strongly recommend installing to a venv, it should be possible in a 22.04 container (such as a GH Action) to clone core and run make install without a venv.

AFAIK the build requirements are never in the venv, and they are usually fetched from the remote package index during build. Could you show me the concrete error message you got because of that?

https://github.com/tboenig/gt_structure_text/actions/runs/7450175483/job/20268478771 here the system-wide old setuptools version is used. This has since been solved with a venv.

@bertsky
Copy link
Collaborator

bertsky commented Jan 17, 2024

https://github.com/tboenig/gt_structure_text/actions/runs/7450175483/job/20268478771

already gone.

here the system-wide old setuptools version is used. This has since been solved with a venv.

ok, so system pip/setuptools did not even bother to fetch the update?

perhaps we need the get_requires_for_build_wheel hook? It says:

It is also possible for a build backend to provide dynamically calculated build dependencies, using PEP 517’s get_requires_for_build_wheel hook. This hook will be called by pip, and dependencies it describes will also be installed in the build environment. For example, newer versions of setuptools expose the contents of setup_requires to pip via this hook.

@kba
Copy link
Member Author

kba commented Jan 18, 2024

https://github.com/tboenig/gt_structure_text/actions/runs/7450175483/job/20268478771

already gone.

I think the repo is private - @tboenig can you make public or invite @bertsky?

pip install ./ocrd_utils && pip install ./ocrd_models && pip install ./ocrd_modelfactory && pip install ./ocrd_validators && pip install ./ocrd_network && pip install ./ocrd && echo done
Defaulting to user installation because normal site-packages is not writeable
Processing ./ocrd_utils
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml): started
  Building wheel for UNKNOWN (pyproject.toml): finished with status 'done'
  Created wheel for UNKNOWN: filename=UNKNOWN-0.1.dev1+gb247cf4-py3-none-any.whl size=1084 sha256=853b545cd339ace52a4c0960d90c17e393c99601369e230122af5c64346bb240
  Stored in directory: /tmp/pip-ephem-wheel-cache-j7v5r6aa/wheels/c4/d3/6c/7b4472ad1c3816f1bdf503f344c175ee3b983997e2fc96f2cd
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.1.dev1+gb247cf4
Defaulting to user installation because normal site-packages is not writeable
Processing ./ocrd_models
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      Traceback (most recent call last):
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 143, in _get_build_requires
          self.run_setup()
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 3, in <module>
          from setuptools.command.build import build as orig_build
      ModuleNotFoundError: No module named 'setuptools.command.build'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
make: *** [Makefile:128: install] Error 1
Error: Process completed with exit code 2.

here the system-wide old setuptools version is used. This has since been solved with a venv.

ok, so system pip/setuptools did not even bother to fetch the update?

I think so. But to be honest, I dont' trust that we can find a setup that reliably installs the right build environment for all combinations of python/pip/setuptools out there.

But the approach here with the symlink also does not work.

  • On CircleCI, installation works fine for all python versions but the package data for ocrd_models seems missing, which would make sense because they are not listed in the MANIFEST.in but why does this break now?
  • For me, building locally for 3.7, the symlinks don't seem to be resolved, fails with invalid requirement ocrd_utils == <empty-string>
  • Building 3.8+ works for me
  • For @MehmedGIT, it's the opposite, in 3.7 installs fine, in 3.8 the same requirement/symlink problem.

And all of this trouble just for consistent versions across the packages :(

Some (more or less radical) ideas:

  • Add the version as __version__ to ocrd_utils/__init__.py and reference that does not work, the {attr = 'ocrd_utils.__version'} construct only works for local packages apparently
  • Have a dedicated release script that copies VERSION to the sub-packages (to avoid the symlink) problem and handles the tagging
  • Get rid of the package separation altogether, only publish one ocrd dist with all the code

@bertsky
Copy link
Collaborator

bertsky commented Jan 19, 2024

But the approach here with the symlink also does not work.

  • On CircleCI, installation works fine for all python versions but the package data for ocrd_models seems missing, which would make sense because they are not listed in the MANIFEST.in but why does this break now?

I did mention this above.

And your local 3.7 / 3.8 failure, did that have the VERSION in the manifest already?

BTW, the setuptools_scm solution also has another pitfall: it requires you to also git-fetch the tags, which I often forget.

Some (more or less radical) ideas:

  • Have a dedicated release script that copies VERSION to the sub-packages (to avoid the symlink) problem and handles the tagging

Like ocrd_all's release.sh release-github?

Ok, but then we might as well have a git-controlled VERSION file in each subpackage, and update that in the release script.

  • Get rid of the package separation altogether, only publish one ocrd dist with all the code

Yes, it (surprisingly) is a non-standard configuration to have multiple version-synced packages in one repo. But changing the package names would be a major break – isn't it a bit late to do that now?

@kba
Copy link
Member Author

kba commented Jan 19, 2024

And your local 3.7 / 3.8 failure, did that have the VERSION in the manifest already?

I tried both with and without VERSION in MANIFEST.in but 3.7 fails in either case

BTW, the setuptools_scm solution also has another pitfall: it requires you to also git-fetch the tags, which I often forget.

Some (more or less radical) ideas:

  • Have a dedicated release script that copies VERSION to the sub-packages (to avoid the symlink) problem and handles the tagging

Like ocrd_all's release.sh release-github?

Ok, but then we might as well have a git-controlled VERSION file in each subpackage, and update that in the release script.

Exactly, that's what that script would do:

new_version=$1
for pkg in ocrd ocrd_utils...;do
  echo $new_version > $pkg/VERSION
done
git add **/VERSION
git commit -m ":package: $new_version"
git tag v$new_version

something along those lines

  • Get rid of the package separation altogether, only publish one ocrd dist with all the code

Yes, it (surprisingly) is a non-standard configuration to have multiple version-synced packages in one repo. But changing the package names would be a major break – isn't it a bit late to do that now?

We would not change the package names, just combine them all into a single dist ocrd. We'd get rid of the inter-dist dependencies (ocrd_utils == ... in the dependencies), move the packages to a src dir and have all the code in one place.

It is still a breaking change of course but considering we need to streamline OCR-D as much as possible to make life easier for future maintainers, might be worth it. But users would not need to change any of the code, just require ocrd instead of ocrd_* in their build setup. And I suspect most people will have been installing only the ocrd package anyway.

I made a quick draft of how this could work in #1166.

@bertsky
Copy link
Collaborator

bertsky commented Jan 19, 2024

just require ocrd instead of ocrd_* in their build setup. And I suspect most people will have been installing only the ocrd package anyway.

I just took a cursory look: so far, the following packages require other subpackages (besides or in lieu of ocrd):

  • mets-mods2tei: ocrd_utils
  • textract2page: ocrd_utils, ocrd_models

That's not just my local installation, but also what PyPI seems to see as reverse dependencies.

Obviously, I can quickly adapt them.

@kba
Copy link
Member Author

kba commented Jan 19, 2024

That's not just my local installation, but also what PyPI seems to see as reverse dependencies.

Obviously, I can quickly adapt them.

OK, thanks for checking. Then let's make the switch to a src-layout and a single dist, i.e. continue #1166 .

@kba kba closed this Jan 19, 2024
@bertsky
Copy link
Collaborator

bertsky commented Jan 19, 2024

Then let's make the switch to a src-layout and a single dist, i.e. continue #1166 .

Agreed.

@kba kba deleted the setuptools-explicit-version branch March 13, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants