Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[libclang/python] Enable packaging libclang bindings #125806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nightlark
Copy link

@nightlark nightlark commented Feb 5, 2025

Add files for packaging libclang Python bindings as a sdist tarball and pure Python wheel. setuptools_scm is used to derive version numbers from git tags for a future workflow that automates publishing updated versions for new LLVM releases. The .git_archival.txt file is populated with version information needed to get accurate version information if the bindings are ever installed from an LLVM/clang source code archive. The .gitignore file is populated with files that may get created as part of building/testing the sdist and wheel that should not be committed to source control.

This would be the first step for addressing #125220. Subsequent steps include a workflow for automating new releases, and sorting out the package name on PyPI.

Copy link

github-actions bot commented Feb 5, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:as-a-library libclang and C++ API labels Feb 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2025

@llvm/pr-subscribers-clang

Author: Ryan Mast (nightlark)

Changes

Add files for packaging libclang Python bindings as a sdist tarball and pure Python wheel. setuptools_scm is used to derive version numbers from git tags for a future workflow that automates publishing updated versions for new LLVM releases. The .git_archival.txt file is populated with version information needed to get accurate version information if the bindings are ever installed from an LLVM/clang source code archive. The .gitignore file is populated with files that may get created as part of building/testing the sdist and wheel that should not be committed to source control.

This would be the first step for addressing #125220. Subsequent steps include a workflow for automating new releases, and sorting out the package name on PyPI.


Full diff: https://github.com/llvm/llvm-project/pull/125806.diff

4 Files Affected:

  • (added) .git_archival.txt (+3)
  • (modified) .gitattributes (+2)
  • (added) clang/bindings/python/.gitignore (+21)
  • (added) clang/bindings/python/pyproject.toml (+34)
diff --git a/.git_archival.txt b/.git_archival.txt
new file mode 100644
index 000000000000000..7c5100942aae489
--- /dev/null
+++ b/.git_archival.txt
@@ -0,0 +1,3 @@
+node: $Format:%H$
+node-date: $Format:%cI$
+describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
diff --git a/.gitattributes b/.gitattributes
index 6b281f33f737db9..b94d65d60a8840a 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,3 +1,5 @@
+.git_archival.txt  export-subst
+
 libcxx/src/**/*.cpp     merge=libcxx-reformat
 libcxx/include/**/*.h   merge=libcxx-reformat
 
diff --git a/clang/bindings/python/.gitignore b/clang/bindings/python/.gitignore
new file mode 100644
index 000000000000000..da1a0b4b0aa60d2
--- /dev/null
+++ b/clang/bindings/python/.gitignore
@@ -0,0 +1,21 @@
+# setuptools_scm auto-generated version file
+_version.py
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Distribution / packaging
+build/
+dist/
+*.egg-info/
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
\ No newline at end of file
diff --git a/clang/bindings/python/pyproject.toml b/clang/bindings/python/pyproject.toml
new file mode 100644
index 000000000000000..1097f76f7e00787
--- /dev/null
+++ b/clang/bindings/python/pyproject.toml
@@ -0,0 +1,34 @@
+[build-system]
+requires = ["setuptools>=42", "setuptools_scm"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "clang"
+description = "libclang python bindings"
+readme = {file = "README.txt", content-type = "text/plain"}
+
+license = { text = "Apache-2.0 with LLVM exception" }
+authors = [
+    { name = "LLVM" }
+]
+keywords = ["llvm", "clang", "libclang"]
+classifiers = [
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: Apache Software License",
+    "Development Status :: 5 - Production/Stable",
+    "Topic :: Software Development :: Compilers",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+]
+dynamic = ["version"]
+
+[project.urls]
+Homepage = "http://clang.llvm.org/"
+Download = "http://llvm.org/releases/download.html"
+
+[tool.setuptools_scm]
+root = "../../.."
+version_file = "clang/_version.py"
+fallback_version = "0.0.0.dev0"
+# Regex version capture group gets x.y.z with optional -rcN, -aN, -bN suffixes; -init is just consumed
+tag_regex = "^llvmorg-(?P<version>[vV]?\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$"
\ No newline at end of file

@nightlark
Copy link
Author

Hey, I'm new to the process of contributing to LLVM, so let me know of any best practices or things I might have missed in the contributed guide.

I'm looking at improving the situation for packaging libclang bindings on PyPI and ideally would like maintain the things needed here alongside the python scripts that are being packaged, rather than hacking something together in a separate repository.

Copy link

@LecrisUT LecrisUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than missing new-line at the end of file, it looks fine.

License classifiers will soon be (or already are) deprecated, so check up with setuptools and when they support PEP639.


[project.urls]
Homepage = "http://clang.llvm.org/"
Download = "http://llvm.org/releases/download.html"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https links please. Also consider adding relevant source, documentation, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an official documentation site with info for the clang python bindings?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about where the relevant documentation for that is, if any. Maybe wait for a developer to respond on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an official documentation site with info for the clang python bindings?

I don't think there is, but the most promising route seems to get https://libclang.readthedocs.io official and regularly updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @sighingnow is the one who maintains that site (doc build scripts at https://github.com/sighingnow/libclang/tree/master/docs). If he's okay with it (my RTD username is rmast), I can get that documentation site updated.

Otherwise, the docs are just built with sphinx so we could upload it as a subpage to one of the documentation sites under the llvm.org domain -- if the source code for those sites is under the llvm org I can look at opening a PR to generate the libclang Python binding docs and deploying them to a website.

@nightlark nightlark force-pushed the add-python-packaging branch 2 times, most recently from 4dd3356 to a4c09ed Compare February 6, 2025 08:04
@nightlark
Copy link
Author

@Endilll the Python bindings seem like part of LLVM that you have expertise in, if you also want to review this PR.

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this
I know too little of packaging in Python to approve, but let's try to get more eyes on this.
CC @DeinAlptraum @sighingnow

version_file = "clang/_version.py"
version_scheme = "no-guess-dev"
# Regex version capture group gets x.y.z with optional -rcN, -aN, -bN suffixes; -init is just consumed
tag_regex = "^llvmorg-(?P<version>\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are -aN and -bN suffixes? I don't think we've ever used them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the two other pre-release version specifiers defined by in PEP 440 (https://peps.python.org/pep-0440/#pre-releases), with a - separator between the segments (https://peps.python.org/pep-0440/#pre-release-separators). If llvm isn’t likely to ever use those for future pre-releases I can remove them from the regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of all changes that people consider for our release process, doing more pre-releases is not even on the table, let alone with this specific syntax. I think you should remove those suffixes to avoid confusion. Feel free to leave a comment if you think that people with Python background would be surprised by the lack of those additional suffixes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for explicit comments. The setup there is for maximum correctness, but if there is a standard that the project uses 👍 for simplifying the regex.

Suggested change
tag_regex = "^llvmorg-(?P<version>\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$"
tag_regex = "^llvmorg-(?P<version>(?:\\.\\d+)+(?:-rc\\d+)?)$"

Copy link
Author

@nightlark nightlark Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to only match an -rcN suffix as part of the captured version. I had to leave the first \\d+ to match the first number in the version component (the repeating portion requires a . to come before the number).

Since the version tags seem to typically only be on commits in release branches and the main branch only has llvmorg-N-init tags, I left the * so it can pick up the major version number from the llvmorg-N-init tag, and the (?:.*) at the end to ignore the trailing -ini. This makes so if someone installs from the main branch it will at least pick up a major version number rather than using the fallback_version.

The typical version number when installing from a random commit in the main branch will be something like 21.post.devN -- so developers installing from main will at least have an indication of the major version of the bindings they've installed.

Copy link
Contributor

@DeinAlptraum DeinAlptraum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, saw this but had no time to answer. Unfortunately I'm also not familiar with Python packaging so can't be of much help here

@DeinAlptraum
Copy link
Contributor

@Endilll @LecrisUT what else do you think needs to be done here to get this approved?

Copy link

@LecrisUT LecrisUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review is that it looks good to go for the scope that it serves. Follow up changes I would recommend is to align the CI and pyproject.toml across the other python projects, but it seems that that has been broken for a while, so it's a bigger issue to address.

@Endilll
Copy link
Contributor

Endilll commented Mar 1, 2025

@Endilll @LecrisUT what else do you think needs to be done here to get this approved?

I still would like to hear from @sighingnow. One of the reasons is that I think I've seen them attending one of the open meetings we have, so they might be around somewhere. Also I barely know anything about how to package this stuff correctly, and what the consequences would be if we get it wrong.

@nightlark
Copy link
Author

nightlark commented Mar 5, 2025

@Endilll - do you have way to contact them directly to tell them about this PR? I suspect they are not seeing any notifications for the mentions here, and will require asking "in-person" at a meeting. I also have a PR (opened >1 month ago) in the repository for the libclang package they maintain, and there's no indication they've seen it or my attempt to reach out to their listed email.

Are there any other people we could ask to review as an alternative, maybe @henryiii?

@nightlark
Copy link
Author

nightlark commented Mar 5, 2025

For reviewers, here's a summary of the potential issues that come to mind and a recap of what has been done so far to verify things are working as expected (to be clear, all of these potential issues have been addressed, and the sub-bullets are how to check that they are non-issues):

  • The packaging metadata (keywords, name, author/contact info, etc) don't match what we expect
    • check if the pyproject.toml has the right classifiers, keywords, package name, etc. (open question: what email to list as the author?)
  • The source distribution and wheel files built (using python -m build) have more files than necessary (these are the things that would be uploaded to PyPI and actually installed by users)
    • check if cloning the LLVM repo and then from the folder with pyproject.toml, building the sdist and wheel (created in a dist subfolder by default) works; unzip the *.whl file, and uncompress the *.tar.gz file to check if the files they have in them consist of the Python bindings and some packaging metadata files (e.g. no other LLVM source code should be present)
    • check if the sdist and wheel files can also be pip installed: pip install clang-19.2.3.tar.gz or pip install clang-19.2.3-py3-none-any.whl
  • The python bindings aren't placed in the right folder to import them when the package is pip installed
    • check if after pip installing the built wheel, Python can import the module using import clang.cindex
  • The version number detected when building a wheel for tagged releases isn't correct
    • check if by tagging a fake release, and then building the sdist+wheel as mentioned above, all the file names have the right version number in them (this is key one for the CI workflow that will release files that users get)
    • check if version numbers picked up on the main branch will just be the major version number (taken from the llvmorg-N-init tag) and likely have a development version component (e.g. clang-21.post1.dev678+g42108e2298cc.tar.gz); only developers working on LLVM will ever install the package in this way and see versions like this
  • Installing directly from a git URL or source archives for a release won't work
    • check if installing from a git URL works: pip install git+https://github.com/nightlark/llvm-project@add-python-packaging#subdirectory=clang/bindings/python
    • check if installing from source archive URL works: pip install https://github.com/nightlark/llvm-project/archive/refs/heads/add-python-packaging.zip#subdirectory=clang/bindings/python
      • end users of the clang Python bindings will be installing from the published PyPI package, so these methods of installation will be rarely used

In practice, the things affecting users will be issues related to the sdist and wheel files built for a specific tagged commit and the other installation methods working fit more into the category of nice-to-haves for LLVM devs.

@LecrisUT

This comment was marked as resolved.

@nightlark
Copy link
Author

nightlark commented Mar 5, 2025

@LecrisUT all of the situations listed work fine -- the top level bullet points are the "hypothetical issues" that someone trying to build the package could theoretically encounter, and the sub-bullet points are ways to check that the potential issue was addressed and is not actually an issue at all.

@LecrisUT

This comment was marked as resolved.

@LecrisUT
Copy link

LecrisUT commented Mar 5, 2025

Also I barely know anything about how to package this stuff correctly, and what the consequences would be if we get it wrong.

Also regarding @Endilll. Nothing can go wrong really. This is still not connected to the CD and you would need permission from clang pypi project for which @nightlark has an issue about trolldbois/python-clang#21. The current state would be helpful to integrate early on, to allow people to experiment and integrate their CI, while any remaining nitpicks would be resolved when working on repairing the python CD (see #123937 and https://github.com/llvm/llvm-project/actions/runs/13661982395).

@trolldbois
Copy link

Also I barely know anything about how to package this stuff correctly, and what the consequences would be if we get it wrong.

Also regarding @Endilll. Nothing can go wrong really. This is still not connected to the CD and you would need permission from clang pypi project for which @nightlark has an issue about trolldbois/python-clang#21. The current state would be helpful to integrate early on, to allow people to experiment and integrate their CI, while any remaining nitpicks would be resolved when working on repairing the python CD (see #123937 and https://github.com/llvm/llvm-project/actions/runs/13661982395).

Happy to integrate with the pypi releases, trusted publishing whenever you are ready.

@Endilll
Copy link
Contributor

Endilll commented Mar 5, 2025

@nightlark Thank you for the write-up about things that can go wrong.

open question: what email to list as the author?

This is a good question I don't know answer to. Debian packages from our APT repository seems to specify LLVM Packaging Team <[email protected]> as the maintainer and their email.

Sorry for not articulating my concerns clearly enough. I'm not too concerned with bugs, as bugs will be eventually found and fixed, but I'm concerned with bugs that we'll have hard time fixing due to backwards compatibility concerns. One such thing I can imagine is users depending on a particular version string format, then we realize it doesn't match what the rest of LLVM does, and have hard time fixing that.

do you have way to contact them directly to tell them about this PR?

Apparently I mixed them up with another person who contributes to llvm-libc, so the answer is likely "no".

Are there any other people we could ask to review as an alternative, maybe @henryiii?

Sure. They are not going to have an experience with LLVM in particular, which is sad, but they will help us where we lack experience the most, i.e. Python packaging stuff.


@nightlark I guess I should make it clear that it's mostly up to me as a maintainer to find reviewers that I think this patch needs, so you shouldn't feel that you're the one blocking progress here. But I appreciate the help explaining what can go wrong, and even suggesting reviewers. Also, the fact that I'm on leave since the beginning of this week doesn't help making progress.

description = "libclang python bindings"
readme = {file = "README.txt", content-type = "text/plain"}

license = {text = "Apache-2.0 WITH LLVM-exception"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are trying other do a SPDX classifier expression here; this is supported by most backends now except setuptools (and flit-core doesn't support complex expressions yet). Personally, I'd not use setuptools here; hatchling is faster and does support SPDX expressions. Setuptools is working towards support, but it's several METADATA versions behind, and it had a pre-finalization version of support that now needs to be changed to what was accepted with PEP 639.

If you don't use a modern backend and set license "Apache-2.0 WITH LLVM-exception", then you need to use a License :: ... classifier. This license.text field isn't meaningful, it's really just a place to describe how the license differs from what's in the classifiers. That's one reason this was changed with PEP 639. :)

@@ -0,0 +1,36 @@
[build-system]
requires = ["setuptools>=42", "setuptools_scm==8.1.0"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you've pinned to setuptools_scm exactly, but not setuptools? I would generally avoid this, unless there's a really good reason to pin it.

@DeinAlptraum
Copy link
Contributor

@nightlark (ping) are you still looking into this?

@nightlark
Copy link
Author

Yes, still looking into this -- got a bit busier through around mid-May though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants