-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[libclang/python] Enable packaging libclang bindings #125806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: Ryan Mast (nightlark) ChangesAdd files for packaging libclang Python bindings as a sdist tarball and pure Python wheel. setuptools_scm is used to derive version numbers from git tags for a future workflow that automates publishing updated versions for new LLVM releases. The .git_archival.txt file is populated with version information needed to get accurate version information if the bindings are ever installed from an LLVM/clang source code archive. The .gitignore file is populated with files that may get created as part of building/testing the sdist and wheel that should not be committed to source control. This would be the first step for addressing #125220. Subsequent steps include a workflow for automating new releases, and sorting out the package name on PyPI. Full diff: https://github.com/llvm/llvm-project/pull/125806.diff 4 Files Affected:
diff --git a/.git_archival.txt b/.git_archival.txt
new file mode 100644
index 000000000000000..7c5100942aae489
--- /dev/null
+++ b/.git_archival.txt
@@ -0,0 +1,3 @@
+node: $Format:%H$
+node-date: $Format:%cI$
+describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
diff --git a/.gitattributes b/.gitattributes
index 6b281f33f737db9..b94d65d60a8840a 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,3 +1,5 @@
+.git_archival.txt export-subst
+
libcxx/src/**/*.cpp merge=libcxx-reformat
libcxx/include/**/*.h merge=libcxx-reformat
diff --git a/clang/bindings/python/.gitignore b/clang/bindings/python/.gitignore
new file mode 100644
index 000000000000000..da1a0b4b0aa60d2
--- /dev/null
+++ b/clang/bindings/python/.gitignore
@@ -0,0 +1,21 @@
+# setuptools_scm auto-generated version file
+_version.py
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Distribution / packaging
+build/
+dist/
+*.egg-info/
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
\ No newline at end of file
diff --git a/clang/bindings/python/pyproject.toml b/clang/bindings/python/pyproject.toml
new file mode 100644
index 000000000000000..1097f76f7e00787
--- /dev/null
+++ b/clang/bindings/python/pyproject.toml
@@ -0,0 +1,34 @@
+[build-system]
+requires = ["setuptools>=42", "setuptools_scm"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "clang"
+description = "libclang python bindings"
+readme = {file = "README.txt", content-type = "text/plain"}
+
+license = { text = "Apache-2.0 with LLVM exception" }
+authors = [
+ { name = "LLVM" }
+]
+keywords = ["llvm", "clang", "libclang"]
+classifiers = [
+ "Intended Audience :: Developers",
+ "License :: OSI Approved :: Apache Software License",
+ "Development Status :: 5 - Production/Stable",
+ "Topic :: Software Development :: Compilers",
+ "Operating System :: OS Independent",
+ "Programming Language :: Python :: 3",
+]
+dynamic = ["version"]
+
+[project.urls]
+Homepage = "http://clang.llvm.org/"
+Download = "http://llvm.org/releases/download.html"
+
+[tool.setuptools_scm]
+root = "../../.."
+version_file = "clang/_version.py"
+fallback_version = "0.0.0.dev0"
+# Regex version capture group gets x.y.z with optional -rcN, -aN, -bN suffixes; -init is just consumed
+tag_regex = "^llvmorg-(?P<version>[vV]?\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$"
\ No newline at end of file
|
Hey, I'm new to the process of contributing to LLVM, so let me know of any best practices or things I might have missed in the contributed guide. I'm looking at improving the situation for packaging libclang bindings on PyPI and ideally would like maintain the things needed here alongside the python scripts that are being packaged, rather than hacking something together in a separate repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than missing new-line at the end of file, it looks fine.
License classifiers will soon be (or already are) deprecated, so check up with setuptools and when they support PEP639.
clang/bindings/python/pyproject.toml
Outdated
|
||
[project.urls] | ||
Homepage = "http://clang.llvm.org/" | ||
Download = "http://llvm.org/releases/download.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https links please. Also consider adding relevant source, documentation, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an official documentation site with info for the clang python bindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about where the relevant documentation for that is, if any. Maybe wait for a developer to respond on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an official documentation site with info for the clang python bindings?
I don't think there is, but the most promising route seems to get https://libclang.readthedocs.io official and regularly updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like @sighingnow is the one who maintains that site (doc build scripts at https://github.com/sighingnow/libclang/tree/master/docs). If he's okay with it (my RTD username is rmast), I can get that documentation site updated.
Otherwise, the docs are just built with sphinx so we could upload it as a subpage to one of the documentation sites under the llvm.org domain -- if the source code for those sites is under the llvm org I can look at opening a PR to generate the libclang Python binding docs and deploying them to a website.
4dd3356
to
a4c09ed
Compare
@Endilll the Python bindings seem like part of LLVM that you have expertise in, if you also want to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this
I know too little of packaging in Python to approve, but let's try to get more eyes on this.
CC @DeinAlptraum @sighingnow
clang/bindings/python/pyproject.toml
Outdated
version_file = "clang/_version.py" | ||
version_scheme = "no-guess-dev" | ||
# Regex version capture group gets x.y.z with optional -rcN, -aN, -bN suffixes; -init is just consumed | ||
tag_regex = "^llvmorg-(?P<version>\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are -aN
and -bN
suffixes? I don't think we've ever used them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the two other pre-release version specifiers defined by in PEP 440 (https://peps.python.org/pep-0440/#pre-releases), with a -
separator between the segments (https://peps.python.org/pep-0440/#pre-release-separators). If llvm isn’t likely to ever use those for future pre-releases I can remove them from the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of all changes that people consider for our release process, doing more pre-releases is not even on the table, let alone with this specific syntax. I think you should remove those suffixes to avoid confusion. Feel free to leave a comment if you think that people with Python background would be surprised by the lack of those additional suffixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for explicit comments. The setup there is for maximum correctness, but if there is a standard that the project uses 👍 for simplifying the regex.
tag_regex = "^llvmorg-(?P<version>\\d+(?:\\.\\d+)*(?:-(?:rc|a|b)\\d+)?)(?:.*)$" | |
tag_regex = "^llvmorg-(?P<version>(?:\\.\\d+)+(?:-rc\\d+)?)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to only match an -rcN
suffix as part of the captured version. I had to leave the first \\d+
to match the first number in the version component (the repeating portion requires a .
to come before the number).
Since the version tags seem to typically only be on commits in release branches and the main branch only has llvmorg-N-init
tags, I left the *
so it can pick up the major version number from the llvmorg-N-init
tag, and the (?:.*)
at the end to ignore the trailing -ini
. This makes so if someone installs from the main branch it will at least pick up a major version number rather than using the fallback_version.
The typical version number when installing from a random commit in the main branch will be something like 21.post.devN
-- so developers installing from main will at least have an indication of the major version of the bindings they've installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, saw this but had no time to answer. Unfortunately I'm also not familiar with Python packaging so can't be of much help here
a4c09ed
to
42108e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review is that it looks good to go for the scope that it serves. Follow up changes I would recommend is to align the CI and pyproject.toml
across the other python projects, but it seems that that has been broken for a while, so it's a bigger issue to address.
I still would like to hear from @sighingnow. One of the reasons is that I think I've seen them attending one of the open meetings we have, so they might be around somewhere. Also I barely know anything about how to package this stuff correctly, and what the consequences would be if we get it wrong. |
@Endilll - do you have way to contact them directly to tell them about this PR? I suspect they are not seeing any notifications for the mentions here, and will require asking "in-person" at a meeting. I also have a PR (opened >1 month ago) in the repository for the libclang package they maintain, and there's no indication they've seen it or my attempt to reach out to their listed email. Are there any other people we could ask to review as an alternative, maybe @henryiii? |
For reviewers, here's a summary of the potential issues that come to mind and a recap of what has been done so far to verify things are working as expected (to be clear, all of these potential issues have been addressed, and the sub-bullets are how to check that they are non-issues):
In practice, the things affecting users will be issues related to the sdist and wheel files built for a specific tagged commit and the other installation methods working fit more into the category of nice-to-haves for LLVM devs. |
This comment was marked as resolved.
This comment was marked as resolved.
@LecrisUT all of the situations listed work fine -- the top level bullet points are the "hypothetical issues" that someone trying to build the package could theoretically encounter, and the sub-bullet points are ways to check that the potential issue was addressed and is not actually an issue at all. |
This comment was marked as resolved.
This comment was marked as resolved.
Also regarding @Endilll. Nothing can go wrong really. This is still not connected to the CD and you would need permission from |
Happy to integrate with the pypi releases, trusted publishing whenever you are ready. |
@nightlark Thank you for the write-up about things that can go wrong.
This is a good question I don't know answer to. Debian packages from our APT repository seems to specify Sorry for not articulating my concerns clearly enough. I'm not too concerned with bugs, as bugs will be eventually found and fixed, but I'm concerned with bugs that we'll have hard time fixing due to backwards compatibility concerns. One such thing I can imagine is users depending on a particular version string format, then we realize it doesn't match what the rest of LLVM does, and have hard time fixing that.
Apparently I mixed them up with another person who contributes to llvm-libc, so the answer is likely "no".
Sure. They are not going to have an experience with LLVM in particular, which is sad, but they will help us where we lack experience the most, i.e. Python packaging stuff. @nightlark I guess I should make it clear that it's mostly up to me as a maintainer to find reviewers that I think this patch needs, so you shouldn't feel that you're the one blocking progress here. But I appreciate the help explaining what can go wrong, and even suggesting reviewers. Also, the fact that I'm on leave since the beginning of this week doesn't help making progress. |
description = "libclang python bindings" | ||
readme = {file = "README.txt", content-type = "text/plain"} | ||
|
||
license = {text = "Apache-2.0 WITH LLVM-exception"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you are trying other do a SPDX classifier expression here; this is supported by most backends now except setuptools (and flit-core doesn't support complex expressions yet). Personally, I'd not use setuptools here; hatchling is faster and does support SPDX expressions. Setuptools is working towards support, but it's several METADATA versions behind, and it had a pre-finalization version of support that now needs to be changed to what was accepted with PEP 639.
If you don't use a modern backend and set license "Apache-2.0 WITH LLVM-exception"
, then you need to use a License :: ...
classifier. This license.text
field isn't meaningful, it's really just a place to describe how the license differs from what's in the classifiers. That's one reason this was changed with PEP 639. :)
@@ -0,0 +1,36 @@ | |||
[build-system] | |||
requires = ["setuptools>=42", "setuptools_scm==8.1.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you've pinned to setuptools_scm exactly, but not setuptools? I would generally avoid this, unless there's a really good reason to pin it.
@nightlark (ping) are you still looking into this? |
Yes, still looking into this -- got a bit busier through around mid-May though. |
Add files for packaging libclang Python bindings as a sdist tarball and pure Python wheel. setuptools_scm is used to derive version numbers from git tags for a future workflow that automates publishing updated versions for new LLVM releases. The .git_archival.txt file is populated with version information needed to get accurate version information if the bindings are ever installed from an LLVM/clang source code archive. The .gitignore file is populated with files that may get created as part of building/testing the sdist and wheel that should not be committed to source control.
This would be the first step for addressing #125220. Subsequent steps include a workflow for automating new releases, and sorting out the package name on PyPI.