conda-forge: CUDA 12 / new CUDA SDK bringup

# Topic

Opening this issue as a central point for information, discussion, etc. around the CUDA 12 migration and the new style CUDA SDK packages.

What follows is background about how CUDA builds are supported now, what is changing about them (and why), and what users can do to adopt the new best practices.

# Background

As a little bit of background, the conda-forge CUDA builds up to this point have relied on a few key components:

* [`nvidia/cuda`]( https://hub.docker.com/r/nvidia/cuda/ ) (here's [one template image as an example]( https://github.com/conda-forge/docker-images/blob/e8890be466b94d2220116d5f80e3d5982bc3b327/linux-anvil-cuda/Dockerfile ))
* The [CUDA redistributable libraries combined]( https://docs.nvidia.com/cuda/eula/index.html#attachment-a ) together in [the `cudatoolkit` package]( https://github.com/conda-forge/cudatoolkit-feedstock ) (for linkage into other packages built in conda-forge)
* The [`nvcc` wrapper script metapackage]( https://github.com/conda-forge/nvcc-feedstock) to glue the previous two pieces together during a build

There are of course plenty more pieces like how these components are used in [the global pinnings]( https://github.com/conda-forge/conda-forge-pinning-feedstock ), how CUDA migrators work, how other infra around CUDA works, how tooling configures CUDA builds (in `staged-recipes` and feedstocks).

However the big picture of Docker images, CUDA libraries, and a wrapper script to bind them is the gist. This has worked ok for a while, but there are different needs it didn't satisfy.

# Feedback

Based on user feedback on this model over the years, we have learned a few key things:

1. The `cudatoolkit` package is bulky ( for example https://github.com/conda-forge/arrow-cpp-feedstock/issues/962 )
2. There are other components users would like access to ( https://github.com/conda-forge/cudatoolkit-feedstock/issues/72 )
3. The current model has also created integration pains at multiple points

# Follow-up

Based on this feedback, we started exploring a new model for handling these packages ( https://github.com/conda-forge/cudatoolkit-feedstock/issues/62 ). After a bit of iterating with various stakeholders, we felt ready to start packaging all of these components ( https://github.com/conda-forge/staged-recipes/issues/21382 ). This admittedly has been a fair bit of work and we appreciate everyone's patience while we have been working through it (as well as everyone who shared feedback). Though it looks like we are nearing the end of package addition and moving on to next steps.

# Structure

## Compiler

One particular component that has taken a fair amount of effort to structure correctly has been the compiler package. However after some final restructuring ( https://github.com/conda-forge/cuda-nvcc-feedstock/issues/12 ) and some promising test results on a few CUDA-enabled feedstocks, this seems to be working in the right way. There are a few particular points of note about how the compiler package is structured (and how correspondingly other CTK packages are structured).

The compiler package is designed to be used in cross-compilation builds. As a result all headers and libraries live in a targets-based structure. This is best observed in [this code from `cuda-nvcc`]( https://github.com/conda-forge/cuda-nvcc-feedstock/blob/deab0342455db4462ef55f487a7b8dd865760e6a/recipe/activate.sh#L5-L7 ):

```bash
[[ "@cross_target_platform@" == "linux-64" ]] && targetsDir="targets/x86_64-linux"
[[ "@cross_target_platform@" == "linux-ppc64le" ]] && targetsDir="targets/ppc64le-linux"
[[ "@cross_target_platform@" == "linux-aarch64" ]] && targetsDir="targets/sbsa-linux"
```

As a result `$PREFIX/$targetsDir` contains the needed headers, libraries, etc. for building for that target platform. The right paths should already be picked up by `nvcc` and `gcc` (the host compiler). So please let us know if that is not happening for some reason.

## CUDA Toolkit Packages

As noted previously, `cudatoolkit` does not exist in this new model (starting with CUDA 12.0). Also no libraries are pulled in outside of `cudart` (in particular the static library). So other libraries like `libcublas`, `libcurand`, etc. need to explicitly be added to `requirements/host` for CUDA 12. In particular there are `*-dev` libraries (like `libcublas-dev`) that would need to be added to `host`. The `-dev` package contains both headers and shared libraries. Also the `-dev` packages add `run_exports` for their corresponding runtime library; so, that dependency will be satisfied automatically using the usually conda-build behavior.

Building off the compiler section above, the move to support cross-compilation means that packages supply headers, libraries, etc. under a different path than `$PREFIX`. Instead the paths look something like this (using `$targetsDir` from above):

* Headers: `$PREFIX/$targetsDir/include`
* Libraries: `$PREFIX/$targetsDir/lib`
* Stub libraries (if any): `$PREFIX/$targetsDir/lib/stubs`

Note that these paths shouldn't be needed anywhere since `nvcc` and `gcc` know about them (when `{{ compiler("cuda") }}` is added to `requirements/build`). Though if some issues come up related to this, please reach out.

## CUDA version alignment

Previously the `cudatoolkit` package had served a CUDA version alignment function ( https://github.com/conda-forge/conda-forge.github.io/issues/687 ). Namely installing `cudatoolkit=11.2` would ensure all packages supported CUDA 11.2. This could be useful when trying to satisfy some hardware constraints.

Another issue the `cudatoolkit` package solved, which may be less obvious at the outset, is it provides a consistent set of libraries that are all part of the same CUDA release (say 11.8 for example). This is trivially solved with one package. However splitting these out into multiple packages (and adding more packages as well), makes this problem more complicated. The `cuda-version` package also comes in here by aligning all CUDA packages to a particular CUDA version.

Finally as there is a fissure between the CUDA 11 & 12 worlds in terms of how these CUDA packages are handled, the `cuda-version` mends this gap. In particular [`cuda-version` constrains `cudatoolkit`]( https://github.com/conda-forge/cuda-version-feedstock/blob/3ce1bae701079226f982b173667edc5d0a64c01f/recipe/meta.yaml#L14-L18 ) and `cuda-version` is backported to all CUDA version conda-forge has shipped. As a result `cuda-version` can be used in both CUDA 11 & 12 use cases to smooth over some of these differences and standardize CUDA version handling.

More details about the `cuda-version` package can be found in [this README]( https://github.com/conda-forge/cuda-version-feedstock/blob/main/recipe/README.md ).

## Final Observations

It's worth noting this new structure differs from notably from the current one in conda-forge, where we have Docker images that place the full CUDA Toolkit install in `/usr/local/cuda` and runtime libraries added automatically. This is done to both address use cases where only `cudart` or `nvcc` are needed (with no additional dependencies) as well as use cases where extra features are needed. Trying to address both of these means greater ability to fine-tune use cases, but it also means a bit more work for maintainers. Hoping this provides more context around why these changes are occurring and what they mean for maintainers and users.

# Next Steps

Recently the CUDA 12 migrator was merged ( https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/4400 ). Already a handful of CUDA 12 migrator PRs have gone out.

For feedstocks that statically link `cudart` (default behavior) and don't use other libraries, there may be nothing additional to do.

For feedstocks that depend on other CUDA Toolkit components (say cuBLAS and cuRAND), those can be added to `host` with a selector like so...

```diff
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -2,2 +2,5 @@
  host:
     - python
     - pip
     - numpy
+    - libcublas-dev      # [(cuda_compiler_version or "").startswith("12")]
+    - libcurand-dev      # [(cuda_compiler_version or "").startswith("12")]
```

Similar patterns can be followed for other dependencies. Also please feel free to reach out for help here if needed

# On CUDA 12.x

Initially we just added CUDA 12.0.0. We now are adding CUDA 12.x releases (with `x > 0`). In particular we have added CUDA 12.1 ( https://github.com/conda-forge/cuda-feedstock/issues/11 ) and are working on CUDA 12.2 ( https://github.com/conda-forge/cuda-feedstock/issues/13 )

If users want to pin tightly to a specific CUDA 12.x version during the build, would recommend adding `cuda-version` in the relevant `requirements` sections. For example to constrain during the build (though not necessarily at runtime)...

```yaml
requirements:
  build:
    ...
    - {{ compiler('cuda') }}                    # [(cuda_compiler or "None") != "None"]
  host:
    ...
    - cuda-version {{ cuda_compiler_version }}  # [(cuda_compiler_version or "None") != "None"]
  run:
    ...
```

Thanks all! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

conda-forge: CUDA 12 / new CUDA SDK bringup #1963

Topic

Background

Feedback

Follow-up

Structure

Compiler

CUDA Toolkit Packages

CUDA version alignment

Final Observations

Next Steps

On CUDA 12.x

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

conda-forge: CUDA 12 / new CUDA SDK bringup #1963

Description

Topic

Background

Feedback

Follow-up

Structure

Compiler

CUDA Toolkit Packages

CUDA version alignment

Final Observations

Next Steps

On CUDA 12.x

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions