Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Using a PyTorch-core codegen API #2871

@bdhirsh

Description

@bdhirsh

Below is a proposal for taking the codegen that pytorch/xla currently performs in-tree, and shifting that into pytorch core as a public codegen API that pytorch/xla can call. Sharing the design here for visibility! cc @ezyang @ailzhang

Background

The below two resources are commonly used when implementing operators for external backends.

  • https://pytorch.org/tutorials/advanced/extend_dispatcher.html
  • People often copy paste pytorch/xla's codegen to get started registering their code. What does it do?
    • Generates registrations for all operators in the codebase
    • Generates CPU fallback registrations for operators you don’t have implemented
      • automatically performs xla→cpu→xla conversions
    • Catches schema changes to pytorch operators
    • In some cases, generates implementations of out operators in terms of their functional variants
    • Generates backend-specific autograd registrations for one or two kernels

Currently, you tell the XLA codegen which functions you’ve implemented by adding headers to torch_xla/csrc/aten_xla_type.h

// In aten_xla_type.h
  static at::Tensor acos(const at::Tensor& self);
  static at::Tensor& acos_(at::Tensor& self);
  static at::Tensor add(const at::Tensor& self, const at::Tensor& other, const at::Scalar& alpha);
  static at::Tensor add(const at::Tensor& self, const at::Scalar& other, const at::Scalar& alpha);

Goals

  • XLA’s codegen is making up for perceived deficiencies in the APIs we provide, namely, it takes a lot of boilerplate to write all of the registrations a backend needs (and some of them, like CPU fallbacks, can be programatically generated). We want to close these deficiencies, and prevent people from having to copy pasting XLA’s codegen to get these facilities.
  • pytorch/xla has its own parser for in-tree files like RegistrationDeclarations.yaml. Pytorch-core has its own tools for this, it would great to not have to duplicate it.

The Pitch

We will offer code generation in PyTorch itself, which you can use to generate this boilerplate.

First, you need to specify to the system which operators you actually support. This is specified as a list in a YAML file, say, xla_native_functions.yaml

backend: XLA
cpp_namespace: torch_xla
supported:  # can omit inplace/out if functional is supported
  - acos
  - add.Tensor
  - copy_  # no functional, inplace is implemented directly
  ...
autograd:
  - max_pool2d # override autograd instead of the forward

Then, as part of your build system, you run a codegen script from PyTorch on your YAML file: pytorch/tools/codegen/gen_backend_stubs.py xla_native_functions.yaml --output-dir /path/to/codegen/output/dir

This will generate boilerplate for you. Here are the files you will get:

// ------------------------------------------------
// XLANativeFunctions.h - stubs of operations you should implement

namespace torch_xla {

 Tensor acos(const Tensor & self);
Tensor add(const Tensor & self, const Scalar & other, const Scalar & alpha=1);
...

} // namespace torch_xla

// ------------------------------------------------
// RegisterXLA.cpp

Tensor wrapper_add_Tensor(
  const Tensor & self, const Tensor & other, const Scalar & alpha
) {
  return torch_xla::add(self, other, alpha);
}

// inplace and out variants are automatically generated, call into the functional variant

Tensor & wrapper_add__Tensor(
  Tensor & self, const Tensor & other, const Scalar & alpha
) {
  return torch_xla::copy_(self, torch_xla::add(self, other, alpha));
}

Tensor & wrapper_add_out(
  const Tensor & self, const Tensor & other, const Scalar & alpha, Tensor & out
) {
  return torch_xla::copy_(out, torch_xla::add(self, other, alpha));
}

TORCH_LIBRARY_IMPL(aten, XLA, m) {
  m.impl("add.Tensor", TORCH_FN(wrapper_add_Tensor));
  m.impl("add_.Tensor", TORCH_FN(wrapper_add__Tensor));
  m.impl("add_.out", TORCH_FN(wrapper_add_out));
  ...
}

Notable benefits:

  • You don’t have to figure out what the correct C++ signature is, we generate those for you
  • You don’t have write inplace/out versions of functions, they get generated for you

How we’re getting there

  1. Nearly byte-for-byte compatible rewrite of the codegen that will live in PyTorch; none of the fancy new stuff, that will come later
  • Included in this rewrite: a yaml file that subsumes aten_xla_type.h
  1. Start refactoring to take advantage of new features

Appendix: Backwards compatibility

Right now, the XLA codegen has logic to catch and error out when it sees BC-breaking schema changes to in-tree ops. As part of this change, BC-breaking schema changes that require fixups to external backend kernels will be caught by the compiler/linker, instead of by the codegen script. This is mostly because we’re codegen’ing the headers for each kernel for you, rather than having the backends write out the schema for the headers of each op themselves.

Appendix: Fallbacks

Fallbacks to CPU are needed for ops that external backends haven’t implemented yet. Some backend kernels also aren’t implemented for all valid inputs, and need to conditionally call into a CPU fallback.

Fallbacks to CPU are currently implemented in codegen. They will eventually be handled for you via generic implementations that would be provided by PyTorch:

// DispatchKey::FallbackCPU (maybe)
Tensor fallback_add(
  const Tensor & self, const Tensor & other, const Scalar & alpha
) {
  auto args = at::list_to_cpu({self, other}); // external backends override this
  auto result_cpu = at::add(args[0], args[1], alpha);
  return result_cpu.to(self.device());
}

Current Status

A WIP version of the xls-side change can be found here: #2869

Metadata

Metadata

Assignees

No one assigned

    Labels

    nostaleDo not consider for staleness

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions