Below is a proposal for taking the codegen that pytorch/xla currently performs in-tree, and shifting that into pytorch core as a public codegen API that pytorch/xla can call. Sharing the design here for visibility! cc @ezyang @ailzhang
Background
The below two resources are commonly used when implementing operators for external backends.
- https://pytorch.org/tutorials/advanced/extend_dispatcher.html
- People often copy paste pytorch/xla's codegen to get started registering their code. What does it do?
- Generates registrations for all operators in the codebase
- Generates CPU fallback registrations for operators you don’t have implemented
- automatically performs xla→cpu→xla conversions
- Catches schema changes to pytorch operators
- In some cases, generates implementations of out operators in terms of their functional variants
- Generates backend-specific autograd registrations for one or two kernels
Currently, you tell the XLA codegen which functions you’ve implemented by adding headers to torch_xla/csrc/aten_xla_type.h
// In aten_xla_type.h
static at::Tensor acos(const at::Tensor& self);
static at::Tensor& acos_(at::Tensor& self);
static at::Tensor add(const at::Tensor& self, const at::Tensor& other, const at::Scalar& alpha);
static at::Tensor add(const at::Tensor& self, const at::Scalar& other, const at::Scalar& alpha);
Goals
- XLA’s codegen is making up for perceived deficiencies in the APIs we provide, namely, it takes a lot of boilerplate to write all of the registrations a backend needs (and some of them, like CPU fallbacks, can be programatically generated). We want to close these deficiencies, and prevent people from having to copy pasting XLA’s codegen to get these facilities.
- pytorch/xla has its own parser for in-tree files like RegistrationDeclarations.yaml. Pytorch-core has its own tools for this, it would great to not have to duplicate it.
The Pitch
We will offer code generation in PyTorch itself, which you can use to generate this boilerplate.
First, you need to specify to the system which operators you actually support. This is specified as a list in a YAML file, say, xla_native_functions.yaml
backend: XLA
cpp_namespace: torch_xla
supported: # can omit inplace/out if functional is supported
- acos
- add.Tensor
- copy_ # no functional, inplace is implemented directly
...
autograd:
- max_pool2d # override autograd instead of the forward
Then, as part of your build system, you run a codegen script from PyTorch on your YAML file: pytorch/tools/codegen/gen_backend_stubs.py xla_native_functions.yaml --output-dir /path/to/codegen/output/dir
This will generate boilerplate for you. Here are the files you will get:
// ------------------------------------------------
// XLANativeFunctions.h - stubs of operations you should implement
namespace torch_xla {
Tensor acos(const Tensor & self);
Tensor add(const Tensor & self, const Scalar & other, const Scalar & alpha=1);
...
} // namespace torch_xla
// ------------------------------------------------
// RegisterXLA.cpp
Tensor wrapper_add_Tensor(
const Tensor & self, const Tensor & other, const Scalar & alpha
) {
return torch_xla::add(self, other, alpha);
}
// inplace and out variants are automatically generated, call into the functional variant
Tensor & wrapper_add__Tensor(
Tensor & self, const Tensor & other, const Scalar & alpha
) {
return torch_xla::copy_(self, torch_xla::add(self, other, alpha));
}
Tensor & wrapper_add_out(
const Tensor & self, const Tensor & other, const Scalar & alpha, Tensor & out
) {
return torch_xla::copy_(out, torch_xla::add(self, other, alpha));
}
TORCH_LIBRARY_IMPL(aten, XLA, m) {
m.impl("add.Tensor", TORCH_FN(wrapper_add_Tensor));
m.impl("add_.Tensor", TORCH_FN(wrapper_add__Tensor));
m.impl("add_.out", TORCH_FN(wrapper_add_out));
...
}
Notable benefits:
- You don’t have to figure out what the correct C++ signature is, we generate those for you
- You don’t have write inplace/out versions of functions, they get generated for you
How we’re getting there
- Nearly byte-for-byte compatible rewrite of the codegen that will live in PyTorch; none of the fancy new stuff, that will come later
- Included in this rewrite: a yaml file that subsumes
aten_xla_type.h
- Start refactoring to take advantage of new features
Appendix: Backwards compatibility
Right now, the XLA codegen has logic to catch and error out when it sees BC-breaking schema changes to in-tree ops. As part of this change, BC-breaking schema changes that require fixups to external backend kernels will be caught by the compiler/linker, instead of by the codegen script. This is mostly because we’re codegen’ing the headers for each kernel for you, rather than having the backends write out the schema for the headers of each op themselves.
Appendix: Fallbacks
Fallbacks to CPU are needed for ops that external backends haven’t implemented yet. Some backend kernels also aren’t implemented for all valid inputs, and need to conditionally call into a CPU fallback.
Fallbacks to CPU are currently implemented in codegen. They will eventually be handled for you via generic implementations that would be provided by PyTorch:
// DispatchKey::FallbackCPU (maybe)
Tensor fallback_add(
const Tensor & self, const Tensor & other, const Scalar & alpha
) {
auto args = at::list_to_cpu({self, other}); // external backends override this
auto result_cpu = at::add(args[0], args[1], alpha);
return result_cpu.to(self.device());
}
Current Status
A WIP version of the xls-side change can be found here: #2869
Below is a proposal for taking the codegen that pytorch/xla currently performs in-tree, and shifting that into pytorch core as a public codegen API that pytorch/xla can call. Sharing the design here for visibility! cc @ezyang @ailzhang
Background
The below two resources are commonly used when implementing operators for external backends.
Currently, you tell the XLA codegen which functions you’ve implemented by adding headers to
torch_xla/csrc/aten_xla_type.hGoals
The Pitch
We will offer code generation in PyTorch itself, which you can use to generate this boilerplate.
First, you need to specify to the system which operators you actually support. This is specified as a list in a YAML file, say, xla_native_functions.yaml
Then, as part of your build system, you run a codegen script from PyTorch on your YAML file: pytorch/tools/codegen/gen_backend_stubs.py xla_native_functions.yaml --output-dir /path/to/codegen/output/dir
This will generate boilerplate for you. Here are the files you will get:
Notable benefits:
How we’re getting there
aten_xla_type.hAppendix: Backwards compatibility
Right now, the XLA codegen has logic to catch and error out when it sees BC-breaking schema changes to in-tree ops. As part of this change, BC-breaking schema changes that require fixups to external backend kernels will be caught by the compiler/linker, instead of by the codegen script. This is mostly because we’re codegen’ing the headers for each kernel for you, rather than having the backends write out the schema for the headers of each op themselves.
Appendix: Fallbacks
Fallbacks to CPU are needed for ops that external backends haven’t implemented yet. Some backend kernels also aren’t implemented for all valid inputs, and need to conditionally call into a CPU fallback.
Fallbacks to CPU are currently implemented in codegen. They will eventually be handled for you via generic implementations that would be provided by PyTorch:
Current Status
A WIP version of the xls-side change can be found here: #2869