Changelog History
Page 3
-
v1.5.0-rc5
April 20, 2020 -
v1.5.0-rc4
April 18, 2020 -
v1.5.0-rc3
April 09, 2020 -
v1.5.0-rc2
March 28, 2020 -
v1.5.0-rc1
March 19, 2020 -
v1.4.1
March 23, 2020 -
v1.4.0 Changes
January 16, 2020π PyTorch 1.4.0 Release Notes
- Highlights
- Backwards Incompatible Changes
- Python
- JIT
- C++
- π New Features
- torch.optim
- Distributed
- RPC [Experimental]
- JIT
- Mobile
- π Improvements
- Distributed
- JIT
- Mobile
- Named Tensors
- C++ API
- AMD Support
- ONNX
- Quantization
- Visualization
- Other Improvements
- π Bug Fixes
- Distributed
- RPC
- C++ API
- JIT
- Quantization
- Mobile
- Other Bug fixes
- π Deprecations
- π Performance
π The PyTorch v1.4.0 release is now available.
π The release contains over 1,500 commits and a significant amount of effort in areas spanning existing areas like JIT, ONNX, Distributed, Performance and Eager Frontend Improvements and improvements to experimental areas like mobile and quantization. It also contains new experimental features including rpc-based model parallel distributed training and language bindings for the Java language (inference only).
π PyTorch 1.4 is the last release that supports Python 2. For the C++ API, it is the last release that supports C++11: you should start migrating to Python 3 and building with C++14 to make the future transition from 1.4 to 1.5 easier.
Highlights
π PyTorch Mobile - Build level customization
π Following the experimental release of PyTorch Mobile in the 1.3 release, PyTorch 1.4 adds additional mobile support including the ability to customize build scripts at a fine-grain level. This allows mobile developers to optimize library size by only including the operators used by their models and, in the process, reduce their on device footprint significantly. Initial results show that, for example, a customized MobileNetV2 is 40% to 50% smaller than the prebuilt PyTorch mobile library. Learn more about how to create your own custom builds, and please engage with the community on the PyTorch forums to provide any feedback you have.
Distributed Model Parallel Training [Experimental]
π With the scale of models, such as RoBERTa, continuing to increase into the billions of parameters, model parallel training has become ever more important to help researchers push the limits. This release provides a distributed RPC framework to support distributed model parallel training. It allows for running functions remotely and referencing remote objects without copying the real data around, and provides autograd and optimizer APIs to transparently run backwards and update parameters across RPC boundaries.
π To learn more about the APIs and the design of this feature, see the links below:
π For the full tutorials, see the links below:
- A full RPC tutorial
- Examples using model parallel training for reinforcement learning and with an LSTM
As always, you can connect with community members and discuss more on the forums.
Java bindings [Experimental]
π In addition to supporting Python and C++, this release adds experimental support for Java bindings. Based on the interface developed for Android in PyTorch Mobile, the new bindings allow you to invoke TorchScript models from any Java program. Note that the Java bindings are only available for Linux for this release, and for inference only. We expect support to expand in subsequent releases. See the code snippet below for how to use PyTorch within Java:
π Learn more about how to use PyTorch from Java here, and see the full Javadocs API documentation here.
Pruning
π Pruning functionalities have been added to PyTorch in the
nn.utils.prunemodule. This provides out-of-the-box support for common magnitude-based and random pruning techniques, both structured and unstructured, both layer-wise and global, and it also enables custom pruning from user-provided masks.To prune a tensor, first select a pruning technique among those available in
nn.utils.prune(or implement your own by subclassingBasePruningMethod).from torch.nn.utils import prune t = torch.rand(2, 5) p = prune.L1Unstructured(amount=0.7) pruned\_tensor = p.prune(t)To prune a module, select one of the pruning functions available in
nn.utils.prune(or implement your own) and specify which module and which parameter within that module pruning should act on.m = nn.Conv2d(3, 1, 2) prune.ln\_structured(module=m, name='weight', amount=5, n=2, dim=1)Pruning reparametrizes the module by turning
weight(in the example above) from a parameter to an attribute, and replacing it with a new parameter calledweight_orig(i.e. appending"_orig"to the initial parametername) that stores the unpruned version of the tensor. The pruning mask is stored as a buffer namedweight_mask(i.e. appending"_mask"to the initial parametername). Pruning is applied prior to each forward pass by recomputingweightthrough a multiplication with the updated mask using PyTorch'sforward_pre_hooks.Iterative pruning is seamlessly enabled by repeatedly calling pruning functions on the same parameter (this automatically handles the combination of successive masks by making use of a
PruningContainerunder the hood).π
nn.utils.pruneis easily extensible to support new pruning functions by subclassing theBasePruningMethodbase class and implementing thecompute_maskmethod with the instructions to compute the mask according to the logic of the new pruning technique.Backwards Incompatible Changes
Python
β±
torch.optim: It is no longer supported to useScheduler.get_lr()to obtain the last computed learning rate. to get the last computed learning rate, callScheduler.get_last_lr()instead. (26423)π Learning rate schedulers are now βchainable,β as mentioned in the New Features section below.
Scheduler.get_lrwas sometimes used for monitoring purposes to obtain the current learning rate. But sinceScheduler.get_lris also used internally for computing new learning rates, this actually returns a value that is βone step ahead.β To get the last computed learning rate, useScheduler.get_last_lrinstead.β‘οΈ Note that
optimizer.param_groups[0]['lr']was in version 1.3.1 and remains in 1.4.0 a way of getting the current learning rate used in the optimizer.Tensor.unfoldon a 0-dimensional Tensor now properly returns a 1-dimensional Tensor.Version 1.3.1 Version 1.4.0 >>> torch.tensor(5).unfold(dimension=0, size=1, step=1) tensor(5) | >>> torch.tensor(5).unfold(dimension=0, size=1, step=1) tensor([5]) |
0οΈβ£
torch.symeignow return a 0-element eigenvectors tensor wheneigenvectors=False(the default).Version 1.3.1 Version 1.4.0 >>> torch.symeig(torch.randn(3,3)).eigenvectors.shape torch.Size([3, 3]) | >>> torch.symeig(torch.randn(3,3)).eigenvectors.shape torch.Size([0]) |
JIT
- Make
torch.jit.get_trace_graphprivate (it is nowtorch.jit._get_trace_graph) (29149)- This function was intended only for ONNX integration; use
traced_module.graphinstead, like: - traced_module = torch.jit.trace(my_module, example_inputs)
traced_graph = traced_module.graph
- This function was intended only for ONNX integration; use
@propertyonScriptModules has been disabled (28395)- Scripted
@propertyaccesses were silently broken before, where we would evaluate the thegetfunction once and store that as the attribute permanently. They properly error now; a workaround is to make your@propertya regular method.
- Scripted
- π Custom ops:
torch::jit::RegisterOperatorshas been removed, usetorch::RegisterOperatorsinstead (28229). The usage and behavior should remain the same. - Remove
torch.jit._register_*bindings from Python (e.g.torch.jit._register_attribute). These were private functions that were not intended to be used. (29499)
C++
[C++] The distinction between Tensor and Variable has been eliminated at the C++ level. (28287)
π This change simplifies our C++ API and matches previous changes we did at the python level that merged Tensors and Variables into a single type.
This change is unlikely to affect user code; the most likely exceptions are:
Argument-dependent lookup for
torch::autogradmay no longer work. This can break because Variable is now defined as an alias for Tensor (using Variable = Tensor;). In this case, you must explicitly qualify the calls totorch::autogradfunctions.Because
VariableandTensorare now the same type, code which assumes that they are different types (e.g., for the purposes of templating, orstd::enable_ifchecks) will not work until you delete the (now) redundant overload/specialization.Some operators may trace differently. If this happens, please file a bug. The most likely situations are:
- There are now more operations in your trace than before (usually, calls to
aten::empty) - There are now less operations in your trace than before (e.g., the trace complains that
"there is no observable dependence"with the inputs)
[C++] arguments in
torch::nn::LinearOptionsare renamed to match the Python API. (27382)- Arguments that are renamed:
in->in_featuresout->out_featureswith_bias->bias
[C++] arguments in
torch::nn::Conv{1,2,3}dOptionsare renamed to match the Python API. (28917) (29838)- Arguments that are renamed:
input_channels->in_channelsoutput_channels->out_channelswith_bias->bias
[C++]
torch::nn::Conv{1,2,3}dOptionsno longer has thetransposedargument. (31005)- If users have
transposedoriginally set totrueintorch::nn::Conv{1,2,3}dOptions, they should migrate their code to usetorch::nn::ConvTranspose{1,2,3}dlayers instead.
[C++] All Reduction enums for
torch::nnlayers and functionals are changed to havetorch::KEnumNAMEsyntax. (27942, 26837)- Example: previously, to specify βmeanβ as the reduction method in a torch::nn layer or functional, we would use
torch::Reduction::Mean. Now,torch::Reduction::Meanhas been renamed to the shortertorch::kMean.
[C++]
torch::tensorconstructor is improved to match Python API behavior. (28523) (29632) (29066)- π Shape checking fixes
- Example 1: previously,
torch::tensor({{1}, {2}})produced a tensor of sizes{2}. Now, it produces a tensor of sizes{2, 1}. - Example 2: previously,
torch::tensor(1.1)produced a 1-dim tensor. Now it produces a 0-dim tensor.
- Example 1: previously,
- Type inference improvements
- Example 1: previously, C++
torch::tensorwith a double (e.g.torch::tensor(1.1)) or a (nested) braced-init-list of doubles (e.g.torch::tensor({{1.1, 2.2}})produces a tensor with dtypetorch::kDouble. Now it produces a tensor with dtypetorch::get_default_dtype(). - Example 2: previously, C++
torch::tensorwith an integer type (e.g.torch::tensor(1)) or a (nested) braced-init-list of integer types (e.g.torch::tensor({{1, 2}})) produces a tensor with the same dtype. Now it always produces a tensor of dtypetorch::kLong(aka.int64_t). - Example 3: previously, when passed a
TensorOptionswithout a dtype set to thetorch::tensorconstructor, it always produces a tensor of dtypetorch::get_default_dtype(). Now it produces a tensor of different dtypes based on the dtype of the braced-init-list and the default dtype.
- Example 1: previously, C++
- Passing a
std::initializer_list(NOT braced-init-list) totorch::tensorwill no longer compile, and the user should pass the equivalent braced-init-list totorch::tensorinstead. For example, writetorch::tensor({1.1, 1.2})instead oftorch::tensor(std::initializer_list<double>({1.1, 1.2})).
[C++] Some activation modulesβ
forwardfunction now takeTensorinstead ofTensor&as input. (28501)torch::nnlayers affected:ELU/SELU/Hardtanh/LeakyReLU/ReLU/ReLU6/RReLU/CELU
This change ensures that the above layers can be used in atorch::nn::Sequentialmodule. If your C++ model uses any of the above layers, you must recompile your C++ code with the new libtorch binary.π New Features
torch.optim
β± Learning rate schedulers (
torch.optim.lr_scheduler) now support βchaining.β This means that two schedulers can be defined and stepped one after the other to compound their effect, see example below. Previously, the schedulers would overwrite each other.>>> import torch >>> from torch.optim import SGD >>> from torch.optim.lr_scheduler import ExponentialLR, StepLR >>> >>> model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] >>> optimizer = SGD(model, 0.1) >>> >>> scheduler1 = ExponentialLR(optimizer, gamma=0.9) >>> scheduler2 = StepLR(optimizer, step_size=3, gamma=0.1) >>> >>> for epoch in range(4): >>> print(epoch, scheduler2.get_last_lr()[0]) >>> >>> optimizer.step() >>> scheduler1.step() >>> scheduler2.step() 0 0.1 1 0.09000000000000001 2 0.08100000000000002 3 0.00729000000000002 4 0.00656100000000002Distributed
- β Add
allgather_coalescedAPI toProcessGroup(28634,29059) - β Add
abortAPI inProcessGroupGlooSend/Recv Work (29928). - β Add
--no_pythonflag to allow using a bash script wrapper in the launch command (29144).
RPC [Experimental]
π
torch.distributed.rpcis a newly introduced package. It contains basic building blocks to run functions remotely in model training and inference, which will be useful for scenarios like distributed model parallel or implementing parameter server frameworks. More specifically, it contains four pillars: RPC, Remote Reference, Distributed Autograd, and Distributed Optimizer. Please refer to the documentation and the tutorial for more details.- π Add
rpc_syncandrpc_asyncfor builtin operators and Python user functions (23228, 23569, 28392). - β Add
remoteandRReffor builtin operators and Python user functions (25169, 25499). - Distributed Autograd - FAST mode backward pass implementation. (27022, 27576).
- β Integrate
remoteandRRefwith distributed autograd (28630, 28656). - β Add a distributed optimizer (29304, 30062).
- β Add python API for
get_gradients()method to retrieve gradients from distributed autograd context. (28926). - π Support creating local
RRefs on local values and to-selfremotecalls (28948, 29634). - π Support custom pickler for RPC (30185).
- β Add default RPC agent options based on the backend type (30201).
- β Add local
shutdowntoProcessGroupagent (30330).
JIT
script::Module: implement more of of the nn.Module API (28828)- In particular, adds the (optionally recursive) methods that iterate over submodules, parameters, etc.
- Adds a pybind-like
attr()method to simplify attribute access.
- β Add support for
@staticmethodonScriptModules (27163) - π Support Module Containers as Iterables (26465)
- π Support Iterables In List Comprehensions (26768)
- π Dictionaries now preserve insertion order, and
OrderedDictis supported (26465) - β Add support for
hasattr()(29332) - TorchScript classes can now be callable (26743)
- β Add
clone_instanceforScriptModules (30168) - β Add
torch.memory_formatsupport to the TorchScript (28544) - Custom
forward()is now allowed on container modules (28988) - Calls to submodules are now preserved in the traced graph (29261)
- β Add support for module containers to be used as iterables (28255)
- π Make JIT Serialization support arbitrary std::function<> IO (28039)
- π Support
layout()in script (27100) - Methods and functions are no longer inlined in the serialized file format (26706)
Mobile
- π Build level customization
π Improvements
Distributed
π Improvements
- β Add timeout support in
ProcessGroupNCCL(27224). - Ensure that DDP wrapped module has parameters that require gradients (25858).
- Making
torch/csrc/cudaNCCL usage safe for NCCL 2.5 (29014). - β
Enable
test_distributedfor ROCm but only with NCCL backend (28814).
RPC Improvements
- π Separate out RPC to
rpc_syncandrpc_asyncAPIs (26570). - π Make python user function serialization format to be consistent with builtin operators (27136).
- Clean up distributed autograd context on all participants on exit (27951).
- π Improve error handling for distributed autograd engine. (27940).
- Scope pybind11 functions to
torch.distributed.{autograd,rpc}(27529). - Lift
rpc_timeouttoRpcAgentto make it reusable for otherRpcAgentimplementations. (29341). - Support sending message to self in
process_group_agent(29253). - Properly shutdown RPC even in the case of
clean_shutdown=False. (29148). - Ensure
initializedContextIds_map is cleaned up appropriately in distributed autograd engine. (29787). - β Add hash and equality operators for
WorkerInfo(29958). - β Add
RpcAgentOptionsstruct type to bundle arguments for differentRpcAgents (29972). - β± Mark timeout
FutureMessages and throw exceptions inProcessGroupAgent(29601). - π» Re-throw python remote exception when using remote reference to itself (29930).
- 0οΈβ£ By default ignore
RRefleaks during shutdown (30217).
π Documentation
- β Add Design doc for Distributed Autograd Engine (29175, 30068, 29927)
- β Add Design doc for Remote Reference (30066).
- β Add documentation page for
torch.distrbuted.rpc(29276, 28030, 29971, 30160, 30050, 30069, 30179, 30218, 30240, 30243, 30259).
MISC
- β Add known worker IDs to distributed autograd context (26324).
- Minor tweaks to RPC message API (28326).
- π Rename
PythonUDF{Call,Resp}(27530). - π Use
std::shared_ptrforDistAutogradContext(29770). - Mark
c10d::~NCCLUtilsas noexcept (29118).
JIT
- π Move custom passes to last optimization step (29256)
- Represent the original Python name of a module type the same way in traced and scripted modules. (29912)
- π¨ Only print original SourceRange on highlight (29708)
- Error message and ergonomic improvements:
- Show full call stack in TorchScript exception even when calls were inlined. (29911)
- Reduce error context from 10 -> 3 (26765)
- Fix error report highlight for unmatched type annotation (27195)
- Make default string arguments in schemas human readable (27088)
- Print which output didn't have dependence during trace checking. (29047)
- π Improvements to save/load and serialization performance:
- Modules can now share JIT types if their implementation is the same, improving save/load performance (26666)
- Improve float pickling speed. (28553)
- Pickler: convert
std::stringstreamcases for improved performance. (29351) - Buffer to speed Unpickler (27727)
- Buffer in Pickler to improve performance. (27720)
- In
torch::save()avoid zip compressing small header records. (28180) - String optimizations related to serialization. (28230)
- Clean up serialized source format (28129)
- API for finding a common ancestor block for a pair of nodes (28864)
- π Make inserted child module names unique (27237)
- π Better hashing for constant pool (27733)
- π Improve error messages when a method or attribute is missing (27110)
- π¨ Display original source range in
Node::print(27524) - Always use the closure to resolve variable names (27515)
Mobile
- π Improve Java API / JNI
- Add module method to allow explicitly destructing native part (27090).
- Add methods to write image tensor content to buffer (27359).
- Various improvements to Android API (27454, 27455).
- Add support for PyTorch JNI build (29412, 42faf961c8, d22f61432d).
- Various fixes to PyTorch JNI (29350, 29861, 30206, 30207).
- π Improve support for older Android NDK
- π Improve error message, documentation, debuggability
- π Improve support for benchmark and profiling
- π Improve build / CI
- Improve Android Gradle build and publishing (26833, 27389, 29262, 29738).
- Misc fixes to the Android test project (27453).
- Improve XCode build script (27358, 28996, 29002).
- Add testing code to iOS CI jobs (27593, 27594, 27784, 30133).
- Misc fixes to the iOS TestApp (27591, 28356, 28809, 29247, 29962, 29963).
- Add support for host build to pytorch_android (27662,27664).
- Add host build Gradle publishing (29749).
- Add mobile build CI with host toolchain (30292).
Named Tensors
- π
torch.addcdiv,torch.addcmulAdded named tensor support (28975). - π
torch.{ones,zeros,full,rand,randn}_likeAdded named tensor support (28981). - π
torch.cdistAdded named tensor support (29129). - π
torch.equalAdded named tensor support (29322). - β Added named tensor support for comparison ops (27162).
- π
Tensor.align_toFixed error message (27221). Tensor.align_toMake method-only. (27304).Tensor.align_toAccept partially named tensors (27308).- π
torch.mean(Tensor, Dimname)Fixed autograd support (29199). Tensor.unflattenFix when dim is a negative integer (#31208) (31432).- π Fix type errors in examples about Named Tensor (27828).
C++ API
π New torch::nn modules
- Convolution layers
- Pooling layers
- Loss layers
- torch::nn::HingeEmbeddingLoss / CosineEmbeddingLoss /MultiMarginLoss (27101) (27345) (27424) (27770).
- torch::nn::TripletMarginLoss / SoftMarginloss / MultiLabelMargin / MarginRankingLoss / MultiLabelSoftMarginLoss (27713, 27956) (27660) (27659) (29000) (27669).
- torch::nn::MSELoss / KLDivLoss / BCELoss / SmoothL1Loss / PoissonNLLLoss / BCEWithLogitsLoss (27156) (28806) (30146) (27661) (28755) (28783).
- torch::nn::NLLLoss / CrossEntropyLoss / CTCLoss (29812) (28654).
- Normalization Layers
- Activation Layers
- torch::nn::ELU / LeakyReLU / SELU / PReLU / ReLU / ReLU6 / RRelu / CELU / GLU (27028) (27059) (27434) (27429) (27435) (27436) (27437) (27487) (29922).
- torch::nn::Sigmoid / LogSigmoid / LogSoftmax / Softmax / Softmax2d / Softplus / Softmin / Softsign / Softshrink / Hardshrink / Hardtanh / Tanh / Threshold (27488) (27060) (27462) (27446) (27509) (27489) (27459) (27535) (27534) (27035) (27537) (27038) (27536) (27538).
- β¬οΈ Dropout Layers
- Padding Layers
- Embedding layers
- torch::nn::Embedding / EmbeddingBag (26358).
- Linear layers
- Vision layers
π New torch::nn::functional functions
- Convolution functions
- Pooling functions
- Loss functions
- torch::nn::functional::hinge_embedding_loss / multi_margin_loss / multilabel_soft_margin_loss / triplet_margin_loss / soft_margin_loss / margin_ranking_loss (27101) (27424) (27669) (27713) (27660) (29000).
- torch::nn::functional::poisson_nll_loss / nll_loss / cross_entropy / binary_cross_entropy_with_logits (28755) (29812) (28783).
- torch::nn::functional::l1_loss / kl_div / mse_loss / binary_cross_entropy / smooth_l1_loss / ctc_loss (27156) (28806) (30146) (27661) (28654).
- Normalization functions
- Activation functions
- torch::nn::functional::elu / leaky_relu / selu / prelu / relu / relu6 / rrelu / celu / glu / gelu (27028) (27059) (27434) (27429) (27435) (27436) (27437) (27487) (29922) (28433).
- torch::nn::functional:: log_sigmoid/ log_softmax / softmax / softplus / softmin / softsign / softshrink / hardshrink / tanhshrink / hardtanh / gumbel_softmax / threshold (27060) (27462) (27446) (27489) (27459) (27535) (27534) (27035) (27537) (27038) (28121) (27538).
- Embedding functions
- Linear functions
- Padding functions
- Vision functions
- Distance functions
- torch::nn::functional::pdist (27122).
- Utility functions
π AMD Support
- π New features integration
- π Build/CI
ONNX
π In PyTorch 1.4, we have mainly focused on expanding the coverage for ONNX Opset 11, and enabling exporting torchvision models. Most of the torchvision models can be exported to ONNX (Opset 11, with fixed input size), including FasterRCNN, MaskRCNN, and KeypointRCNN. We have also enhanced export support for some tensor indexing scenarios, with more enhancements to come in the next release. In addition, 20+ new PyTorch operators are enabled in ONNX exporter.
Expanding Coverage for ONNX Opset 11
- π
torch.sort/torch.topkare supported in Opset 11 (25739) torch.size/torch.squeeze/torch.unsqueeze/torch.mm/torch.index_fill/torch.index_copyare supported in Opset 11 (27578)torch.masked_select/torch.masked_scatterare supported in Opset 11 (25949)- π
torch.arangeis supported in Opset 11 (26875) avg_pool, constant_pad_nd, reflection_pad, replication_padSupport enhanced in Opset 11 (28225)- π
torch.hardtanhis supported in Opset 11 (30169) - Enable ONNX constant folding for opset 11 (29011)
Exporting More Torch Operators/Models to ONNX
torch.remainderis enabled in exporter (24410)torch.unfoldis enabled in exporter (24970)torch.slice/torch.selectwith negative index are enabled in exporter (25273, 26549)torch.ones/torch.ones_like/torch.zeros/torch.zeros_like/torch.full/torch.full_likewith default dtype are enabled in exporter (27577)torch.unbindis enabled in exporter (27247)torch.nn.functional.interpolateexport is enhanced (27179, 27566, 28560, 29489)torch.detis enabled in exporter (26958)torch.group_normis enabled in exporter (27071)torch.meshgridis enabled in exporter (26037)torch.randn/torch.randn_likeare enabled in exporter (28470, 29354)torch.weight_normenabled in exporter (28618)torch.scalar_tensoris enabled in exporter (28713)torch.logdetis enabled in exporter (29767)torch.batch_norm2D with affine=False is enabled in exporter (29458)torch.bitshiftis enabled in exporter (28210)
β Enhancing Export/Test Infra
- β Use deepcopy inputs in ONNX ORT test cases (27186)
- Return NotImplemented from all binary math ops (27423).
- Disabling ONNX IR v4 sematics for opset 8 or lower (28990)
- β Add ONNX tests for torchvision models (30121)
- Keep output type information while exporting ONNX graph (25906)
Quantization
π Quantization updates correspond to a mix of bug-fixes and feature improvements, with feature improvements adding improved operator coverage and performance improvements. We have also made a lot of progress towards enabling graph mode quantization support.
- π Feature improvements:
- Enabling intra-op parallelism (26692).
- Enabling inplace relu (28710).
- Quantized Tensor support copy (28612).
- Add quantized torch mean implementation (27675).
- Add quantized avg_pool2d for pytorch mobile (27631).
- Add nn.quantized.Conv3d (29813).
- Adding inplace quantized relu6 (29245).
- Fast histogram observer (29790).
- PackedSequence support for quantized LSTM (29585).
- Improve legacy QuantizedLinear functions to reduce overhead (29773).
- Add support for quantized operator conversion from PT to C2 via ONNX (29694).
- enable per channel dynamic quantization (30122).
- π Scripting support:
Visualization
- π Fixed graph visualization: displaying proper names after recent JIT changes (30244)
- π Support logging embedding for TensorBoard visualizations to generic filesystem (27716)
Other Improvements
torch.argmax/argminAllow half type (28787).torch.cuda.memory_stats / memory_summaryinstrumentation for CUDA memory allocator (27361).torch.set_num_threadsAllow calling multiple times with TBB (27190).torch.set_num_threadsAllow calling multiple times in parallel native (27947).torch.logical_xorAllow non-bool tensors (27248).torch.promote_typesNicer error message. (27941).torch.batch_norm_elemtAdd an out-variant (27621).torch.lerpImplement derivative with respect to weight (28219).- π
torch.complex32Add type promotion support (27929). - π
torch.uniqueSupport bool tensors (28374). torch.reshapeImprove backward for viewable geometries (28901).torch.luGeneralized factorization (28608).torch.equalAdd the intra-op parallelism (28810).torch.randintAccept generator=None (29748).torch.bfloat16Enabled for cuda (27259).torch.multinomialEnable for torch.half (29266).nn.RNNRespect the current stream in cudnn (27026).nn.RNNPreserve nonlinearity attribute (28058).- π
nn.LinearSupport 0-batch size. (27211). nn.functional.binary_cross_entropyimplement double backwards (26983).- π
nn.AdaptiveAvgPool2dAdd support for NHWC memory format (24396). nn.GELUAdd GELU activation (28944).nn.LayerNormHandle batch size of zero (28614).- π
nn.BatchNormAdd NHWC support on cudnn (23861). - π
nn.BatchNorm2dsupport torch.channels_last (28982). nn.BatchNorm2dHandle empty inputs (30035).nn.LayerNormEnable the intra-op parallelism (28464).nn.utils.pruneAdd pruning functionality (24076).nn.SequentialMake iterable (28987).dtype.is_signedAbility to differentiate signed dtypes (29511).- β±
optim.lr_scheduler.MultiplicativeLRAdd new multiplicative learning rate scheduler. (27254). - π
cuda.comm.scatter, gatherAdd channel-last support (28077). at::parallel_forChoose number of OMP threads based on GRAIN_SIZE (26963).- π Return NotImplemented from unsupported tensor arithmetic operators (26507).
- Automatically select proper tqdm submodule (27108).
- π Pickle support for sparse tensors (27062).
- π Vectorized complex unary and binary op support. (26500).
- π Complex support for reduce and linpack ops on CPU (27653).
- π Complex support for compare and pointwise ops on CPU (28735).
- π Make PyTorch Python 3.8 compatible (29302).
- β Buffer python warning to avoid deadlocks (26613).
- π Use NNPACK for strided convolutions. (29084).
π Bug Fixes
Distributed
- Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (27124).
- π Fix segmentation fault in
FileStorewith concurrent accesses. (28812). - π Fix DDP incompatibility issue with
nn.MultiheadAttention(26826).
RPC
- β Add
ProcessGroupAgenttermination detection algorithm (26984). - π Fix pybind11 warnings in Python RPC handler implementation (27284).
- Defer creating
ProcessGroupAgentlistener thread until contexts are initialized (28013). - π Fix Python RPC handler exit crash (27251).
- π Fix distributed autograd initialization (29069).
- Always include autograd context id in
rpc_*/remoterequests (29781). - π Make
RRefContextsingleton leaky, deal with module destruct order race. (30172).
π C++ API Bug Fixes
- at::Tensor::requires_grad_ now supported (26332).
- π torch::isfinite now supported (30083).
- torch::nn::modules_ordered_dict is deprecated (28774).
- β Add reset_parameters to torch::nn modules (29832).
- π Allow passing undefined Tensor to Module::register_parameter (27948).
- Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (30626).
- Include hierarchy information in C++ API loading error messages (28499).
- π Fix a bug: the C++ L-BFGS optimizer does not work properly if there are one or more registered tensors with no grad in the model (27606).
- π Use c10::variant-based enums for Nonlinearity and FanMode (27933). Support for
torch::nn::init::Nonlinearityandtorch::nn::init::FanModewill be removed in 1.5.
JIT
- π Make dropout properly condition on training. (29436)
- π Fix aten::grad to return optional list (29577)
- π Fix
torch.arangedtype - π Fix type sharing on loaded ScriptModules (29826)
- π Fix type sharing between traced modules (29583)
- 0οΈβ£ Check for mutable default parameters (29833)
- π Fix tracing of autograd functions (29791)
- Check for unrolled loop in break & continue (29474)
- π Fix negative string indexing (22700)
- π Make jit.trace_module reentrant (29411)
- π Fix jit outplace tracing and reapply changes to _like operators. (28839)
- Properly guard against inheritance on TorchScript classes (28407)
- π Fix when giving jit format warning about unsupported options (28616)
- π Fix handling of function attributes. (28569)
- π Fix pushLong() issue in pickler. (28057)
- π Fix broken name mangling (27511)
- π Fix segfault while printing value type for an error msg in emitListComprehension (27261)
- π Fix
toIValuedict iteration (26856) - π Fix race condition in Function::optimized_graph(). (27012)
- Sanitize module names on legacy import (27764)
- Python None should have its type inferred as NoneType (26665)
- Properly set existing attributes under recursive script (27514)
Quantization
- Skip copy_same_type_transpose_ for quantized tensor (29609).
- β Add note that cuda quantization is not supported (27829).
- π Rename _intrinsic to intrinsic (27194).
- π Better error message for quantized dispatch (28635).
- β‘οΈ Update the misleading comments for zero_points and scale in dynamic quant linear module [1/2] (28767).
- Avoid the misleading zero_point and scale [2/2] (28827).
- β Add the warning message for API with linear modules (28766).
- Do not insert observers for empty sequential modules (28384).
- π Fix the padding issue of quantized average pool operator (28260).
Mobile
- π Fix deadlock issues in ThreadPool (29885).
- Disable ProfilingGraphExecutorImpl for mobile (30067).
π Other Bug fixes
torch.kthvalueFix CUDA shared memory out of bound access in findPattern (28989).torch.saveFix source files not being saved (28965).torch.loadFix OSError loading files larger than 2GB. (27125).torch.linspaceclearer error message for negative step sizes. (28274).torch.histcAdd range checks to avoid segfaults (27712).torch.luFix threadlocal issue on cpu (28546).torch.max_pool2dLimit tensor size to max CUDA grid size (28931).torch.renormFix a memory leak in CUDA renorm. (29873).torch.index_addFix bug in atomicAdd on CUDA for some dtypes (29231).torch.addmmFix handling of empty tensors (28613).nn.CTCLossFix incorrect gradient for large target sizes (27460).nn.functional.ctc_lossFix incorrect gradient on cudnn (27039).nn.EmbeddingIncorrect gradient at padding_idx in cuda kernel. (27731).nn.LayerNormFix an illegal memory access error (28196).nn.Conv2dhandle zero stride (28784).nn.PoissonNLLLossFix incorrect result withfull=True(28637).nn.AvgPool2dfix an overflow for 231-1 sized inputs (30793).nn.RNNBaseFix an issue with use of children of RNN third party device types (28562).π§
nn.UpsampleFix βinvalid configuration argumentβ error (28927).nn.UpsampleFix a CUDA launch config failure (29016).β±
optim.lr_scheduler.OneCycleLRCorrectly handle div_factor parameter (28217).π
PackedSequence.toEnsure all tensors are moved (27245).EventList.total_averageFix a regression caused by missing iadd (27498).Tensor.record_streamEnsure stream is recorded for shifted view tensors (27371).torch.hubHandle branch names containing a slash. (27960).π Fix error handling in Magma kernels (29003).
π Fix avx for c++14 (28207).
π Fix illegal memory access thread safety issue in sparse CUDA (29426).
π Deprecations
π Python 2 support is deprecated and will not be supported in the 1.5 release.
β±
torch.optim:Scheduler.step(epoch)is now deprecated; useScheduler.step()instead. (26432)For example:
>>> for epoch in range(10): >>> optimizer.step() >>> scheduler.step(epoch) DeprecationWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. warnings.warn(EPOCH_DEPRECATION_WARNING, DeprecationWarning)[C++] C++11 is deprecated and will not be supported in the 1.5 release.
[C++]
Tensor::is_variable()has been deprecated. As noted in the Backwards Incompatible Changes section, the distinction between variable and non-variable has been eliminated, so this check is no longer meaningful. Generally,is_variable()will now return true except in some special circumstances (see 29653 for more details). (29653)[C++]
torch::nn::modules_ordered_dicthas been deprecated. It is generally no longer necessary and can just be removed. (28774)π
torch.jit.quantizedAPI has been deprecated in favor oftorch.quantization.quantize_dynamic(28766)π Performance
π A benchmark suite is available to easily measure the performance of operators with a range of input shapes. The generated benchmark data fully characterize the performance of operators in terms of execution time. For more details see README.md in the benchmarks/operator_benchmark directory.
- π
torch.nn.functional.threshold, torch.nn.functional.layer_norm, torch.cdistPerformance of threshold (CPU), layer norm (CUDA) and cdist operations was improved (27155,27634, 25799) - π
torch.Tensor.fill_Performance for half and bfloat16 types on CPU was improved (28397). torch.nn.MaxPool2dimplementation for channels_last format was added (24872)- There is a fast pass reducing the overheads of pointwise operations relying on TensorIterator under certain conditions (contiguous inputs, no broadcast) (29180).
- Overheads of operations with scalars/number literals was improved (29915).
- In case of type promotion on the GPU, the values are converted on the fly, without explicit casting of the full tensor (30018).
- π reorder_dimensions in TensorIterator favors output write locality, improving overall performance when operating on discontiguous tensors (28615).
- Float pickling speed was improved (28553).
- GRAIN_SIZE for intra-op parallelization was unified between TH and ATen operations (28770)
- π
tensor.numeldevirtualized, improving performance (27294)
-
v1.4.0.a0
October 08, 2019 -
v1.3.1 Changes
November 07, 2019π Significant Fixes
π Type Promotion: fixed a bug where type promotion, combined with non-contiguous tensors could compute incorrect results. (28253)
Version 1.3.0 Version 1.3.1 >>> a = torch.tensor([[True, True], [<span class="pl-c1">False</span>, <span class="pl-c1">True</span>]])# get a non-contiguous tensor >>> a_transpose = a.t() # type promote by comparing across dtypes (bool -> long) >>> a_transpose == 0 # POTENTIALLY INCORRECT VALUES | >>> a = torch.tensor([[True, True], [False, True]]) # get a non-contiguous tensor >>> a_transpose = a.t() # type promote by comparing across dtypes (bool -> long) >>> a_transpose == 0 tensor([[False, True], [False, False]]) |
π Type Promotion / Indexing: Fixed a Bug that Allowed Mixed-Dtype Indexing and assignment could lead to incorrect results. Mixed dtype operations of this form are currently disabled, as they were in 1.2. (28231)
Version 1.3.0 Version 1.3.1 >>> a = torch.ones(5, 2, dtype=torch.float) >>> b = torch.zeros(5, dtype=torch.long) >>> a[:, [1]] = b.unsqueeze(-1) >>> a # POTENTIALLY INCORRECT VALUES | >>> a = torch.ones(5, 2, dtype=torch.float) >>> b = torch.zeros(5, dtype=torch.long) >>> a[:, [1]] = b.unsqueeze(-1) RuntimeError: expected dtype Float but got dtype Long |
π torch.where(condition, x, y): fixed a bug on CPU where incorrect results could be returned if
xandywere of different dtypes. Mixed dtype operations of this form are currently disabled, as they were in version 1.2. (29078)Version 1.3.0 Version 1.3.1 >>> x = torch.randn(2, 3) >>> y = torch.randint(0, 10, (2, 3)) >>> torch.where(x < 0, x, y) tensor(...) # POTENTIALLY INCORRECT VALUES | >>> x = torch.randn(2, 3) >>> y = torch.randint(0, 10, (2, 3)) >>> torch.where(x < 0, x, y) RuntimeError: expected scalar type Float but found Long |
π Other Fixes
- π
torch.argmax: fix regression on CUDA that disabled support fortorch.float16inputs. (28915) - NamedTensor: fix Python refcounting bug with
Tensor.names. (28922) - π Quantization: support
deepcopyfor quantized tensors. (28612) - π Quantization: support
nn.quantized.ReLUwithinplace=True. (28710) - π Documentation:
torch.lgammaandtorch.polygammaare now documented. (28964)
- π
-
v1.3.0 Changes
October 10, 2019Table of Contents
- π₯ Breaking Changes
- Highlights
- [Experimental]: Mobile Support
- [Experimental]: Named Tensor Support
- [Experimental]: Quantization support
- Type Promotion
- Deprecations
- π New Features
- TensorBoard: 3D Mesh and Hyperparameter Support
- Distributed
- Libtorch Binaries with C++11 ABI
- New TorchScript features
- π Improvements
- C++ Frontend Improvements
- Autograd
- New torch::nn modules
- New torch::nn::functional functions
- tensor Construction API
- Other C++ Improvements
- Distributed Improvements
- Performance Improvements
- JIT Improvements
- ONNX Exporter Improvements
- Adding Support for ONNX IR v4
- Adding Support for ONNX Opset 11
- Exporting More Torch Operators/Models to ONNX
- Enhancing ONNX Export Infra
- Other Improvements
- π Bug Fixes
- TensorBoard Bug Fixes
- C++ API Bug fixes
- JIT
- Other Bug Fixes
- π Documentation Updates
- Distributed
- JIT
- Other documentation improvements
π₯ Breaking Changes
Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions. (22273, 26981)
π Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.
π In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see full documentation). These rules generally will retain precision and be less surprising to users.
Version 1.2 Version 1.3 >>> torch.tensor(1) + 2.5 tensor(3) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3]) >>> torch.tensor( True ) + 5 tensor(True) | >>> torch.tensor(1) + 2.5 tensor(3.5000) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3.5000]) >>> torch.tensor(True) + 5 tensor(6) |
Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error. (22273, 26981)
Version 1.2 Version 1.3 >>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) tensor(2) >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) tensor(True) | >>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) RuntimeError: result type Float cannot be cast to the desired output type Long >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) RuntimeError: result type Long cannot be cast to the desired output type Bool |
π These rules can be checked at runtime via torch.can_cast.
torch.flatten: 0-dimensional inputs now return a 1-dim tensor. (25406).Version 1.2 Version 1.3 >>> torch.flatten(torch.tensor(0)) tensor(0) | >>> torch.flatten(torch.tensor(0)) tensor([0]) |
nn.functional.affine_grid: whenalign_corners = True, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).
π
torch.gels:removed deprecated operator, usetorch.lstsqinstead. (26480).π
utils.data.DataLoader:made a number of Iterator attributes private (e.g.num_workers,pin_memory). (22273)[C++]
Variable::backwardwill no longer implicitly create a gradient for non-1-element Variables. Previously, a gradient tensor of all 1s would be implicitly created . This behavior matches the Python API. (26150)auto x = torch::randn({5, 5}, torch::requires_grad()); auto y = x * x; y.backward() // ERROR: "grad can be implicitly created only for scalar outputs"[C++] All option specifiers (e.g.
GRUOptions::bidirectional_) are now private, use the function variants (GRUOptions::bidirectional(...))instead. (26419).Highlights
π [Experimental]: Mobile Support
π In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here are the full list of features in this release:
- π Support for full TorchScript inference on mobile;
- Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
- Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
- π Support for all forward ops on mobile CPU (backward ops are not supported yet);
- β‘οΈ Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
- β‘οΈ Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);
β We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.
The tutorials, demo apps and download links for prebuilt libraries can be found at: https://pytorch.org/mobile/
π This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster and better for your specific use cases. Stay tuned and give us your feedback!
π [Experimental]: Named Tensor Support
π Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by name" rather than "broadcasting by position".
Create a named tensor by passing a
namesargument into most tensor factory function.\>\>\> tensor = torch.zeros(2, 3, names=('C', 'N')) tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))Named tensors propagate names across operations.
\>\>\> tensor.abs() tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))Rearrange to a desired ordering by using
align_to.\>\>\> tensor = tensor.align\_to('N', 'C', 'H', 'W')\>\>\> tensor.names, tensor.shape (('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))π And more! Please see our documentation on named tensors.
π [Experimental]: Quantization support
π PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:
x = torch.rand(10,1, dtype=torch.float32) xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8) # xq is a quantized tensor with data represented as quint8 xdq = x.dequantize() # convert back to floating pointπ We also support 8 bit quantized implementations of most common operators in CNNs, including:
- Tensor operations:
- view, clone, resize, slice
- add, multiply, cat, mean, max, sort, topk
- Modules/Functionals (in torch.nn.quantized)
- Conv2d
- Linear
- Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
- Interpolate
- Upsample
- π Fused operations for preserving better accuracy (in torch.nn.intrinsic)
- ConvReLU2d, ConvBnReLU2d, ConvBn2d
- LinearReLU
- add_relu
π We also support dynamic quantized operators, which take in floating point activations, but use quantized weights (in torch.nn.quantized.dynamic).
- LSTM
- Linear
π Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:
- MinMaxObserver
- MovingAverageMinMaxObserver
- PerChannelMinMaxObserver
- MovingAveragePerChannelMinMaxObserver
- HistogramObserver
π For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:
torch.fake_quantize_per_tensor_affine,torch.fake_quantize_per_channel_affinetorch.quantization.FakeQuantize
π In addition, we also support workflows in torch.quantization for:
- post-training dynamic quantization
- static post training quantization
- quantization aware training
All quantized operators are compatible with TorchScript.
π For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html
Type Promotion
Arithmetic and comparison operations may now perform mixed-type operations that promote to a common dtype.
This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with
dtype=torch.float32.>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)π See the full documentation for more details.
torch.result_typeProvide function to determine result of mixed-type operations (26012).torch.can_castExpose casting rules for type promotion (26805).torch.promote_typesExpose promotion logic (26655).
π Deprecations
nn.functional.affine_grid/nn.functional.grid_sample: USING The Align_CORNER Default value is now deprecated, because it will be changed in 1.4 release.π The
align_cornerparameter was added in this release; the behavior in the previous release was equivalent to setting the parameter toTrue. This is also the current default value but it will be changed toFalsefrom 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.>>> torch.nn.functional.affine_grid(torch.randn(1,2,3), (1,3,2,2)) UserWarning: Default grid_sample and affine_grid behavior will be changed to align_corners=False from 1.4.0. See the documentation of grid_sample for details. ... >>> torch.nn.functional.affine_grid(torch.randn(1,2,3), (1,3,2,2), align_corners=True) # NO WARNING! ...π [C++] Deprecate
torch::Tensor::data<T>()in favor oftorch::Tensor::data_ptr<T>()(24847, 24886).π New Features
π TensorBoard: 3D Mesh and Hyperparameter Support
torch.utils.tensorboardsupports 3D mesh and points plus hyperparameter logging. More details can be found in the documentation forSummaryWriterwithadd_meshandadd_hparams.A simple example exercising both methods:
from torch.utils.tensorboard import SummaryWriter vertices_tensor = torch.as_tensor([ [1, 1, 1], [-1, -1, 1], [1, -1, -1], [-1, 1, -1], ], dtype=torch.float).unsqueeze(0) colors_tensor = torch.as_tensor([ [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 0, 255], ], dtype=torch.int).unsqueeze(0) faces_tensor = torch.as_tensor([ [0, 2, 3], [0, 3, 1], [0, 1, 2], [1, 3, 2], ], dtype=torch.int).unsqueeze(0) with SummaryWriter() as w: w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor) for i in range(5): w.add_hparams({'lr': 0.1*i, 'bsize': i}, {'hparam/accuracy': 10*i, 'hparam/loss': 10*i})Distributed
π This release adds macOS support for
torch.distributedwith the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.- β¬οΈ
torch.distributed.all_reduce_coalescedSupport allreduce of a list of same-device tensors (24949, 25470, 24876) torch.distributed.all_reduceAdd bitwise reduction ops (BAND, BOR, BXOR) (26824)
Libtorch Binaries with C++11 ABI
π We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ βQUICK START LOCALLYβ.
π New TorchScript features
- β Add
not insupport for TorchScript (23637). - You can now raise exceptions in one side of an if branch (23565).
- β Add
torch.jit.is_scripting()API (25955). - π Make assertions like
x is not Noneunwrap the optional type ofx(23949). - β Add dictionary augmented assignment (
+=) support to TorchScript (23639). - π Support
gradanddataattribute for tensor in TorchScript (23842). - β Add
@ignorefor TorchScript classes (23614). - π Support nn.GRU in script (23266).
- π Support tensor as a key type in TorchScript (23638).
- β Add support for ModuleDict (25715).
- Bind
set_grad_enabled()into TorchScript (25350). - β Add
inmembership checks for lists (25796). - β Add
tuplekeyword (25474). - Add
__getitem__to class types (25664). - Add
__setitem__to class types (25750). - π Make JIT dicts ordered, matching Python 3.6+ semantics (26465).
- β Added invert bitwise operation to TorchScript (22324).
- β Add
min()andmax()for lists to TorchScript (26351). - π Support iterables and ranges in list comprehensions (26768).
π Improvements
C++ Frontend Improvements
π We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:
Autograd
- Tensor autograd APIs
- β Add support for custom autograd functions in C++ API
torch::autograd::backwardandtorch::autograd::grad(24342)torch::autograd::Variable::register_hook(24393).
π New torch::nn modules
- Containers
- torch::nn::ModuleList (24317).
- Linear layers
- torch::nn::Identity (26713).
- Convolution layers
- torch::nn::Fold (24160).
- Pooling layers
- Loss functions
- torch::nn::L1Loss (25902).
- Distance functions
π New torch::nn::functional functions
- Pooling functions
- torch::nn::functional::max_pool1d / max_pool2d / max_pool3d (26262).
- torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices (26521).
- torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d (26262).
- torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d (26755, 26772, 26775).
- torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices (26755, 26772, 26775).
- Distance functions
tensor Construction API
- β Add support for multidimensional inputs to
torch::tensor(26210, 26890, 26756).- From now on, we can use
torch::tensor({{1, 2}, {3, 4}})in C++ to construct the same tensor astorch.tensor([[1, 2], [3, 4]])in Python. Some caveats are noted in this comment.
- From now on, we can use
- β Add support for bool and BFloat16 dtypes to
torch::tensor(23337).
Other C++ Improvements
- β Add
torch::nn::Module::unregister_modulefunction, for unregistering a submodule from atorch::nn::Module(26088).
Distributed Improvements
- β±
torch.distributedDetect and handle NCCL errors appropriately instead of blocking peers until timeout inProcessGroupNCCL(25012, 25905) torch.distributedMake scatter/gather arguments optional (25575)torch.distributed.launchAdd a -m flag to allow users to launch python modules (24910).- π²
torch.distributedAdd function to get NCCL version for logging (26583). - β±
torch.distributedAdd timeout parameter to connect function in TCPStore (26554). - β±
torch.distributeduse timeout in connect function to prevent against infinite loop (26364). - π
torch.nn.modules.batchnormAllow SyncBatchNorm to run without DDP in inference mode (24815)
π Performance Improvements
torch.argmax/argminRewrite as TensorIterator reductions (26181).torch.erfinvVectorize unary operator (26629).torch.sin/cos/tanUse intrinsics for trigonometric functions on CPU (26431).- π Fix possible deadlock in SharedCache inside a forked child proc (25158).
torch.qrFix a regression (23591).nn.ConvUse Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).nn.ConvUse parallel_for in DepthwiseConvKernel (26879).nn.ConvChange shape for conv and unary ops (25477).- Fix pin_memory_thread not exiting quickly (23646).
- Increase predefined_minimum_secs to reduce variation (23734).
- β¨ Enhance Tensor indexSelect performance (23055).
- 0οΈβ£ Separate input shapes to reduce default execution time (24136).
- constraints.lower_cholesky Vectorize LowerCholeskyTransform (24131).
- Speed up an integer to the power of a positive integer on CPU (26020).
- [ROCm] Enable jit fusion (22872).
- [ROCm] Use MIOpen for transpose convolutions (26172).
JIT Improvements
- π Enable CPU fused kernel on Windows (25578).
- π¦ Expose an API to iterate all the registered operators (23207).
- Include recursive class compilations in error call stack (23454).
- Substantial improvements to saved model format speed and size.
- Compress debug symbols when serializing TorchScript models. (23659).
- Compress all non-Tensor components of a serialized TorchScript model. (23723).
- Perform string uniquing by value in pickle serialization. (23741).
- Implement a bunch of pickle serialization features that optimize for size. (23759).
- Implement more size-oriented opcodes in the depickler. (26454).
- Cache node operators to speed up optimization (24827).
- π Allow forward hooks in tracing (23613).
- β Add Pickler C++ API (23241).
- Open up AliasAnalysisKind for any ops (23810).
- β Add the ability to compile exports on traced modules (24298).
- π Make
NoneTypea subtype ofOptional[T](25361).
ONNX Exporter Improvements
π In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.
β Adding Support for ONNX IR v4
- Provide an option to exclude the weights from model inputs (#23284)
- 0οΈβ£ Make graph inputs without weights as default (#26146)
β Adding Support for ONNX Opset 11
- π Introduce ONNX Opset 11 support (#23739)
- β Add export for torch.Interpolate in Opset 11 (#24805, #27179)
- β Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 (#24790)
- β Add export for tensor.clamp in Opset 11 (#25797)
- β Add export for torch.topk and torch.sort in Opset 11 (#25739)
Exporting More Torch Operators/Models to ONNX
- Export torch.pixel_shuffle (#23739)
- Export torch.multinomial (#23581)
- Export torch.normβs frobenius_norm (#23536)
- Export torch.std (#22310)
- Export torch.empty and torch.empty_like (#24166)
- Export torch.rsqrt (#24153)
- Export torch.log1p (#25808)
- Export torch.unique (#25050)
- Export torch.gelu (#24475)
- Export tensor.index_fill and tensor.index_copy (#23052)
- Export torch.round (#26126)
- Export torch.baddbmm (#25738)
- Export torch.remainder (#24410)
- Export torch.cumsum (#24476)
- Export tensor.size with negative axis (#26436)
- Export RNN/LSTM with h0/c0 initial state (#22813)
Enhancing ONNX Export Infra
- Enable exporting PyTorch models which have Dict and String as inputs and outputs (#25889)
- Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. (#24378)
- Correctly validate dynamic axes names. (#23974)
- β Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 (#22993)
Other Improvements
- Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result (23063).
torch.det/logdet/slogdetAllowing batching (22909).torch.logical_notAdd new operator (23839).torch.logical_xorAdd new operator (23847).- β‘οΈ
torch.symeigImprove the stability of gradient updates (23018). torch.eyeEnable for bool and half (24148).torch.tril / triuEnable for bool and half (24163).- π
torch.logical_not/xorsupport non-bool tensors. (23916, 23978). - π
torch.index_selectImplement indexing methods for sparse tensors (24937). torch.lu_solveEnable broadcasting of batch dimensions (24333).torch.choleskyEnable batches greater than 262140 (24438).torch.detSimplify generation of singular matrices to avoid numerical issue on PowerPC (25773).torch.erfinvIn the CUDA implementation, use erfinv() for double to preserve accuracy (25337).torch.erfinvAdd a float version of erfinv on CPU (26070).- β‘οΈ
torch.cuda.streamUpdates autograd engine to respect streams set in forward (8354). torch.backends.mkldnn.enabledAllow disabling MKLDNN at runtime (25459).torch.cholesky_solveAdd derivative (26185).torch.cholesky_inverseAdd derivative (26451).torch.polygammaEnsure that n is non-negative(26294).torch.pinverseEnable batching (26095).torch.digamma/trigammaFix type mismatches on CUDA (25791).torch.whereEnable for bool tensor on CUDA (26430).- 0οΈβ£
torch.loaddefault encoding change to 'utf-8' (26421). torch.repeat_interleaveRespect the current stream (26946).torch.bernoulli_Implement for bool tensors (25076).torch.normFix nuclear norm with requires_grad=True (26303).torch.hub.download_url_to_fileMake function public (26723).nn.modules.convadd padding_mode to repr (23996).- π
nn.TransformerExtend to support BERT (gelu) (24181). - π
nn.BatchNorm2dAdd support for non-affine batch norm with float stats and half inputs (22750). nn.ParameterFix type hints (25586).nn.CTCLossImprove error message (26325).nn.ConvAllow batch size of 0 (26214).nn.LSTM/GRUenable double backward for non-cudnn (26660).optim.AdagradAdd epsilon argument (24980).- 0οΈβ£
optim.LBFGSChange default tolerance_grad to 1e-7 (25240). - β±
optim.lr_scheduler.OneCycleLRAdd new 1cycle learning rate scheduler (25324). - β‘οΈ
optimizer.stepFix type annotation (26930). - π
bfloat16Add support for sub, mul, and div on CPU (22851). bfloat16Enabled comparison ops on CPU (24182).bfloat16Enabled masked methods (24183).bfloat16Enabled torch.mm and torch.mv (24224).- π²
bfloat16Enable log_softmax and CrossEntropyLoss (24457). bfloat16Enabled conv methods (26167).bfloat16Enabled dtype on CUDA (26407).- π
quasirandom.SobolEngineUse random seed if not specified (24884). utils.data.dataloaderAdd possible out of shared memory error message (25730).cuda.set_rng_stateAdd type hint (26200).- π Zero sized tensor support for repeat_interleave (23717).
- Recommend
~andbitwise_not()when user tries to apply neg (-) on a bool tensor. (23621). - π Fix double backward of inplace op on view (23502).
autograd.gradValidate shapes of outputs (25349).- Enable libflame as a LAPACK choice (25795).
- π Fix race condition in CUDA initialization (25788).
- β‘οΈ Include
iteration_in SGD optimizer serialization (26906). - [C++]
torch::tensorFix an ambiguous overload issues in constructor (26890). - [XLA] Check device before accessing data_ptr in PackLayer (26056).
- [XLA] Allow overwriting catch-all kernels (25947).
π Bug Fixes
π TensorBoard Bug Fixes
SummaryWriter.add_graph: Fix empty graph output in some cases (25599).- β‘οΈ Update Caffe2 contrib TensorBoard logging to not require TensorFlow (25259).
SummaryWriter.make_video: Fix write_gif call to moviepy for newer lib (21218).
π C++ API Bug fixes
- π Fixes mismatch of device and data type when computing
step_sizein LBFGS optimizer (25909).
JIT
- π Fix list comprehension that change the type of the original iterable (24271).
- π Fix double copying of constants during recursive scripting (24412).
- π Fix frontend error message (23576).
- Clear recursive error stack on each compilation (23458).
- π Fix bugs in assignment to optionals (25059).
- π Make
torch.jit.Attributework whenPYTORCH_ENABLED=0(23851). - π Fix unicode in comments causing compilation errors (24218).
- Correctly raise an error if an
nn.Modulehas not been initialized but you try to script it (24852). - π Fix annotated assignment to variables (25094).
- dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
- π fix closures which always throw. (25278).
- β Add source location to class instantiation error (24990).
- π Fix
AliasAnalysisKind::PUREon MSVC (25375). - Emit script function calls during tracing. (25089).
- Resolve
NamedTupletypes properly in Python (26443). - π Fix schema matching of tuples to vartype lists (25944).
- Correctly preserve ignored function return value type (25262).
- π Fix missing newline in compiled from source range highlight (25802).
- π Fix use-after-free bug in
optional(25965). - π Fix torch.arange traced as constant (25363).
- Preserve module names in recursive script (24505).
- Properly resolve ignored module method type annotations (26683).
- π Make
is_optionalcheck more robust (26312). - π Fix builtin lookup for Python functions (26688).
- Typevar matching fix + implicit conversions from Scalar to int/float (26453).
- π Fix range for non-int inputs and pow implementation (26926).
π Other Bug Fixes
- π
torch.is_pinnedpin_memory should not copy on already pinned tensors (23484). torch.cdistFix incorrect gradients on CUDA non-batch tensors (22915).- π
torch.from_numpyFix failure on windows for int32 (25139). torch.tensorFix memory leak creating a tensor from numpy (24267).torch.indexDon't saveselfinindexbackward (25594).torch.bincountFix int32 overflow on CUDA (25748).torch.bernoulliFix the distribution sampler (26864).torch.powFix precision (25476).torch.cdistFix gradient computation when first arg is 1xn (26254).- β
torch.scatter_add_Fix scatter CPU kernel when (input size, src size) > index size (25839). - π
nn.ConvTranspose2dFixed an error with float16 inputs and weights on CUDA. (23552). nn.CTCLossFix zero-length targets on CUDA (23298).nn.Conv2dCorrect an overflow in an error message (25146).optim.Adamapply a small mathematical fix. (23737).- π·
dataloaderFix IndexError on shutdown if not all workers are started (23761). Tensor.repeatFix crash on for 0 repeats (23766).torch.pin_memoryonly use one thread (25111).distributions.Uniform,HalfCauchy,GammaFixlog_probwhen value is a float (23017).- π Fix typing error for Padding with asymmetric signatures (24895).
- Avoid race condition in
intrusive_ptr.reset_()(24464). torch.hub: Fix SSL cert issue for hub in Python 2 (25042).- π Fix int overflow issue in CUDA kernels. (24818).
Module.cudaFix type hints (25018).- π Fix bug in assertNotEqual for int tensors (25412).
- π Fix 'in' return true incorrectly (24156).
- π Fix bugs in bulk loader when
batch_size=Noneor with namedtuple (26065). - π Fix serialization issue in big endian arch (26383).
- π Fix
Vec256::abs()for floating point when applied on -0.0 (26422). - π Fix cyclic reference in _LRScheduler (25776).
- π Fix a build failure on s390x (26233).
- [XLA] Fix tensor construction from array (24283).
π Documentation Updates
Distributed
torch.distributedError phrasing in torch.distributed helper functions (25574)torch.distributions.negative_binomialclarified ambiguous doc string in NegativeBinomial (25923)
JIT
- β Add technical documentation for the serialization format (23456).
- π Fix trace docs (24191).
- β Add
trace_moduleto docs (24258). - Cleanup distinction around
scriptandtrace(24208). - π Fix
item()call in docs (25404). - β‘οΈ Misc doc updates / fixes (24371, 24445).
π Other documentation improvements
- π
torch.record_streamAdd documentation (24078). torch.foldDescribe the relation between fold and unfold operations (24840).torch.argmaxFix incorrect doc (23775).- π
torch.randomadd docs (23553). - π
torch.empty_stridedAdd docs (23735). torch.bitwise_notDocument for bool tensors (23800).- π
torch.cdistAdd documentation (25221). - β‘οΈ
torch.whereUpdate parameter names in doc (25554). torch.atan2Clarify and correct the doc (26180).- π
nn.functional.bilinearAdded documentation (24951). nn.functional.upsampleFix align_corners doc (23707).- π
nn.TransformerFixed an error in the example (24837). - π
optim.lr_scheduler.CosineAnnealingWarmRestartsAdd documentation (25421). - β‘οΈ
optim.SGDUpdated with subscripts (23985). optim.RMSpropHighlighting in the doc that square root comes before adding epsilon (26735).- β
autograd.detect_anomalyAdd a warning (26615). - π Improve dataloader docs on when auto-batching is disabled (23671).
- β‘οΈ Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
- Document benchmarking practice for CUDA (23910).
- β Add ASAN instructions to CONTRIBUTING.md (24848).