Official TensorBoard Support, Attributes, Dicts, Lists and User-defined types in JIT / TorchScript, Improved Distributed
Note: CUDA 8.0 is no longer supported
Highlights
TensorBoard (currently experimental)
First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.
[JIT] Attributes in ScriptModules
Attributes can be assigned on a ScriptModule by wrapping them with torch.jit.Attribute and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call torch.jit.save(), so they are a great way to store arbitrary state in your model. See the docs for more info.
Example:
class Foo(torch.jit.ScriptModule):
def __init__(self, a_dict):
super(Foo, self).__init__(False)
self.words = torch.jit.Attribute([], List[str])
self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])
@torch.jit.script_method
def forward(self, input: str) -> int:
self.words.append(input)
return self.some_dict[input]
[JIT] Dictionary and List Support in TorchScript
TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for…in constructs.
[JIT] User-defined classes in TorchScript (experimental)
For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.
@torch.jit.script
class Pair:
def __init__(self, first, second)
self.first = first
self.second = second
def sum(self):
return self.first + self.second
DistributedDataParallel new functionality and tutorials
nn.parallel.DistributedDataParallel: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
(19271).
Breaking Changes
Tensor.set_: thedeviceof a Tensor can no longer be changed viaTensor.set_. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in aStorageon a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).- Pay attention to the order change of
lr_scheduler.step(). (7889). torch.unique: changed the default value ofsortedtoTrue. (15379).- [JIT] Rename isTensor api -> isCompleteTensor. #18437
- [JIT] Remove GraphExecutor's python bindings. #19141
- [C++]: many methods on
Typeno longer exist; use the functional or Tensor method equivalent. (17991). - [C++]: the
Backendconstructor ofTensorOptionsno longer exists. (18137). - [C++, Distributed]: Remove c10d
ProcessGroup::getGroupRankhas been removed. (19147).
New Features
Operators
torch.tril_indices,torch.triu_indices: added operator with same behavior as NumPy. (14904, 15203).torch.combinations,torch.cartesian_prod: added newitertools-like operators. (9393).torch.repeat_interleave: new operator similar tonumpy.repeat. (18395).torch.from_file: new operator similar toStorage.from_file, but returning a tensor. (18688).torch.unique_consecutive: new operator with semantics similar tostd::uniquein C++. (19060).torch.tril,torch.triu,torch.trtrs: now support batching. (15257, 18025).torch.gather: add support forsparse_gradoption. (17182).torch.std,torch.max_values,torch.min_values,torch.logsumexpcan now operate over multiple dimensions at once. (14535, 15892, 16475).torch.cdist: added operator equivalent toscipy.spatial.distance.cdist. (16168, 17173).torch.__config__.show(): reports detailed version of all libraries. (18579).
NN
nn.MultiheadedAttention: new module implementing MultiheadedAttention fromAttention Is All You Need. (18334).nn.functional.interpolate: added support forbicubic. (9849).nn.SyncBatchNorm: support synchronous Batch Normalization. (14267).nn.Conv: added support for Circular Padding viamode='circular'. (17240).nn.EmbeddingBag: now supports trainable `per_sample_weights. (18799).nn.EmbeddingBag: add support forfrom_pretrainedmethod, as innn.Embedding. (15273).RNNs: automatically handle unsorted variable-length sequences viaenforce_sorted. (15225).nn.Identity: new module for easier model surgery. (19249).
Tensors / dtypes
torch.bool: added support fortorch.booldtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).
Optim
optim.lr_scheduler.CyclicLR: Support for Cyclical Learning Rate and Momentum. (18001).optim.lr_scheduler.CosineAnnealingWarmRestarts: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).- Support multiple simultaneous LR schedulers. (14010)
Distributions
torch.distributions: now support multiple inheritance. (16772).
Samplers
quasirandom.SobolEngine: new sampler. (10505).
DistributedDataParallel
nn.parallel.DistributedDataParallel: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).
TorchScript and Tracer
- Allow early returns from if-statements. (#154463)
- Add an
@ignoreannotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055) - Simple
for...inloops on lists. (#16726) - Ellipses (
...) in Tensor indexing. (#17763) Nonein Tensor indexing. (#18615)- Support for basic list comprehensions. (#17267)
- Add implicit unwrapping of optionals on
if foo is not None. (#15587) - Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
- Implement
to(),cpu(), andcuda()on ScriptModules. (#15340 , #15904) - Add support for various methods on lists: (
clear(),pop(),reverse(),copy(),extend(),index(),count(),insert(),remove()). - Add support for
sort()on lists of specialized type (Tensors,int,float,bool). (#19572) - Add support for various methods on strings: (
index(),slice(),len()) - Support
Tensor.to()in TorchScript. ( #15976 ) - Support for
Torch.tensor()in TorchScript. (#14913, #19445) - Support for
torch.manual_seed()in TorchScript. (#19510) - Support for
nn.LSTMin TorchScript. (#15744) - Support for
nn.initin TorchScript. (#19640) - Add
hash()builtin. (#18258) - Add
min()andmax()builtins for numerical types. (#15680) - Add
isinstance()builtin, which performs a static type check. (#15076) - Add
train()/eval()/is_training()to C++ ScriptModule API. (#16044) - Allow List arguments to Python functions called from TorchScript. (#15721)
- Allow using
std::vectorandstd::unordered_mapas arguments to custom operators. (#17587) - Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
- Allow generic containers as ScriptModule inputs. (#16482)
- Allow
nn.Sequentialin ModuleList. (#16882)
Experimental Features
- [Quantization] (API unstable): added limited support for quantized datatypes via
torch.qint8dtype,torch.quantize_linearconversion function. (18230). - [MKLDNN tensor] (API unstable): Added limited (opaque) support for
MKLDNNtensors viaTensor.to_mkldnn(); operators are currently limited to ResNext101 operators. (17748).
Improvements
torch.min,torch.max,torch.median,torch.mode,torch.kthvalue,torch.symeig,torch.eig,torch.pstrf,torch.qr,torch.geqrf,torch.solve,torch.slogdet,torch.sort,torch.topk,torch.gels,torch.triangular_solve,torch.svdnow return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).torch.empty(and other factory functions): now take apin_memorykwarg; can now pin without going throughtorch.Storageinterface.. (18455).torch.histc: Now supported on CUDA. (15842)torch.unique: Addreturn_counts. (18391, 18651).torch.logspace: add the ability to specify abase. (19542).torch.set_printoptions: added scientific notation support. (16876).torch.btrifactnow handles tensors with greater than 3 dimensions. (14964).torch.kthvalue: now supported on CUDA. (17544).torch.abs: now supported onuint8andint8dtypes. (16893).torch.stack,torch.cat: now supported for CPU half tensors. (16389).torch.cross: added support for negative dimensions. (17582).torch.lerp: add support forweightas a Tensor. (17348).torch.transpose: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).torch.linspace,torch.logspacecan now be used withsteps=1andstart != end. (14748).torch.cholesky: changed the derivative from a triangular matrix to symmetric matrix. (19116).torch.lerp: Improved numerical stability. (18871).torch.logdet,torch.slogdet: improve numerical precision. (18449).Tensor.__contains__is now supported. (17733).Tensor.fill_andtorch.zerosnow support half on CPU. (17536).Tensor.resize_as_,Tensor.view: now supported on half CPU tensors. (18821).Tensor indexing: allow indexing via NumPy booleans. (14932).nn.EmbeddingBag: enable half precision dense backward. (19293).nn.Embedding: fix dense Embedding to work with double backwards. (9078).nn.MaxPool1d: Allow list and tuples to be passed asoutput_size. (16489).nn.CTCLoss: support zeroing infinite losses viazero_infinityargument. (16199).nn.Dropout: add support for enabling during eval. (17549).nn.MSELoss: add warning about unexpected broadcasting. (18349).nn.Module.load_state_dict: also returnmissing_keysandunexpected_keys. (18668).nn.parallel.data_parallel: Enforce devices matchdevice_ids. (17129).torch.device: handle in more places that used to accept only device ordinals. (14929)dtype.int8tensors can now be converted to NumPy arrays. (14710).nn.functional.gumbel_softmax: allow multidimensional input withdimargument. (13339).nn.functional.cosine_similarity: improved precision. (18250).torch.autograd: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).torch.autograd.profiler: add Self (non-nested) CPU Time Total, CPU time total (19378).DataLoader: support accepting a custom memory pinning function. (16743).DataLoader: retry libshm on EINTR. (15964).DataLoader: fixed an issue withpin_memoryandPackedSequence. (18079)data.utils.collate,data.utils.pin_memory: now preserve namedtuples. (16440)- Use
IndexErrorinstead ofRuntimeErroron many indexing error cases. (17049, 17114). - Support indexing a
torch.float16tensor on CPU. (17645). - Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
utils.checkpoint.checkpoint: supportNoneas an argument to checkpoint function. (17969).torch.autograd: added more information forone of the variables needed for gradient computation has been modified by an inplace operationexception. (18523).cuda.synchronize: add a device argument. (19573).cuda.reset_max_memory_*: now supported. (15985).distributions.Independent: can now calculate KL Divergence. (17681).torch.distributed.new_group: now supports overriding default backend. (18595).torch.distributed.init_process_group: will now propagate timeout to underlying Store. (16571).- [JIT] Preserve module hierarchy on traced modules. (#15101)
- [JIT] Add metadata for TracedModules. (#17311)
- [JIT] Improve portability of int and float checks. (#19532)
- [JIT] Preserve method parameter names during serialization. (#16750)
- [JIT] Add a correctness check for C++ types to custom operators. (#15247)
- [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
- [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
- [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
- [JIT Error Messages] Print suggestion to add
nn.Moduleattributes to__constants__when they are using in TorchScript. (#18164) - [JIT Error Messages]
torch.save(): Improve error message when you try to save a ScriptModule. (#15321) - [JIT Error Messages]
torch.jit.save(): Improve error message when trying to save a model with Python code. (#16850) - [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
- [JIT Error Messages] Better error when trying to add a Tensor to
__constants__. (#16724) - [JIT Error Messages] Better error when a module list isn't added to
__constants__. (#17167) - [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
- [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
- [C++]
nn::Module: added Python interop. (13481). - [C++]
autograd::profiler: is now supported. (16580) - [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
- [C++]
torch.argsortis now supported in C++. (17099). - [C++]
Tensor.isnan: now supported in C++. (15722). - [C++]: Added named submodule support to
nn::Sequential. (17552). - [C++]: Kaiming Initialization. (14718).
- [C++]
torch::data::transforms::Normalize: now supported in C++. (15891). - [C++]: Support call operator on module holder calling forward. (15831).
Random and Sequential distributed samplers. (16910). - [C++]: pretty printing of C++ Modules. (15326).
- [C++] Support serializing
std::vector<torch::Tensor>. (19677).
Bug Fixes
Serious
torch.prod: correct erroneous calculation on large tensors. (15653).torch.mean(and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).nn.Conv: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).Tensor.eq_: Fix erroneous calculation. (15475).torch.mean: Fix fp16 output calculation. (14878).nn.PoissonNLLLoss: Properly handlereduction=None. (17358).- [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
- [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).
Other
Tensor.roundis now consistently half to even. (17443).Tensor.resize_: Fix some 0-element cases. (14874).Tensor.numpy: Fix conversion oftorch.int8dtype. (15194).Tensor.grad: correctly handledel. (16525).Tensor.clamp: correctly handle NaN on CUDA. (15479).Tensor.topk: properly set launch bounds on CUDA. (17296).Tensor.kthvalue: treat NaN as bigger than any number. (17824).Tensor.copy_: Properly synchronize on src and dst sreams. (16966).Tensor indexing: Fix incorrect dimension error message. (16495).Tensor.coalesce,Tensor.clone,Tensor.to_dense: fixed for sparse 0-dimensional tensors. (17379).torch.isinf: Don't error out on integral tensors. (15489).torch.argsort,torch.sort: Match NumPy by considering NaNs to be larger than any number. (15886).torch.geqrf,torch.ormqr: when anoutparameter is specified, dispatch to the correct function. (16964).torch.cuda.get_device_name/torch.cuda.get_device_capability: Fix handling of optional. (17222).Tensor.tril_/Tensor.triu_: properly reuse input memory. (17031).torch.arange: fix shape inconsistency between CPU and CUDA. (18462).torch.empty(and other size-based factory functions): properly enforce non-negative sizes. (17077).torch.load: support serializing / deserializingpathlib.Pathobject. (18562).nn.BatchNorm: correctly handle very large batches. (17047).nn.Softmax/nn.LogSoftmax: fix double backward fortorch.half. (17330).nn.Softmax: handle empty inputs in backward. (17259).nn.NLLLoss: Fix crash whenignore_indexis out-of-bounds on CPU. (17328).nn.Softmax,nn.LogSoftmax: handle 0-element inputs. (17651).nn.CTCLoss: correct error checking. (16269).nn.Conv: better report convolution size mismatch. (17436).torch.nn.functional.cosine_similarity: fix output sometimes returning result > 1.0. (18168).nn.parallel.data_parallel: Fix handling of buffers that require_grad. (13352).nn.parallel.data_parallel: would previously sometimes frees tensors before all pending operations finish. (18465).torch.distributed.broadcast: fixed repeated calls leading to OOM. (19219).torch.multiprocessing: fix serialization of integernn.Parameters. (18639).torch.multiprocessing: Fix handling ofdistributionson CUDA. (16854).torch.nonzero: Fix for 0-dimensional tensors on CUDA. (17406).torch.slogdet: Fixsignrequiring grad wheninputrequired grad. (16337).torch.cuda.Stream: Properly restore stream on destination device when switching devices. (17439).torch.cuda.Stream: Fixed synchronization issue when used with non-current device. (15689).torch.cuda.Stream: properly change device in stream context manager. (16128).DataLoader: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).DataLoader:_utils.collate.default_collatenow converts bool lists to byte Tensors, not integer tensors.
(14669).DataLoader: ensure dataset is indexed by integers. (17649).torch.sparse.mm: Handle transposed dense tensors in backwards. (18737).torch.sparse.sum: Fix parsing ofdim. (16517).torch.sparse.mm/torch.sparse.addmm: fix broadcasting and using uninitialized data. (16572).Tensor.to_sparse: Fix for 0-dimensional tensors. (17406).SparseTensor: fix add with non-contiguousvaluestensors. (18179).- Fix
compare_exchange_weakinweak_intrusive_ptr. (16302). utils.model_zoo.load_url: Fix race condition. (16578).utils.data.RandomSampler: havelenproperly take into accountnum_samples. (15991).torch.distributions: Fix precision issue with expansion that prefersprobsoverlogits. (18614).distributions.dirichlet.Dirichlet: fixed an underflow issue. (17488).distributions.binomial.Binomial.log_prob: fixed numerical stability issue. (15962).Caching Allocator: Free all blocks with outstanding events on OOM-retry. (19222).torch.dtype: fix pickling issue with Python 2. (18045).utils.data.DataLoader: Fix SIGCHLD checking. (19421).optim.Optimizer: Properly copy defaults. (19308).optim.lr_scheduler.CosineAnnealingLR: Fix division-by-zero error. (19180).optim.lr_scheduler.ReduceLROnPlateau: fix bug when the argument tostepis reused outside the function.
(16697).cudNN: fix race condition with multiple threads calling into the same device. (15080).cudNN: Properly specify accumulation types. (16825).cuDNN: Fix incorrectly selecting slower algorithms in certain cases. (15881).cuFFT: Properly handle CUDA contexts. (19300)- Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
- Fix tensor printing bug with Python 2. (12732).
MKLDNN: fix thread safety. (17022).- [JIT]
floordiv: Fix integer division and divide-by-zero semantics. (#15813). - [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
- [JIT]
ord(): Fix handling of utf8 chars. (#19423). - [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
- [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
- [JIT] Fix infinite loop in
requires_gradanalysis pass. (#18361). - [JIT] Fix ordering of parameters for in
rnn.py. (#18198). - [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
- [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
- [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
- [JIT] Fix bug where
_unique_state_dictcould contain duplicate Tensors. (#18139). - [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
- [C++]: Add
StreamandEventAPIs. (15937). - [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
- [C++]: Properly pass
extra_cuda_cflagsto C++ extensions on Windows. (18638). - [C++] Make SGD semantics match python. (15840).
- [C++]
torch::nn::init::orthogonal_: match Python API. (18915).
Deprecations
torch.btrifact: the deprecatedinfoargument has been removed. (14935).torch.potrshas been deprecated, usetorch.cholesky_solveinstead. Note thatupperdefaults toFalsefortorch.cholesky_solve, andTruefortorch.potrs. (15334).torch.pstrfis deprecated; usetorch.choleskyinstead. Note thatupperdefaults toFalsefortorch.cholesky, andTruefortorch.pstrf. (17866).torch.potriis deprecated; usetorch.cholesky_inverseinstead. Note thatupperdefaults toFalsefortorch.cholesky_inverse, andTruefortorch.potri. (19498).torch.btrifact_with_infohas been deprecated; usetorch.luwithget_infos=Trueinstead.(18435).torch.btrifacthas been deprecated; use the new nametorch.luinstead. (18435).torch.gesvis deprecated; use the new name `torch.solve instead. (18060).torch.trtrshas been deprecated; use the new nametorch.triangular_solveinstead. (18213).torch. btriunpackhas been deprecated; use the new nametorch.lu_unpackinstead. (18529).torch.btrisolvehas been deprecated; use the new nametorch.lu_solveinstead. (18726).- [C++]
IntListhas been deprecated, useIntArrayRefinstead, as it better describes the type and ownership semantics in C++. (16751). - [C++] Dispatch macros with
Typeparameters, e.g.AT_DISPATCH_ALL_TYPES(tensor.type(), ..., are now deprecated; useScalarTypeinstead, e.g.AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), .... (17527, 17996). - [C++] the deprecated
variable_tensor_functionshave been removed. (15003).
Performance
Highlights
nn.BatchNormCPU inference speed increased up to ~19x.(19152).nn.AdaptiveAvgPool: speed up common-case of size=1 output by ~30x. (17011).nn.EmbeddingBagCPU performance increased by ~4x. (19329).Tensor.copy_: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).torch.nonzero: is now ~2x faster than numpy on CPU. (15190)- Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
reduction functions: Speed up some large Tensor cases by 50-80%. (17428).- [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
- [JIT] Graph fuser:
batch_normfusion for inference. (#15146) - [JIT] Graph fuser:
layer_normfusion for inference. (#18266)
Other
torch.abs,torch.frac,torch.repiprocal,torch.neghave been vectorized and parallelized (19041).torch.bmm: CPU performance increased by 2x. (19338).torch.sort: CUDA performance increased by ~2x. (19379).torch.caton CPU is now ~4x faster in the case where inputs are contiguous anddim!= 0. (17032).torch.multinomialfixed a 2x performance regression. (17121).torch.empty(and another factory functions): reduce overhead by 20-40%. (17565).torch.linspacehas been parallelized on CPU. (15320).torch.logspacehas been parallelized on CPU. (15438).torch.rangehas been parallelized on CPU. (15484).torch.arangehas been parallelized on CPU. (15667).torch.load: avoid unnecessary CPU-to-CUDA copy. (17297).reduction functions: improve efficiency on CUDA. (16224, 17040).- Speed up some GEMM cases on CPU by up to 7x.(17730)
- Tensor iterator loop unrolling. (17667).
sparse/dense matrix multiply: improve speed by ~5x. (16905).distributions.MultivariateNormal: sped up. (17294).- [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
- [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
- [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
- [JIT] Shape analysis: various correctness improvements. (#18271)
- [JIT] Shape analysis:
aten::_convolutionnow participates in shape analysis. (#16837] - [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
- [JIT] Autodiff: support for scalar comparison ops and
randlike. (#14740) - [JIT] Autodiff: support for
adaptive_avg_pool2d. (#15459) - [JIT] Autodiff: support for
erfanderfc. (#15139) - [JIT] Autodiff: support for
layernorm. (#17702) - [JIT] Autodiff: support for
tanh. (#17816) - [JIT] Autodiff: support for
matmul/dropout. (#17523) - [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
- [JIT] Constant folding: improved inlining of control flow. (#16244)
Documentation
Tensor.scatter_: add documentation aboutvalueparameter. (17467).Tensor.unfold: correctly documentdimensionparameter, notdim. (19020).Tensor.is_floating_point()is now documented. (15704).torch.cholesky: Fix brokenupperexample in documentation. (15215).torch.gesv: documentoutparameter. (15649).torch.mul: better explain elementwise multiplication. (15664).torch.eig,torch.symeig: better explain backwards limitations. (15929).torch.ormqr: fixed output specification. (15694).torch.from_numpy: replaced usage withtorch.as_tensorin documentation. (16587).torch.mvlgamma: Fix the constant in the docs. (17045).torch.mode: more precisely describe what is returned. (17069).torch.upsample: documentation now matchestorch.interpolate. (17134)torch.arange: correctdtypedocumentation. (18604)torch.cumprod: documentoutparameter. (19340).torch.nonzero: document indices being returned lexicographically. (19539).torch.nn.functional.interpolate: better explainaligned_cornersparameter. (14806).torch.nn.functional.pad: documentation has been made consistent with other functional ops. (15984).nn.functional.grid_sample: clarify behavior of padding. (19754).nn.TripletMarginLoss: correct type ofswapparameter. (18115).nn.CrossEntropyLoss: clarifyignore_indexdocumentation. (18117).nn.CrossEntropyLoss: the input format is more clearly explained. (15990).nn.CTCLoss: Clarify a number of ambiguities. (18415).nn.BCEWithLogitsLoss: add better explanation. (19212).nn.BCEWithLogitsLoss: better explain positive samples. (17258).nn.ModuleList/nn.ParameterList: update documentation. (17731).nn.Module.load_state_dict: correct semantics ofstrict. (17618)nn.parallel.DataParallel: more accurately specify how different argument types are handled. (15993).nn.parallel.DistributedDataParallel: Clarified batch size requirements. (16010).torch.distributed: Document mixed-precision training. (15440).torch.multiprocessing: Include example multiprocessing code. (16345).torch.autograd: Better explain computing Jacobian-vector product. (15197).torch.cuda.get_rng_state,torch.cuda.set_rng_state: document taking adeviceobject. (14324).torch.device: Fix example of passingdeviceto tensor factory. (16839).DataLoader: update documentation to describe how workers are managed. (18091).- Unified shape formats throughout the documentation. (15741).
- Update documentation for
reductionarguments to use non-deprecated format. (17300). mark_non_differentiable: document correct semantics. (17891).- Warn about memory overlaps on inplace operations. (17576).
- Fix a number of small issues with conv and pooling docstrings. (17052).
- Fix a number of small issues with padding and activation docstrings. (17197).
- [C++]: mention packed accessors in Tensor basics. (19464).
ONNX
Exporting More Torch Operators to ONNX
- Export torch.isnan to ONNX (17698).
- Export torch.flatten to ONNX (16240).
- Export torch.where, torch.ceil, torch.floor to ONNX (18571).
- Export torch.narrow to ONNX (17550).
- Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
- Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
- Export torch.nonzero to ONNX (17036, 18047).
- Export torch.erf to ONNX (16106).
- Export torch.split (15092).
- Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX (15677).
- Export torch.expand and torch.ne to ONNX (15050).
- Export torch.nn.LogSigmoid to ONNX (14830).
- Export torch.nn.RReLU to ONNX (14781).
- Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
- Replace use of ConstantLike with with ConstantOfShape (16095, 16214).
Extending Existing Exporting Logic
- Enable dim support in torch.nn.Softmax's export (18482).
- Support exporting squeeze & unsqueeze with negative dim attribute (19297).
- Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
- Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
- Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).
Optimizing Exported ONNX Graph
- Add constant folding in ONNX exporter (18698).
- Retain the parameter names in ONNX exporter (17551).
- Omit slice op if it is a non-op (19155).
- Add a flag to strip doc_string from exported ONNX models (18882).
- Omit torch.dropout if the model is in eval mode (16547).
Adding Utility Functions and Refactoring
- Remove unused arg f from _model_to_graph(). (19647).
- Add the support for stable ONNX opsets in exporter (16068, 17419).
- Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
- Add an utility function to check whether it's in the middle of ONNX export or not (19050).
- Refactoring serialization of ONNX initializers to be name-based (17830).
- Expose dim() on type and use it in ONNX symbolics (15933).
- Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
- Add an assertion to check the number of the parameters passed to ONNX exporter (18145).
Bugfixes
- Fix different types in rsub caused bug (15707).
- Fix list structure supports in ONNX exporter (19102).
- Fix case for
activationsattribute in nn.RNN ONNX export. (19368). - Minor fix for onnx ConstantOfShape export (18199).
- Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
- Fixing ONNX export of logical ops to have correct output datatype (15185).
- Fix typo in docstring (18216).
