Thanks to the combination of state-of-the-art accelerators and highly optimized open software frameworks, there has been tremendous progress in the performance of deep neural networks. While these developments have been responsible for many breakthroughs, progress towards solving large-scale problems, such as video encoding and semantic segmentation in 3D, is hampered because access to on-premise memory is often limited. Instead of relying on (optimal) checkpointing or invertibility of the network layers -- to recover the activations during backpropagation -- we propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique. Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint. Even though the randomized trace estimation introduces stochasticity during training, we argue that this is of little consequence as long as the induced errors are of the same order as errors in the gradient due to the use of stochastic gradient descent. We discuss the performance of networks trained with stochastic backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.
翻译:由于先进的加速器和高度优化的开放软件框架的结合,在深神经网络的运行方面取得了巨大进展。虽然这些发展是许多突破的原因,但解决大规模问题的进展,如3D中的视频编码和语义分解,却受到阻碍,因为获取预感记忆的机会往往有限。我们提议在后向回向回向中恢复电动 -- -- 我们提议以多通道随机跟踪估计技术来接近神经网络中螺旋层的梯度。与其他方法相比,这一方法很简单,易于分析,并导致记忆足迹大为减少。即使随机跟踪估计在培训期间引入了偏差,但这并不是什么结果,只要诱发的误差与使用随机梯度梯度梯度梯度的误差相同。我们讨论经过多通道随机测量的轨迹测技术的网络的性能,以及如何在最大程度使用记忆力和高层计算时控制误差。