U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.
翻译:U-Net结构在最先进的深层学习中是无处不在的,然而,它们的常规化特性和与波子的关系却未得到充分研究。在本文中,我们制定了一个多分辨率框架,将U-Net确定为无限功能空间模型的有限维度截断。我们提供了理论结果,证明平均集合与平方可容功能空间内的预测相对应,并表明平均集合的U-Net隐含地学习了数据的光电波基表示法。然后,我们利用我们的框架来确定最先进的VAE(HVAE)结构,这种结构具有U-Net结构,作为从点质量流动的多分辨率扩散进程的两步分解,引入了抽样不稳定性。我们还表明,HVAE学会了时间的表述,通过权重共享提高了参数效率。我们利用这一观察,用现有模型的一半参数来达到最新水平的HVAE(HVAE)性能。