This paper focuses on Winograd transformation in 3D convolutional neural networks (CNNs) that are more over-parameterized compared with the 2D version. The over-increasing Winograd parameters not only exacerbate training complexity but also barricade the practical speedups due simply to the volume of element-wise products in the Winograd domain. We attempt to reduce trainable parameters by introducing a low-rank Winograd transformation, a novel training paradigm that decouples the original large tensor into two less storage-required trainable tensors, leading to a significant complexity reduction. Built upon our low-rank Winograd transformation, we take one step ahead by proposing a low-rank oriented sparse granularity that measures column-wise parameter importance. By simply involving the non-zero columns in the element-wise product, our sparse granularity is empowered with the ability to produce a very regular sparse pattern to acquire effectual Winograd speedups. To better understand the efficacy of our method, we perform extensive experiments on 3D CNNs. Results manifest that our low-rank Winograd transformation well outperforms the vanilla Winograd transformation. We also show that our proposed low-rank oriented sparse granularity permits practical Winograd acceleration compared with the vanilla counterpart.
翻译:本文侧重于3D进化神经网络中的 Winograd 转换, 与 2D 版本相比, 3D 进化神经网络( CNNs ) 的变异。 过度增长的 Winograd 参数不仅会加剧培训的复杂性,而且还会阻碍实际的加速, 原因仅仅是Winograd 域中元素性产品的数量很大。 我们试图通过引入一个低级的 Winograd 变异, 降低可训练参数的可训练性。 这是一个新型的培训模式, 将原来的大阀变异变成两个储存较少的可训练的变异器, 从而导致大幅降低复杂性。 在我们的低级Winograd变异器中, 我们先迈出一步, 提出一个低级的、 低级的、 分散的微粒度变异质, 以测量柱性参数的重要性。 我们的稀薄微粒化能通过生成一种非常常规的稀薄模式来获取效果的 Winograd 加速度。 为了更好地了解我们的方法的功效, 我们在 3D CNN 进行广泛的实验。 结果表明, 我们的低级的Wingradegradegrade 转换优的变异性优优优优优于 香压低的变压压低的谷变异变。