We propose WaveMix -- a novel neural architecture for computer vision that is resource-efficient yet generalizable and scalable. WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks, establishing new benchmarks for segmentation on Cityscapes; and for classification on Places-365, five EMNIST datasets, and iNAT-mini. Remarkably, WaveMix architectures require fewer parameters to achieve these benchmarks compared to the previous state-of-the-art. Moreover, when controlled for the number of parameters, WaveMix requires lesser GPU RAM, which translates to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges, (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. Our code and trained models are publicly available.
翻译:我们提出“WaveMix ”, 这是一种新型的计算机视觉神经结构, 与以往的状态相比, 资源效率高,但可普遍适用且可缩放。 此外, “WaveMix ” 网络的精确度要低于或优于最新水平的“GPU ”神经神经网络、 视觉变压器、 和用于若干任务的象征性混合器,为城市景色的分解制定新的基准; 在Pages-365, 5 EMNIST数据集和 iNAT-mini上进行分类。 值得注意的是, “WaveMix ” 结构需要比以前三种强的图像 -- -- 规模变化、变换换和边缘分散的参数要少。 此外,“WaveMix” 网络在控制参数数量时,需要比最先进的 GPUPU RAM 更低的精确度, 从而节省时间、成本和能源。 为了实现这些成果,我们在WaveMix 区段的多层次上使用了多维度的两维分立的离离离子波波波波流转换(2D-D-DWT),, 也使得整个地图的平流和移动的平流流流层平流平流流平流的平流和平流层平流层平流层平流的平流图层平流的平流和平流。</s>