This paper compares the performance of a NN taking the output of a DCT (Discrete Cosine Transform) of an image patch with leNet for classifying MNIST hand written digits. The basis functions underlying the DCT bear a passing resemblance to some of the learned basis function of the Visual Transformer but are an order of magnitude faster to apply.
翻译:本文比较了 NN 使用 DCT (Decrete Cosine 变换) 图像补丁与 leNet 的图像补丁对 MNIST 手写数字进行分类的功能。 DCT 的基础函数与视觉变换器的一些学习基础函数相近,但应用速度更快。