The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods.
 翻译:手稿描述了快速和可缩放的结构以及计算进化和交叉关系的相关算法。 基本的想法是映射 2D 进化和交叉关系与变换域中 1D 进化和交叉关系集集的2D进化和交叉关系。 这可以通过对普通内核使用分解周期拉松变换(DPRT) 和对低层内核使用 SVD- LU 分解法来实现。 这种方法使用可安装到现代 FPGA 和 Zynq- SOC 设备的可变结构。 基于不同种类的可用资源, $\ time P$ 区块、 2D 进化和交叉关系可以仅以$O(P) 时钟周期计算到 $O(P2) 时钟周期。 因此, 性能和所需资源的数量和类型之间存在权衡。 我们使用现代可编程设备( Virt-7ex 和 Zynq- SOC 设备) 提供拟议架构的可扩展结构的实施( Virt-7 和 ZynC 方法显示我们所需要的当前方法数量。