Incorporating geometric transformations that reflect the relative position changes between an observer and an object into computer vision and deep learning models has attracted much attention in recent years. However, the existing proposals mainly focus on the affine transformation that is insufficient to reflect such geometric position changes. Furthermore, current solutions often apply a neural network module to learn a single transformation matrix, which not only ignores the importance of multi-view analysis but also includes extra training parameters from the module apart from the transformation matrix parameters that increase the model complexity. In this paper, a perspective transformation layer is proposed in the context of deep learning. The proposed layer can learn homography, therefore reflecting the geometric positions between observers and objects. In addition, by directly training its transformation matrices, a single proposed layer can learn an adjustable number of multiple viewpoints without considering module parameters. The experiments and evaluations confirm the superiority of the proposed layer.
翻译:将反映观察者与对象相对位置变化的几何变换纳入计算机视野和深层学习模型近年来引起了许多注意。但是,现有建议主要侧重于无法反映这种几何位置变化的线形变异。此外,目前的解决办法往往使用神经网络模块学习单一变异矩阵,这不仅忽视了多视角分析的重要性,而且还包括模块外的额外培训参数,而变异矩阵参数又增加了模型的复杂性。在本文件中,提出了深层学习的视角变异层。拟议的层可以学习同质学,从而反映观察者与对象之间的几何位置。此外,通过直接培训其变异矩阵,一个单一的拟议层可以在不考虑模块参数的情况下学习可调整的多个观点。实验和评价证实了拟议层的优越性。