The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.
翻译:场景表示变换器(SRT)是最近一种以交互速率渲染新视图的方法。由于SRT使用相对于任意选择的参考相机的相机姿态,因此它不具有输入视图顺序不变的属性。结果,SRT在需要经常更改引用框架的大型场景中无法直接应用。在这项工作中,我们提出了相对姿态注意力SRT(RePAST):我们将两两相机姿态信息直接注入转换器的注意力机制中,而不是在输入时固定一个参考帧。这导致该模型从定义上不受任何全局参考帧选择的影响,同时仍保留原始方法的全部能力。实证结果表明,向模型添加这种不变性不会导致质量损失。我们认为这是将完全潜在的基于变压器的渲染方法应用于大型场景的一步。