We propose a differentiable rendering algorithm for efficient novel view synthesis. By departing from volume-based representations in favor of a learned point representation, we improve on existing methods more than an order of magnitude in memory and runtime, both in training and inference. The method begins with a uniformly-sampled random point cloud and learns per-point position and view-dependent appearance, using a differentiable splat-based renderer to evolve the model to match a set of input images. Our method is up to 300x faster than NeRF in both training and inference, with only a marginal sacrifice in quality, while using less than 10~MB of memory for a static scene. For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate, while maintaining high image quality and temporal coherence even without imposing any temporal-coherency regularizers.
翻译:我们为高效的新观点合成建议了一种不同的翻版算法。 通过从基于量的表达方式转向学习点代表,我们在培训和推论方面改进了现有方法,其范围超过了记忆和运行时间的量级。 这种方法从统一抽样随机点云开始,学习每个点的方位和视貌,使用一个不同的基于样板的投递器来发展模型,以匹配一组输入图像。 我们的方法在培训和推论两方面都比NERF快300倍,质量上只有微不足道的牺牲,而静态场景则使用不到10~MMB的内存。 对于动态场景,我们的方法培训了比STNERF更快的两种量级,并且以接近互动的速度使图像质量和时间的一致性保持高水平,即使不强加任何时间一致性规范器。