Feed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their models, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state $Z$ that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state $Z$. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K. The video results, code and trained models are available on our project page: https://lhmd.top/zpressor.
翻译:前馈式3D高斯溅射(3DGS)模型近年来作为一种有前景的新视角合成解决方案出现,能够实现单次推理而无需针对每个场景进行3DGS优化。然而,其可扩展性从根本上受到模型有限容量的制约,导致随着输入视图数量的增加,性能下降或内存消耗激增。本研究从信息瓶颈原理的视角分析前馈式3DGS框架,并提出ZPressor——一种轻量级且与架构无关的模块,能够将多视图输入高效压缩为紧凑的潜在状态$Z$,该状态在保留关键场景信息的同时剔除冗余。具体而言,ZPressor通过将视图划分为锚点集与支撑集,并利用交叉注意力将支撑视图的信息压缩至锚点视图中,形成压缩潜在状态$Z$,从而使现有前馈式3DGS模型能够在80GB GPU上扩展至处理超过100个480P分辨率的输入视图。我们证明,在DL3DV-10K和RealEstate10K两个大规模基准测试中,将ZPressor集成到多个先进的前馈式3DGS模型中,能在中等输入视图下持续提升性能,并在密集视图设置下增强鲁棒性。视频结果、代码及训练模型已发布于项目页面:https://lhmd.top/zpressor。