Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate ~43.7x speedup on 512-D feature maps. Code will be made publicly available.


翻译:近期计算机视觉领域的进展通过利用3D高斯泼溅(3D-GS)成功将开放词汇分割(OVS)扩展至三维领域。尽管取得这一进展,如何高效渲染开放词汇查询所需的高维特征仍面临重大挑战。现有方法采用码本或特征压缩技术,导致信息损失,从而降低分割质量。为突破此限制,我们提出了分位数渲染(Q-Render)——一种面向3D高斯泼溅的新型渲染策略,能在保持高保真度的同时高效处理高维特征。与传统体渲染需对每条射线相交的所有3D高斯分布进行密集采样不同,Q-Render仅对沿射线方向具有主导影响的高斯分布进行稀疏采样。通过将Q-Render集成至可泛化的三维神经网络,我们同时提出了高斯泼溅网络(GS-Net),该网络能以可泛化方式预测高斯特征。在ScanNet和LeRF数据集上的大量实验表明,我们的框架在512维特征图上实现约43.7倍加速的实时渲染的同时,性能优于现有最优方法。代码将公开发布。

0
下载
关闭预览

相关内容

3D是英文“Three Dimensions”的简称,中文是指三维、三个维度、三个坐标,即有长、有宽、有高,换句话说,就是立体的,是相对于只有长和宽的平面(2D)而言。
Top
微信扫码咨询专知VIP会员