Prevailing 3D texture generation methods, which often rely on multi-view fusion, are frequently hindered by inter-view inconsistencies and incomplete coverage of complex surfaces, limiting the fidelity and completeness of the generated content. To overcome these challenges, we introduce TEXTRIX, a native 3D attribute generation framework for high-fidelity texture synthesis and downstream applications such as precise 3D part segmentation. Our approach constructs a latent 3D attribute grid and leverages a Diffusion Transformer equipped with sparse attention, enabling direct coloring of 3D models in volumetric space and fundamentally avoiding the limitations of multi-view fusion. Built upon this native representation, the framework naturally extends to high-precision 3D segmentation by training the same architecture to predict semantic attributes on the grid. Extensive experiments demonstrate state-of-the-art performance on both tasks, producing seamless, high-fidelity textures and accurate 3D part segmentation with precise boundaries.
翻译:当前主流的3D纹理生成方法通常依赖于多视图融合,但常受限于视图间的不一致性和对复杂表面覆盖的不完整性,从而限制了生成内容的保真度与完整性。为克服这些挑战,我们提出了TEXTRIX,一个用于高保真纹理合成及下游应用(如精确3D部件分割)的原生3D属性生成框架。该方法构建了一个潜在3D属性网格,并利用配备稀疏注意力的扩散Transformer,实现了在体素空间内直接为3D模型着色,从根本上避免了多视图融合的局限性。基于此原生表示,该框架通过训练相同架构以预测网格上的语义属性,自然地扩展到高精度3D分割任务。大量实验表明,该方法在两项任务上均达到了最先进的性能,生成了无缝、高保真的纹理,并实现了具有精确边界的准确3D部件分割。