We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This "Capture, Canonicalize, Splat" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.
翻译:本文提出了一种新颖的零样本流程,能够从少量非结构化手机图像中创建超逼真且保持身份特征的三维化身。现有方法面临多重挑战:单视图方法存在几何不一致与幻觉问题,导致身份特征保持能力下降;而基于合成数据训练的模型则难以捕捉皮肤皱纹与精细毛发等高频细节,限制了真实感。本方法引入两项关键贡献:(1) 生成式规范化模块,将多张非结构化视图处理为标准化的统一表示;(2) 基于Transformer的模型,该模型通过新构建的大规模高保真高斯溅射化身数据集训练,该数据集源自真实人物的穹顶捕捉数据。这套"捕捉、规范化、溅射"流程能够从非结构化照片中生成具有惊人真实感与鲁棒身份保持能力的静态半身化身。