Text2Light:零热文本驱动的《人类发展报告全貌》 (Text2Light: Zero-Shot Text-Driven HDR Panorama Generation)

from arxiv, SIGGRAPH Asia 2022; Project Page https://frozenburning.github.io/projects/text2light/ Codes are available at https://github.com/FrozenBurning/Text2Light

High-quality HDRIs(High Dynamic Range Images), typically HDR panoramas, are one of the most popular ways to create photorealistic lighting and 360-degree reflections of 3D scenes in graphics. Given the difficulty of capturing HDRIs, a versatile and controllable generative model is highly desired, where layman users can intuitively control the generation process. However, existing state-of-the-art methods still struggle to synthesize high-quality panoramas for complex scenes. In this work, we propose a zero-shot text-driven framework, Text2Light, to generate 4K+ resolution HDRIs without paired training data. Given a free-form text as the description of the scene, we synthesize the corresponding HDRI with two dedicated steps: 1) text-driven panorama generation in low dynamic range(LDR) and low resolution, and 2) super-resolution inverse tone mapping to scale up the LDR panorama both in resolution and dynamic range. Specifically, to achieve zero-shot text-driven panorama generation, we first build dual codebooks as the discrete representation for diverse environmental textures. Then, driven by the pre-trained CLIP model, a text-conditioned global sampler learns to sample holistic semantics from the global codebook according to the input text. Furthermore, a structure-aware local sampler learns to synthesize LDR panoramas patch-by-patch, guided by holistic semantics. To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere. This continuous representation enables a versatile module to upscale the resolution and dynamic range simultaneously. Extensive experiments demonstrate the superior capability of Text2Light in generating high-quality HDR panoramas. In addition, we show the feasibility of our work in realistic rendering and immersive VR.

翻译：高品质的《人类发展报告》(高动态距离图像)通常是《人类发展报告》的全局性,它是在图形中创建3D场景的摄影现实亮度和360度反射的最受欢迎的方法之一。鉴于很难捕捉到《人类发展报告》,我们非常希望有一个多功能和可控的基因模型。在这种模型中,外行人能够直观控制生成过程。然而,现有最先进的整体性方法仍然在努力为复杂场景合成高质量的全局性。在这项工作中,我们提议一个零发的文本驱动框架,即Text2Light,在没有配对培训数据的情况下,生成4K+分辨率《人类发展报告》。鉴于用于描述场景的免费格式文本,我们以两个专门步骤对相应的《人类发展报告》进行综合:1) 以低动态范围(LDR)和低分辨率的全局性版制版制成,2) 超分辨率的音调图,以便在分辨率和动态范围中扩大LDR(L)的全局性。具体地,我们首先从零发的文本驱动的版本驱动的全局性版制成,我们建立双级的代码,作为不同的环境上级的高级文本演示的演示,然后由Sloder RDRDRDR制成开始,从一个不断的文本制成,从全球的文本制成,从开始,到不断制成。