360$^\circ$ images and videos have become an economic and popular way to provide VR experiences using real-world content. However, the manipulation of the stereo panoramic content remains less explored. In this paper, we focus on the 360$^\circ$ image composition problem, and develop a solution that can take an object from a stereo image pair and insert it at a given 3D position in a target stereo panorama, with well-preserved geometry information. Our method uses recovered 3D point clouds to guide the composited image generation. More specifically, we observe that using only a one-off operation to insert objects into equirectangular images will never produce satisfactory depth perception and generate ghost artifacts when users are watching the result from different view directions. Therefore, we propose a novel view-dependent projection method that segments the object in 3D spherical space with the stereo camera pair facing in that direction. A deep depth densification network is proposed to generate depth guidance for the stereo image generation of each view segment according to the desired position and pose of the inserted object. We finally merge the synthesized view segments and blend the objects into the target stereo 360$^\circ$ scene. A user study demonstrates that our method can provide good depth perception and removes ghost artifacts. The view-dependent solution is a potential paradigm for other content manipulation methods for 360$^\circ$ images and videos.
翻译:360 $ 360 circ$ $ 360 ccirc$ 图像和视频已成为一种经济和流行的方式,用真实世界内容提供 VR 经验。然而,对立体全景内容的操纵仍然没有多少探索。在本文中,我们关注360 $ ⁇ circ$ 图像构成问题,并开发一个解决方案,从立体图像配对对象中取一个立体图像,将其插入一个目标立体立体立体全景,并配有精心维护的几何信息。我们的方法是利用3D点云来指导复合图像生成。更具体地说,我们观察到,只使用一次性操作将对象插入等离子图像,不会产生令人满意的深度感知力,当用户从不同视图方向观看结果时产生幽灵制品。因此,我们提出了一个新的视界化投影方法,在立体成立体摄像对立面的立体立体立体立体成像中,建议深度网络为每个视图部分的立体生成深度指导,以预期位置和配置为对象。我们最后将综合视图部分和将对象混合成色的深度感知深,将用户的图像成平方值成平方图解成平的图像,可以演示制成平方平方法解。