Bokeh rendering is a popular and effective technique used in photography to create an aesthetically pleasing effect. It is widely used to blur the background and highlight the subject in the foreground, thereby drawing the viewer's attention to the main focus of the image. In traditional digital single-lens reflex cameras (DSLRs), this effect is achieved through the use of a large aperture lens. This allows the camera to capture images with shallow depth-of-field, in which only a small area of the image is in sharp focus, while the rest of the image is blurred. However, the hardware embedded in mobile phones is typically much smaller and more limited than that found in DSLRs. Consequently, mobile phones are not able to capture natural shallow depth-of-field photos, which can be a significant limitation for mobile photography. To address this challenge, in this paper, we propose a novel method for bokeh rendering using the Vision Transformer, a recent and powerful deep learning architecture. Our approach employs an adaptive depth calibration network that acts as a confidence level to compensate for errors in monocular depth estimation. This network is used to supervise the rendering process in conjunction with depth information, allowing for the generation of high-quality bokeh images at high resolutions. Our experiments demonstrate that our proposed method outperforms state-of-the-art methods, achieving about 24.7% improvements on LPIPS and obtaining higher PSNR scores.
翻译:Bokeh 映像是一种在摄影中用来创造美观美景效果的流行而有效的技术。 它被广泛用于模糊背景和突出地表的话题,从而吸引观众对图像主要焦点的注意。 在传统的数字单镜头反射相机(DSLRs)中,这种效果是通过使用大型孔径镜头实现的。 这使相机能够用浅深的视野采集图像, 光深的视野只有很小的图像区域, 而图像的其余部分是模糊的。 然而, 移动电话中嵌入的硬件通常比DSLRs中的硬件要小得多, 并且更加有限。 因此, 移动电话无法捕捉到自然浅浅的现场照片, 这可能是移动摄影的一大限制。 为了应对这一挑战, 我们在本文件中提出了一个新颖的方法, 利用视野变异器, 最近的和强大的深层学习结构。 我们的方法是适应深度校准网络, 以弥补单层深度估测算中的错误。 这个网络用来监督PPPS 进程, 在高深度的深度测试过程中, 展示我们高质量的图像, 以及高质量的生成方法, 允许我们生成的升级的方法。