Heterogeneity exists in most camera images. This heterogeneity manifests itself across the image space as varied Moire ringing, motion-blur, color-bleaching or lens based projection distortions. Moreover, combinations of these image artifacts can be present in small or large pixel neighborhoods, within an acquired image. Current camera image processing pipelines, including deep trained versions, tend to rectify the issue applying a single filter that is homogeneously applied to the entire image. This is also particularly true when an encoder-decoder type deep architecture is trained for the task. In this paper, we present a structured deep learning model that solves the heterogeneous image artifact filtering problem. We call our deep trained model the Patch Subspace Variational Autoencoder (PS-VAE) for Camera ISP. PS-VAE does not necessarily assume uniform image distortion levels nor similar artifact types within the image. Rather, our model attempts to learn to cluster different patches extracted from images into artifact type and distortion levels, within multiple latent subspaces (e.g. Moire ringing artifacts are often a higher dimensional latent distortion than a Gaussian motion blur artifact). Each image's patches are encoded into soft-clusters in their appropriate latent sub-space, using a prior mixture model. The decoders of the PS-VAE are also trained in an unsupervised manner for each of the image patches in each soft-cluster. Our experimental results demonstrates the flexibility and performance that one can achieve through improved heterogeneous filtering. We compare our results to a conventional one-encoder-one-decoder architecture.
翻译:多数相机图像中存在异质性。 这种异质性在图像空间中表现为多变的图像响铃、 运动球、 彩色分层或镜头反射。 此外, 这些图像文物的组合可以在获得的图像中存在于小型或大型像素邻居中。 目前的相机图像处理管道, 包括深层培训版本, 倾向于纠正对整张图像应用单一的过滤器的问题。 当为任务训练一个软化的刻度分解器型的深层结构时, 这一点尤其如此。 在本文中, 我们展示了一个结构化的深层次学习模型, 解决了混杂的图像制品过滤问题。 我们称之为深层模型的Patch Subspace Varider自动编码器( PS-VAE), 用于相机 ISP 中, PS- VAE 不一定具有统一的图像扭曲水平或相似的工艺品类型。 我们的模型尝试通过多层层层层的常规空间( 如: 调调调的不全层的不全层的图像, 我们的不全层图像是一层的软层结构, 也常常用一个更深层的缩的图像结构来显示前层的图像结构。