Dark environment becomes a challenge for computer vision algorithms owing to insufficient photons and undesirable noise. To enhance object detection in a dark environment, we propose a novel multitask auto encoding transformation (MAET) model which is able to explore the intrinsic pattern behind illumination translation. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation considering the physical noise model and image signal processing (ISP). Based on this representation, we achieve the object detection task by decoding the bounding box coordinates and classes. To avoid the over-entanglement of two tasks, our MAET disentangles the object and degrading features by imposing an orthogonal tangent regularity. This forms a parametric manifold along which multitask predictions can be geometrically formulated by maximizing the orthogonality between the tangents along the outputs of respective tasks. Our framework can be implemented based on the mainstream object detection architecture and directly trained end-to-end using normal target detection datasets, such as VOC and COCO. We have achieved the state-of-the-art performance using synthetic and real-world datasets. Code is available at https://github.com/cuiziteng/MAET.
翻译:由于光子不足和不可取的噪音,黑暗环境成为计算机视觉算法的挑战。为了在黑暗环境中加强物体探测,我们提议了一个新颖的多任务自动编码转换(MAET)模型,该模型能够探索光化翻译背后的内在模式。以自我监督的方式,MAET通过对现实的光化分解模型和图像信号处理(ISP)进行编码和解码,从而了解内在的视觉结构。基于这一表述,我们通过解码捆绑框坐标和分类来完成物体探测任务。为避免两个任务过于纠缠,我们的MAET通过强制定一个或多向正切的正切规律来解析对象和有辱人格的特征。这形成了一个参数性方形的方形方形方形,沿着这些方形进行多向预测,可以根据各自任务的产出最大限度地实现色素之间的矩形分解。我们的框架可以在主流物体探测结构的基础上实施,并利用常规的目标探测数据集(如VOC和COCO)直接培训端对端至端的数据集来完成。我们已在一个州-asiria-comdeal-comdeal-dealdeal-dealdement。我们已经实现了一个州-masimasia-comde-made-commddal-mas。