The fine-grained localization of clinicians in the operating room (OR) is a key component to design the new generation of OR support systems. Computer vision models for person pixel-based segmentation and body-keypoints detection are needed to better understand the clinical activities and the spatial layout of the OR. This is challenging, not only because OR images are very different from traditional vision datasets, but also because data and annotations are hard to collect and generate in the OR due to privacy concerns. To address these concerns, we first study how joint person pose estimation and instance segmentation can be performed on low resolutions images with downsampling factors from 1x to 12x. Second, to address the domain shift and the lack of annotations, we propose a novel unsupervised domain adaptation method, called AdaptOR, to adapt a model from an in-the-wild labeled source domain to a statistically different unlabeled target domain. We propose to exploit explicit geometric constraints on the different augmentations of the unlabeled target domain image to generate accurate pseudo labels and use these pseudo labels to train the model on high- and low-resolution OR images in a self-training framework. Furthermore, we propose disentangled feature normalization to handle the statistically different source and target domain data. Extensive experimental results with detailed ablation studies on the two OR datasets MVOR+ and TUM-OR-test show the effectiveness of our approach against strongly constructed baselines, especially on the low-resolution privacy-preserving OR images. Finally, we show the generality of our method as a semi-supervised learning (SSL) method on the large-scale COCO dataset, where we achieve comparable results with as few as 1% of labeled supervision against a model trained with 100% labeled supervision.
翻译:操作室( OR) 临床医师的精细本地化是设计新一代 OR 支持系统的关键组成部分。 需要为个人像素分解和正键点检测建立计算机视觉模型, 以便更好地了解临床活动和OR的空间版图。 这是一项具有挑战性的工作, 不仅因为 OR 图像与传统的视觉数据集有很大不同, 并且由于隐私问题, 很难在 ORC 中收集和生成数据和说明。 为了解决这些关切, 我们首先研究如何在低分辨率图像上使用从 1x 至 12x 的下映系数对低分辨率图像进行估计和实例分解。 其次, 要解决域变换和缺乏说明的问题, 我们提议一种新型的未经监督的域域调整方法, 将一个模型从在网上贴有标签的源域域名的源域改成在统计上不同而没有标签的目标域域域域域域域域域域域域域域域域域域域域域域域域域域域域。 我们提议利用明确的几何模型来生成准确的虚拟标签, 使用这些假标签来训练模型, 在高甚和低分辨率的域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域图,, 显示一个我们所训练的数据域域域域域域域域域域域域域域域域域域域内数据,, 显示一个不同的数据,我们所研算数据,我们所研算,我们所研算,我们所研算,我们所研算的域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域,,我们所研,,,我们所研,我们所研,我们所研的域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域,,我们所研的