Shoe tread impressions are one of the most common types of evidence left at crime scenes. However, the utility of such evidence is limited by the lack of databases of footwear prints that cover the large and growing number of distinct shoe models. Moreover, the database is preferred to contain the 3D shape, or depth, of shoe-tread photos so as to allow for extracting shoeprints to match a query (crime-scene) print. We propose to address this gap by leveraging shoe-tread photos collected by online retailers. The core challenge is to predict depth maps for these photos. As they do not have ground-truth 3D shapes allowing for training depth predictors, we exploit synthetic data that does. We develop a method termed ShoeRinsics that learns to predict depth by leveraging a mix of fully supervised synthetic data and unsupervised retail image data. In particular, we find domain adaptation and intrinsic image decomposition techniques effectively mitigate the synthetic-real domain gap and yield significantly better depth prediction. To validate our method, we introduce 2 validation sets consisting of shoe-tread image and print pairs and define a benchmarking protocol to quantify the quality of predicted depth. On this benchmark, ShoeRinsics outperforms existing methods of depth prediction and synthetic-to-real domain adaptation.
翻译:鞋印是犯罪现场留下的最常见证据类型之一;然而,由于缺乏鞋印数据库,这些证据的效用因缺少鞋印数据库而受到限制,这些数据库覆盖了数量众多且不断增加的不同鞋型模型;此外,该数据库更倾向于包含鞋面照片的3D形状或深度,以便提取鞋印,与查询(犯罪-犯罪现场)印刷相匹配。我们提议利用在线零售商收集的鞋印照片来弥补这一差距。核心挑战在于预测这些照片的深度地图。由于它们没有地面3D形状,无法用于培训深度预测器,我们利用了这样的合成数据。我们开发了一种叫做ShoeRinsics的方法,通过利用充分监督的合成数据和不受监督的零售图像数据组合,学会预测深度。特别是,我们发现域适应和内在图像分解技术有效地缩小了合成-真实域间的差距,并产生更好的深度预测。为了验证我们的方法,我们引入了由制鞋图和打印的3D形状构成的地面-3D形状,我们利用了这样的合成数据。我们开发了一种称为ShoRins的深度预测基准线,并界定了现有深度预测的域域质量。