The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches.Direct and indirect methods are compared in terms of sensitivity, specificity and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.
翻译:区分基于不同类型痕迹的同一来源和不同来源假设的问题,是法医学的一个一般性问题。这个问题往往通过巴耶斯方法加以解决,巴耶斯方法能够提供一种可能性比率,以量化支持两种不同假设的证据的相对强度。这里,我们侧重于基于距离的方法,这些方法的稳健性以及具体处理高维度证据的能力非常不同,需要加以评估和优化。一个基于估计两种相互竞争的假设下每个假设下的距离的可能性的直接方法的统一框架,以及使用后勤回归法来区分同源和不同源距离分布的间接方法的统一框架。虽然直接方法更灵活,间接方法在机器学习中更健全和非常自然。此外,间接方法还能够使用矢量距离,从而防止超度距离方法造成的严重信息损失。在敏感度、具体性和稳健性和间接方法中,在手态、不带有特征选择的方面,在不带有特征选择的方面,在不具有特征的方面,在手态、不具有挑战性的第5级的实地评估中,特别在第16级、具有重要历史特征的实地评估中,在第16级、第5级和具有重要证据的实地方面,特别具有新的和具有挑战性。</s>