With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing. There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection. In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.
翻译:随着机器学习和深层次学习应用在实践中的兴起,监测,即核查这些应用在规格范围内运作,已成为一个重要的实际问题,监测的一个重要方面是检查投入(或中间体)是否偏离了它们被验证的分布,这可以使测试期间获得的性能保证无效。对此,有两种共同的方法。也许,比较典型的方法是异端检测或新颖的检测,对于一个输入,我们问它是否是一个外端,即极不可能来自参考分布。第二个可能是较近期的方法是考虑更多的投入并将其分布与参考分布进行比较(例如在测试期间抽样),这是在标签漂移探测下进行的。在这项工作中,我们通过将某一输入的数量与自动选定的参考分布部分进行比较,缩小了外部检测和漂移探测之间的差距。