We propose a robust method for averaging numbers contaminated by a large proportion of outliers. Our method, dubbed RODIAN, is inspired by the key idea of MINPRAN [1]: We assume that the outliers are uniformly distributed within the range of the data and we search for the region that is least likely to contain outliers only. The median of the data within this region is then taken as RODIAN. Our approach can accurately estimate the true mean of data with more than 50% outliers and runs in time $O(n\log n)$. Unlike other robust techniques, it is completely deterministic and does not rely on a known inlier error bound. Our extensive evaluation shows that RODIAN is much more robust than the median and the least-median-of-squares. This result also holds in the case of non-uniform outlier distributions.
翻译:我们提出了一种稳健的方法,用于平均受大量外部线污染的数字。我们的方法,称为RODIAN,受MIPRAN [1] 关键概念的启发:我们假设,外部线在数据范围内分布一致,我们寻找最不可能仅包含外部线的区域。然后,该地区数据的中位数被作为RODIAN。我们的方法可以准确估计数据的真实平均值,其值超过50%的外部线,并按时间运行$O(n\log n)。与其他强力技术不同,它完全具有确定性,不依赖已知的绝对错误。我们的广泛评估表明,区域外部线比中位值和中位值要强得多。结果也适用于非统一外部线分布的情况。