Experimental datasets are growing rapidly in size, scope, and detail, but the value of these datasets is limited by unwanted measurement noise. It is therefore tempting to apply analysis techniques that attempt to reduce noise and enhance signals of interest. In this paper, we draw attention to the possibility that denoising methods may introduce bias and lead to incorrect scientific inferences. To present our case, we first review the basic statistical concepts of bias and variance. Denoising techniques typically reduce variance observed across repeated measurements, but this can come at the expense of introducing bias to the average expected outcome. We then conduct three simple simulations that provide concrete examples of how bias may manifest in everyday situations. These simulations reveal several findings that may be surprising and counterintuitive: (i) different methods can be equally effective at reducing variance but some incur bias while others do not, (ii) identifying methods that better recover ground truth does not guarantee the absence of bias, (iii) bias can arise even if one has specific knowledge of properties of the signal of interest. We suggest that researchers should consider and possibly quantify bias before deploying denoising methods on important research data.
翻译:实验数据集的规模、范围和详细程度正在迅速增长,但这些数据集的价值却因不必要的测量噪音而受到限制,因此,采用旨在减少噪音和增强关注信号的分析技术很诱人。在本文件中,我们提请人们注意,取消排放的方法有可能带来偏差,并导致不正确的科学推断。我们首先审查偏差和差异的基本统计概念。低偏差技术通常会减少反复测量中观察到的差异,但这可能会损害对平均预期结果的偏差。然后我们进行三次简单的模拟,提供在日常生活中可能出现偏差的具体例子。这些模拟揭示了若干可能令人惊讶和反直觉的结论:(一) 不同方法在减少差异方面同样有效,但有些方法会产生偏差,而另一些方法则没有,(二) 确定更好地恢复地面真相的方法并不能保证没有偏差,(三) 即使对意向信号的特性有具体了解,也可能出现偏差。我们建议研究人员在对重要研究数据采用分辨方法之前,应考虑并可能量化偏差。