We propose and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.
翻译:我们提出并运用新的基因组数据质量定性范式,对质量故意退化的影响进行量化,其理由是初始质量越高,基因组越脆弱,降解的影响就越大。我们证明这种现象无处不在,量化的降解计量方法可用于多种目的。我们的重点是找出在数据质量方面可能存在问题,但也可能是真正的异常现象,甚至试图颠覆数据库。