Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
翻译:机械学习模式通常通过使用参考说明进行计算相似性来评估,并通过最大限度地实现类似性来培训。尤其是在生物医学领域,说明具有主观性,而且跨河和跨河内部的可靠性较低。由于说明仅反映批注实体对真实世界的解释,因此即使模型达到高度相似性分数,也可能导致次最佳预测。在此,引入了峰底真理理论概念。PGT标志着一个点,超过这一点,与参考说明的相似性增加不再转化为更好的真实世界模型性业绩(RWMP)。此外,还提出了通过计算跨河之间和跨河内部的可靠性来接近PGTT的定量技术。最后,对用于评价和改进模型业绩的PGT-aware战略的三类进行了审查。