The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.
翻译:作者最近的研究论文 "Cumulative deviation of a subpopulation from the full population" 和 "A graphical method of cumulative differences between two subpopulations" (均发表于2021年 Springer 开放获取的期刊 "Journal of Big Data" 的第8卷) 提出了一些即使在没有详细校准显著性检验的情况下,仍能够进行概率预测的校准和比较子群体和整体群体差异的图形方法和摘要统计量。这些摘要统计量和方法可以通过对协变量或得分进行控制来测量概率预测的校准,并且可以评估子群体和整体群体之间的响应差异。这些最近发表的论文根据标量摘要统计量构建显著性测试,但仅概述了如何进行测试的显著性水平校准(也称为 "P值")。本文回顾和综合了涵盖多年的研究成果,以详细阐述如何进行P值的校准。本文提供计算效率高、易于实现的数值方法来评估正确校准的P值,并附有严格的数学证明保证其准确性,并使用开源软件和数值示例进行了说明和验证。