The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.
翻译:作者最近的研究论文“子人口与全部人口之间的对比偏差”和“两种子人口之间累积差异的图形方法”(均发表在Springer公开访问的“大数据杂志”2021年第8卷中),在不广泛校准正式意义测试的情况下,提出了图形方法和摘要统计数据。简要指标和方法可以测量概率预测的校准,并评估子人口与全部人口之间在反应上的差异,同时通过调整来控制共变或得分。这些最近发表的论文根据星标摘要统计数据构建了重要程度测试,但只描述了如何校准测试达到的值(也称为“P值”),本文章回顾并综合了长达数十年的工作,以便详细说明如何校准P值。本文介绍了评估经适当校准的P值的计算效率、易于执行的数字方法,以及严格的数学证据,保证其准确性,并用公开源软件和数字示例说明和验证了方法。