Goodness-of-fit tests based on the empirical Wasserstein distance are proposed for simple and composite null hypotheses involving general multivariate distributions. For group families, the procedure is to be implemented after preliminary reduction of the data via invariance.This property allows for calculation of exact critical values and p-values at finite sample sizes. Applications include testing for location--scale families and testing for families arising from affine transformations, such as elliptical distributions with given standard radial density and unspecified location vector and scatter matrix. A novel test for multivariate normality with unspecified mean vector and covariance matrix arises as a special case. For more general parametric families, we propose a parametric bootstrap procedure to calculate critical values. The lack of asymptotic distribution theory for the empirical Wasserstein distance means that the validity of the parametric bootstrap under the null hypothesis remains a conjecture. Nevertheless, we show that the test is consistent against fixed alternatives. To this end, we prove a uniform law of large numbers for the empirical distribution in Wasserstein distance, where the uniformity is over any class of underlying distributions satisfying a uniform integrability condition but no additional moment assumptions. The calculation of test statistics boils down to solving the well-studied semi-discrete optimal transport problem. Extensive numerical experiments demonstrate the practical feasibility and the excellent performance of the proposed tests for the Wasserstein distance of order p = 1 and p = 2 and for dimensions at least up to d = 5. The simulations also lend support to the conjecture of the asymptotic validity of the parametric bootstrap.
翻译:根据经验瓦森斯坦距离,提议对基于经验瓦森斯坦距离的简单和复合空虚假设进行适当测试。对于群体家庭,程序将在通过变化初步减少数据后实施。此属性允许在有限的抽样大小下计算精确的关键值和 p值。应用包括对位置尺度家庭进行测试,对因亲吻变异产生的家庭进行测试,例如,用给定标准辐射密度和未指定地点矢量和散射矩阵来进行星际分布测试。对于使用未具体说明的平均矢量和螺旋矩阵的多变量正常度进行新颖的测试,作为特例出现。对于更普通的参数家庭,我们建议采用参数靴套件程序来计算关键值。对于经验瓦瑟斯坦距离缺乏无精确度分布理论意味着,在无效假设下,对测深的谷际测距测距仪的有效性,然而,我们表明,测试与固定的替代品是一致的。对于在瓦瑟斯坦距离上的经验分布的大型数值,也是一种统一的法律。对于更普通的测算法,在1级的精确度测距值上,最差值的测算的测算为最差值的精确的测算性测算。