Rapid advancements in data science require us to have fundamentally new frameworks to tackle prevalent but highly non-trivial "irregular" inference problems, to which the large sample central limit theorem does not apply. Typical examples are those involving discrete or non-numerical parameters and those involving non-numerical data, etc. In this article, we present an innovative, wide-reaching, and effective approach, called "repro samples method," to conduct statistical inference for these irregular problems plus more. The development relates to but improves several existing simulation-inspired inference approaches, and we provide both exact and approximate theories to support our development. Moreover, the proposed approach is broadly applicable and subsumes the classical Neyman-Pearson framework as a special case. For the often-seen irregular inference problems that involve both discrete/non-numerical and continuous parameters, we propose an effective three-step procedure to make inferences for all parameters. We also develop a unique matching scheme that turns the discreteness of discrete/non-numerical parameters from an obstacle for forming inferential theories into a beneficial attribute for improving computational efficiency. We demonstrate the effectiveness of the proposed general methodology using various examples, including a case study example on a Gaussian mixture model with unknown number of components. This case study example provides a solution to a long-standing open inference question in statistics on how to quantify the estimation uncertainty for the unknown number of components and other associated parameters. Real data and simulation studies, with comparisons to existing approaches, demonstrate the far superior performance of the proposed method.
翻译:暂无翻译