This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages. Requiring minimal assumptions, it extends recently published parametric versions of the methodology and generalizes it. If the underlying data generating process is asymmetric, it gives uniformly better prediction and inference performance over the parametric formulation. Furthermore, it introduces a new classification statistic utilizing which I show that overall, it has better model fit, inference and classification performance than the parametric version, and the difference in performance is statistically significant especially if the data generating process is asymmetric. In addition, the methodology can be used to perform model diagnostics for any model specification. This is a highly useful result, and it extends existing work for categorical model diagnostics broadly across the sciences. The mathematical results also highlight important new findings regarding the interplay of statistical significance and scientific significance. Finally, the methodology is applied to various real-world datasets to show that it may outperform widely used existing models, including Random Forests and Deep Neural Networks with very few iterations.
翻译:本条为通用线性模型引入了一种新的非参数性方法,该方法结合了绝对数据的二进制回归和潜在变量配方的优点,同时克服了它们的缺点。要求最低假设,它扩展了最近公布的该方法的参数版本,并概括了该方法。如果基础数据生成过程不对称,则对参数配方进行一致的更好的预测和推论性能。此外,它引入了新的分类统计,我用它来显示总体而言,它比参数版本更适合、推论和分类性能,而性能的差别在统计上具有重大意义,特别是当数据生成过程不对称时。此外,该方法可用于为任何模型规格进行模型诊断。这是一个非常有用的结果,它扩展了整个科学范围内现有的绝对模型诊断工作。数学结果还突出了关于统计意义和科学重要性相互作用的重要新发现。最后,该方法被应用于各种真实世界的数据集,以表明它可能超越广泛使用的现有模型,包括随机森林和深神经网络,其迭代作用很少。