"For how many days during the past 30 days was your mental health not good?" The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: the data are overdispersed, zero-inflated, bounded by 30, and heaped in five- and seven-day increments. To meet these challenges, we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (STAR) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an EM algorithm that is compatible with any continuous data model estimable by least squares. STAR regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. STAR is deployed to study the factors associated with self-reported mental health and demonstrates substantial improvements in goodness-of-fit compared to existing count data regression models.
翻译:“在过去30天里,你的精神健康状况如何?” 这个问题的回答是衡量自我报告的精神健康状况,可以与国家健康和营养检查调查(NHANES)中的重要共变体联系起来。然而,这些计数变量提出了主要的分布挑战:数据过于分散,零充气,受30个约束,加压5天和7天。为了迎接这些挑战,我们设计了一个计算数据回归的半参数估计和推论框架。数据生成过程的定义是通过同时转换和舍入一个潜值回归模型(STAR)来界定的。这种转换是非对称性的,圆形操作员确保了对离散和受约束数据的正确支持。最大可能性的估算是使用与任何持续数据模型相容的EM算法来计算,这种算法与最小方可以估计的任何连续数据模型相容。STRATAR的回归包括无症状的假设测试和信任间隔,通过信息标准进行变量选择,以及定制的诊断。模拟研究验证了这个框架的效用。STRATAR将部署用于研究与现有回归模型相关的因素,以便比较现有数据回归模型。