The sample selection bias problem arises when a variable of interest is correlated with a latent variable, and involves situations in which the response variable had part of its observations censored. Heckman (1976) proposed a sample selection model based on the bivariate normal distribution that fits both the variable of interest and the latent variable. Recently, this assumption of normality has been relaxed by more flexible models such as the Student-t distribution (Marchenko and Genton, 2012; Lachos et al., 2021). The aim of this work is to propose generalized Heckman sample selection models based on symmetric distributions (Fang et al., 1990). This is a new class of sample selection models, in which variables are added to the dispersion and correlation parameters. A Monte Carlo simulation study is performed to assess the behavior of the parameter estimation method. Two real data sets are analyzed to illustrate the proposed approach.
翻译:当一个利益变量与潜在变量相关联时,就会产生抽样选择偏差问题,这种选择偏差问题涉及答复变量有其意见一部分被审查的情况。Heckman(1976年)根据符合利益变量和潜在变量的双变量正常分布提出了样本选择模型。最近,学生-t分布等更灵活的模型放宽了这一正常性假设(Marchenko和Genton,2012年;Lachos等人,2021年)。这项工作的目的是提出基于对称分布的通用Heckman样本选择模型(Fang等人,1990年)。这是一个新的样本选择模型类别,其中将变量添加到分散和关联参数中。进行了蒙特卡洛模拟研究,以评估参数估计方法的行为。对两个真实数据集进行了分析,以说明拟议的方法。