We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21\% to 40\% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution.
翻译:我们在内生样本选择条件下,开发了一个分布回归模型。该模型是 Heckman 选择模型的半参数推广。它允许协变量对结果分布和筛选过程中的异质性模式产生更丰富的效应,并允许从高斯误差结构中大幅偏离,同时保持与经典模型相同的可追踪性水平。该模型适用于连续、离散和混合结果。我们提供了识别、估计和推断方法,并将它们应用于英国的工资分解。在这里,我们将男女工资差异分解为组成、工资结构、选择结构和选择排序效应。在控制内生就业选择后,我们仍然发现存在相当大的性别工资差距——在不通过组成说明的情况下,该差距在整个(潜在)报价工资分布中的范围为 21% 到 40%。我们还发现单身男性的正向筛选和已婚女性的负向筛选在分布的高端部分占据了性别工资差距的实质部分。