In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.
翻译:在这项工作中,当不平衡现象涉及连续或离散的共变现象时,我们考虑回归框架中数据不平衡的问题,这种情况可能导致估计数的偏差。在这种情况下,我们建议采用数据增强算法,将加权抽取(WR)和数据增强(DA)程序结合起来。第一步,DA程序允许探索比最初程序更广泛的支持。第二步,WR方法将外部分布驱动到目标。我们通过数字研究来讨论DA程序的选择,说明这一方法的优点。最后,研究精算应用。