We propose a simple method by which to choose sample weights for problems with highly imbalanced or skewed traits. Rather than naively discretizing regression labels to find binned weights, we take a more principled approach -- we derive sample weights from the transfer function between an estimated source and specified target distributions. Our method outperforms both unweighted and discretely-weighted models on both regression and classification tasks. We also open-source our implementation of this method (https://github.com/Daniel-Wu/Continuous-Weight-Balancing) to the scientific community.
翻译:我们提出一个简单的方法,用于选择高度不平衡或偏斜特征问题的样本权重。我们采取的原则性更强的方法不是天真地分解回归标签,而是寻找捆绑的重量。我们从估计来源和特定目标分布之间的转移函数中抽取样本权重。我们的方法在回归和分类任务方面都优于未加权和离散加权模式。我们还将这种方法(https://github.com/Daniel-Wu/Continous-Weight-Balance)的实施外包给科学界。