Uplift is a particular case of conditional treatment effect modeling. Such models deal with cause-and-effect inference for a specific factor, such as a marketing intervention or a medical treatment. In practice, these models are built on individual data from randomized clinical trials where the goal is to partition the participants into heterogeneous groups depending on the uplift. Most existing approaches are adaptations of random forests for the uplift case. Several split criteria have been proposed in the literature, all relying on maximizing heterogeneity. However, in practice, these approaches are prone to overfitting. In this work, we bring a new vision to uplift modeling. We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk. Our solution is developed for a specific twin neural network architecture allowing to jointly optimize the marginal probabilities of success for treated and control individuals. We show that this model is a generalization of the uplift logistic interaction model. We modify the stochastic gradient descent algorithm to allow for structured sparse solutions. This helps training our uplift models to a great extent. We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
翻译:提升是有条件治疗效果模型的特例。 这些模型涉及特定因素的因果关系推导, 如市场干预或医疗治疗。 实际上,这些模型是建立在随机临床试验的个别数据基础上的, 其目的在于将参与者根据提升程度分成不同的群体。 多数现有方法是随机森林的调整, 以适应提升案例。 文献中提出了几种不同的标准, 都依赖于最大程度的异质性。 然而, 实际上, 这些方法容易被过度使用。 我们在工作中, 带来了一个新的提升模型。 我们提出一个新的损失函数, 利用贝叶斯人对相对风险的解释进行连接。 我们的解决方案是针对特定的双神经网络结构, 以便共同优化治疗和控制个人成功率的边际概率。 我们显示, 这个模型是提升后勤互动模型的概括化。 我们修改过分梯度梯度的梯度下位算法, 以允许结构化的稀释解决方案。 这有助于培训我们的提升模型, 从而极大地培训我们的提升模型。 我们从大范围的模拟中展示了我们提出的方法, 与大比例的随机性实验。