Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation -- they are not aware that some of the decisions may still be taken by humans. In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels. More specifically, we first introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of nondecreasing submodular functions. Building on this representation, we further show that the objective is nondecreasing and satisfies $\alpha$-submodularity, a recently introduced notion of approximate submodularity. These properties allow a simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications -- medical diagnosis and content moderation-demonstrate that our algorithm outsources to humans those samples in which the prediction error of the ridge regression model would have been the highest if it had to make a prediction, it outperforms several competitive baselines, and its performance is robust with respect to several design choices and hyperparameters used in the experiments.
翻译:人类和机器学习模式都越来越多地做出决策。然而,机器学习模式目前被培训为完全自动化 -- -- 他们不知道有些决定可能仍然由人类做出。在本文中,我们迈出了第一步,开发机器学习模式,这些模式在不同的自动化水平下最优化运作。更具体地说,我们首先在人的援助下引入山脊回归问题,并表明它是硬的。然后,我们从相应的客观功能中得出一个替代的表述,作为未降级子模块功能的区别。基于这一表述,我们进一步显示,目标是不下降,满足了部分模式,这是最近引入的近似亚模式概念。这些属性使得简单而有效的贪婪算法能够享有解决该问题的近似保证。从两个重要应用程序 -- -- 医学诊断和内容调适度 -- -- 实验合成和现实世界数据,证明我们的算法将那些样品外包给人类,在这些样本中,山脊回归模型的预测误差是最高的,如果它不得不作出预测,它超越了几个竞争性基准和模型,它所使用的几度设计实验是稳健的。