通过外推法安全政策学习:审前风险评估应用 (Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment)

Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic classification scores and recommendations to help judges make release decisions. How can we use the data based on existing deterministic policies and learn new and better policies? Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic rather than deterministic. We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the worst-case regret. The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations. Lastly, we apply the proposed methodology to a unique field experiment on pre-trial risk assessment instruments. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing instrument while potentially leading to better overall outcomes at a lower cost.

翻译：在当今社会,许多这类和其他以数据驱动的政策,特别是在公共政策领域,都基于已知的确定性规则,以确保其透明度和可解释性。例如,作为我们激励性应用的审前算法风险评估,提供了相对简单、确定性分类分数和建议,以帮助法官作出释放决定。我们如何利用基于现有确定性政策的数据,学习新的和更好的政策?不幸的是,先前的政策学习方法不适用,因为它们要求现有政策是随机的,而不是确定性。我们制定了一种强有力的优化方法,部分地确定政策预期的效用,然后通过尽量减少最坏的遗憾找到最佳政策。由此产生的政策是保守的,但有统计安全的保证,使决策者能够限制产生比现行政策更差的结果的可能性。我们将这一方法推广到人类根据算法建议作出决定的常见和重要环境。最后,我们将拟议的方法运用于对审前风险评估工具的独特实地试验,同时将现有的成本评估工具放在更低的水平上。我们提出新的分类和建议,在可能情况下保留新的透明度规则。