通过外推法安全政策学习:审前风险评估应用 (Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment)

Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies are based on known, deterministic rules to ensure their transparency and interpretability. This is especially true when such policies are used for public policy decision-making. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic classification scores and recommendations to help judges make release decisions. Unfortunately, existing methods for policy learning are not applicable because they require existing policies to be stochastic rather than deterministic. We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the worst-case regret. The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations. Lastly, we apply the proposed methodology to a unique field experiment on pre-trial risk assessments. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing risk assessment instrument while potentially leading to better overall outcomes at a lower cost.

翻译：在当今社会,许多这类和其他由数据驱动的政策都以已知的、决定性的规则为基础,以确保透明度和可解释性。当这种政策被用于公共政策决策时,尤其如此。例如,作为我们激励性应用的审前算法风险评估,提供了相对简单、决定性的分类分数和建议,以帮助法官作出释放决定。不幸的是,现有的政策学习方法并不适用,因为它们要求现有政策是随机的,而不是确定性的。我们制定了一种强有力的优化方法,部分地确定政策的预期效用,然后通过尽量减少最坏情况的遗憾找到最佳政策。由此产生的政策是保守的,但有统计安全的保证,使决策者能够限制产生比现行政策更糟糕结果的可能性。我们将这一方法推广到人类根据算法建议作出决定的常见和重要环境。最后,我们将拟议的方法运用到审判前风险评估的独特实地试验中。我们提出了新的分类和建议规则,以保持现有风险评估的透明度和可解释性,同时提高总体风险评估的成本。我们提出了新的分类和建议,以保持现有风险评估的透明度和可解释性。