We explore the promises and challenges of employing sequential decision-making algorithms -- such as bandits, reinforcement learning, and active learning -- in law and public policy. While such algorithms have well-characterized performance in the private sector (e.g., online advertising), the tendency to naively apply algorithms motivated by one domain, often online advertisements, can be called the "advertisement fallacy." Our main thesis is that law and public policy pose distinct methodological challenges that the machine learning community has not yet addressed. Machine learning will need to address these methodological problems to move "beyond ads." Public law, for instance, can pose multiple objectives, necessitate batched and delayed feedback, and require systems to learn rational, causal decision-making policies, each of which presents novel questions at the research frontier. We discuss a wide range of potential applications of sequential decision-making algorithms in regulation and governance, including public health, environmental protection, tax administration, occupational safety, and benefits adjudication. We use these examples to highlight research needed to render sequential decision making policy-compliant, adaptable, and effective in the public sector. We also note the potential risks of such deployments and describe how sequential decision systems can also facilitate the discovery of harms. We hope our work inspires more investigation of sequential decision making in law and public policy, which provide unique challenges for machine learning researchers with potential for significant social benefit.
翻译:我们探讨在法律和公共政策中采用顺序决策算法 -- -- 例如土匪、强化学习和积极学习 -- -- 的许诺和挑战。虽然这种算法在私营部门(例如在线广告)的表现特征明确,但这种算法在私营部门(例如在线广告)中往往被一个领域(往往是在线广告)所驱动的天真应用算法的倾向可以被称为“广告谬误”。我们的主要论点是,法律和公共政策提出了机器学习界尚未解决的截然不同的方法挑战。机器学习需要解决这些方法问题,才能“超越广告”。 例如,公法可以带来多重目标,需要分批和延迟反馈,需要系统学习理性、因果决策政策,每个系统都在研究前沿提出新的问题。我们讨论在监管和治理方面,包括公共卫生、环境保护、税收管理、职业安全和利益裁定方面,可能广泛应用顺序决策算法。我们用这些例子来强调必要的研究,以使顺序决策成为“超越广告 ” 。例如,公法可以带来多重目标,需要分批和延迟反馈,需要系统学习理性、因果关系,需要学习,需要系统来学习,每个系统在研究领域都提出新的问题。我们在研究领域提出新的问题。我们讨论一系列决策中可能带来巨大的风险,并描述不断决策的学习。我们如何使社会决策产生重大的学习,我们是如何学习如何产生巨大的风险。我们如何促进。我们进行新的研究。我们还研究。我们进行新的研究。