场景:一种用于情景描述和数据生成的语言 (Scenic: A Language for Scenario Specification and Data Generation)

We propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. Specifically, we consider the problems of training a system to be robust to rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs, then sampling these to generate specialized training and test data. More generally, such languages can be used to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment at any point in time is a 'scene', a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes and the behaviors of their agents over time. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.

翻译：我们为设计和分析网络物理系统,特别是基于机器学习的系统,提出了一种新的概率化编程语言。具体地说,我们考虑了培训一个系统以对罕见事件具有活力、测试其在不同条件下的性能和调试失败等问题。我们展示了一种概率化编程语言如何通过指定分布方式编码有趣的投入类型来帮助解决这些问题,然后对这些投入进行抽样,以生成专门的培训和测试数据。更一般地说,这些语言可用于编写环境模型,这是任何正式分析的一个基本先决条件。在本文中,我们侧重于自主汽车和机器人等系统,这些系统在任何时刻的环境都是“秘密的”,是物理物体和代理人的配置。我们设计了一种针对特定领域的语言,即“精度化语言”,用来描述场景的分布和其代理的行为。作为一种概率化的编程语言,Scenic允许向现场的特征分配分布,并在场景上公开地施加硬软约束。我们开发了从所产生分布中取样的专门技术,利用了Scenica网络所设计的合成图像结构,从而改进了在合成汽车的成型图像学中实现的状态。