PAC-Based正式验证ODD数据检测的方法 (PAC-Based Formal Verification for Out-of-Distribution Data Detection)

Cyber-physical systems (CPS) like autonomous vehicles, that utilize learning components, are often sensitive to noise and out-of-distribution (OOD) instances encountered during runtime. As such, safety critical tasks depend upon OOD detection subsystems in order to restore the CPS to a known state or interrupt execution to prevent safety from being compromised. However, it is difficult to guarantee the performance of OOD detectors as it is difficult to characterize the OOD aspect of an instance, especially in high-dimensional unstructured data. To distinguish between OOD data and data known to the learning component through the training process, an emerging technique is to incorporate variational autoencoders (VAE) within systems and apply classification or anomaly detection techniques on their latent spaces. The rationale for doing so is the reduction of the data domain size through the encoding process, which benefits real-time systems through decreased processing requirements, facilitates feature analysis for unstructured data and allows more explainable techniques to be implemented. This study places probably approximately correct (PAC) based guarantees on OOD detection using the encoding process within VAEs to quantify image features and apply conformal constraints over them. This is used to bound the detection error on unfamiliar instances with user-defined confidence. The approach used in this study is to empirically establish these bounds by sampling the latent probability distribution and evaluating the error with respect to the constraint violations that are encountered. The guarantee is then verified using data generated from CARLA, an open-source driving simulator.

翻译：智能化电子系统（CPS），如自动驾驶汽车等，通常利用学习组件进行操作，在运行时经常受到噪声和超出分布范围的实例的影响，这时关键任务依赖于ODD检测子系统的运行来恢复CPS的已知状态或中断执行以防止安全受到威胁。然而，很难保证ODD检测器的性能，因为很难对实例的ODD方面进行表征，特别是在高维非结构化数据方面。为了区分ODD数据和通过训练过程中已知于学习组件的数据，一种新兴技术是将变分自动编码器（VAE）纳入系统中，并在其潜在空间中应用分类或异常检测技术。采用这种方法的理由是通过编码过程减少数据域的大小，从而通过减少处理要求来惠及实时系统，为非结构化数据提供方便的特征分析，并允许实现更可解释的技术。本研究将PAC方法应用于通过VAE中的编码过程进行ODD检测的方法，以量化图像特征并对其应用符号约束来限制对用户定义置信度下的不熟悉实例的检测误差。本研究所采用的方法是通过对潜在概率分布进行抽样，并评估相对于遇到的约束违规的误差来经验性地建立这些界限。随后，采用来自于一个开源的驾驶模拟器CARLA的生成数据验证保证。