As more and more conversational and translation systems are deployed in production, it is essential to implement and to develop effective control mechanisms guaranteeing their proper functioning and security. An essential component to ensure safe system behavior is out-of-distribution (OOD) detection, which aims at detecting whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, it has received much less attention in text generation. This paper addresses the problem of OOD detection for machine translation and dialog generation from an operational perspective. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection ODD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples that are well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF breaks this curse and achieve good results in OOD detection while increasing performance.
翻译:由于在生产过程中越来越多地使用谈话和翻译系统,必须实施和发展有效的控制机制,保证其正常运行和安全,确保安全系统行为的一个基本组成部分是超出分配(OOOD)检测,目的是检测输入样本是否在统计上远低于培训分布,尽管OOD检测在分类任务中是一个广泛覆盖的专题,但在生成文本方面却很少受到重视。本文从操作角度探讨机器翻译和生成对话OOOD检测的问题。我们的贡献包括:(一) RAINPROO 相对信息投影ODD检测框架;(二) 更多的OOOD检测操作性评估环境。令人惊讶的是,我们发现OOOD检测不一定与具体任务措施一致。OOD探测器可能筛选出模型处理良好的样本,并保存不完善的样本,导致性能减弱。我们的结果显示,RAINPROOPO打破了这一诅咒,并在OD检测中取得良好结果,同时提高性能。