Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text, language that contains actionable physical harm, but requires further reasoning to identify such harm, is an area of particular interest, as such texts may arise from everyday scenarios and are challenging to detect as harmful. Qualifying the knowledge required to reason about the safety of various texts and providing human-interpretable rationales can shed light on the risk of systems to specific user groups, helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. We propose FARM, a novel framework that leverages external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge in specific scenarios, retrieves this knowledge with attribution to trustworthy sources, and uses this to both classify the safety of the original text and generate human-interpretable rationales, combining critically important qualities for sensitive domains such as user safety. Furthermore, FARM obtains state-of-the-art results on the SafeText dataset, improving safety classification accuracy by 5.29 points.
翻译:随着智能系统市场继续增长,不受限制的系统可能建议用户采取可能导致严重伤害的危险行动,用户的人身安全日益受到越来越多的关注。隐蔽的不安全文本,含有可采取行动的有形伤害,但需要进一步推理来确定这种伤害的语言,是一个特别令人感兴趣的领域,因为这些文本可能来自日常的情景,而且难以发现是有害的,因此这些文本可能来自日常的情景,而且具有挑战性,因此难以发现有害性。 证明关于各种文本安全的必要知识,并提供人的解释性理由,可以揭示系统对特定用户群体的风险,帮助利益攸关方管理其系统的风险,以及决策者为消费者安全提供具体保障。我们提议FARM,这是一个利用外部知识在安全背景下可靠地产生理由的新框架。特别是,FARM在具体情况下对缺失的知识进行检索,将其归属于可信赖的来源,利用这一知识对原始文本的安全进行分类,并产生人类解释性理由,将用户安全等敏感领域的至关重要的品质结合起来。此外,FARM在SafText数据集上获得了最新的数据,提高了安全性分类的精确度,到5.29点。