Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI - 专知论文

会员服务 ·

0

INFORMS · 知识 (knowledge) · Continuity · state-of-the-art · 模型评估 ·

2023 年 5 月 19 日

Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI

翻译：暂无翻译

Alex Mei,Sharon Levy,William Yang Wang

from arxiv, In Findings of the 2023 Conference of the Association for Computational Linguistics

Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful. We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge to qualify the information required to reason in specific scenarios and retrieves this information with attribution to trustworthy sources. This knowledge is used to both classify the safety of the original text and generate human-interpretable rationales, shedding light on the risk of systems to specific user groups and helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.

翻译：暂无翻译

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【Freddy Lecue 博士|公开演讲】可解释XAI人工智能进展（Explainable AI-The Story So Far），Sungkyunkwan University 2019

【Freddy Lecue 博士|公开演讲】可解释XAI人工智能进展（Explainable AI-The Story So Far），Sungkyunkwan University 2019

专知会员服务

32+阅读 · 2019年11月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

多碘阴离子掺杂光学散射型固态电解质在染料敏化太阳能电池的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

间断Galerkin方法及在电磁计算中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

中层大气—电离层（MAI）探测系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

基于有机-无机原位电化学杂化的聚苯胺一维生长控制及传感性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

某些分布参数系统能控、能观、能稳的统一处理

国家自然科学基金

0+阅读 · 2009年12月31日

VerifAI: Verified Generative AI

Arxiv

0+阅读 · 2023年7月6日

Generative Artificial Intelligence Consensus in a Trustless Network

Arxiv

0+阅读 · 2023年7月4日

Secure Software Development Methodologies: A Multivocal Literature Review

Arxiv

0+阅读 · 2023年7月4日

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月1日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

Trustworthy AI: A Computational Perspective

Arxiv

12+阅读 · 2021年8月19日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

VIP会员

文章信息

相关主题

知识 (knowledge)

state-of-the-art

相关VIP内容

【Freddy Lecue 博士|公开演讲】可解释XAI人工智能进展（Explainable AI-The Story So Far），Sungkyunkwan University 2019

【Freddy Lecue 博士|公开演讲】可解释XAI人工智能进展（Explainable AI-The Story So Far），Sungkyunkwan University 2019

专知会员服务

32+阅读 · 2019年11月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

VerifAI: Verified Generative AI

Arxiv

0+阅读 · 2023年7月6日

Generative Artificial Intelligence Consensus in a Trustless Network

Arxiv

0+阅读 · 2023年7月4日

Secure Software Development Methodologies: A Multivocal Literature Review

Arxiv

0+阅读 · 2023年7月4日

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月1日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

Trustworthy AI: A Computational Perspective

Arxiv

12+阅读 · 2021年8月19日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

多碘阴离子掺杂光学散射型固态电解质在染料敏化太阳能电池的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

间断Galerkin方法及在电磁计算中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

中层大气—电离层（MAI）探测系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

基于有机-无机原位电化学杂化的聚苯胺一维生长控制及传感性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

某些分布参数系统能控、能观、能稳的统一处理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员