在相互AI系统中控制自学的受控自学政策优化 (Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 约束 · Learning · Extensibility · 上下文赌博机/上下文老虎机 ·

2022 年 9 月 17 日

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

翻译：在相互AI系统中控制自学的受控自学政策优化

Mohammad Kachuee,Sungjin Lee

Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate.

翻译：最近,基于用户满意度指标和背景强盗的自学方法显示出令人乐观的结果,使对话性自主系统得以持续改进。然而,直接针对非政策性强盗学习目标的这类衡量方法,往往会增加突然改变政策以打破当前用户经验的风险。在本研究中,我们引入了一个可扩展的框架,通过用户定义的限制,支持个别领域的细微探索目标。例如,我们可能希望确保减少诸如购物等商业关键领域的政策偏差,同时将更多的勘探预算分配给音乐等领域。此外,我们提出了一种新颖的元进化学习方法,该方法可以伸缩和实用地解决这一问题。拟议方法通过鼓励平衡地制约各领域满意度的元目标,调整约束性违反处罚的适应性,我们利用现实世界对话性AI的数据进行一系列现实性制约性基准的广泛实验。根据实验结果,我们证明拟议的方法能够实现政策价值与约束性满意度之间的最佳平衡。

0

相关内容

赌博机/老虎机

赌博机/老虎机

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

靶向血小板膜糖蛋白GPIbα抑制肿瘤转移的作用与分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

同步辐射技术研究Au-Cu双金属空心纳米颗粒的结构和性能

国家自然科学基金

0+阅读 · 2015年12月31日

相容幂domain结构与函数逼近结构相关问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

石墨烯量子点的可控制备及其荧光成像应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

青藏高原地表太阳辐射的遥感反演

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子在复合物中分散性定量表征及与介电性关系

国家自然科学基金

0+阅读 · 2012年12月31日

酪氨酸磷酸化信号转导网络在丙型肝炎病毒NS3致癌机理中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Arxiv

0+阅读 · 2022年10月26日

Approximations for Generalized Unsplittable Flow on Paths with Application to Power Systems Optimization

Arxiv

0+阅读 · 2022年10月26日

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking

Arxiv

0+阅读 · 2022年10月26日

UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning

Arxiv

0+阅读 · 2022年10月25日

DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Arxiv

0+阅读 · 2022年10月25日

Evaluating and Optimizing Hearing-Aid Self-Fitting Methods using Population Coverage

Arxiv

0+阅读 · 2022年10月25日

Clustering with fair-center representation: parameterized approximation algorithms and heuristics

Arxiv

0+阅读 · 2022年10月24日

On the optimization and pruning for Bayesian deep learning

Arxiv

0+阅读 · 2022年10月24日

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems

Arxiv

0+阅读 · 2022年10月22日

Neural Approaches to Conversational AI

Arxiv

26+阅读 · 2018年9月21日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Arxiv

0+阅读 · 2022年10月26日

Approximations for Generalized Unsplittable Flow on Paths with Application to Power Systems Optimization

Arxiv

0+阅读 · 2022年10月26日

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking

Arxiv

0+阅读 · 2022年10月26日

UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning

Arxiv

0+阅读 · 2022年10月25日

DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Arxiv

0+阅读 · 2022年10月25日

Evaluating and Optimizing Hearing-Aid Self-Fitting Methods using Population Coverage

Arxiv

0+阅读 · 2022年10月25日

Clustering with fair-center representation: parameterized approximation algorithms and heuristics

Arxiv

0+阅读 · 2022年10月24日

On the optimization and pruning for Bayesian deep learning

Arxiv

0+阅读 · 2022年10月24日

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems

Arxiv

0+阅读 · 2022年10月22日

Neural Approaches to Conversational AI

Arxiv

26+阅读 · 2018年9月21日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

靶向血小板膜糖蛋白GPIbα抑制肿瘤转移的作用与分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

同步辐射技术研究Au-Cu双金属空心纳米颗粒的结构和性能

国家自然科学基金

0+阅读 · 2015年12月31日

相容幂domain结构与函数逼近结构相关问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

石墨烯量子点的可控制备及其荧光成像应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

青藏高原地表太阳辐射的遥感反演

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子在复合物中分散性定量表征及与介电性关系

国家自然科学基金

0+阅读 · 2012年12月31日

酪氨酸磷酸化信号转导网络在丙型肝炎病毒NS3致癌机理中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员