基于人类反馈的模糊任务求解：MineRL BASALT 2022 竞赛回顾 (Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition) - 专知论文

会员服务 ·

0

有向 · 奖励函数 · 泛函 · 通道 · Learning ·

2023 年 3 月 23 日

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

翻译：基于人类反馈的模糊任务求解：MineRL BASALT 2022 竞赛回顾

Stephanie Milani,Anssi Kanervisto,Karolis Ramanauskas,Sander Schulhoff,Brandon Houghton,Sharada Mohanty,Byron Galbraith,Ke Chen,Yan Song,Tianze Zhou,Bingquan Yu,He Liu,Kai Guan,Yujing Hu,Tangjie Lv,Federico Malato,Florian Leopold,Amogh Raut,Ville Hautamäki,Andrew Melnik,Shu Ishida,João F. Henriques,Robert Klassert,Walter Laurito,Ellen Novoseller,Vinicius G. Goecks,Nicholas Waytowich,David Watkins,Josh Miller,Rohin Shah

To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.

翻译：为了促进基于人类反馈的基础模型微调研究，我们于 NeurIPS 2022 举办了 MineRL BASALT 人类反馈微调竞赛。BASALT 挑战要求团队开发算法来解决在 Minecraft 中难以规定奖励函数的任务。通过这个竞赛，我们旨在促进使用人类反馈作为学习期望行为的通道的算法的发展。我们描述了竞赛并概述了前几名的解决方案。文章最后讨论了竞赛的影响和未来的改进方向。

0

相关内容

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

57+阅读 · 2022年12月10日

【剑桥大学Amanda Prorok】多机器人和多智能体问题的机器学习

【剑桥大学Amanda Prorok】多机器人和多智能体问题的机器学习

专知会员服务

30+阅读 · 2022年9月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

124+阅读 · 2022年4月21日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

7 Papers & Radios | 机器人顶会RSS最佳论文；谷歌用语言模型解数学题

7 Papers & Radios | 机器人顶会RSS最佳论文；谷歌用语言模型解数学题

机器之心

1+阅读 · 2022年7月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

求解非凸随机二阶锥优化问题的无导数方法研究与应用

国家自然科学基金

0+阅读 · 2015年12月31日

带有执行器非线性的不确定非线性系统的自适应控制

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于粘性解的随机时滞方程最优控制问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多Agent的分散式网络免疫方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

寻求现实多阶段投资组合选择问题时间相容最优投资策略的高性能算法

国家自然科学基金

1+阅读 · 2012年12月31日

若干排序博弈问题的协调机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蒽醌/石墨烯纳米复合材料电极的电催化氧还原性能及其在异相electro-Fenton-like体系中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Selective imitation on the basis of reward function similarity

Arxiv

0+阅读 · 2023年5月12日

Assault and Battery: Evaluating the Security of Power Conversion Systems Against Electromagnetic Injection Attacks

Arxiv

0+阅读 · 2023年5月11日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

Arxiv

14+阅读 · 2021年9月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Explainable Recommendation: A Survey and New Perspectives

Arxiv

11+阅读 · 2018年5月13日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

57+阅读 · 2022年12月10日

【剑桥大学Amanda Prorok】多机器人和多智能体问题的机器学习

【剑桥大学Amanda Prorok】多机器人和多智能体问题的机器学习

专知会员服务

30+阅读 · 2022年9月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

124+阅读 · 2022年4月21日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版4500字 | 数字战场：解读战争中的网络电磁行动

【新书】没有标签的数据：实用的无监督机器学习

【ICML2025】因果感知对比学习用于鲁棒的多变量时间序列异常检测

Nature：大脑中的多时间尺度强化学习

相关资讯

7 Papers & Radios | 机器人顶会RSS最佳论文；谷歌用语言模型解数学题

7 Papers & Radios | 机器人顶会RSS最佳论文；谷歌用语言模型解数学题

机器之心

1+阅读 · 2022年7月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Selective imitation on the basis of reward function similarity

Arxiv

0+阅读 · 2023年5月12日

Assault and Battery: Evaluating the Security of Power Conversion Systems Against Electromagnetic Injection Attacks

Arxiv

0+阅读 · 2023年5月11日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

Arxiv

14+阅读 · 2021年9月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Explainable Recommendation: A Survey and New Perspectives

Arxiv

11+阅读 · 2018年5月13日

相关基金

求解非凸随机二阶锥优化问题的无导数方法研究与应用

国家自然科学基金

0+阅读 · 2015年12月31日

带有执行器非线性的不确定非线性系统的自适应控制

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于粘性解的随机时滞方程最优控制问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多Agent的分散式网络免疫方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类投资组合优化问题的模型与算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

寻求现实多阶段投资组合选择问题时间相容最优投资策略的高性能算法

国家自然科学基金

1+阅读 · 2012年12月31日

若干排序博弈问题的协调机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蒽醌/石墨烯纳米复合材料电极的电催化氧还原性能及其在异相electro-Fenton-like体系中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员