Dungeons和数据:大型网络数据包数据集 (Dungeons and Data: A Large-Scale NetHack Dataset)

from arxiv, 9 pages, to be published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.

翻译：Go、StarCraft或DOTA等机构在开发解决具有挑战性的连续决策问题的代理人方面的近期突破,依赖模拟环境和大规模数据集,然而,由于公开源码数据集的缺乏以及与之合作的计算成本过高,这一研究的进展受到阻碍。这里我们展示了NetHack学习数据集(NLD),这是来自NetHack流行游戏(NetHack)的大规模和高度可扩缩的轨迹数据集,对于目前的方法来说,这都是极具挑战性的,而且运行速度非常快。全国民主联盟由三部分组成:从2009年至2020年在NAO公共NetHack服务器上收集的150万个人类轨迹从150万个州级转换到150万个州级转换;从NetHack挑战2021号象征性赢家收集的100 000个轨迹上30亿个州级行动核心转换,显示用户以高度压缩的形式记录、装载和流传任何此类轨迹的代码。我们评估了广泛的现有算法,包括在线和离线式RL,以及从从具有挑战性的连续分析需要的大规模进展,以充分显示从具有挑战性的连续优势的研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日