Recently developed offline reinforcement learning algorithms have made it possible to learn policies directly from pre-collected datasets, giving rise to a new dilemma for practitioners: Since the performance the algorithms are able to deliver depends greatly on the dataset that is presented to them, practitioners need to pick the right dataset among the available ones. This problem has so far not been discussed in the corresponding literature. We discuss ideas how to select promising datasets and propose three very simple indicators: Estimated relative return improvement (ERI) and estimated action stochasticity (EAS), as well as a combination of the two (COI), and empirically show that despite their simplicity they can be very effectively used for dataset selection.
翻译:最近开发的离线强化学习算法使直接从预先收集的数据集中学习政策成为可能,给实践者带来了新的两难处境:由于算法能够提供的业绩在很大程度上取决于提供给他们的数据集,实践者需要从现有的数据集中挑选正确的数据集。这个问题迄今尚未在相应的文献中讨论。我们讨论了如何选择有希望的数据集的想法,并提出了三个非常简单的指标:估计相对回报改善(ERI)和估计的行动随机性(EAS),以及两者的结合(COI),并实际表明,尽管它们很简单,但可以非常有效地用于数据集的选择。