Just because some purely recurrent models suffer from being hard to optimize and inefficient on today's hardware, they are not necessarily bad models of language. We demonstrate this by the extent to which these models can still be improved by a combination of a slightly better recurrent cell, architecture, objective, as well as optimization. In the process, we establish a new state of the art for language modelling on small datasets and on Enwik8 with dynamic evaluation.
翻译:仅仅因为有些纯粹的循环模型很难优化且在今天的硬件上效率低下,并不意味着它们不是语言模型的好选择。我们通过结合略微改进的循环单元、体系结构、目标和优化方法,展示了这一点。通过这个过程,我们在小数据集和动态评估的 Enwik8 上建立了一个新的语言建模最佳结果。