In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.
翻译:近年来,强化学习和强盗改变了一系列广泛的现实应用,包括医疗、金融、推荐系统、机器人、语言和自然语言处理。 虽然加强学习算法的大多数语言应用都围绕着以灵活优化特性改进深神经网络的培训,但仍有许多理由探索利用强化学习的好处,如奖励驱动的适应性、州代表制、时间结构和可概括性。在本次调查中,我们概述了最近加强学习和强盗的进展,并讨论了如何有效地利用这些应用来解决语言和自然语言处理问题,并采用适应性、互动性和可扩展性模型。