培训前深强化学习新政策,以识别语言情感 (A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition)

Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment is called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely being explored for challenging tasks like Speech Emotion Recognition (SER). The deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotional-aware response to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate. Pre-training with cross dataset was also studied to discover the feasibility of pre-training the RL Agent with a similar dataset in a scenario of where no real environmental data is not available. IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognize four emotions happy, sad, angry and neutral in the utterances provided. Experimental results show that the proposed "Zeta policy" performs better than existing policies. The results also support that pre-training can reduce the training time upon reducing the warm-up period and is robust to cross-corpus scenario.

翻译：强化学习(RL)是一个半监督的学习模式,一个代理机构通过与环境互动学习。与RL一起深层学习,提供了一个学习如何与环境互动的有效方法,称为深强化学习(deep RL) 。深RL在赌博中取得了巨大的成功,如阿尔法戈,但其潜力很少被探索用于具有挑战性的任务,如言语情感识别(SER)等。为SER使用的深度RL可能通过动态地学习对客户询问作出感知反应来改善自动呼叫中心代理机构的业绩。虽然RL代理机构采用的政策在行动选择中发挥着主要作用,但目前没有为SER量身定制的RL政策。此外,对于深RLL政策来说,延长学习期是一个总的挑战,这可能影响SER的学习速度。因此,在本文中,我们引入了一个新的政策――“Zeta政策”,这是为SER公司定制的,在深RLL中培训前可以更快地获得学习率。在交叉数据设置前还进行了研究,以发现对RLZ(RLA)进行预先培训的可行性,而目前的RL代理机构政策中没有使用类似数据,在SAVARE的情景中提供真实的运行数据,因此,因此,因此没有提供真实的运行数据,因此,因此,因此,而没有使用真实的运行数据可以显示真实数据,因此,而没有提供更精确的运行状态数据。