While multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, existing work has focused almost exclusively on communication with discrete symbols. Human communication often takes place (and emerged) over a continuous acoustic channel; human infants acquire language in large part through continuous signalling with their caregivers. We therefore ask: Are we able to observe emergent language between agents with a continuous communication channel trained through reinforcement learning? And if so, what is the impact of channel characteristics on the emerging language? We propose an environment and training methodology to serve as a means to carry out an initial exploration of these questions. We use a simple messaging environment where a "speaker" agent needs to convey a concept to a "listener". The Speaker is equipped with a vocoder that maps symbols to a continuous waveform, this is passed over a lossy continuous channel, and the Listener needs to map the continuous signal to the concept. Using deep Q-learning, we show that basic compositionality emerges in the learned language representations. We find that noise is essential in the communication channel when conveying unseen concept combinations. And we show that we can ground the emergent communication by introducing a caregiver predisposed to "hearing" or "speaking" English. Finally, we describe how our platform serves as a starting point for future work that uses a combination of deep reinforcement learning and multi-agent systems to study our questions of continuous signalling in language learning and emergence.
翻译:虽然多剂强化学习已被作为一种有效手段用于研究代理人之间突发通信的有效手段,但现有工作几乎完全侧重于与离散符号的通信。人类通信经常通过连续的声频频道进行(和出现 ) ; 人类婴儿在很大程度上通过与其照顾者的连续信号获得语言。 因此,我们问: 我们是否能够在经过强化学习培训的连续通信频道的代理人之间观测突发语言? 如果能够,频道特性对新兴语言的影响是什么? 我们提出一种环境和培训方法,作为初步探索这些问题的手段。 我们使用一个简单的信息传递环境,在这个环境中,“说者”的代理人需要向“听者”传递一个概念。 演讲者配备了一个电动代码,将符号映射成连续波形,而听众则需要绘制这个概念的持续信号。 我们通过深层的学习,我们发现基本的构成系统是沟通渠道,在传递看不见的概念组合时,我们发现噪音是必不可少的。 并且我们开始用一个电路标来绘制一个连续的波形平台,“我们学习一个未来的学习点 ” 学习一个“我们未来的强化语言 ” 。