Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator, that jointly evaluates the realism of interactions and individual action sequences and operates at different time scales. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. Experiments show that albeit dealing with low dimensional data, SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines, and we argue that this work will constitute a first stone towards higher dimensional and multimodal interaction generation. Evaluations are conducted using classical GAN metrics, that we specifically adapt for discrete sequential data. Our model is shown to properly learn the dynamics of interaction sequences, while exploiting the full range of available actions.
翻译:社会互动中的人类行动的预测在设计社会机器人或人工变异器方面有着重要的应用。在本文中,我们侧重于互动的单一形式代表,并提议以数据驱动的方式处理互动生成。特别是,我们模拟人类互动生成是一个离散的多序列生成问题,并提出了社交互动GAN,这是用于有条件互动生成的新颖的对立结构。我们的模型建立在反复出现的编码解码器生成网络和双流分析器的基础上,共同评价互动和个人行动序列的现实主义,并在不同的时间尺度上运作。关于互动参与者的背景信息在代理人之间共享,并在新一代和歧视性评估过程中重新注入。实验表明,尽管我们处理的是低维数据,但社会互动GAN成功地产生了互动人群的高现实主义行动序列,将它与反复出现和变异式歧视者基线的多样性作比较,我们认为,这项工作将构成通往更高维度和多式互动生成的第一块石。评估正在使用古典的GAN计量仪进行。关于互动参与者的背景信息,在生成者和分析过程中,我们具体调整了离层数据的序列。