在Dyadic互动环境里,多重反弹:什么、为什么和如何? (Multiple Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?)

According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.

翻译：根据刺激性有机体反应理论,所有人类行为反应都受到环境的刺激,人们将处理得到的刺激并产生适当的反应,这意味着在特定的投入刺激的特定背景下,一个人可以根据其内部状况和其他背景因素作出不同的反应。在三角互动中,人使用口头和非口头暗示进行交流,听众的非口头反应可能适合对特定演讲者行为作出反应。已经存在一套工作,对自动产生对特定投入的适当反应的问题进行调查。然而,没有人试图在三角互动背景下自动产生多种适当的反应,并用客观措施评估这些反应的恰当性。本文首先在文献中首次界定了面部多适当反应组的任务,并提出了一套新的客观评价指标,用以评价所产生反应的恰当性。该文件随后提出了一个框架,用以预测、产生和评价多种适当的面部反应。