According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
翻译:根据刺激器官反应(Stimulus Organism Response, SOR)理论,所有人类行为反应都受到情境的刺激,人们将处理收到的刺激并产生相应的反应。这意味着在特定的情境下,对于给定的输入刺激,一个人可以根据其内部状态和其他情境因素以不同的方式做出反应。类比地,在双人互动中,人们使用言语和非言语语言进行交流,其中对于回应特定的说话者行为来说,广泛的听者非语言反应可能是合适的。已经有一些工作研究了如何针对给定的输入自动生成恰当的反应,但是没有人试图在双人互动的情境下自动生成多个合适的反应,并使用客观的指标来评估这些反应的适当性。本文首先在文献中第一次定义了面部多元合适反应生成(facial Multiple Appropriate Reaction Generation,fMARG)任务,并提出了一组新的客观评估指标来评估生成的反应的适当性。接下来,本文介绍了一种框架,用于预测、生成和评估多个合适的面部反应。