学习如何响应粘贴者:多发对话框中统一多式框架 (Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog)

from arxiv, Accepted by The Web Conference 2020 (WWW 2020). Equal contribution from first two authors. Dataset and code are released at https://github.com/gsh199449/stickerchat

Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.

翻译：具有生动和感动表达式的粘贴剂在网上短信应用程序中越来越受欢迎,有些作品致力于通过将标签标签标签与先前的语句匹配,自动选择粘贴剂。然而,由于数量庞大,要求所有标签标签都贴上文字标签是不切实际的。因此,在本文中,我们建议根据多转对话框背景历史向用户推荐适当的粘贴剂,而无需任何外部标签。在这项任务中面临两大挑战。一个是学习粘贴剂的语义含义,而没有相应的文本标签。另一个挑战是将候选人粘贴剂与多转对话框环境相匹配。为了应对这些挑战,我们提议采用粘贴贴剂选择(SRS)的文本标签选择(SRS)模式。具体地,SRS首先使用基于粘贴标签的图像编码标签标签标签标签标签标签标签,然后用基于多转动式对话框的自动标签标签标签标签标签,然后用我们所有互动的直径直线和长的直径直径直径直径直径对调的直径直径直径直径直径直径直径直径。