Metaphor generation is a challenging task which can impact many downstream tasks such as improving user satisfaction with dialogue systems and story generation. This paper tackles the problem of Chinese nominal metaphor generation by introducing a multitask metaphor generation framework with self-training and metaphor identification mechanisms. Self-training addresses the data scarcity issue of metaphor datasets. That is, instead of solely relying on labelled metaphor datasets which are usually small in size, self-training helps identify potential metaphors from a large-scale unlabelled corpus for metaphor generation. The metaphor weighting mechanism enables our model to focus on the metaphor-related parts of the input (e.g., the comparison of the metaphor and comparator) during model learning and thus improves the metaphoricity of the generated metaphors. Our model is trained on an annotated corpus consisting of 6.3k sentences that contain diverse metaphorical expressions. Experimental results show that our model is able to generate metaphors with better readability and creativity compared to the baseline models, even in the situation where training data is insufficient.
翻译:名词生成是一项具有挑战性的任务,可以影响许多下游任务,例如提高用户对对话系统和故事生成的满意度。本文通过采用自我培训和隐喻识别机制,采用多任务比喻生成框架,解决中国名义比喻生成问题。自我培训涉及隐喻数据集的数据稀缺问题。也就是说,自我培训不仅依靠通常规模小的贴标签比喻数据集,还有助于识别大规模无标签生成隐喻的潜在隐喻。隐喻加权机制使我们的模型能够在模型学习期间侧重于投入中与隐喻有关的部分(例如比喻和参照比喻的比较),从而改进生成的隐喻的隐喻性。我们的模型在包含多种隐喻表达的6.3k句子的附加体上接受了培训。实验结果表明,我们的模型能够产生比基线模型更易读性和创造性的隐喻,即使在培训数据不足的情况下也是如此。