Chinese character riddle is a challenging riddle game which takes a single character as the solution. The riddle describes the pronunciation, shape and meaning of the solution character with rhetoric techniques. In this paper, we propose a Chinese character riddle dataset covering the majority of common simplified Chinese characters by crawling riddles from the Web and generating brand new ones. In the generation stage, we provide the Chinese phonetic alphabet, decomposition and explanation of the solution character for the generation model and get multiple riddle descriptions for each tested character. Then the generated riddles are manually filtered and the final dataset, CC-Riddle is composed of both human-written riddles and filtered generated riddles. Furthermore, we build a character riddle QA system based on our dataset and find that the existing models struggle to solve such tricky questions. CC-Riddle is now publicly available.
翻译:中文字符谜是一个具有挑战性的谜题游戏, 它使用一个单一字符作为解答。 谜题用花言巧语技术描述解答的发音、 形状和含义。 在本文中, 我们提出一个中国字符谜题数据集, 覆盖大部分普通简化的中国字符, 其方法是从网络上爬来谜语, 并生成全新的谜题。 在生成阶段, 我们提供中国语字母、 分解和解释生成模型的解谜字符, 并为每个测试字符获得多个谜题描述。 然后生成的谜题被手工过滤, 最后的数据集, CC- Riddle 由人写谜语和过滤生成的谜题组成。 此外, 我们根据我们的数据集构建了一个字符谜题QA系统, 并发现现有模型在努力解决这些棘手的问题。 CC- Riddle 现在可以公开使用。