We conducted a human subject study of named entity recognition on a noisy corpus of conversational music recommendation queries, with many irregular and novel named entities. We evaluated the human NER linguistic behaviour in these challenging conditions and compared it with the most common NER systems nowadays, fine-tuned transformers. Our goal was to learn about the task to guide the design of better evaluation methods and NER algorithms. The results showed that NER in our context was quite hard for both human and algorithms under a strict evaluation schema; humans had higher precision, while the model higher recall because of entity exposure especially during pre-training; and entity types had different error patterns (e.g. frequent typing errors for artists). The released corpus goes beyond predefined frames of interaction and can support future work in conversational music recommendation.
翻译:我们与许多非常规和新颖实体一道,对一个吵闹的谈话式音乐建议查询堆中的名称实体的识别进行了人类主题研究;我们评估了这些挑战性条件下的人类净化语言行为,并将其与当今最常见的净化系统、微调变压器进行了比较;我们的目标是了解指导设计更好的评价方法和净化算法的任务;结果显示,在严格的评价计划下,我们背景下的人类和算法都很难发现净化;人类的精确度较高,而模型的精确度较高,因为实体特别在培训前暴露;实体类型有不同的错误模式(例如艺术家经常打字错误 ) 。 释放的模型超越了预先界定的互动框架,可以支持对话式音乐建议的未来工作。</s>