Coreference resolution -- which is a crucial task for understanding discourse and language at large -- has yet to witness widespread benefits from large language models (LLMs). Moreover, coreference resolution systems largely rely on supervised labels, which are highly expensive and difficult to annotate, thus making it ripe for prompt engineering. In this paper, we introduce a QA-based prompt-engineering method and discern \textit{generative}, pre-trained LLMs' abilities and limitations toward the task of coreference resolution. Our experiments show that GPT-2 and GPT-Neo can return valid answers, but that their capabilities to identify coreferent mentions are limited and prompt-sensitive, leading to inconsistent results.
翻译:合作解决是理解话语和整个语言的一项关键任务,但尚未看到大型语言模式(LLMs)的广泛好处。 此外,共同解决系统主要依赖监管标签,这些标签费用昂贵,难以批注,因此时机已经成熟,可以迅速进行工程设计。在本文中,我们引入了基于QA的快速工程方法,并辨别了\textit{generative}、预先培训的LMs的能力和对共同解决任务的局限性。我们的实验表明,GPT-2和GPT-Neo可以找到有效的答案,但是它们识别核心提及的能力是有限和敏捷的,导致不一致的结果。