Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. Several prior studies have shown that ChatGPT attains remarkable generation ability compared with existing models. However, the quantitative analysis of ChatGPT's understanding ability has been given little attention. In this report, we explore the understanding ability of ChatGPT by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models. We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question-answering tasks. Additionally, by combining some advanced prompting strategies, we show that the understanding ability of ChatGPT can be further improved.
翻译:最近,ChatGPT吸引了极大的关注,因为它能够产生对人的调查的流畅和高质量的反应;前几次研究表明,与现有模型相比,ChatGPT具有惊人的生成能力;然而,对ChatGPT的理解能力的定量分析很少受到重视;在本报告中,我们探索ChatGPT的理解能力,方法是根据最受欢迎的GLUE基准对其进行评价,并将它与4个有代表性的、经过微调的BERT型模型进行比较。我们发现:(1)ChatGPT在处理引言和类似任务方面做得不够;(2)ChatGPT在大幅度的推论任务方面优于BERT的所有模型;(3)ChatGPT在情绪分析和问答任务方面达到与BERT的类似业绩。此外,我们通过将一些先进的快速战略结合起来,表明ChatGPT的理解能力可以进一步提高。</s>