Extractive summarization is a crucial task in natural language processing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of ChatGPT has attracted significant interest in the NLP community due to its remarkable performance on a wide range of downstream tasks. However, concerns regarding factuality and faithfulness have hindered its practical applications for summarization systems. This paper first presents a thorough evaluation of ChatGPT's performance on extractive summarization and compares it with traditional fine-tuning methods on various benchmark datasets. Our experimental analysis reveals that ChatGPT's extractive summarization performance is still inferior to existing supervised systems in terms of ROUGE scores. In addition, we explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance. Furthermore, we find that applying an extract-then-generate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT's capabilities for faithful text summarization tasks using two-stage approaches.
翻译:提取式摘要生成是自然语言处理中的一个关键任务,其直接提取句子将长文档缩短为较短版本。近期引入的ChatGPT引起了自然语言处理界的广泛关注,因为它在多种下游任务上的表现出色。然而,对ChatGPT的准确性和输出的真实性存在担忧,这阻碍了将其实际应用于摘要系统。本文首先对ChatGPT在提取式摘要中的表现进行了全面评估,并将其与传统的微调方法进行了比较,使用各种基准数据集。我们的实验分析表明,ChatGPT的提取式摘要表现仍然低于现有监督系统在ROUGE分数方面。此外,我们还探索了上下文学习和思维链推理的有效性以增强其表现。此外,我们发现,将ChatGPT与提取然后生成的流程流水线相结合,相比摘要性强的基线模型,能够在保持摘要真实性的同时实现显着的性能改进。这些观察结果突出了使用两阶段方法加强ChatGPT在忠实文本摘要任务方面能力的潜在方向。