Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. In this work, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark our dataset with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yet-effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
翻译:尽管在自动汇总方面取得了巨大进展,但最先进的方法主要经过培训,能够出色地总结短线文章或具有强烈版面偏差的文件,如科学文章或政府报告等。总结金融文件,包括事实和数字的有效技术在很大程度上尚未探索,主要原因是没有合适的数据集。在这项工作中,我们介绍了由公开交易公司主持、作为文件的一套新的收入电话记录(ECT)数据集,以及从相应的路透社文章中得出的一套简短的专家书写电报式圆点摘要。ECT是长期没有结构化的文件,没有任何规定的长度限制或格式。我们用评估所制作摘要内容质量和事实一致性的各种标准的最新摘要来衡量我们的数据集。最后,我们提出了一套简单有效的方法,即ECT-BPS,以产生一套能准确反映这些呼吁中讨论的重要事实的圆点。