The quest for seeking health information has swamped the web with consumers health-related questions. Generally, con- sumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. Recently, large-scale datasets have significantly propelled the development of several summarization tasks, such as multi-document summarization and dialogue summarization. However, a lack of a domain-expert annotated dataset for the consumer healthcare questions summarization task inhibits the development of an efficient summarization system. To address this issue, we introduce a new dataset, CHQ-Sum,m that contains 1507 domain-expert annotated consumer health questions and corresponding summaries. The dataset is derived from the community question answering forum and therefore provides a valuable resource for understanding consumer health-related posts on social media. We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset
翻译:寻求健康信息的需求使得网络上充斥着消费者提出的健康相关问题。通常,消费者会使用过度描述性和边缘性信息来表达其医疗状况或其他医疗需求,这给自然语言理解带来了挑战。应对这一挑战的一种方法是总结问题并提炼原始问题的关键信息。近年来,大规模数据集显著推动了多项摘要任务的发展,例如多文档摘要和对话摘要。然而,消费者医疗问题摘要任务缺乏领域专家标注的数据集,阻碍了高效摘要系统的开发。为解决这一问题,我们引入了一个新的数据集CHQ-Sum,其中包含1507个由领域专家标注的消费者健康问题及相应摘要。该数据集源自社区问答论坛,因此为理解社交媒体上消费者发布的健康相关帖子提供了宝贵资源。我们在多种最先进的摘要模型上对该数据集进行基准测试,以证明其有效性。