The quest for seeking health information has swamped the web with consumers' health-related questions. Generally, consumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. To address this issue, we introduce a new dataset, CHQ-Summ that contains 1507 domain-expert annotated consumer health questions and corresponding summaries. The dataset is derived from the community question-answering forum and therefore provides a valuable resource for understanding consumer health-related posts on social media. We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset.
翻译:寻求健康信息的努力在网上充斥着与消费者健康有关的问题,一般而言,消费者使用描述性过强的外围信息来表达其医疗状况或其他保健需求,从而帮助应对自然语言理解的挑战; 应对这一挑战的一种方法是总结问题和提炼原始问题的关键信息; 为解决这一问题,我们引入一个新的数据集,CHQ-Summ, 包含1 507个域-专家附加说明的消费者健康问题和相应的摘要; 数据集来自社区问答论坛,因此为了解社交媒体上与消费者健康有关的职位提供了宝贵的资源。 我们把数据集以多种最先进的汇总模型作为基准,以显示数据集的有效性。