In this paper, we present a semi-automated framework called AMUSED for gathering multi-modal annotated data from the multiple social media platforms. The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process. From a given list of the articles from professional news media or blog, AMUSED detects links to the social media posts from news articles and then downloads contents of the same post from the respective social media platform to gather details about that specific post. The framework is capable of fetching the annotated data from multiple platforms like Twitter, YouTube, Reddit. The framework aims to reduce the workload and problems behind the data annotation from the social media platforms. AMUSED can be applied in multiple application domains, as a use case, we have implemented the framework for collecting COVID-19 misinformation data from different social media platforms.
翻译:在本文中,我们提出了一个半自动化框架,称为AMUSED,用于从多个社交媒体平台收集多式附加说明的数据,目的是通过在数据收集过程中将机器与人紧密结合,减轻收集和说明社交媒体数据的问题。从专业新闻媒体或博客的一篇文章清单中,AMUSED从新闻报道中检测到与社交媒体文章的链接,然后从相关社交媒体平台下载同一文章的内容,以收集关于该具体文章的细节。该框架能够从多个平台,如Twitter、YouTube、Reddit获取附加说明的数据。该框架旨在减少社交媒体平台数据注释背后的工作量和问题。AMUSED可以应用到多个应用领域,作为使用实例,我们实施了从不同社交媒体平台收集COVID-19错误数据的框架。