In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social medias, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD - a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.
翻译:近年来,越南见证了社会网络用户在脸书、Youtube、Instagram和Tiktok等不同社会平台上的大规模发展。在社交媒体上,仇恨言论已成为社会网络用户的关键问题。为了解决这个问题,我们引入了ViHSD这个带有人文附加说明的数据集,用于自动检测社交网络上的仇恨言论。这个数据集包含30,000多条评论,数据集中的每个评论都有三个标签之一:CLEAN、OFENSIVE或HATE。此外,我们引入了数据创建程序,用于说明和评估数据集的质量。最后,我们评估了深层学习模型和变异模型的数据集。