Classic information extraction techniques consist in building questions and answers about the facts. Indeed, it is still a challenge to subjective information extraction systems to identify opinions and feelings in context. In sentiment-based NLP tasks, there are few resources to information extraction, above all offensive or hateful opinions in context. To fill this important gap, this short paper provides a new cross-lingual and contextual offensive lexicon, which consists of explicit and implicit offensive and swearing expressions of opinion, which were annotated in two different classes: context dependent and context-independent offensive. In addition, we provide markers to identify hate speech. Annotation approach was evaluated at the expression-level and achieves high human inter-annotator agreement. The provided offensive lexicon is available in Portuguese and English languages.
翻译:传统信息提取技术包括建立对事实的问答,事实上,这仍然是主观信息提取系统在识别背景中的观点和感觉方面所面临的挑战。在基于情绪的NLP任务中,信息提取资源很少,尤其是攻击性或仇恨性观点。为填补这一重要空白,这份短文提供了一个新的跨语言和背景攻击性词汇,由明确和隐含的冒犯和咒骂性意见表达组成,在两种不同的类别中作了说明:背景依赖和背景独立的攻击。此外,我们还提供了识别仇恨言论的标志。在表达层面对批注方法进行了评估,并达成了高水平的人类间批注协议。所提供的攻击性词汇以葡萄牙语和英语提供。