This work aims to evaluate the ability that both probabilistic and state-of-the-art vector space modeling (VSM) methods provide to well known machine learning algorithms to identify social network documents to be classified as aggressive, gender biased or communally charged. To this end, an exploratory stage was performed first in order to find relevant settings to test, i.e. by using training and development samples, we trained multiple algorithms using multiple vector space modeling and probabilistic methods and discarded the less informative configurations. These systems were submitted to the competition of the ComMA@ICON'21 Workshop on Multilingual Gender Biased and Communal Language Identification.
翻译:这项工作旨在评估概率和最新水平的矢量空间模型方法能够提供众所周知的机器学习算法,以确定社会网络文件被归类为攻击性、性别偏见或集体收费,为此,首先进行了探索阶段,以便找到相关环境进行测试,即通过使用培训和开发样本,我们利用多种矢量空间模型和概率方法培训了多种算法,并放弃了信息较少的配置,这些系统已提交ComMA@ICON'21多语言性别比对和社区语言识别讲习班竞争。