In this report, we describe our Transformers for euphemism detection baseline (TEDB) submissions to a shared task on euphemism detection 2022. We cast the task of predicting euphemism as text classification. We considered Transformer-based models which are the current state-of-the-art methods for text classification. We explored different training schemes, pretrained models, and model architectures. Our best result of 0.816 F1-score (0.818 precision and 0.814 recall) consists of a euphemism-detection-finetuned TweetEval/TimeLMs-pretrained RoBERTa model as a feature extractor frontend with a KimCNN classifier backend trained end-to-end using a cosine annealing scheduler. We observed pretrained models on sentiment analysis and offensiveness detection to correlate with more F1-score while pretraining on other tasks, such as sarcasm detection, produces less F1-scores. Also, putting more word vector channels does not improve the performance in our experiments.
翻译:在本报告中,我们描述我们的变换器,以进行委婉检测基准(TEDB),作为2022年委婉检测的共同任务。我们把预测委婉作为文本分类的任务。我们把基于变换器的模型视为目前最先进的文本分类方法。我们探索了不同的培训计划、预培训模型和模型结构。我们得到的0.816 F1核心(0.818精确度和0.814回忆)的最佳结果包括一种委婉-检测-对调 TweetEval/TimeLMs预先训练的RoBERTA模型,作为与KimCNN分类培训的终端至终端分析后端的特征提取器。我们观察到了情绪分析和攻击性检测的预培训模型与更多的F1核心相关联,而其他任务的培训,如arcasum探测,则产生较少F1-核心。此外,用更多字词的矢量渠道并不能改善我们实验的性能。