Toxic misinformation campaigns have caused significant societal harm, e.g., affecting elections and COVID-19 information awareness. Unfortunately, despite successes of (gold standard) retrospective studies of misinformation that confirmed their harmful effects after the fact, they arrive too late for timely intervention and reduction of such harm. By design, misinformation evades retrospective classifiers by exploiting two properties we call new-normal: (1) never-seen-before novelty that cause inescapable generalization challenges for previous classifiers, and (2) massive but short campaigns that end before they can be manually annotated for new classifier training. To tackle these challenges, we propose UFIT, which combines two techniques: semantic masking of strong signal keywords to reduce overfitting, and intra-proxy smoothness regularization of high-density regions in the latent space to improve reliability and maintain accuracy. Evaluation of UFIT on public new-normal misinformation data shows over 30% improvement over existing approaches on future (and unseen) campaigns. To the best of our knowledge, UFIT is the first successful effort to achieve such high level of generalization on new-normal misinformation data with minimal concession (1 to 5%) of accuracy compared to oracles trained with full knowledge of all campaigns.
翻译:不幸的是,尽管(古老的标准)对错误信息进行了追溯性研究,证实错误信息在事后产生了有害影响,但结果却太迟,无法及时干预和减少这种伤害。通过设计,错误信息回避了追溯性分类者,利用了我们称之为新常态的两种特性:(1) 前所未有的新颖做法,给以前的分类者带来了无法避免的概括化挑战;(2) 大规模但短时间的运动,在它们能够人工为新的分类培训附加说明之前结束。为了应对这些挑战,我们建议UFIT将两种技术结合起来:用语义化的语义化掩码来减少过度配配配配配,在潜在空间对高密度地区进行结构化,以提高可靠性和准确性。根据公共新常态错误数据对UFIT的评估表明,与以往(和隐秘)运动的现有方法相比,已有30%以上的改进。根据我们所知,UFIT是首次成功地努力在新常态错误数据上实现高度的通用化,只有最低限度的减让性(1%至5%),或经过充分训练的准确性运动的知识,比全部精准性运动的精准性(1%至5%)。