Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.
翻译:社交媒体平台努力通过内容调适来保护用户免受有害内容的伤害。这些平台最近利用了机器学习模式来应对每天大量用户生成的内容。由于温适政策因国家和产品类型而异,因此通常按政策培训和部署模式。然而,这种做法效率极低,特别是在政策变化要求数据集重新标签和对转移的数据分发进行模式再培训的情况下。为减轻这种成本低效率,社交媒体平台经常使用第三方内容调控服务,提供多种子任务分数的预测分数,如预测是否存在未成年人、粗鲁手势或武器,而不是直接提供最终的温和决定。然而,从多个子任务对具体目标政策的预测分数中做出可靠的自动调和决定,尚未得到广泛探讨。在本研究中,我们制定了真实而有效的内容调适情景,并引入简单而有效的门槛优化方法,以寻找多个子任务的最佳阈值,以具有成本效益的方式做出可靠的温和决定。广泛的实验表明,我们的方法在内容调适度方面比现有的阈值优化方法和超感力力。