Machine learning (ML) enabled classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation. In building these models, data scientists need to take a stance on the legitimacy, authoritativeness and objectivity of the sources of `truth' used for model training and testing. This has political, ethical and epistemic implications which are rarely addressed in technical papers. Despite (and due to) their reported high performance, ML-driven moderation systems have the potential to shape online public debate and create downstream negative impacts such as undue censorship and reinforcing false beliefs. This article reports on a responsible innovation (RI) inflected collaboration at the intersection of social studies of science and data science. We identify a series of algorithmic contingencies--key moments during model development which could lead to different future outcomes, uncertainty and harmful effects. We conclude by offering an agenda of reflexivity and responsible development of ML tools for combating misinformation.
翻译:在建立这些模型时,数据科学家需要就用于示范培训和测试的“真相”来源的合法性、权威性和客观性采取立场,这具有政治、伦理和认知方面的影响,而技术文件很少涉及这些影响。尽管(和由于)据报告这些模式表现很高,但由ML驱动的温和系统有可能形成在线公众辩论,并造成下游负面影响,例如不当审查和加强虚假信仰。本篇文章报告了在科学和数据科学的社会研究的交叉点上进行负责任的创新(RI),我们确定了在模型开发过程中一系列算法应急-关键时刻,这可能导致不同的未来结果、不确定性和有害影响。我们最后提出一个反射和负责任地开发ML工具以打击错误信息的议程。