Sentiment analysis (SA) systems, though widely applied in many domains, have been demonstrated to produce biased results. Some research works have been done in automatically generating test cases to reveal unfairness in SA systems, but the community still lacks tools that can monitor and uncover biased predictions at runtime. This paper fills this gap by proposing BiasRV, the first tool to raise an alarm when a deployed SA system makes a biased prediction on a given input text. To implement this feature, BiasRV dynamically extracts a template from an input text and from the template generates gender-discriminatory mutants (semantically-equivalent texts that only differ in gender information). Based on popular metrics used to evaluate the overall fairness of an SA system, we define distributional fairness property for an individual prediction of an SA system. This property specifies a requirement that for one piece of text, mutants from different gender classes should be treated similarly as a whole. Verifying the distributional fairness property causes much overhead to the running system. To run more efficiently, BiasRV adopts a two-step heuristic: (1) sampling several mutants from each gender and checking if the system predicts them as of the same sentiment, (2) checking distributional fairness only when sampled mutants have conflicting results. Experiments show that compared to directly checking the distributional fairness property for each input text, our two-step heuristic can decrease overhead used for analyzing mutants by 73.81% while only resulting in 6.7% of biased predictions being missed. Besides, BiasRV can be used conveniently without knowing the implementation of SA systems. Future researchers can easily extend BiasRV to detect more types of bias, e.g. race and occupation.
翻译:尽管在许多领域广泛应用了感官分析(SA)系统,但事实证明,这些系统产生了偏差结果。有些研究工作是在自动生成测试案例以揭示SA系统不公平现象方面进行的,但社区仍然缺乏能够监测和发现运行时有偏差预测的工具。本文通过提出BiasRV填补了这一差距。BiasRV是第一个在部署的SA系统对特定输入文本作出偏差预测时发出警报的工具。为了执行这一功能,BiasRV动态地从输入文本和模板中提取了一个模板,产生了性别歧视变异体(模拟等值文本只在性别信息方面有所不同)。根据用来评价SA系统总体公平性的工具,我们为SA系统的个人预测定义了分配公平性财产。对于一个文本,不同性别等级的变异体应该作为一个整体地对待。验证分配公平性财产给运行系统带来很大的偏差。为了更方便地运行,BiasRV采用两步高的偏向性变异体:(1)从每个R系统取样数个变异性(egers) 比较每个变异性数据,如果直接使用Serviewalalal views views views views views reviews reviewd views reviews) views reviews reviewd views viewd viewd viewd viewd viewdds views views wass views viewd viewd views viewd viewd viewdal viewdd viewd viewdald viewds viewddddsmusddddds lids