COFFEE: 可解释建议中个人化文本生成的反事实公平 (COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation)

Personalized text generation has broad industrial applications, such as explanation generation for recommendations, conversational systems, etc. Personalized text generators are usually trained on user written text, e.g., reviews collected on e-commerce platforms. However, due to historical, social, or behavioral reasons, there may exist bias that associates certain linguistic quality of user written text with the users' protected attributes such as gender, race, etc. The generators can identify and inherit these correlations and generate texts discriminately w.r.t. the users' protected attributes. Without proper intervention, such bias can adversarially influence the users' trust and reliance on the system. From a broader perspective, bias in auto-generated contents can reinforce the social stereotypes about how online users write through interactions with the users. In this work, we investigate the fairness of personalized text generation in the setting of explainable recommendation. We develop a general framework for achieving measure-specific counterfactual fairness on the linguistic quality of personalized explanations. We propose learning disentangled representations for counterfactual inference and develop a novel policy learning algorithm with carefully designed rewards for fairness optimization. The framework can be applied for achieving fairness on any given specifications of linguistic quality measures, and can be adapted to most of existing models and real-world settings. Extensive experiments demonstrate the superior ability of our method in achieving fairness while maintaining high generation performance.

翻译：个人化文本生成具有广泛的工业性应用,例如建议的解释、对话系统等。个人化文本生成者通常接受用户书面文本的培训,例如电子商务平台上收集的审查。然而,由于历史、社会或行为原因,可能存在偏见,将用户书面文本的某些语言质量与用户的性别、种族等受保护属性联系起来。生成者可以识别和继承这些相关性,产生用户受保护属性的文本。如果没有适当的干预,这种个人化文本生成者可以对用户的信任和对系统的依赖产生对抗性影响。从更广的角度来看,自动生成内容中的偏见可以强化关于用户如何通过与用户互动进行在线书写的社会陈规定型观念。在这项工作中,我们调查个人化文本生成的公平性与用户的性别、种族等受保护属性。我们为在个人化解释的语言质量方面实现具体计量的反事实公平性制定了一个总体框架。我们建议学习不愉快的表述,以反事实推论为依据,并制订新的政策学习算法,精心设计了对公平性调整的奖励。从广义上讲,可以应用这一框架,同时在任何语言质量的高度标准上实现公正性的公平性,同时可以应用任何高水平的实验,并展示现有的标准。