Many applications of representation learning, such as privacy-preservation, algorithmic fairness and domain adaptation, desire explicit control over semantic information being discarded. This goal is often formulated as satisfying two potentially competing objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. In this paper, we \emph{identify and determine} two fundamental trade-offs between utility and semantic dependence induced by the statistical dependencies between the data and its corresponding target and semantic attributes. We derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs. We also derive empirical estimates of the trade-offs and show their convergence to the corresponding population counterparts. Finally, we numerically quantify the trade-offs on representative problems and compare to the solutions achieved by baseline representation learning algorithms.
翻译:代表学习的许多应用,如隐私保护、算法公正和域性适应,都希望明确控制被抛弃的语义信息。这个目标往往满足两个可能相互竞争的目标:在预测目标属性的同时,尽量发挥效用,同时对已知的语义属性保持独立或不变。在本文中,我们确认和确定:由于数据及其相应目标和语义属性之间的统计依赖性,在效用和语义依赖性之间产生了两个基本的权衡。我们根据温和的假设,为潜在的优化问题的全球选择提出了封闭式解决方案,从而得出了精确权衡的封闭式公式。我们还得出了权衡结果的经验估计,并表明它们与相应的人口对应方的趋同。最后,我们用数字来量化代表性问题的权衡,并与基线代表学习算法所实现的解决办法进行比较。