Topic modeling, or more broadly, dimensionality reduction, techniques provide powerful tools for uncovering patterns in large datasets and are widely applied across various domains. We investigate how Non-negative Matrix Factorization (NMF) can introduce bias in the representation of data groups, such as those defined by demographics or protected attributes. We present an approach, called Fairer-NMF, that seeks to minimize the maximum reconstruction loss for different groups relative to their size and intrinsic complexity. Further, we present two algorithms for solving this problem. The first is an alternating minimization (AM) scheme and the second is a multiplicative updates (MU) scheme which demonstrates a reduced computational time compared to AM while still achieving similar performance. Lastly, we present numerical experiments on synthetic and real datasets to evaluate the overall performance and trade-offs of Fairer-NMF
翻译:暂无翻译