The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown. This problem arises in a wide range of applications including survey data, in which one needs to decide whether the anonymity of the responses can be preserved while uncovering significant statistical connections. In this work, we propose a novel optimization algorithm for shuffled linear regression based on a posterior-maximizing objective function assuming Gaussian noise prior. We compare and contrast our approach with existing methods on synthetic and real data. We show that our approach performs competitively while achieving empirical running-time improvements. Furthermore, we demonstrate that our algorithm is able to utilize the side information in the form of seeds, which recently came to prominence in related problems.
翻译:被打乱的线性回归问题旨在恢复在输入和输出之间的对应性不为人知的数据集中的线性关系。 这个问题出现在包括调查数据在内的多种应用中, 需要确定在发现重要统计联系的同时, 是否可以保留答复的匿名性。 在这项工作中, 我们提出一种新的优化算法, 以假设前高斯语噪声的后向- 最大化客观功能为基础, 来重新整理线性回归。 我们比较和对比我们的方法与合成和真实数据的现有方法。 我们显示我们的方法在取得经验性实时改进的同时, 具有竞争力。 此外, 我们证明我们的算法能够使用种子形式的侧面信息, 最近在相关问题中, 种子的侧面信息变得突出。