Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating such unsupervised entropy-based losses for discriminative neural networks. Recent self-labeling methods based on such losses represent the state of the art in deep clustering. However, some important properties of entropy clustering are not well-known, or even misunderstood. For example, we provide a counterexample to prior claims about equivalence to variance clustering (K-means) and point out technical mistakes in such theories. We discuss the fundamental differences between these discriminative and generative clustering approaches. Moreover, we show the susceptibility of standard entropy clustering to narrow margins and motivate an explicit margin maximization term. We also propose an improved self-labeling loss; it is robust to pseudo-labeling errors and enforces stronger fairness. We develop an EM algorithm for our loss that is significantly faster than the standard alternatives. Our results improve the state-of-the-art on standard benchmarks.
翻译:模型输入和输出之间的相互信息最大化,正式与软分子预测的“确定性”和“公平性”有关,促使歧视性神经网络出现这种未受监督的星盘损失。最近基于这种损失的自我标签方法代表了深层群集的先进程度。然而,诱变组群的一些重要特性并不广为人知,甚至没有被误解。例如,我们为先前关于与差异组群(K- means)等同的主张提供了反证,并指出了这些理论的技术错误。我们讨论了这些有区别和基因组群方法之间的根本差异。此外,我们显示了标准的诱变聚群群群的易感性以缩小边际,并激发了明确的边际最大化术语。我们还提出了改进的自我标签损失;它对于假标签错误和强化公平性是十分有力的。我们开发了一种损失的EM算法,比标准替代方法要快得多。我们的成果改进了标准基准的状态。