We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and is embarrassingly parallel. We consider several loss functions, including Binder loss and variation of information. We note that criticisms of Binder loss are the result of using equal penalties of misclassification and we show an efficient means to compute Binder loss with potentially unequal penalties. Furthermore, we extend the original variation of information to allow for unequal penalties and show no increased computational costs. We provide a reference implementation of our algorithm. Using a variety of examples, we show that our method produces clustering estimates that better minimize the expected loss and are obtained faster than existing methods.
翻译:我们建议一种随机的贪婪搜索算法,以根据损失函数和蒙特卡洛后遗物样本寻找随机分割的点估计值。 鉴于搜索空间的大小和尴尬的离散性质,尽可能减少后遗物预期损失是具有挑战性的。 我们的方法是基于一系列随机的贪婪优化进行随机和令人尴尬的平行的随机搜索。 我们考虑了若干损失功能,包括Binder损失和信息变异。 我们注意到,对Binder损失的批评是使用同样的分类处罚的结果,我们展示了一种以可能不平等的处罚计算Binder损失的有效手段。 此外,我们扩展了最初的信息变异,允许不平等的处罚,并且没有显示更高的计算成本。我们提供了我们的算法的参考实施。我们用多种例子显示,我们的方法生成了集束估计,以更好地尽量减少预期的损失,并且比现有方法更快的速度获得。