We study the problem of learning to cluster data points using an oracle which can answer same-cluster queries. Different from previous approaches, we do not assume that the total number of clusters is known at the beginning and do not require that the true clusters are consistent with a predefined objective function such as the K-means. These relaxations are critical from the practical perspective and, meanwhile, make the problem more challenging. We propose two algorithms with provable theoretical guarantees and verify their effectiveness via an extensive set of experiments on both synthetic and real-world data.
翻译:我们研究利用一个能够回答同一组问题的神谕来学习集群数据点的问题,与以前的方法不同,我们不认为最初的集群总数是已知的,也不要求真正的集群符合诸如K手段等预先确定的客观功能。这些放松从实际角度看至关重要,同时使问题更具挑战性。我们提出了两种具有可证实的理论保证的算法,并通过对合成数据和实际数据进行一系列广泛的实验来核查其有效性。