This work considers clustering nodes of a largely incomplete graph. Under the problem setting, only a small amount of queries about the edges can be made, but the entire graph is not observable. This problem finds applications in large-scale data clustering using limited annotations, community detection under restricted survey resources, and graph topology inference under hidden/removed node interactions. Prior works treated this problem as a convex optimization-based matrix completion task. However, this line of work is designed for learning single cluster membership of nodes belonging to disjoint clusters, yet mixed (multiple) cluster membership nodes and overlapping clusters often arise in practice. Existing works also rely on the uniformly random edge query pattern and nuclear norm-based optimization, which give rise to a number of implementation and scalability challenges. This work aims at learning mixed membership of the nodes of overlapping clusters using edge queries. Our method offers membership learning guarantees under systematic query patterns (as opposed to random ones). The query patterns can be controlled and adjusted by the system designers to accommodate implementation challenges---e.g., to avoid querying edges that are physically hard to acquire. Our framework also features a lightweight and scalable algorithm. Real-data experiments on crowdclustering and community detection are used to showcase the effectiveness of our method.
翻译:这项工作考虑的是基本上不完全的图形的组合节点。 在问题设置中, 只能对边缘进行少量询问, 但整个图表无法观测。 这个问题在隐藏/ 移动节点的相互作用下, 在隐藏/ 移动节点中, 通过有限的注释、 受限制的调查资源下社区探测和图形表层地形推断, 在大型数据组群中发现应用。 先前的工作将这一问题作为基于优化的组合矩阵完成任务处理 。 但是, 这项工作是为了学习属于脱节组群的节点的单个分组成员, 然而, 在实践中经常出现混合( 多个) 分组成员点和重叠组群群。 现有的工作还依赖于统一的随机边缘查询模式和基于核规范的优化, 从而导致一些执行和可缩放的挑战。 这项工作的目的是通过边缘查询, 学习重叠组群点的混合成员。 我们的方法在系统查询模式下提供成员学习保证( 与随机查询) 。 系统设计师可以控制和调整查询模式, 以适应执行挑战- 例如, 避免在实际上难以获得的边缘点和基于核规范的核标准优化 。 我们的框架模型的检测和测量方法的光量和比例分析。 我们的框架模型是用来显示的光量和光度 。