Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.
翻译:数据开采方面的许多根本问题可以简化为一种或多种NP硬组合优化问题。最近,量子和量子驱动硬件等新技术的进步,与使用通用计算机相比,有望大大加快解决这些问题的速度,但往往要求以特殊的形式,如Ising或四边不限制的二进制优化模式(QUBO)来模拟问题,以便利用这些装置。在这项工作中,我们侧重于重要的二进制矩阵(BMF)问题,这在数据开采中有许多应用。我们为BMF建议了两种QUBO配方。我们展示了这些配方中如何容易地纳入集群限制。我们认为,特殊目的硬件在它能够处理的变量数量上是有限的,在将大型矩阵因素化时,这种变量会构成挑战。我们提出了一种基于取样的方法来克服这一挑战,使我们能够将大型矩形矩阵因素化。除了这些方法外,我们还提出了一种简单的基线算法,在少数情况下比我们更复杂的方法要好。我们在Sjitsu Annaaler公司进行两个数字式的组合限制,我们所考虑的特殊目的硬件硬件组合,即能用来模拟数据实验,这些合成的金属-MFMMMUC,这些数据是真实的实验,它们都能显示的。