We create a mixed-integer optimization (MIO) approach for doing cluster-aware regression, i.e. linear regression that takes into account the inherent clustered structure of the data. We compare to the linear mixed effects regression (LMEM) which is the most used current method, and design simulation experiments to show superior performance to LMEM in terms of both predictive and inferential metrics in silico. Furthermore, we show how our method is formulated in a very interpretable way; LMEM cannot generalize and make cluster-informed predictions when the cluster of new data points is unknown, but we solve this problem by training an interpretable classification tree that can help decide cluster effects for new data points, and demonstrate the power of this generalizability on a real protein expression dataset.
翻译:我们设计了一个混合整数优化(MIO)方法,用于进行集成觉回归,即线性回归,其中考虑到数据固有的集群结构。我们比较了当前最常用的线性混合效应回归(LMEM),并设计了模拟实验,以显示比LMEM在硅的预测性和推断性测量方面优异的性能。此外,我们展示了我们的方法是如何以非常可解释的方式拟订的;LMEM无法在未知的新数据点组群时对集群进行概括和知情的预测,但我们通过培训一种可解释的分类树来解决这个问题,该树可以帮助确定新数据点的集群效应,并展示这种通用性在真正的蛋白表达数据集上的力量。