We consider a variant of regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments may not be able to maintain the correspondence between the samples and the measurements. Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -- ROBOT -- for the shuffled regression problem, which is applicable to large data and complex nonlinear models. Specifically, we reformulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT can be further extended to the inexact correspondence setting, where there may not be an exact alignment between the input and output data. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.
翻译:我们考虑回归问题的变体,因为输入和输出数据之间没有对应的回归问题。这种被打乱的数据通常在许多真实的世界问题中观察到。以流动细胞测量为例,测量仪器可能无法保持样本和测量数据之间的对应。由于问题的组合性质,大多数现有方法只有在样本规模小的情况下才适用,并且仅限于线性回归模型。为了克服这些瓶颈,我们建议为被打乱的回归问题建立一个新的计算框架 -- -- ROBOT -- -- 即ROBOT -- --,这个框架适用于大型数据和复杂的非线性模型。具体地说,我们重新配置回归而不将通信作为连续优化的问题。然后,我们利用回归模型和数据对应之间的相互作用,我们根据不同的编程技术制定了一种高度梯度的方法。这种超梯度方法基本上将数据对应视为回归的操作者,因此,我们可以通过对数据对对应数据进行区分,为模型的参数找到更好的下行方向。ROBOT可以进一步扩展到直线通信设置,在其中可能不是精确的输入和直线性跟踪,包括ROB的现有数据,从而实现更精确的运行。