We consider a regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments are unable to preserve the correspondence between the samples and the measurements. Due to the combinatorial nature, most of existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework - ROBOT- for the shuffled regression problem, which is applicable to large data and complex models. Specifically, we propose to formulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we propose to develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT is quite general, and can be further extended to the inexact correspondence setting, where the input and output data are not necessarily exactly aligned. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.
翻译:当输入和输出数据之间没有对应关系时,我们考虑回归问题,当输入和输出数据之间没有对应关系时,我们考虑回归问题。这种折叠数据通常在许多真实的世界问题中观察到。以流动细胞测量为例,测量仪器无法保存样本和测量数据之间的对应关系。由于组合性质,大多数现有方法只有在样本规模小时才适用,并且仅限于线性回归模型。为了克服这种瓶颈,我们提议一个新的计算框架-对于被折叠的回归问题,ROBOT-适用于大型数据和复杂模型。具体地说,我们提议在没有对应关系的情况下将回归作为连续优化问题。然后,通过利用回归模型和数据对应之间的相互作用,我们建议根据不同的编程技术,开发一种高度梯度的方法。由于这种高度梯度的方法基本上将数据对应视为回归的操作者,因此使我们能够通过数据对应方式为模型参数找到更好的归位方向。ROBOT相当笼统,可以进一步扩展到直截面对应设置中,而输入和输出数据不一定完全吻合的输入和输出数据,我们建议,我们建议以可追溯性的方法显示,包括不精确的轨迹定式的多重的运行实验,包括:ROBOOT的运行式实验,以达到不精确到不精确的多式的运行方法。