We solve a weakly supervised regression problem. Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources. The solution process requires to optimize a certain objective function (the loss function), which combines manifold regularization and low-rank matrix decomposition techniques. These low-rank approximations allow us to speed up all matrix calculations and reduce storage requirements. This is especially crucial for large datasets. Ensemble clustering is used for obtaining the co-association matrix, which we consider as the similarity matrix. The utilization of these techniques allows us to increase the quality and stability of the solution. In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
翻译:我们解决了一个监管薄弱的回归问题。 在“ 微弱” 下, 我们理解, 对于某些训练点, 标签是已知的, 有些未知的, 另一些则由于随机噪音或其他原因( 如缺乏资源)而不确定。 解决方案进程需要优化某种客观功能( 损失功能), 将多重正规化和低级别矩阵分解技术结合起来。 这些低级别近似让我们能够加快所有矩阵计算, 并减少存储要求。 这对大型数据集尤为重要 。 集合组合用于获取共同关联矩阵, 我们认为是相似的矩阵。 这些技术的使用使我们能够提高解决方案的质量和稳定性。 在数字部分, 我们运用了建议的方法, 使用蒙特- 卡洛模型来进行人工和真实的数据集 。