Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks. This article conducts a large dimensional analysis of a simple but, as we shall see, extremely powerful when carefully tuned, Least Square Support Vector Machine (LSSVM) version of MTL, in the regime where the dimension $p$ of the data and their number $n$ grow large at the same rate. Under mild assumptions on the input data, the theoretical analysis of the MTL-LSSVM algorithm first reveals the "sufficient statistics" exploited by the algorithm and their interaction at work. These results demonstrate, as a striking consequence, that the standard approach to MTL-LSSVM is largely suboptimal, can lead to severe effects of negative transfer but that these impairments are easily corrected. These corrections are turned into an improved MTL-LSSVM algorithm which can only benefit from additional data, and the theoretical performance of which is also analyzed. As evidenced and theoretically sustained in numerous recent works, these large dimensional results are robust to broad ranges of data distributions, which our present experiments corroborate. Specifically, the article reports a systematically close behavior between theoretical and empirical performances on popular datasets, which is strongly suggestive of the applicability of the proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is fully based on the theoretical analysis and does not in particular require any cross validation procedure. Besides, the reported performances on real datasets almost systematically outperform much more elaborate and less intuitive state-of-the-art multi-task and transfer learning methods.
翻译:多任务学习( MTL) 高效地利用多个相关任务中的有用信息来帮助改进所有任务的总体性表现。 本文章对一个简单但如我们所见,当仔细调整MTL的最低平方支持矢量机(LSSVM)版本时,对这个系统非常有力,因为在这个系统中,数据的维度及其数量以同样的速度增长。在对输入数据的轻度假设下,MTL-LSSVM算法的理论分析首先揭示了算法及其在工作中的互动所利用的“充足统计数据”。这些结果显示,对于MTL-LSSVM的标准方法基本上不甚理想,当仔细调整MTL-LSSVM版本时,该标准方法可能会导致负面转移的严重影响,但这些缺陷很容易被纠正。这些修正转化为改进的 MTL-L-LSSVM算法,它只能从更多的数据中受益,而其理论性能也得到了分析。正如许多最近工作所证实和理论上持续的那样,这些大维值结果对于数据分布得稳妥,对于数据分布的宽度范围,这是惊人的结果性范围,而更接近于真实性的数据分配,我们正在系统地进行实验式的实验性地证明。