Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most beneficial candidates. Sequential approaches to these problems can be unified under a new framework, Sequential Support Network Learning (SSNL), in which the goal is to select the most beneficial candidate set of partners for all participants using trials; that is, to learn a directed graph that represents the highest-performing contributions. We demonstrate that a new pure-exploration model, the semi-overlapping multi-(multi-armed) bandit (SOMMAB), in which a single evaluation provides distinct feedback to multiple bandits due to structural overlap among their arms, can be used to learn a support network from sparse candidate lists efficiently. We develop a generalized GapE algorithm for SOMMABs and derive new exponential error bounds that improve the best known constant in the exponent for multi-bandit best-arm identification. The bounds scale linearly with the degree of overlap, revealing significant sample-complexity gains arising from shared evaluations. From an application point of view, this work provides a theoretical foundation and improved performance guarantees for sequential learning tools for identifying support networks from sparse candidates in multiple learning problems, such as in multi-task learning (MTL), auxiliary task learning (ATL), federated learning (FL), and in multi-agent systems (MAS).
翻译:许多现代人工智能和机器学习问题需要通过共享但非对称的计算密集型过程来评估合作伙伴的贡献,并同时选择最有益的候选者。针对这些问题的序列方法可以统一在一个新框架——序列支持网络学习(SSNL)下,其目标是通过试验为所有参与者选择最有益的合作伙伴候选集合;即学习一个代表最高性能贡献的有向图。我们证明了一种新的纯探索模型——半重叠多臂老虎机(SOMMAB),由于臂之间的结构重叠,单次评估可为多个老虎机提供不同的反馈,该模型可用于从稀疏候选列表中高效学习支持网络。我们为SOMMAB开发了广义GapE算法,并推导出新的指数误差界,该误差界改进了多臂老虎机最佳臂识别中指数部分已知的最佳常数。这些界限与重叠程度呈线性关系,揭示了共享评估带来的显著样本复杂度优势。从应用角度来看,本研究为从稀疏候选者中识别支持网络的序列学习工具提供了理论基础和改进的性能保证,可应用于多任务学习(MTL)、辅助任务学习(ATL)、联邦学习(FL)以及多智能体系统(MAS)等多种学习问题。