We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}{\Delta}\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $\Delta$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}{\Delta}\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\frac{\Delta}{\tilde{\Delta}^2}\log(T))$, where $\Tilde{\Delta} > \Delta$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.
翻译:我们处理一个新出现的问题, 正在找到一个在加权图中进行最佳单方匹配的最佳单方 。 半双层版本, 在每个迭代中取样一个完全匹配, 已经通过\ cite{ ADMA} 得到解决, 创建了一种算法, 其预期遗憾与$O( frac{ L\log( L) =Delta ⁇ log( T) ) 相匹配, 与$2L$、 $T$ 迭代和最低报酬差距 $\ Delta$ 。 我们分两步减少这一约束 。 首先, 如\ cite{ gRAAB} 和\ cite{ UniRank} 等半双层匹配, 我们使用预期奖赏的单式属性来设计一种在$( L\ frac{ 1\ Delta} log( T) 中令人遗憾的算法。 其次, 我们通过将焦点移到“ emph{ Is” 用户的美元比 $j? } 这样的遗憾变成$( l\ frafercel) delta_Del\\\\\ loginalate view us us us exinal exact the the the exinal expecess a expecial exual expeal expecual exports away.