Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not \emph{dominated} by the higher ranked agents, which leads to \emph{forced exploration}. With carefully defined complexity parameters, we characterize this \emph{forced exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}. Furthermore, we validate our theoretical findings via experiments.
翻译:理解双面在线匹配市场的复杂动态,即需求方代理人竞相与供应方(武器)相匹配,最近引起了很大的兴趣。为此,我们在本文件中引入了非固定(动态)环境中的分散的双面匹配市场框架。我们坚持连续的独裁环境,即需求方代理人对供应方(武器)有未知和不同的偏好,但武器对供应方(武器)有固定和已知的偏好。我们提议并分析一个分散的和不同步的学习算法,即分散的非固定式匪帮(Texttt{DNCB}),即代理玩(限制性)连续的消除型学习算法,以学习他们对武器的偏好。理解这种制度的复杂性源于这样一个事实,即竞争的匪徒以不稳的方式选择他们的行动,而低级代理人只能从高级代理人的一组武器中学习,而不是按emph{占主导}学习,这将导致我们分级的探索。我们用精细的复杂参数来分析这个理论性实验结果。(NCRB{RP}