强化学习,自主确定集体特派团代理人员出错的来源 (Reinforcement learning autonomously identifying the source of errors for agents in a group mission)

When agents are swarmed to carry out a mission, there is often a sudden failure of some of the agents observed from the command base. It is generally difficult to distinguish whether the failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) solely by the communication between the command base and the concerning agent. By making a collision to the agent by another, we would be able to distinguish which hypothesis is likely: For $h_a$, we expect to detect corresponding displacements while for $h_a$ we do not. Such swarm strategies to grasp the situation are preferably to be generated autonomously by artificial intelligence (AI). Preferable actions ($e.g.$, the collision) for the distinction would be those maximizing the difference between the expected behaviors for each hypothesis, as a value function. Such actions exist, however, only very sparsely in the whole possibilities, for which the conventional search based on gradient methods does not make sense. Instead, we have successfully applied the reinforcement learning technique, achieving the maximization of such a sparse value function. The machine learning actually concluded autonomously the colliding action to distinguish the hypothesises. Getting recognized an agent with actuator error by the action, the agents behave as if other ones want to assist the malfunctioning one to achieve a given mission.

翻译：当特工们在执行任务时,从指挥基地观察到的一些特工们往往突然失灵,通常很难区分失败是否完全由指挥基地与有关代理人之间的通信造成(假冒,美元)或传感器(假冒,美元)造成。如果与另一个特工发生碰撞,我们将能够区分哪些假设是可能的:对于$_a美元,我们期望发现相应的流离失所情况,而对于美元则没有。这种掌握局势的群温战略最好由人工智能(AI)自动产生。为了区别,可以采取的行动(例如$,美元,碰撞)可能是最大限度地区分每种假设的预期行为之间的差异,作为价值函数。但是,通过对另一种假设进行碰撞,我们只能非常分散地区分整个可能性,而根据梯度方法进行常规搜索是没有意义的。相反,我们成功地应用了强化学习技术,实现了这种稀释价值功能的最大化(AI),而这种稀释战略最好由人工智能(AI)产生。对于区分的可取的行动(例如$,美元,碰撞)是:作为价值函数,使每种假设的预期行为产生最大差别,机器通过一种公认的代理人行为来区分一种机能,从而区分另一种动作的动作。学会如何使机能行为成为一种代理人的行为。