Reputation is a central element of social communications, be it with human or artificial intelligence (AI), and as such can be the primary target of malicious communication strategies. There is already a vast amount of literature on trust networks addressing this issue and proposing ways to simulate these networks dynamics using Bayesian principles and involving Theory of Mind models. The main issue for these simulations is usually the amount of information that can be stored and is usually solved by discretising variables and using hard thresholds. Here we propose a novel approach to the way information is updated that accounts for knowledge uncertainty and is closer to reality. In our game, agents use information compression techniques to capture their complex environment and store it in their finite memories. The loss of information that results from this leads to emergent phenomena, such as echo chambers, self-deception, deception symbiosis, and freezing of group opinions. Various malicious strategies of agents are studied for their impact on group sociology, like sycophancy, egocentricity, pathological lying, and aggressiveness. Even though our modeling could be made more complex, our set-up can already provide insights into social interactions and can be used to investigate the effects of various communication strategies and find ways to counteract malicious ones. Eventually this work should help to safeguard the design of non-abusive AI systems.
翻译:光荣是社会交流的中心要素,无论是人或人工智能(AI),因此可能是恶意交流战略的首要目标。信任网络上已经有大量文献来处理这个问题,并就如何利用贝叶斯原则模拟这些网络动态提出方法,并涉及思想理论模型。这些模拟的主要问题通常是可以储存的信息数量,通常通过离散变量和硬阈值加以解决。我们在这里建议对信息更新的方式采取一种新颖的方法,该方法可以说明知识的不确定性,并且更接近现实。在我们的游戏中,代理利用信息压缩技术来捕捉其复杂的环境并将其储存在有限的记忆中。由于这些原因导致的信息丢失而出现突发现象,例如回声室、自我误解、欺骗性共生和冻结群体观点。对代理人的各种恶意策略进行了研究,以了解其对群体社会学的影响,例如交错、自我中心、病态说谎和侵略性。尽管我们的模型可以更复杂,但我们的设置可以提供对社会互动的洞察力,并可以提供社会互动的洞察力,从而最终能够调查恶意通信系统的影响。