In this paper, we address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data to help creating and rendering the digital copy of the physical world in the Metaverse. Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices. Nevertheless, mechanisms to hire sensing IoT devices to share their data with the VSP and then deliver the constructed digital twin to the Metaverse users are vulnerable to adverse selection problem. The adverse selection problem, which is caused by information asymmetry between the system entities, becomes harder to solve when the private information of the different entities are multi-dimensional. We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem. To demonstrate the effectiveness of our algorithm, we conduct extensive simulations and measure several key performance metrics of the contract for the Metaverse. Our results show that our designed iterative contract is able to incentivize the participants to interact truthfully, which maximizes the profit of the VSP with minimal individual rationality (IR) and incentive compatibility (IC) violation rates. Furthermore, the proposed learning-based iterative contract framework has limited access to the private information of the participants, which is to the best of our knowledge, the first of its kind in addressing the problem of adverse selection in incentive mechanisms.
翻译:在本文中,我们讨论了由虚拟服务供应商(VSP)设计奖励机制的问题,以使用感测 IoT 设备来使用感测 IoT 设备来出售其遥感数据,以帮助创造和提供Meteveve中物理世界的数字副本。由于带宽有限,我们提议使用语义提取算法来减少通过感测 IoT 设备提供的数据。然而,使用感测 IoT 设备与VSP 共享数据,然后向Metevevy用户交付所建造的数字双胞胎的机制很容易遇到不利的选择问题。当不同实体的私人信息具有多面性时,由于系统实体之间的信息不对称而导致的不利选择问题就更加难以解决。我们提议采用新的迭代式合同设计,并使用新的多剂强化学习模式(MARL)来解决模型化的多维度合同问题。然而,为了展示我们的算法的有效性,我们进行了广泛的模拟并测量了Metversevers合同的若干关键性业绩衡量标准。我们设计的迭性合同能够激励参与者进行真实的交互互动,从而最大限度地提高VSP 个人选择机制的利润的利润。 以最起码的合理性化程度学习了个人选择率。</s>