It is a common sense that datasets with high-quality data samples play an important role in artificial intelligence (AI), machine learning (ML) and related studies. However, although AI/ML has been introduced in wireless researches long time ago, few datasets are commonly used in the research community. Without a common dataset, AI-based methods proposed for wireless systems are hard to compare with both the traditional baselines and even each other. The existing wireless AI researches usually rely on datasets generated based on statistical models or ray-tracing simulations with limited environments. The statistical data hinder the trained AI models from further fine-tuning for a specific scenario, and ray-tracing data with limited environments lower down the generalization capability of the trained AI models. In this paper, we present the Wireless AI Research Dataset (WAIR-D)1, which consists of two scenarios. Scenario 1 contains 10,000 environments with sparsely dropped user equipments (UEs), and Scenario 2 contains 100 environments with densely dropped UEs. The environments are randomly picked up from more than 40 cities in the real world map. The large volume of the data guarantees that the trained AI models enjoy good generalization capability, while fine-tuning can be easily carried out on a specific chosen environment. Moreover, both the wireless channels and the corresponding environmental information are provided in WAIR-D, so that extra-information-aided communication mechanism can be designed and evaluated. WAIR-D provides the researchers benchmarks to compare their different designs or reproduce results of others. In this paper, we show the detailed construction of this dataset and examples of using it.
翻译:一种常识是,具有高质量数据样本的数据集在人工智能(AI)、机器学习(ML)和相关研究中发挥着重要作用。然而,尽管AI/ML在很久很久以前的无线研究中就引入了AI/ML,但研究界通常很少使用这类数据集。如果没有共同的数据集,为无线系统提议的基于AI的方法很难与传统基线甚至彼此进行比较。现有的无线AI研究通常依赖基于统计模型或环境有限的透视模拟生成的数据集。统计数据阻碍了经过培训的AI模型对特定情景进行进一步的微调,以及有限的环境条件下的射线追踪数据降低了经过培训的AI模型的通用能力。在本文中,我们介绍了无线的AI研究数据集(WAIR-D)1, 由两种假设构成。设想1包含10 000个环境,用户设备被稀释得少,而假设2包含100个环境,其深度下降的UEUs。 详细的环境是随机采集的,40多个城市对经培训的AI模型进行了进一步调整,在实际世界地图中,这种经过培训的模型和经过培训的AIS模型中都提供了良好的环境调整。