While the availability of large datasets has been instrumental to advance fields like computer vision and natural language processing, this has not been the case in mobile networking. Indeed, mobile traffic data is often unavailable due to privacy or regulatory concerns. This problem becomes especially relevant in Open Radio Access Network (RAN), where artificial intelligence can potentially drive optimization and control of the RAN, but still lags behind due to the lack of training datasets. While substantial work has focused on developing testbeds that can accurately reflect production environments, the same level of effort has not been put into twinning the traffic that traverse such networks. To fill this gap, in this paper, we design a methodology to twin real-world cellular traffic traces in experimental Open RAN testbeds. We demonstrate our approach on the Colosseum Open RAN digital twin, and publicly release a large dataset (more than 500 hours and 450 GB) with PHY-, MAC-, and App-layer Key Performance Measurements (KPMs), and protocol stack logs. Our analysis shows that our dataset can be used to develop and evaluate a number of Open RAN use cases, including those with strict latency requirements.
翻译:暂无翻译