Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.
翻译:在这项工作中,我们探讨是否和如何利用基因对抗网络(GANs)来激励数据共享,办法是建立一个通用框架,以分享具有最少专家知识的合成数据集。作为一个具体目标,我们本文件的重点是利用元数据的时间序列数据集(例如,与相应的ISP一起进行包损失率测量),我们查明现有的GAN方法在对等性(例如,长期依赖性、复杂的多维关系、模式崩溃)和隐私(即,现有保障不易理解,可以牺牲忠诚性)方面的工作量所面临的关键挑战。为了提高忠诚性,我们设计了一个名为DopelGANger(DG)的定制工作流程,并表明在各种真实世界数据集(例如,带宽度测量、集群要求、网络会议)和使用案例(例如,结构性定性、预测性模型、算法比较)、DG实现对43%的忠诚性,这比基本GRismreality挑战要更好。尽管我们没有解决了这些对GAN的希望性,但是我们没有解决了与GRismillality的不确定性,我们没有提出与Greialimal Progressreal Progs press press press preal mas press preal mas press press press preal mas press presm press mess mas mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mis mism mas am im am ams ams ams ams am we do we do ams ams su su am we do we do we do we do we do mas mas mas mas am we do mas mis mis mism mas laus.s mas laus la la laus mas mas mas mas mas mas mas mas mas mas mas laus laus laus laus laus laus laus laus laus