Human-centered data collection is typically costly and implicates issues of privacy. Various solutions have been proposed in the literature to reduce this cost, such as crowdsourced data collection, or the use of semi-supervised algorithms. However, semi-supervised algorithms require a source of unlabeled data, and crowd-sourcing methods require numbers of active participants. An alternative passive data collection modality is fingerprint-based localization. Such methods use received signal strength (RSS) or channel state information (CSI) in wireless sensor networks to localize users in indoor/outdoor environments. In this paper, we introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data. Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data and, following this, to produce synthetic data that can be used to augment the real collected data in order to increase overall positioning accuracy. Experimental results on a benchmark dataset show that by applying the proposed method and using a combination of 10% collected data and 90% synthetic data, we can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data. This means that by employing GAN-generated synthetic data, we can use 90% less real data, thereby reduce data-collection costs while achieving acceptable accuracy.
翻译:以人为中心的数据收集通常费用高昂,而且涉及隐私问题。文献中提出了各种解决方案,以减少这一成本,如收集众源数据,或使用半监督算法。然而,半监督算法需要无标签数据来源,而众包方法则需要积极参与者人数。另一种被动数据收集模式是指纹本地化。这种方法在无线传感器网络中使用信号强度(RSS)或频道国家信息(CSI),在室内/室外环境中将用户本地化。在本文中,我们采用一种新颖的方法,通过使用合成数据,降低指纹本地化的培训成本。 灵敏对抗网络(GANs)用于学习有限抽样数据的分配,在此之后,生成合成数据,用于加强实际收集的数据,以提高总体定位准确性。基准数据集的实验结果显示,通过应用拟议方法,并使用10%收集的数据和90%的合成数据组合,我们可以获得基本相似的定位准确性,即使用合成数据集成的90%,同时使用可接受的数据集,降低我们所收集的数据的准确性,同时使用可接受的90%的合成数据,同时使用合成数据采集的90%。