Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets and measurement-constrained experiments. However, traditional subsampling methods often suffer from the lack of information available at the design stage. We propose an active sampling strategy that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The method is illustrated on virtual simulation-based safety assessment of advanced driver assistance systems. Substantial performance improvements were observed compared to traditional sampling methods.
翻译:数据分抽样已被广泛公认为是克服分析大规模数据集和计量限制试验方面的计算和经济瓶颈的工具,然而,传统的分抽样方法往往因设计阶段缺乏可用信息而受到影响,我们提议了一项积极的抽样战略,在对看不见数据进行机器学习预测的指导下,在估计和数据收集之间用最佳的子样本进行迭代,该方法在对先进的驱动器协助系统进行虚拟模拟安全评估时加以说明,与传统的抽样方法相比,业绩得到显著改善。