Current AI/ML methods for data-driven engineering use models that are mostly trained offline. Such models can be expensive to build in terms of communication and computing cost, and they rely on data that is collected over extended periods of time. Further, they become out-of-date when changes in the system occur. To address these challenges, we investigate online learning techniques that automatically reduce the number of available data sources for model training. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources after receiving a small number of measurements. The algorithm is initialized with a feature ranking algorithm, a feature set stability metric, and a search policy. We perform an extensive experimental evaluation of this algorithm using traces from an in-house testbed and from a data center in operation. We find that OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets. Most importantly, we find that the accuracy of a predictor trained on a OSFS-produced feature set is somewhat better than when the predictor is trained on a feature set obtained through offline feature selection. OSFS is thus shown to be effective as an online feature selection algorithm and robust regarding the sample interval used for feature selection. We also find that, when concept drift in the data underlying the model occurs, its effect can be mitigated by recomputing the feature set and retraining the prediction model.
翻译:目前由数据驱动的再培训工程模型的AI/ML方法大多是经过培训的离线使用模式。这些模型在通信和计算成本方面建设成本昂贵,依靠长期收集的数据。此外,当系统发生变化时,它们就过时了。为了应对这些挑战,我们调查自动减少模式培训可用数据源数量的在线学习技术。我们提出了一个称为在线稳定在线特征设置Algorithm(OSFS)的在线算法,该算法在接受少量测量后从大量可用数据源中选择了一套小功能集。这种算法最初采用特征排序算法、特征设定稳定性衡量标准和搜索政策。我们利用内部测试台和运行中数据中心的痕迹对这一算法进行广泛的实验性评价。我们发现,OSFS在所有调查的模型集中,规模1-3级定定的特性范围大大缩小了特征的大小。最重要的是,我们发现,在获得OSFS所制作的数据集集的预测或精准性比在预测和定位模型集中,比在预测和模型模型模型模型模型模型模型中,在对预测和模型模型的精细度进行后,我们对这种算方法进行广泛实验性评估,因此,通过在线选择的特性集,我们通过在线特征选择的特性选择,在使用该特征模型进行。