Motivated by the desire to generate labels for real-time data we develop a method to estimate the dependency structure and accuracy of weak supervision sources incrementally. Our method first estimates the dependency structure associated with the supervision sources and then uses this to iteratively update the estimated source accuracies as new data is received. Using both off-the-shelf classification models trained using publicly-available datasets and heuristic functions as supervision sources we show that our method generates probabilistic labels with an accuracy matching that of existing off-line methods.
翻译:出于为实时数据制作标签的愿望,我们开发了一种方法来逐步估计薄弱监督源的依赖性和准确性。我们的方法首先估计与监督源相关的依赖性结构,然后在收到新数据时利用这一方法反复更新估计源的准确性。我们利用以公开可得的数据集和超常功能培训的现成分类模型作为监督源,我们发现我们的方法产生概率性标签,其准确性与现有的离线方法相匹配。