Python的半监督数据编程 (SPEAR : Semi-supervised Data Programming in Python)

We present SPEAR, an open-source python library for data programming with semi supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data. SPEAR facilitates weak supervision in the form of heuristics (or rules) and association of noisy labels to the training dataset. These noisy labels are aggregated to assign labels to the unlabeled data for downstream tasks. We have implemented several label aggregation approaches that aggregate the noisy labels and then train using the noisily labeled set in a cascaded manner. Our implementation also includes other approaches that jointly aggregate and train the model for text classification tasks. Thus, in our python package, we integrate several cascade and joint data-programming approaches while also providing the facility of data programming by letting the user define labeling functions or rules. The code and tutorial notebooks are available at https://github.com/decile-team/spear. Further, extensive documentation can be found at https://spear-decile.readthedocs.io/. Video tutorials demonstrating the usage of our package are available here. We also present some real-world use cases of SPEAR.

翻译：我们提出了开放源码的Python图书馆,用于半监督的数据编程。这个软件包实施了若干最近的数据编程方法,包括用于编程标签和构建培训数据的设施。SPEAR促进以累进制(或规则)形式对培训数据集进行薄弱的监管和将噪音标签组合到培训数据集中。这些吵闹标签被集中起来,为下游任务分配未贴标签的数据标签。我们实施了几种标签聚合方法,将吵闹的标签聚合在一起,然后使用按级联方式设置的有声调标签进行训练。我们的实施还包括其他方法,共同汇总并培训文本分类任务模式。因此,在我们的Python软件包中,我们整合了若干级联和联合数据-方案化方法,同时通过让用户定义标签功能或规则来提供数据编程设施。代码和教程可以在https://github.com/decile-team/spear查阅。此外,在https://spear-decial.readthedocs.io/. 视频教义性文件展示了我们软件的用户世界。我们还可以使用。