The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.
翻译:这项工作涉及的问题说明是:对于公众情绪分类 API 来说,我们如何设置一个对不同类型数据运作良好的分类器,该分类器对不同类型数据的效果良好,对跨领域数据的说明能力有限。我们显示,鉴于来自不同领域的大量未附加说明的数据以及由受过一个领域小附加说明数据集培训的分类师生成的关于这个数据集的假标签,我们可以培训一个对不同数据集进行更全面概括的感官分类器。