We present a novel multiple-source unsupervised model for text classification under domain shift. Our model exploits the update rates in document representations to dynamically integrate domain encoders. It also employs a probabilistic heuristic to infer the error rate in the target domain in order to pair source classifiers. Our heuristic exploits data transformation cost and the classifier accuracy in the target feature space. We have used real world scenarios of Domain Adaptation to evaluate the efficacy of our algorithm. We also used pretrained multi-layer transformers as the document encoder in the experiments to demonstrate whether the improvement achieved by domain adaptation models can be delivered by out-of-the-box language model pretraining. The experiments testify that our model is the top performing approach in this setting.
翻译:我们为域变下的文本分类提出了一个新型的多源且不受监督的模型。 我们的模型将文件表达中的最新速度用于动态整合域编码器。 它还使用一种概率性超常来推断目标域的错误率,以对源分类器进行配对。 我们的湿度利用了数据转换成本和目标特性空间的分类准确度。 我们用“ 域适应” 的真实世界情景来评估我们的算法效力。 我们还在实验中使用了预先培训过的多层变压器作为文件编码器,以证明域适应模型的改进能否通过远程语言模型的预培训实现。 实验证明我们的模型是这一设置中最可行的方法。