We define disentanglement as how far class-different data points from each other are, relative to the distances among class-similar data points. When maximizing disentanglement during representation learning, we obtain a transformed feature representation where the class memberships of the data points are preserved. If the class memberships of the data points are preserved, we would have a feature representation space in which a nearest neighbour classifier or a clustering algorithm would perform well. We take advantage of this method to learn better natural language representation, and employ it on text classification and text clustering tasks. Through disentanglement, we obtain text representations with better-defined clusters and improve text classification performance. Our approach had a test classification accuracy of as high as 90.11% and test clustering accuracy of 88% on the AG News dataset, outperforming our baseline models -- without any other training tricks or regularization.
翻译:我们定义了不同类别数据点之间的距离,与不同类别数据点之间的距离相对。 当在代表性学习期间最大限度地分离时,我们获得了一个被转换的特征代表,其中保留了数据点的类别成员。如果数据点的类别成员得以保留,我们将有一个特征代表空间,让最近的邻居分类师或组合算法能够很好地发挥作用。我们利用这个方法学习更好的自然语言代表,并在文本分类和文本分组任务中使用它。通过分解,我们获得了有更明确界定的组群的文本代表,并提高了文本分类的性能。我们的方法是测试分类准确性高达90.11%,测试AG 数据集88 % 的分类准确性,比我们的基线模型表现得更好 -- -- 没有任何其他训练技巧或正规化。