The target of image-text clustering (ITC) is to find correct clusters by integrating complementary and consistent information of multi-modalities for these heterogeneous samples. However, the majority of current studies analyse ITC on the ideal premise that the samples in every modality are complete. This presumption, however, is not always valid in real-world situations. The missing data issue degenerates the image-text feature learning performance and will finally affect the generalization abilities in ITC tasks. Although a series of methods have been proposed to address this incomplete image text clustering issue (IITC), the following problems still exist: 1) most existing methods hardly consider the distinct gap between heterogeneous feature domains. 2) For missing data, the representations generated by existing methods are rarely guaranteed to suit clustering tasks. 3) Existing methods do not tap into the latent connections both inter and intra modalities. In this paper, we propose a Clustering-Induced Generative Incomplete Image-Text Clustering(CIGIT-C) network to address the challenges above. More specifically, we first use modality-specific encoders to map original features to more distinctive subspaces. The latent connections between intra and inter-modalities are thoroughly explored by using the adversarial generating network to produce one modality conditional on the other modality. Finally, we update the corresponding modalityspecific encoders using two KL divergence losses. Experiment results on public image-text datasets demonstrated that the suggested method outperforms and is more effective in the IITC job.
翻译:图像文本群集(ITC)的目标是通过整合关于这些不同样本的多种模式的互补和一致信息,找到正确的群集。然而,目前大多数研究都以每种模式样本完整的理想前提对国贸中心进行分析。然而,这一假设在现实世界中并不总是有效。缺少的数据问题恶化了图像文本群集的学习性能,最终将影响国贸中心任务的概括化能力。虽然已经提出了一系列方法来解决这种不完整的图像文本群集问题(ITC),但下列问题仍然存在:(1) 大多数现有方法几乎不考虑不同特征领域之间存在的明显差距。(2) 关于缺失的数据,现有方法产生的表述很少保证符合组合任务的理想前提。(3) 现有方法并不在现实世界中始终有效。在本文件中,我们建议建立一个分组化的图像集集集成(CIGIT-C)网络,以应对上述挑战。更具体的方式是,我们首先使用特定模式的编码将原始特征映射到更独特的子空间。对于现有方法所产生的潜在联系很少保证与组合任务群集任务相匹配。(3) 现有方法并不能够利用内部和内部模式的表达潜在联系,而内部和内部和内部的表达方式中的潜在联系,最后通过采用一种模式,我们所展示的版本的模型,以产生不同的格式,从而产生另一种格式,从而产生新的格式,从而产生新的格式,从而产生另一种格式,从而产生另一种格式更新我们所展示模式。