Many real-world applications involve data from multiple modalities and thus exhibit the view heterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep co-attention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.
翻译:许多现实世界应用涉及多种模式的数据,从而展示了这些方法的预测结果。例如,社交媒体上的用户建模可能利用基础社会网络的地形图和用户职位的内容;在医疗领域,多种观点可能是以不同姿势拍摄的X光图像。迄今为止,提出了各种技术以取得有希望的结果,例如基于能力的关联分析方法等。与此同时,决策者必须能够理解这些方法的预测结果。例如,由于基于不同姿势的病人X射线图像提供的模型的诊断结果,医生需要了解模型作出这种预测的原因。然而,由于无法利用每种观点的补充信息,无法以可解释的方式解释预测各种预测,因此,状态技术通常会受到影响。为了解决这些问题,我们建议建立一个深层次的共享网络,用于多视角子空间学习的结果,目的是提取共同的信息和基于不同姿势的患者X光射线图像提供的模型,并提供有关模型作出这种预测的原因。然而,最新状态的技术技术通常由于无法使用每种观点的补充信息,因此,我们通过一个更精确的变压方法,然后通过一个更精确的变压模型,我们用一个更精确的变压的变压的方法, 来显示一个更精确的变压的变压。