多元特点代表制的有歧视的战略性框架 (A Discriminative Vectorial Framework for Multi-modal Feature Representation)

Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these multi-modal data sources has quickly become a necessity. In this paper, a discriminative vectorial framework is proposed for multi-modal feature representation in knowledge discovery by employing multi-modal hashing (MH) and discriminative correlation maximization (DCM) analysis. Specifically, the proposed framework is capable of minimizing the semantic similarity among different modalities by MH and exacting intrinsic discriminative representations across multiple data sources by DCM analysis jointly, enabling a novel vectorial framework of multi-modal feature representation. Moreover, the proposed feature representation strategy is analyzed and further optimized based on canonical and non-canonical cases, respectively. Consequently, the generated feature representation leads to effective utilization of the input data sources of high quality, producing improved, sometimes quite impressive, results in various applications. The effectiveness and generality of the proposed framework are demonstrated by utilizing classical features and deep neural network (DNN) based features with applications to image and multimedia analysis and recognition tasks, including data visualization, face recognition, object recognition; cross-modal (text-image) recognition and audio emotion recognition. Experimental results show that the proposed solutions are superior to state-of-the-art statistical machine learning (SML) and DNN algorithms.

翻译：由于感官和计算技术的飞速发展,代表相同模式或现象的多模式数据源迅速得到越来越多的关注,因此,寻找办法探索这些多模式数据源的有用信息很快成为必要。在本文件中,提议了一个歧视性的矢量框架,通过采用多式散列(MH)和差别性相关性最大化(DCM)分析,在知识发现中采用多种模式的特征表现模式,为多式传导和计算技术的快速发展,为多模式数据源之间的语义相似性最小化,并通过DCM分析联合对多种数据源进行内在的区分性表述,使多模式数据源的新矢量框架成为必要。此外,对拟议的特征代表战略进行了分析和进一步优化,其依据分别是卡通性和非卡通性案例,因此,生成的特征代表能够有效利用高质量的投入数据源,在各种应用中产生改进,有时相当令人印象深刻的结果。拟议框架的有效性和普遍性表现为:利用古典特征和深层神经网络(DNNU)基础的多式矢量级特征框架,使图像特征体现为图像和图像和图像感光学认识、图像感官分析及感官感化认识的拟议认识。