Case-based Reasoning (CBR) on high-dimensional and heterogeneous data is a trending yet challenging and computationally expensive task in the real world. A promising approach is to obtain low-dimensional hash codes representing cases and perform a similarity retrieval of cases in Hamming space. However, previous methods based on data-independent hashing rely on random projections or manual construction, inapplicable to address specific data issues (e.g., high-dimensionality and heterogeneity) due to their insensitivity to data characteristics. To address these issues, this work introduces a novel deep hashing network to learn similarity-preserving compact hash codes for efficient case retrieval and proposes a deep-hashing-enabled CBR model HeCBR. Specifically, we introduce position embedding to represent heterogeneous features and utilize a multilinear interaction layer to obtain case embeddings, which effectively filtrates zero-valued features to tackle high-dimensionality and sparsity and captures inter-feature couplings. Then, we feed the case embeddings into fully-connected layers, and subsequently a hash layer generates hash codes with a quantization regularizer to control the quantization loss during relaxation. To cater to incremental learning of CBR, we further propose an adaptive learning strategy to update the hash function. Extensive experiments on public datasets show that HeCBR greatly reduces storage and significantly accelerates case retrieval. HeCBR achieves desirable performance compared with the state-of-the-art CBR methods and performs significantly better than hashing-based CBR methods in classification.
翻译:以数据独立的散射法为依据的以往方法依赖于随机预测或人工构建,这不适用于解决特定数据问题(例如,高维性和异质性),因为它们对数据特性不敏感。为了解决这些问题,这项工作引入了一个新的深层散列网络,以学习类似保存缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩略图代码,用于高效案件检索,并提议一个深增缩的 CBR 模型HCBR 。具体地说,我们引入了嵌入位置,以代表混和多线性互动层,以获得多线性互动层,这可有效地过滤零值特性,以解决高维度和偏移,并捕捉符合数据特性的突变组合。然后,我们把案件嵌入完全连接层的州级散列网引入了类似保存缩略图,并随后将快速缩缩缩缩缩缩缩缩缩缩略图的CBRBRRBR,从而在快速缩缩略图中大幅更新了不断升级的缩略图。