Data encoding is a common and central operation in most data analysis tasks. The performance of other models, downstream in the computational process, highly depends on the quality of data encoding. One of the most powerful ways to encode data is using the neural network AutoEncoder (AE) architecture. However, the developers of AE are not able to easily influence the produced embedding space, as it is usually treated as a \textit{black box} technique, which makes it uncontrollable and not necessarily has desired properties for downstream tasks. In this paper, we introduce a novel approach for developing AE models that can integrate external knowledge sources into the learning process, possibly leading to more accurate results. The proposed \methodNamefull{} (\methodName{}) model is able to leverage domain-specific information to make sure the desired distance and neighborhood properties between samples are preservative in the embedding space. The proposed model is evaluated on three large-scale datasets from three different scientific fields and is compared to nine existing encoding models. The results demonstrate that the \methodName{} model effectively captures the underlying structures and relationships between the input data and external knowledge, meaning it generates a more useful representation. This leads to outperforming the rest of the models in terms of reconstruction accuracy.
翻译:在大多数数据分析任务中,数据编码是一种常见的中央操作。在计算过程中,其他模型的性能高度取决于数据编码的质量。数据编码的最有力方法之一是使用神经网络AutoEncoder(AE)结构。然而,AE的开发者无法轻易地影响所生成的嵌入空间,因为它通常被视作一种“textit{blackbox}”技术,这使得它无法控制,也不一定具有下游任务所需的属性。在本文中,我们引入了一种新的方法来开发AE模型,这种模型可以将外部知识来源纳入学习进程,从而可能导致更准确的结果。拟议的“MethodNamefull<unk> ”(\methodName<unk> )模型能够利用特定域的信息确保样品之间所需的距离和邻里属性在嵌入空间中具有防腐蚀性。拟议模型用三个不同的科学领域的大型数据集进行评估,并与现有的九个编码模型比较。结果显示,\method_ 模型能够有效地捕捉到基础结构以及外部模型的准确性,从而将数据和外部模型转化为。</s>