分布式稀疏块码的分解器 (Factorizers for Distributed Sparse Block Codes)

Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.

翻译：分布式稀疏块码（SBC）展示了使用定宽向量编码和操作符号数据结构的紧凑表示。然而，一个主要挑战是要将此类数据结构分解成其组成元素，而不必搜索所有可能的组合。当通过噪声SBC查询时，该分解变得更具挑战性，其中由于感知不确定性和在现代神经网络用于生成查询向量时的近似，符号表示被放宽。为解决这些挑战，我们首先提出了一种快速而高度准确的方法来分解更灵活且因此更广义的SBC形式，称为GSBC。我们的迭代分解器引入了基于阈值的非线性激活、条件随机采样和基于\ell_\infty的相似度度量。它的随机采样机制与在叠加中搜索允许分析地确定解码迭代的预期数量，其与GSBC的绑定能力相匹配。其次，所提出的分解器在通过使用深度卷积神经网络（CNN）生成的噪声乘积向量查询时保持其高精度。这有助于将其应用于替换CNN中的大型全连接层（FCL），其中C个可训练类向量或属性组合可以由我们的分解器隐式表示为F因子码本，每个码本具有$\sqrt[\leftroot{-2}\uproot{2}F]{C}$个固定的码向量。我们提供了一种在CNN的分类层中灵活集成我们的分解器的方法，同时提供了一种新颖的损失函数。我们在CIFAR-100、ImageNet-1K和RAVEN数据集上展示了该方法的可行性。在所有使用情况下，参数和操作次数都显着减少，与FCL相比。