The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).
翻译:果蝇大脑的蘑菇体是神经科学中研究最精密的系统之一。 它的核心是由一群凯尼恩细胞组成的,它们接受多种感官模式的投入。 这些细胞受到前端对齐的横向神经的抑制, 从而产生一种稀有的高维的输入表达面。 在这项工作中, 我们研究这个网络模式的数学正规化, 并将其应用于学习文字及其上下文之间的相关结构, 在一个没有结构的文本库中, 一种共同的自然语言处理( NLP) 任务 。 我们显示, 这个网络可以学习文字的语义表达方式, 并产生静态和基于背景的嵌入词。 与使用密集的词嵌入式表达面的传统方法( 例如, BERT, GloVe) 不同, 我们的算法将文字的语义和上下文的语义含义以稀少的二进制代码的形式编码。 学习的表达方式的质量通过词相似性分析、 单词感性自然语言处理( NLP) 和文件分类来评估。 我们显示, 不仅水果网络的模型不仅能够实现与现有记忆力模型和缩略微的计算方法相似性计算方法的功能( NP ) 。