Echo States Networks (ESN) and Long-Short Term Memory networks (LSTM) are two popular architectures of Recurrent Neural Networks (RNN) to solve machine learning task involving sequential data. However, little have been done to compare their performances and their internal mechanisms on a common task. In this work, we trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task. This task aims at modelling how infants learn language: they create associations between words and visual stimuli in order to extract meaning from words and sentences. The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space. (1) We found that both models were able to successfully learn the task: the LSTM reached the lowest error for the basic corpus, but the ESN was quicker to train. Furthermore, the ESN was able to outperform LSTMs on datasets more challenging without any further tuning needed. (2) We also conducted an analysis of the internal units activations of LSTMs and ESNs. Despite the deep differences between both models (trained or fixed internal weights), we were able to uncover similar inner mechanisms: both put emphasis on the units encoding aspects of the sentence structure. (3) Moreover, we present \textit{Recurrent States Space Visualisations} (RSSviz), a method to visualize the structure of latent state space of RNNs, based on dimension reduction (using UMAP). This technique enables us to observe a fractal embedding of sequences in the LSTM. RSSviz is also useful for the analysis of ESNs (i) to spot difficult examples and (ii) to generate animated plots showing the evolution of activations across learning stages. Finally, we explore qualitatively how the RSSviz could provide an intuitive visualisation to understand the influence of hyperparameters on the reservoir dynamics prior to ESN training.
翻译:热心国家网络(ESN)和长短时间内存网络(LSTM)是经常神经网络(RNN)的两个受欢迎的结构,用来解决涉及序列数据的机器学习任务。然而,在比较其性能和内部机制的共同任务方面,我们没有做多少工作。在这项工作中,我们训练了ESN和LSTMS进行跨度学习(CSL)任务。这个任务旨在模拟婴儿如何学习语言:它们建立文字和视觉模拟之间的关联,以便从文字和句子中提取含义。结果分为三类:性能比较、内部动态分析以及潜伏空间的视觉化。(1) 我们发现两种模型都能够成功地学习这个任务:LSTM达到基本任务的最低错误,但ESNU培训速度更快。此外,ENS能够在无需进一步调整的情况下,在数据集上比LSTMM(RSTMS)和 ESNS(S)内部单位的激活作用。尽管两个模型之间都存在深刻的差异(对内部重量进行训练或固定的),我们还是能够发现EVRM(S)的深度分析。