Most recent state of the art architectures rely on combinations and variations of three approaches: convolutional, recurrent and self-attentive methods. Our work attempts in laying the basis for a new research direction for sequence modeling based upon the idea of modifying the sequence length. In order to do that, we propose a new method called "Expansion Mechanism" which transforms either dynamically or statically the input sequence into a new one featuring a different sequence length. Furthermore, we introduce a novel architecture that exploits such method and achieves competitive performances on the MS-COCO 2014 data set, yielding 134.6 and 131.4 CIDEr-D on the Karpathy test split in the ensemble and single model configuration respectively and 130 CIDEr-D in the official online evaluation server, despite being neither recurrent nor fully attentive. At the same time we address the efficiency aspect in our design and introduce a convenient training strategy suitable for most computational resources in contrast to the standard one. Source code is available at https://github.com/jchenghu/exploring
翻译:最新艺术结构的状态取决于三种方法的组合和变异:进化、经常性和自发方法。我们努力为基于修改序列长度的理念的序列建模新研究方向奠定基础。为了这样做,我们提议了一个名为“扩展机制”的新方法,该方法将输入序列动态或静态地转化为一个新的,其序列长度不同。此外,我们引入了一个新结构,利用这种方法,在MS-CO2014数据集上实现竞争性性能,产生134.6和131.4 CIDER-D,分别由组合和单一模型配置的Karpathy测试和在正式在线评价服务器上的130 CIDER-D,尽管它们既不重复,也不完全关注。与此同时,我们处理设计的效率问题,并引入一个适合大多数计算资源的方便培训战略,与标准资源相对应。源代码见https://github.com/jchenghu/exloringing。