Recent years have seen the emergence of many new neural network structures (architectures and layers). To solve a given task, a network requires a certain set of abilities reflected in its structure. The required abilities depend on each task. There is so far no systematic study of the real capacities of the proposed neural structures. The question of what each structure can and cannot achieve is only partially answered by its performance on common benchmarks. Indeed, natural data contain complex unknown statistical cues. It is therefore impossible to know what cues a given neural structure is taking advantage of in such data. In this work, we sketch a methodology to measure the effect of each structure on a network's ability, by designing ad hoc synthetic datasets. Each dataset is tailored to assess a given ability and is reduced to its simplest form: each input contains exactly the amount of information needed to solve the task. We illustrate our methodology by building three datasets to evaluate each of the three following network properties: a) the ability to link local cues to distant inferences, b) the translation covariance and c) the ability to group pixels with the same characteristics and share information among them. Using a first simplified depth estimation dataset, we pinpoint a serious nonlocal deficit of the U-Net. We then evaluate how to resolve this limitation by embedding its structure with nonlocal layers, which allow computing complex features with long-range dependencies. Using a second dataset, we compare different positional encoding methods and use the results to further improve the U-Net on the depth estimation task. The third introduced dataset serves to demonstrate the need for self-attention-like mechanisms for resolving more realistic depth estimation tasks.
翻译:近些年来,出现了许多新的神经网络结构(建筑和层层)。为了完成一个特定的任务,网络需要一定的能力。需要的能力取决于每个任务。到目前为止,还没有对拟议神经结构的真正能力进行系统研究。每个结构能够和不能实现的问题只能部分地用共同基准的性能来回答。事实上,自然数据包含复杂的未知统计线索。因此,无法知道在这些数据中什么提示给定神经结构正在利用什么。在这项工作中,我们设计了一个测量每个结构对网络的深度能力的影响的方法。所需的能力取决于每个任务。每个数据集都适合评估给定的能力,并缩到最简单的形式:每项输入都包含解决任务所需的信息量。我们通过建立三个数据集来评估以下三个网络特性中的每一个特性:a)将本地信号与遥远的推断联系起来的能力,b)翻译变差和c)测量每个结构对网络的深度能力的影响,设计临时合成数据集。每个数据集都专门用来评估给一个特定能力,而我们更精确的精确的精确度则用来评估数据结构。我们用一个不精确的精确的深度来评估。我们用一个不精确的精确的精确的深度来评估数据结构来评估。我们用一个系统来评估一个不精确的精确的精确的深度来评估。我们用来评估。我们用来评估一个不精确的精确的精确的精确的精确的精确的精确的精确的精确度,用来评估一个数据结构来显示它们。