Data has now become a shortcoming of deep learning. Researchers in their own fields share the thinking that "deep neural networks might not always perform better when they eat more data," which still lacks experimental validation and a convincing guiding theory. Here to fill this lack, we design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD), which give powerful answers. For the purpose of guidance, based on the discussion of results, two theories are proposed: under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of sample information and the amount of class information; under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain. The above theories provide guidance from the perspective of data, which can promote a wide range of practical applications of artificial intelligence.
翻译:数据现已成为深层学习的一个缺陷。 研究者们在他们自己的领域共同认为,“深神经网络在吃更多的数据时不一定总能取得更好的效果 ”, 而这仍然缺乏实验性验证和令人信服的指导理论。为了弥补这一缺陷,我们设计了同一独立的分布(IID)和分布(OOOD)的实验,给出了强有力的答案。为了指导目的,根据对结果的讨论,提出了两种理论:在IDD条件下,信息数量决定了每个样本的效果,样本的贡献和类别之间的差异决定了抽样信息的数量和类别信息的数量;在OOD条件下,交叉的样本程度决定了贡献,不相关的元素造成的偏差是交叉领域的一个重要因素。 以上理论从数据的角度提供了指导,可以促进人工情报的广泛实际应用。