The emergence of data-driven machine learning (ML) has facilitated significant progress in many complicated tasks such as highly-automated driving. While much effort is put into improving the ML models and learning algorithms in such applications, little focus is put into how the training data and/or validation setting should be designed. In this paper we investigate the influence of several data design choices regarding training and validation of deep driving models trainable in an end-to-end fashion. Specifically, (i) we investigate how the amount of training data influences the final driving performance, and which performance limitations are induced through currently used mechanisms to generate training data. (ii) Further, we show by correlation analysis, which validation design enables the driving performance measured during validation to generalize well to unknown test environments. (iii) Finally, we investigate the effect of random seeding and non-determinism, giving insights which reported improvements can be deemed significant. Our evaluations using the popular CARLA simulator provide recommendations regarding data generation and driving route selection for an efficient future development of end-to-end driving models.
翻译:数据驱动的机器学习(ML)的出现促进了许多复杂任务的显著进展,如高度自动化驾驶等。虽然在改进这种应用中的ML模型和学习算法方面付出了很大努力,但对于如何设计培训数据和(或)验证设置却很少重视。在本文件中,我们调查了若干数据设计选择对培训和验证可最终培训的深层驾驶模型的影响。具体地说,(一)我们调查培训数据的数量如何影响最后驾驶业绩,以及哪些业绩限制是通过目前用来生成培训数据的机制产生的。 (二)此外,我们通过相关分析表明,通过验证设计,使在验证过程中测量的驾驶性能能够很好地概括到未知的测试环境。 (三)最后,我们调查随机播种和非定型的影响,提出可以认为报告改进意义重大的见解。我们利用流行的CARLA模拟器进行的评价,就数据生成和驱动路选择提出了建议,以便今后有效开发端对终端驾驶模型。