Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, require large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations. This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.
翻译:含有机器学习算法的软件是汽车观念的一个组成部分,例如在驱动自动化系统方面。这种软件的开发,特别是机器学习组件的培训和验证,需要大量的附加说明的数据集。数据行业和批注服务已经出现,为数据密集型汽车软件组件的开发提供服务。具体指定数据和批注需要的广泛困难对OEMs(原件设备制造商)与其软件组件、数据和说明供应商之间的合作提出了挑战。本文件调查了瑞典汽车业从业人员难以就数据和说明达成明确规格的原因。访谈研究的结果显示,缺乏数据质量方面的有效指标、工作方式不明确、注解质量定义不明确和商业生态系统的缺陷,是制定规格方面困难的原因。我们提供了一份建议清单,可以减轻在制订规格时遇到的挑战,并提出未来研究机会来克服这些挑战。我们的工作有助于正在进行的关于机器学习问责制的研究,用于复杂的软件系统,特别是用于诸如自动化驾驶等高载式应用系统。</s>