End to end learning is machine learning starting in raw data and predicting a desired concept, with all steps done automatically. In software engineering context, we see it as starting from the source code and predicting process metrics. This framework can be used for predicting defects, code quality, productivity and more. End-to-end improves over features based machine learning by not requiring domain experts and being able to extract new knowledge. We describe a dataset of 5M files from 15k projects constructed for this goal. The dataset is constructed in a way that enables not only predicting concepts but also investigating their causes.
翻译:结束学习是机器学习,从原始数据开始,预测理想的概念,并自动完成所有步骤。在软件工程方面,我们把它看作是从源代码和预测过程的衡量标准开始。这个框架可以用来预测缺陷、代码质量、生产率等等。端到端通过不需要域专家并能够获取新知识来改进基于特征的机器学习。我们描述了为这个目标建造的15公里项目的5M文件数据集。数据集的构建不仅能够预测概念,还可以调查其原因。