Data sharing remains a major hindering factor when it comes to adopting emerging AI technologies in general, but particularly in the agri-food sector. Protectiveness of data is natural in this setting; data is a precious commodity for data owners, which if used properly can provide them with useful insights on operations and processes leading to a competitive advantage. Unfortunately, novel AI technologies often require large amounts of training data in order to perform well, something that in many scenarios is unrealistic. However, recent machine learning advances, e.g. federated learning and privacy-preserving technologies, can offer a solution to this issue via providing the infrastructure and underpinning technologies needed to use data from various sources to train models without ever sharing the raw data themselves. In this paper, we propose a technical solution based on federated learning that uses decentralized data, (i.e. data that are not exchanged or shared but remain with the owners) to develop a cross-silo machine learning model that facilitates data sharing across supply chains. We focus our data sharing proposition on improving production optimization through soybean yield prediction, and provide potential use-cases that such methods can assist in other problem settings. Our results demonstrate that our approach not only performs better than each of the models trained on an individual data source, but also that data sharing in the agri-food sector can be enabled via alternatives to data exchange, whilst also helping to adopt emerging machine learning technologies to boost productivity.
翻译:在采用新兴的AI技术时,数据共享仍然是主要的障碍因素,特别是在农业食品部门。数据保护性在这种环境下是自然而然的;数据是数据拥有者的宝贵商品,如果使用得当,数据拥有者可以使他们对导致竞争优势的操作和过程有有益的洞察力。不幸的是,新的AI技术往往需要大量的培训数据才能很好地运行,这在很多情况下都是不切实际的。然而,最近的机器学习进展,例如联合学习和隐私保护技术,可以通过提供各种来源的数据所需的基础设施和基础技术来解决这一问题,以培训模型,而无需自己分享原始数据。在本文件中,我们提出一个技术解决方案,以使用分散的数据(即没有交换或分享但仍然与所有者分享的数据)为基础,以开发一个跨筒机学习模型,便利整个供应链的数据共享。我们的数据共享建议的重点是通过soybean收益预测来改进生产优化,并提供潜在的使用案例,使这些方法能够帮助其他问题环境中的模型。我们提出的一个技术解决方案是以联合学习为基础的技术解决方案。我们的成果还表明,在每一个经过培训的农业技术中,我们的数据交流中,只能采用一种更好的数据来源。