Crop diseases significantly affect the quantity and quality of agricultural production. In a context where the goal of precision agriculture is to minimize or even avoid the use of pesticides, weather and remote sensing data with deep learning can play a pivotal role in detecting crop diseases, allowing localized treatment of crops. However, combining heterogeneous data such as weather and images remains a hot topic and challenging task. Recent developments in transformer architectures have shown the possibility of fusion of data from different domains, for instance text-image. The current trend is to custom only one transformer to create a multimodal fusion model. Conversely, we propose a new approach to realize data fusion using three transformers. In this paper, we first solved the missing satellite images problem, by interpolating them with a ConvLSTM model. Then, proposed a multimodal fusion architecture that jointly learns to process visual and weather information. The architecture is built from three main components, a Vision Transformer and two transformer-encoders, allowing to fuse both image and weather modalities. The results of the proposed method are promising achieving 97\% overall accuracy.
翻译:作物疾病严重影响农业生产的数量和质量。在精确农业的目标是尽量减少甚至避免使用农药的情况下,经过深层次学习的天气和遥感数据可以在发现作物疾病方面发挥关键作用,使作物得到局部处理。然而,天气和图像等各种数据相结合仍然是一个热点和具有挑战性的任务。变压器结构的最近发展表明有可能将不同领域的数据(例如文字图像)融合在一起。目前的趋势是只定制一个变压器来创建多式联运混合模型。相反,我们建议采用新的方法,利用三个变压器实现数据融合。在本文件中,我们首先解决了缺失的卫星图像问题,方法是与ConvLSTM模型进行相互调试。然后,我们提出了一个多式集成结构,共同学习处理视觉和天气信息。该结构由三个主要组成部分,即视野变压器和两个变压器组成,能够将图像和天气模式融为一体。拟议方法的结果有望达到97个总体精确度。