The past decade witnesses a rapid development in the measurement and monitoring technologies for food science. Among these technologies, spectroscopy has been widely used for the analysis of food quality, safety, and nutritional properties. Due to the complexity of food systems and the lack of comprehensive predictive models, rapid and simple measurements to predict complex properties in food systems are largely missing. Machine Learning (ML) has shown great potential to improve classification and prediction of these properties. However, the barriers to collect large datasets for ML applications still persists. In this paper, we explore different approaches of data annotation and model training to improve data efficiency for ML applications. Specifically, we leverage Active Learning (AL) and Semi-Supervised Learning (SSL) and investigate four approaches: baseline passive learning, AL, SSL, and a hybrid of AL and SSL. To evaluate these approaches, we collect two spectroscopy datasets: predicting plasma dosage and detecting foodborne pathogen. Our experimental results show that, compared to the de facto passive learning approach, AL and SSL methods reduce the number of labeled samples by 50% and 25% for each ML application, respectively.
翻译:过去十年来,食品科学的测量和监测技术迅速发展,在这些技术中,光谱学被广泛用于分析食品质量、安全和营养特性。由于食品系统的复杂性和缺乏全面预测模型,基本上缺少快速和简单的测量来预测食品系统中的复杂特性。机器学习(ML)显示有极大潜力改进这些特性的分类和预测。然而,收集大型ML应用数据集的障碍依然存在。在本文中,我们探索了数据说明和模型培训的不同方法,以提高ML应用的数据效率。具体地说,我们利用积极学习(AL)和半强化学习(SSL),并调查了四种方法:基线被动学习、AL、SSL以及AL和SSL的混合。为了评估这些方法,我们收集了两个光谱学数据集:预测等离子剂量和检测食物传播的病原体。我们的实验结果表明,与事实上的被动学习方法相比,AL和SSL方法使每个ML应用的标签样本分别减少50%和25%。