Tribuo: Java 的机器学习与验证 (Tribuo: Machine Learning with Provenance in Java)

Machine Learning models are deployed across a wide range of industries, performing a wide range of tasks. Tracking these models and ensuring they behave appropriately is becoming increasingly difficult as the number of deployed models increases. There are also new regulatory burdens for ML systems which affect human lives, requiring a link between a model and its training data in high-risk situations. Current ML monitoring systems often provide provenance and experiment tracking as a layer on top of an ML library, allowing room for imperfect tracking and skew between the tracked object and the metadata. In this paper we introduce Tribuo, a Java ML library that integrates model training, inference, strong type-safety, runtime checking, and automatic provenance recording into a single framework. All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms, hyperparameters and data transformation steps automatically. The provenance lives inside the model object and can be persisted separately using common markup formats. Tribuo implements many popular ML algorithms for classification, regression, clustering, multi-label classification and anomaly detection, along with interfaces to XGBoost, TensorFlow and ONNX Runtime. Tribuo's source code is available at https://github.com/oracle/tribuo under an Apache 2.0 license with documentation and tutorials available at https://tribuo.org.

翻译：机器学习模型分布于广泛的行业,执行各种各样的任务。随着部署模型数量的增加,跟踪这些模型并确保其行为得当正变得越来越困难。对于影响人类生活的ML系统,还有新的监管负担,需要模型与其高风险情况下的培训数据联系起来。当前的ML监测系统通常提供出处和实验跟踪,作为ML图书馆的一层,为跟踪对象和元数据之间的不完善跟踪和扭曲提供空间。在本文件中,我们引入了Tribuo,一个Java ML图书馆,将模型培训、推断、强型安全、运行时间检查和自动出处记录纳入一个单一框架。所有Tribuo的模型和评价记录了投入数据的全部处理管道,以及培训算法、超参数和数据转换步骤的自动连接。示范对象内部有源,并且可以使用通用的标记格式单独存在。Tribuoo执行许多流行的ML算法,用于分类、回归、集成、多标签分类和异常检测,同时在 XGBoost、Tentor和OnFors下与Treal-Fors的可查到的Rest/Ashimax 代码接口。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/