The simplest and often most effective way of parallelizing the training of complex machine learning models is to execute several training instances on multiple machines, possibly scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure. Often, such a meta learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns. In this contribution we discuss how a set of RestAPIs can be used to access a dedicated service based on INFN Cloud to monitor and possibly coordinate multiple training instances, with gradient-less optimization techniques, via simple HTTP requests. The service, named Hopaas (Hyperparameter OPtimization As A Service), is made of web interface and sets of APIs implemented with a FastAPI back-end running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python front-end is also made available for quick prototyping. We present applications to hyperparameter optimization campaigns performed combining private, INFN Cloud and CINECA resources.
翻译:将复杂机器学习模式的培训与复杂机器学习模式培训平行化的最简单而且往往是最有效的方法是在多机器上执行若干培训实例,可能扫描超光度空间,以优化基本统计模式和学习程序;这种元学习程序往往受到以下因素的限制:能够安全地访问一个共同数据库,以组织先前和正在进行的试验的知识;在设计这种优化运动时,利用在不同环境中提供的机会机会性GPPU是另一个挑战;在设计这种优化运动时,我们讨论了如何利用一套复式API来利用基于INNF Clound的专用服务来监测和可能协调多种培训实例,通过简单的HTTTP请求,使用无梯优化技术来监测并协调多个培训实例;称为Hopaas(Hyperparameter Opitimization as A Service)的服务是用网络接口和APIs集成的成套实施,其快速API通过Uvicorn和NGINX在虚拟情况下运行。在Apptuna提供的Bayesian技术上,目前优化算法以Optuna提供的Bython前端技术为基础。