Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.
翻译:大部分AI项目都是从一台单笔笔记本上运行的Python笔记本开始;然而,一个人通常需要经历一个痛苦的山峰,才能处理更大的数据集(用于实验和生产部署),这通常需要数据科学家们用许多人工和容易出错的步骤来充分利用现有的硬件资源(例如SIMMD指令、多处理、量化、记忆分配优化、数据分割、分配计算等)。为了应对这一挑战,我们在https://github.com/intel-analytics/BigDL/Apache 2.0许可证下打开了BigDL2.0的源码源(合并原BigDL和Analytical Zoo项目);使用BigDL 2.0,用户可以简单地在他们的笔记本上建立常规的Python笔记本(可能提供自动MLU支持),然后在一个节点上透明地加速(在实验中达到9.6x的速度),并且无缝地扩展成一个大型集群(在现实世界中使用数百个服务器)。 BigDL 2.0已经由许多真实世界的用户采用。