透明无服务器执行 Python 多处理应用程序 (Transparent Serverless execution of Python multiprocessing applications)

Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the Cloud. This paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared state. To evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, Pandaral.lel's dataframe, and ScikitLearn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multiprocessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory latency. The results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.

翻译：访问的透明性意味着本地和远程资源都使用相同的操作。有了透明度, 未经修改的单机应用程序可以运行到分解的计算、存储和存储资源。通过透明性来隐藏分布式系统的复杂性将带来巨大的好处, 比如在云中灵活分解的资源中, 扩大本地平行的科学应用。本文展示了一种绩效评估, 我们评估了使用最新水平的云分解资源对Python多处理应用程序的存取透明度的可行性。我们将多处理模块与一个透明运行服务器无服务器功能进程并使用模拟数据存储共享共享状态的实施模块连接起来。为了评估透明度, 我们运行在云中四个未修改的分解式应用程序: Uber 研究的进化战略, Blairs-AI Proximal Politial Political 政策优化, PandargiltLearn' 超声频谱调。我们用我们的图书馆的分解处理时间和同一应用程序的可缩放宽度, 使用单机多处理库的多处理库库库库库, 使用大型VM 的单机多式多式多式多处理图书馆图书馆图书馆图书馆图书馆图书馆图书馆图书馆图书馆图书馆图书馆库, 实现透明化的存取等等高透明性。