Serverless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. In this paper, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure pay-as-you-go cost model. With Flint, a developer uses PySpark exactly as before, but without needing an actual Spark cluster. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics.
翻译:围绕松散组合功能的函数引用组织起来的无服务器结构是许多应用的新设计。 最近的工作主要侧重于用户影响产品和事件驱动的处理管道。 在本文中,我们探索了应用空间的完全不同部分,并考察了使用无服务器结构分析大数据的可行性。我们向弗林特展示了一个火花执行引擎原型,它利用AWS Lambda的原型来提供一个纯现收现付成本模型。与弗林特一样,开发者利用了PySpark(PySpark),但不需要实际的火花集群。我们描述了弗林特的设计、实施和性能,以及与无服务器分析相关的挑战。