Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple compositions. Each additional function or service in a serverless composition introduces a new possible fault source and a new layer to obfuscate faults. Currently, serverless platforms offer only limited support for identifying runtime faults. Developers looking to observe their serverless compositions often have to rely on scattered logs and ambiguous error messages to pinpoint root causes. In this paper, we investigate the use of distributed tracing for improving the observability of faults in serverless applications. To this end, we first introduce a model for characterizing fault observability, then provide a prototypical tracing implementation - specifically, a developer-driven and a platform-supported tracing approach. We compare both approaches with our model, measure associated trade-offs (execution latency, resource utilization), and contribute new insights for troubleshooting serverless compositions.
翻译:无服务器应用程序可能特别难以解决,因为这些应用程序通常由各种管理和部分管理的服务组成。 错误往往无法预测,而且可能发生于多个点, 即使是简单的构件。 每一个额外的功能或服务在无服务器的构成中都引入了新的可能的断层源和新的层来混淆断层。 目前, 无服务器的平台只能提供有限的支持来识别运行时间错误。 想要观察其无服务器的构件的开发者往往不得不依靠分散的日志和模糊的错误信息来找出根源。 在本文中, 我们调查使用分布式追踪来改进无服务器应用程序中的缺陷的可视性。 为此, 我们首先引入了一种将错误可视性定性的模型, 然后提供一种原型追踪实施方法 — 具体地说, 一种开发者驱动的、 平台支持的追踪方法。 我们比较这两种方法与我们的模型, 衡量相关的交易( 执行 Latency、 资源利用), 并且为排除服务器无故障的构成提供新的洞察力。