Application telemetry refers to measurements taken from software systems to assess their performance, availability, correctness, efficiency, and other aspects useful to operators, as well as to troubleshoot them when they behave abnormally. Many modern observability platforms support dimensional models of telemetry signals where the measurements are accompanied by additional dimensions used to identify either the resources described by the telemetry or the business-specific attributes of the activities (e.g., a customer identifier). However, most of these platforms lack any semantic understanding of the data, by not capturing any metadata about telemetry, from simple aspects such as units of measure or data types (treating all dimensions as strings) to more complex concepts such as purpose policies. This limits the ability of the platforms to provide a rich user experience, especially when dealing with different telemetry assets, for example, linking an anomaly in a time series with the corresponding subset of logs or traces, which requires semantic understanding of the dimensions in the respective data sets. In this paper, we describe a schema-first approach to application telemetry that is being implemented at Meta. It allows the observability platforms to capture metadata about telemetry from the start and enables a wide range of functionalities, including compile-time input validation, multi-signal correlations and cross-filtering, and even privacy rules enforcement. We present a collection of design goals and demonstrate how schema-first approach provides better trade-offs than many of the existing solutions in the industry.
翻译:应用遥测是指从软件系统中测量数据,以评估其性能、可得性、正确性、效率及其他方面,对操作者有用,并在操作者行为异常时排除困难。许多现代可观察平台支持遥测信号的维维模式,在这种模式中,测量的同时,还使用额外的维度来识别遥测或活动具体业务特征(例如客户识别特征)所描述的资源。然而,这些平台大多缺乏对数据的任何语义理解,没有获取任何遥测元数据,从计量单位或数据类型(将所有层面都作为字符串)到更复杂的概念(目的政策),这限制了平台提供丰富用户经验的能力,特别是在处理不同的遥测资产时,例如,将时间序列中的异常与相应的日志或轨迹系列(例如,客户识别特征)联系起来,这就要求对各数据集的维度进行语义理解。在本文件中,我们描述的是用于应用遥测的多种方法,例如测量单位或数据类型(作为字符串)到更复杂的概念概念。这限制了平台提供丰富的用户经验,特别是从不同的行业设计规则收集中获取更好的数据,我们所开始,从而展示了目前关于远程测量的系统路路路路路的系统。