In this paper, we present DiRecGNN, an attention-enhanced entity recommendation framework for monitoring cloud services at Microsoft. We provide insights on the usefulness of this feature as perceived by the cloud service owners and lessons learned from deployment. Specifically, we introduce the problem of recommending the optimal subset of attributes (dimensions) that should be tracked by an automated watchdog (monitor) for cloud services. To begin, we construct the monitor heterogeneous graph at production-scale. The interaction dynamics of these entities are often characterized by limited structural and engagement information, resulting in inferior performance of state-of-the-art approaches. Moreover, traditional methods fail to capture the dependencies between entities spanning a long range due to their homophilic nature. Therefore, we propose an attention-enhanced entity ranking model inspired by transformer architectures. Our model utilizes a multi-head attention mechanism to focus on heterogeneous neighbors and their attributes, and further attends to paths sampled using random walks to capture long-range dependencies. We also employ multi-faceted loss functions to optimize for relevant recommendations while respecting the inherent sparsity of the data. Empirical evaluations demonstrate significant improvements over existing methods, with our model achieving a 43.1% increase in MRR. Furthermore, product teams who consumed these features perceive the feature as useful and rated it 4.5 out of 5.
翻译:本文提出DiRecGNN,一种面向微软云服务监控的注意力增强实体推荐框架。我们深入分析了云服务所有者对该功能实用性的认知,并总结了实际部署中的经验教训。具体而言,我们研究了为云服务自动化看门狗(监控器)推荐应追踪的最优属性(维度)子集的问题。首先,我们在生产规模上构建了监控器异构图。这些实体的交互动态通常具有有限的结构与参与信息特征,导致现有前沿方法性能欠佳。此外,传统方法因其同质性特点,难以捕捉长程实体间的依赖关系。为此,我们提出一种受Transformer架构启发的注意力增强实体排序模型。该模型采用多头注意力机制聚焦于异构邻居及其属性,并通过随机游走采样的路径进一步捕捉长程依赖关系。我们还采用多维度损失函数,在优化相关推荐的同时兼顾数据固有的稀疏性。实证评估表明,相较于现有方法,本模型取得了显著改进,平均倒数排名提升了43.1%。此外,使用该功能的产品团队认为其具有较高实用性,在5分制评分中给出了4.5分的评价。