Identifying high-quality webpages is fundamental for real-world search engines, which can fulfil users' information need with the less cognitive burden. Early studies of \emph{webpage quality assessment} usually design hand-crafted features that may only work on particular categories of webpages (e.g., shopping websites, medical websites). They can hardly be applied to real-world search engines that serve trillions of webpages with various types and purposes. In this paper, we propose a novel layout-aware webpage quality assessment model currently deployed in our search engine. Intuitively, layout is a universal and critical dimension for the quality assessment of different categories of webpages. Based on this, we directly employ the meta-data that describes a webpage, i.e., Document Object Model (DOM) tree, as the input of our model. The DOM tree data unifies the representation of webpages with different categories and purposes and indicates the layout of webpages. To assess webpage quality from complex DOM tree data, we propose a graph neural network (GNN) based method that extracts rich layout-aware information that implies webpage quality in an end-to-end manner. Moreover, we improve the GNN method with an attentive readout function, external web categories and a category-aware sampling method. We conduct rigorous offline and online experiments to show that our proposed solution is effective in real search engines, improving the overall usability and user experience.
翻译:确定高质量的网页对于现实世界搜索引擎来说至关重要,因为现实世界搜索引擎能够满足用户的信息需求,而认知负担则较少。早期对 emph{webpage 质量评估的研究表明,早期设计手工制作的功能通常只能用于特定类别的网页(例如购物网站、医疗网站),很难应用于为不同类型和目的提供数万亿网页的实时搜索引擎。在本文中,我们提议了一个新的版面设计-有意识网页质量评估模型,目前我们安装在搜索引擎中。直观地说,布局是不同类别网页质量评估质量评估的通用和关键层面。基于这一点,我们直接使用描述网页特定类别(例如购物网站网站网站、医疗网站网站网站网站)的元数据,作为我们模型的输入。DOM树数据统一了不同类别和不同用途的网页的表述,并标明了网页版图的布局质量评估模式。我们建议基于图表的神经网络(GNNW)是一个通用的通用和关键层面评估。基于图表的网络质量评估方法,以提取丰富的布局面图的总体搜索方式,我们用GOOsmoal 样的服务器功能将显示一种在线搜索方法,我们从网络升级到升级到升级到浏览的系统,以显示一种浏览的外部方法,以浏览方式改进了一种浏览方式,以浏览的浏览的浏览的浏览的浏览方法,以浏览方式在浏览到浏览方式在浏览方式在浏览方式在浏览的浏览的浏览方式在浏览方法,我们阅读的升级方法在浏览的升级方法在浏览的升级方法,我们阅读到浏览最后的流程方法在浏览式方法在浏览式方法在浏览式方法在浏览的流程中,在浏览的分类中改进了一种浏览式方法,在浏览式方法在浏览式方法在浏览的操作方法,在浏览到浏览的操作方法中以浏览的分类中展示了一种浏览的操作方法,在浏览的操作方法,在浏览方法在浏览方法中用的方法在浏览方式上,在浏览方式上的一种方法上的一种方法中在浏览的升级方法在浏览的操作方法,在浏览方式上展示的一种方法在浏览方式上的一种方法在浏览方式上的一种方法,在浏览方法在浏览的操作方法在浏览方式上的一种方法中用的方法在浏览方式上的一种方法在浏览方式上展示了一种方法在浏览方式上的一种方法在浏览