A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and ranking methods, they are competitive results to the state-of-the-art methods.
翻译:智能网络代理商的基本能力正在寻找和获取新信息。 互联网搜索引擎可靠地找到了正确的近地点, 但顶级结果可能是离预期目标的几条链接。 一种互补的方法是通过超链接导航, 使用一种能够理解本地内容的政策, 并选择能让其更接近目标的链接。 在本文中, 我们显示随机抽样的轨迹的行为克隆足以学习有效的链接选择政策。 我们用38M节点和387M边缘的维基百科图表版本展示了该方法。 该模型能够有效地在节点5和20个步骤之间运行, 分别将96%和92%的时间分开。 我们随后在下游事实核查中使用由此产生的嵌入和政策, 并询问在哪些情况下, 与基本的TF- IDF搜索和排序方法相结合, 它们对于最先进的方法具有竞争性的结果。