Consider an agent exploring an unknown graph in search of some goal state. As it walks around the graph, it learns the nodes and their neighbors. The agent only knows where the goal state is when it reaches it. How do we reach this goal while moving only a small distance? This problem seems hopeless, even on trees of bounded degree, unless we give the agent some help. This setting with ''help'' often arises in exploring large search spaces (e.g., huge game trees) where we assume access to some score/quality function for each node, which we use to guide us towards the goal. In our case, we assume the help comes in the form of distance predictions: each node $v$ provides a prediction $f(v)$ of its distance to the goal vertex. Naturally if these predictions are correct, we can reach the goal along a shortest path. What if the predictions are unreliable and some of them are erroneous? Can we get an algorithm whose performance relates to the error of the predictions? In this work, we consider the problem on trees and give deterministic algorithms whose total movement cost is only $O(OPT + \Delta \cdot ERR)$, where $OPT$ is the distance from the start to the goal vertex, $\Delta$ the maximum degree, and the $ERR$ is the total number of vertices whose predictions are erroneous. We show this guarantee is optimal. We then consider a ''planning'' version of the problem where the graph and predictions are known at the beginning, so the agent can use this global information to devise a search strategy of low cost. For this planning version, we go beyond trees and give an algorithms which gets good performance on (weighted) graphs with bounded doubling dimension.
翻译:当一个代理商在寻找某个目标状态时, 考虑一个探索未知的图形的代理商。 当它绕图走过时, 它会学习节点及其邻居。 代理商只知道目标状态是何时到达的。 我们如何在短距离移动时达到这个目标? 这个问题似乎毫无希望, 即使在约束程度的树上, 除非我们给代理商一些帮助。 这个设置“ help ”, 通常出现在探索大搜索空间( 例如, 巨大的游戏树 ) 时, 我们假设每个节点有某种分/ 质量 功能, 我们用来指导我们走向目标。 在我们的案例中, 我们假设帮助的形式是远距离预测: 每个点$(v) 提供它距离目标顶点的预测$(v) $(v) 。 如果这些预测是正确的, 我们可以在最短的路径上达到目标。 如果预测是不可靠且有些错误? 我们能否得到一个与预测错误的数值相关的算法? 在这个工作中, 我们考虑树上的问题, 给出确定性算法, 以远距离预测的形式算法的形式, 其总值的值值值值值值值值值值值值值值值值值值是美元, 开始是美元, 。 该数值值的数值值值值的值的值的值的值值值值值值值值的值的值的值的值的值的值的值值值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值的值