We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way. We expect interactions between allocation targets to be important and therefore propose to learn the reward structure for near-optimal allocation policies with a GNN. By relaxing the resource constraint, we can employ gradient-based optimization in contrast to more standard evolutionary algorithms. Our algorithm is motivated by a problem in modern astronomy, where one needs to select-based on limited initial information-among $10^9$ galaxies those whose detailed measurement will lead to optimal inference of the composition of the universe. Our technique presents a way of flexibly learning an allocation strategy by only requiring forward simulators for the physics of interest and the measurement process. We anticipate that our technique will also find applications in a range of resource allocation problems.
翻译:通过学习如何在不受监督的情况下分配资源,我们提出了一个最大限度地实现全球效用功能的方法。我们期望分配目标之间的相互作用是重要的,因此,我们提议学习接近最佳分配政策的奖励结构。通过放松资源限制,我们可以采用基于梯度的优化方法,而不是更标准的演化算法。我们的算法是由现代天文学中的一个问题驱动的,在现代天文学中,人们需要根据有限的初步信息选择10美9美分的星系,这些星系的详细测量将导致对宇宙构成的最佳推论。我们的技术是一种灵活学习分配战略的方法,只要求对利益物理学和测量过程进行前瞻模拟。我们预计,我们的技术也将在一系列资源分配问题上找到应用。