Neural architecture search (NAS) automates the design of deep neural networks. One of the main challenges in searching complex and non-continuous architectures is to compare the similarity of networks that the conventional Euclidean metric may fail to capture. Optimal transport (OT) is resilient to such complex structure by considering the minimal cost for transporting a network into another. However, the OT is generally not negative definite which may limit its ability to build the positive-definite kernels required in many kernel-dependent frameworks. Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings. Furthermore, we derive a novel parallel NAS, using quality k-determinantal point process on the GP posterior, to select diverse and high-performing architectures from a discrete set of candidates. Empirically, we demonstrate that our TW-based approaches outperform other baselines in both sequential and parallel NAS.
翻译:神经结构搜索(NAS) 将深神经网络的设计自动化。 搜索复杂和非连续结构的主要挑战之一是比较常规的欧洲-clide指标可能无法捕捉到的网络的相似性。 最佳运输(OT)通过考虑将网络运输到另一个网络的最低成本来适应如此复杂的结构。 然而, OT一般并不确定是否消极,这可能会限制其在许多依赖内核的框架下建立所需正- 定型内核的能力。 在树- Wasserstein(TW)(TW) (TW) (TW) (TW) (TW) (TW) (TW) (TW) (TW) 是OT的否定性明确变体) (TW) (TW) (TW) (TW) (TW) (TW) (TW) (TW) (TW) (TW) 是OTOT) (I-W) 的固定变体, 我们为神经结构开发了一个新的差异, 并在一个高斯进程模型中为相继的NAS 设置和平行设置的其他基线显示, 我们的TW(NAS) 显示我们基于TW 的方法超越了其他基线基线。