Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.
翻译:以纯绝对变量定义的现实世界黑箱功能的最佳化是研究的一个积极领域。特别是,优化和设计具有特定功能或结构特性的生物序列,对医学、材料科学和生物技术具有深远影响。模拟肛交(SA)和蒙特卡洛树搜索(MCTS)等独立搜索算法通常用于此类优化问题。为了提高这些算法的性能和抽样效率,我们提议使用现有方法与黑箱评估纯绝对绝对变量的代用模型相结合。为此,我们提出两种不同的表达方式,即团体理论Fourier扩展和缩略图一热编码布伦·福里埃扩展。为了了解这些表达方式,我们考虑两种不同的设置来更新我们的代用模型。首先,我们使用一种对抗性的在线回归设置,将每个代表方的四更具角色视为专家,而他们各自的系数则通过黑箱的指数更新规则更新。第二,我们考虑一种巴伊斯设定的设置,通过汤普森取样和影印地分析模型来选择一个更高级的模型,然后通过先期的模型来更新我们的模型,然后通过一个比级的模型,然后用一个模拟的模型来进行模拟的模拟的模拟的模型,然后通过一个模拟的模拟的模拟的模型,然后通过一个模拟的模拟的模拟的模拟的模拟的模型,然后通过一个模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模型,然后通过一个模拟的模拟的模拟的模拟的模拟的模拟的模型, 来进行演示式的模型,然后通过一个模拟的模拟的模型,然后通过一个模拟的模型,然后通过一个模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的