努力利用强化学习进行在天空上进行适应性光学控制 (Towards on-sky adaptive optics control using reinforcement learning)

J. Nousiainen,C. Rajani,M. Kasper,T. Helin,S. Y. Haffert,C. Vérinaud,J. R. Males,K. Van Gorkom,L. M. Close,J. D. Long,A. D. Hedglen,O. Guyon,L. Schatz,M. Kautz,J. Lumbres,A. Rodack,J. M. Knight,K. Miller

The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function. We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.

翻译：直接成像具有潜在可居住性的Exoplanets 是下一代地基极大型望远镜上高对比成像仪的初级科学案例之一。为实现这一要求很高的科学目标, 仪器配备了 eXreme 适应性光学( XAO) 系统, 以千赫兹至几千赫兹为框架控制数千个动画机。多数可居住性外相位于与其主星的小角分离点, 目前 XAO 系统控制法留下大量残留。改进后的方法包括静态矩阵波前重建以及气动控制等AOO控制战略因时间延迟误差而受到影响, 并且对错误登记敏感, 也就是说, 我们的目标是产生适应这些限制的控动方法, 提供显著的AOAO校校校校校校校校校校, 因此, 将ODOA 的大规模变异技术推广到磁力学习系统。改进后, 我们用PO4A 改进的方法, 学习动态模型, 并优化了OA 级实验室内部的轨变异实验系统,, 也使用了OAVA 的轨变校校校校校校校校内政策。