We establish a dataset of over $1.6\times10^4$ experimental images of Bose-Einstein condensates containing solitonic excitations to enable machine learning (ML) for many-body physics research. About 33 % of this dataset has manually assigned and carefully curated labels. The remainder is automatically labeled using SolDet -- an implementation of a physics-informed ML data analysis framework -- consisting of a convolutional-neural-network-based classifier and object detector as well as a statistically motivated physics-informed classifier and a quality metric. This technical note constitutes the definitive reference of the dataset, providing an opportunity for the data science community to develop more sophisticated analysis tools, to further understand nonlinear many-body physics, and even advance cold atom experiments.
翻译:我们建立了一个超过1.6美元的实验性Bose-Einstein冷凝层的实验图像数据集,其中含有声学感应,使机器学习(ML)能够用于多体物理学研究。约33%的这一数据集手工分配和精心整理了标签。其余的则使用SolDet自动贴标签,这是实施一个物理学知情的ML数据分析框架 -- -- 包括一个革命-神经-网络分类器和物体探测器,以及一个具有统计动机的物理学知情分类器和质量指标。这一技术说明构成了数据集的明确参考,为数据科学界开发更先进的分析工具,进一步理解非线性多体物理学,甚至推进冷原子实验提供了机会。