We introduce an approach to the targeted completion of lacunae in molecular data sets which is driven by topological data analysis, such as Mapper algorithm. Lacunae are filled in using scaffold-constrained generative models trained with different scoring functions. The approach enables addition of links and vertices to the skeletonized representations of the data, such as Mapper graph, and falls in the broad category of network completion methods. We illustrate application of the topology-driven data completion strategy by creating a lacuna in the data set of onium cations extracted from USPTO patents, and repairing it.
翻译:我们引入了一种有针对性地完成分子数据集空白的方法,这种方法由地貌学数据分析(如地图算法)驱动。 填补空白的方法是使用经过不同评分功能培训的脚手架受限制的基因模型。 这种方法使数据骨架化的表达方式(如地图图)能够增加链接和脊椎,并属于网络完成方法的广泛类别。 我们通过在从USPTO专利提取的金属结晶数据集中设置一个空白并进行修复,来说明由地形学驱动的数据完成战略的应用。