High-dimensional reduction methods are powerful tools for describing the main patterns in big data. One of these methods is the topological data analysis (TDA), which modeling the shape of the data in terms of topological properties. This method specifically translates the original data into two-dimensional system, which is graphically represented via the 'persistence diagram'. The outliers points on this diagram present the data pattern, whereas the other points behave as a random noise. In order to determine which points are significant outliers, replications of the original data set are needed. Once only one original data is available, replications can be created by fitting a model for the points on the persistence diagram, and then using the MCMC methods. One of such model is the RST (Replicating Statistical Topology). In this paper we suggest a modification of the RST model. Using a simulation study, we show that the modified RST improves the performance of the RST in terms of goodness of fit. We use the MCMC Metropolis-Hastings algorithm for sampling according to the fitted model.
翻译:高维减少方法是描述海量数据主要模式的有力工具。 这种方法之一是地形数据分析( TDA), 它以地形特性为数据形状的模型。 这种方法具体地将原始数据转换成二维系统, 通过“ 常量图” 以图形形式表示。 此图表的外端点显示数据模式, 而其他点则表现为随机噪音。 为了确定哪些点是重要外端, 需要复制原始数据集。 一旦只有一个原始数据, 就可以通过在持久性图上安装点模型来复制数据, 然后使用 MCMC 方法。 其中之一是 RST( 复制统计图示 ) 。 在本文中, 我们建议修改 RST 模型。 我们使用模拟研究, 显示修改后的 RST 提高了 RST 的性能。 我们使用 MC Metropolis- Hastings 算法来根据合适的模型进行取样 。