The importance of aggregated count data, which is calculated from the data of multiple individuals, continues to increase. Collective Graphical Model (CGM) is a probabilistic approach to the analysis of aggregated data. One of the most important operations in CGM is maximum a posteriori (MAP) inference of unobserved variables under given observations. Because the MAP inference problem for general CGMs has been shown to be NP-hard, an approach that solves an approximate problem has been proposed. However, this approach has two major drawbacks. First, the quality of the solution deteriorates when the values in the count tables are small, because the approximation becomes inaccurate. Second, since continuous relaxation is applied, the integrality constraints of the output are violated. To resolve these problems, this paper proposes a new method for MAP inference for CGMs on path graphs. First we show that the MAP inference problem can be formulated as a (non-linear) minimum cost flow problem. Then, we apply Difference of Convex Algorithm (DCA), which is a general methodology to minimize a function represented as the sum of a convex function and a concave function. In our algorithm, important subroutines in DCA can be efficiently calculated by minimum convex cost flow algorithms. Experiments show that the proposed method outputs higher quality solutions than the conventional approach.
翻译:根据多个个人的数据计算出的综合计数数据的重要性继续增加。 集体图形模型(CGM)是分析汇总数据的一种概率方法。 在 CGM 中,最重要的操作之一是根据给定的观察,对未观测的变量进行事后(MAP)的最大推断。 因为对通用的CGM的计算问题被证明是硬的, 一种解决近似问题的方法已经提出。 但是, 这种方法有两个主要的缺点。 首先, 当计数表中的数值很小时, 解决方案的质量会恶化, 因为近似会变得不准确。 其次, 由于持续放松, 产出的综合性限制会受到侵犯。 要解决这些问题, 本文建议了一种新方法, MAP在路径图上对CGGM的推论问题进行推论。 首先, 我们表明, MAP的推论问题可以被描述为一种( 非线性) 最低成本流。 然后, 我们应用Convex Algoithm (DCA) 的差异, 这是一种普通方法, 以最大限度地减少常规算算算算法中的重要的排序函数。