Finding the origin of short phrases propagating through the web has been formalized by Leskovec et al. [ACM SIGKDD 2009] as DAG Partitioning: given an arc-weighted directed acyclic graph on $n$ vertices and $m$ arcs, delete arcs with total weight at most $k$ such that each resulting weakly-connected component contains exactly one sink---a vertex without outgoing arcs. DAG Partitioning is NP-hard. We show an algorithm to solve DAG Partitioning in $O(2^k \cdot (n+m))$ time, that is, in linear time for fixed $k$. We complement it with linear-time executable data reduction rules. Our experiments show that, in combination, they can optimally solve DAG Partitioning on simulated citation networks within five minutes for $k\leq190$ and $m$ being $10^7$ and larger. We use our obtained optimal solutions to evaluate the solution quality of Leskovec et al.'s heuristic. We show that Leskovec et al.'s heuristic works optimally on trees and generalize this result by showing that DAG Partitioning is solvable in $2^{O(w^2)}\cdot n$ time if a width-$w$ tree decomposition of the input graph is given. Thus, we improve an algorithm and answer an open question of Alamdari and Mehrabian [WAW 2012]. We complement our algorithms by lower bounds on the running time of exact algorithms and on the effectivity of data reduction.
翻译:Leskovec 等人 [ACM SIGKDD 2009] 将查找在网上传播的短短语的来源正式确定为 DAG 分区 : 给一个以美元为顶端和以美元弧值为单位的二次加权定向环状图, 以美元为单位删除总重量为单位的弧值, 这样每个导致连接不全的部件都完全包含一个水槽- 一个顶点, 而不释放弧。 DAG 分割是硬的。 我们展示了一种算法, 以美元( 2Qk\ cdddd) 解决 DAG 分割( 以美元为单位的 $( n+m ) 。 也就是说, 以美元为单位的线性直线性直线直线方向, 我们用线性可执行的数据缩减规则来补充它。 我们实验显示, 组合起来, 它们可以在5分钟内以美元为单位解决DAAAG 的模拟引用。 美元, 10+7美元为单位。 我们使用我们获得的最佳解决方案来评估 LESkovec 和alalalal 的计算结果, 如果以平流平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平。