Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the computational problem of determining the TV distance between two product distributions over the domain $\{0,1\}^n$. We establish the following results. 1. Exact computation of TV distance between two product distributions is $\#\mathsf{P}$-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals. 2. Given two product distributions $P$ and $Q$ with marginals of $P$ being at least $1/2$ and marginals of $Q$ being at most the respective marginals of $P$, there exists a fully polynomial-time randomized approximation scheme (FPRAS) for computing the TV distance between $P$ and $Q$. In particular, this leads to an efficient approximation scheme for the interesting case when $P$ is an arbitrary product distribution and $Q$ is the uniform distribution. We pose the question of characterizing the complexity of approximating the TV distance between two arbitrary product distributions as a basic open problem in computational statistics.
翻译:总变差距离(TV距离)是概率分布之间的距离的基本概念。 在这项工作中,我们提出并研究计算确定两个产品在域内分配之间的电视距离的计算问题,我们确定以下结果。 1. 两种产品分配之间的电视距离精确计算为$mathsf{P}$已完成,这与其他距离措施截然相反,如KL、Chi-square和Hellinger等,它们拉长于边际。 2. 鉴于两个产品分布为$和Q$,其边际至少1/2美元,而$的边际为$的边际为$,而$的边际最多为$的边际为$的边际为$的边际,我们提出了一个计算P$和$之间的电视距离的全多盘点随机调整方案(FPRAS),特别是当美元是任意产品分配和$Q美元是统一分布时,这就形成了一个有趣的近似方案。我们提出了一个问题,即如何说明在两次任意计算基本产品分布的公开距离时,对电视分配的距离的复杂程度进行定性。