We consider a novel challenge: approximating a distribution without the ability to randomly sample from that distribution. We study how such an approximation can be obtained using *weight queries*. Given some data set of examples, a weight query presents one of the examples to an oracle, which returns the probability, according to the target distribution, of observing examples similar to the presented example. This oracle can represent, for instance, counting queries to a database of the target population, or an interface to a search engine which returns the number of results that match a given search. We propose an interactive algorithm that iteratively selects data set examples and performs corresponding weight queries. The algorithm finds a reweighting of the data set that approximates the weights according to the target distribution, using a limited number of weight queries. We derive an approximation bound on the total variation distance between the reweighting found by the algorithm and the best achievable reweighting. Our algorithm takes inspiration from the UCB approach common in multi-armed bandits problems, and combines it with a new discrepancy estimator and a greedy iterative procedure. In addition to our theoretical guarantees, we demonstrate in experiments the advantages of the proposed algorithm over several baselines. A python implementation of the proposed algorithm and of all the experiments can be found at https://github.com/Nadav-Barak/AWP.
翻译:我们考虑了一个新的挑战: 近似分布, 无法从分布中随机抽样。 我们研究如何使用* 重量查询* 来获得这种近似。 根据一些数据集, 重量查询将一个示例带给一个神器, 根据目标分布, 返回类似所展示的例子的概率。 这个神器可以代表对目标人口数据库的查询进行计数, 或连接一个检索引擎的界面, 返回匹配搜索结果的数量。 我们提议一个交互式算法, 迭接地选择数据集示例, 并进行相应的重量查询。 算法会找到一组按照目标分布来接近重量的数据集的重新加权值, 使用数量有限的重量查询。 我们根据算法发现的总变异距离和最佳可实现的重标值。 我们的算法可以从多臂土匪问题中常见的 UCB 方法中得到启发, 并且将它与新的差异估计器和贪婪的迭代程序结合起来。 除了我们的理论保证外, 我们还可以通过实验A/ ALA 的实验和 ALA/ pqual 所找到的一些基准。