We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an $\varepsilon$-approximately optimal (Bayesian) aggregator. We study the sample complexity of this problem. We show that, for arbitrary discrete distributions, the number of samples must be at least $\tilde \Omega(m^{n-2} / \varepsilon)$, where $m$ is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts $n$. But if experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to $\tilde O(1 / \varepsilon^2)$, which does not depend on $n$.
翻译:我们考虑一种巴伊西亚预测汇总模型,即专家在对未知的二进制事件观测私人信号后,向一位校长报告其事后对事件的看法,然后由他将报告汇总成单一的预测。专家的信号和事件结果经过了本校所不知道的联合分发,但本校本可以查阅经分发的i.d.“样本”,每份样本都是专家报告(而非信号)和事件的实现。利用这些样本,主要目的是找到一个大约为最佳的(巴耶西亚)美元(Bayesian)聚合器。我们研究这一问题的抽样复杂性。我们表明,对于任意的离散分发而言,样品的数量必须至少是$\tilde\Omega(m ⁇ -2}/\ varepsilon)美元,其中每份样本是专家报告(而不是信号)的大小。这种样本复杂性在专家人数中以指数增长为单位。但如果专家的信号不是独立的,那么对于实现Ox美元(美元)的复杂程度,则取决于Oxx的样品的大小。