\textit{Binscatter}, or a binned scatter plot, is a very popular tool in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing mean, quantile, and other nonparametric regression functions in large data sets. It is also often used for informal evaluation of substantive hypotheses such as linearity or monotonicity of the unknown function. This paper presents a foundational econometric analysis of binscatter, offering an array of theoretical and practical results that aid both understanding current practices (i.e., their validity or lack thereof) as well as guiding future applications. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice, and provide a simple, valid approach. Our results include a principled choice for the number of bins, confidence intervals and bands, hypothesis tests for parametric and shape restrictions for mean, quantile, and other functions of interest, among other new methods, all applicable to canonical binscatter as well as to nonlinear, higher-order polynomial, smoothness-restricted and covariate-adjusted extensions thereof. Companion general-purpose software packages for \texttt{Python}, \texttt{R}, and \texttt{Stata} are provided. From a technical perspective, we present novel theoretical results for possibly nonlinear semi-parametric partitioning-based series estimation with random partitions that are of independent interest.
翻译:\ textit{ Binschatter} 或一个被拆分的散射图, 是应用微观经济学中非常流行的工具 。 它提供了灵活而微妙的视觉化和总结大数据集中平均、 量性和其他非参数回归函数的方法。 它也经常用于非正式评估实质性假设, 如未知函数的线性或单一度。 本文对文件夹散射提供了基础生态度分析, 提供了一系列理论和实践结果, 帮助既理解当前做法( 即, 其有效性或缺乏性), 也指导了未来应用程序。 特别是, 我们强调与当前做法中使用的正变调整方法有关的重要方法问题, 并且提供了简单、 有效的方法。 我们的结果包括对硬文件的数量、 信任间隔和波段、 对平均值、 量性和其他利益功能的假设测试, 所有这些都适用于 Canonical binchail 以及非线性、 更高排序的多元度、 平滑度- 直径 和 直径端- 常规软件 和 comtratratradeal- trendal- traltraal extlistal alistal listal listal