The optimal sample allocation in stratified sampling is one of the basic issues of modern survey sampling methodology. It is a procedure of dividing the total sample among pairwise disjoint subsets of a finite population, called strata, such that for chosen survey sampling designs in strata, it produces the smallest variance for estimating a population total (or mean) of a given study variable. In this paper we are concerned with the optimal allocation of a sample, under lower and upper bounds imposed jointly on the sample strata-sizes. We will consider a family of sampling designs that give rise to variances of estimators of a natural generic form. In particular, this family includes simple random sampling without replacement (abbreviated as SI) in strata, which is perhaps, the most important example of stratified sampling design. First, we identify the allocation problem as a convex optimization problem. This methodology allows to establish a generic form of the optimal solution, so called optimality conditions. Second, based on these optimality conditions, we propose new and efficient recursive algorithm, named RNABOX, which solves the allocation problem considered. This new algorithm can be viewed as a generalization of the classical recursive Neyman allocation algorithm, a popular tool for optimal sample allocation in stratified sampling with SI design in all strata, when only upper bounds are imposed on sample strata-sizes. We implement the RNABOX in R as a part of our package stratallo, which is available from the Comprehensive R Archive Network (CRAN). Finally, in the context of the established optimality conditions, we briefly discuss two existing methodologies dedicated to the allocation problem being studied: the noptcond algorithm introduced in Gabler, Ganninger and M\"unnich (2012); and fixed iteration procedures from M\"unnich, Sachs and Wagner (2012).
翻译:在分层抽样中,最佳取样分配是现代抽样调查方法的基本问题之一。它是将总样本分配给有限总体的两两不交子集(称为层),以便在选择的样本调查设计下,产生用于估计给定研究变量的总体总数(或平均值)的最小方差的过程。在本文中,我们关注的是在联合约束样本层大小的下限和上限的情况下,最佳取样分配。我们将考虑一系列抽样设计,这些抽样设计产生的估计量方差具有自然常见的形式。特别地,这个家族包括在层中无替换的简单随机抽样(缩写为SI),该抽样设计是分层抽样中最重要的例子。首先,我们将分配问题确定为凸优化问题。该方法允许建立最优解的通用形式,称为最优性条件。其次,基于这些最优性条件,我们提出了一种新的高效递归算法,命名为RNABOX,用于解决所考虑的分配问题。这种新算法可以看作是经典的递归Neyman分配算法的推广,在所有层中采用SI设计时,只对样本层大小施加上限约束时的最佳取样分配工具。我们在R中将RNABOX作为stratallo包的一部分实现,该包可从综合R档案网络(CRAN)中获取。最后,在确定的最优性条件的背景下,我们简要讨论了两种现有的专用于研究中考虑的分配问题的方法论:Gabler,Ganninger和Munnich(2012)介绍的noptcond算法; 以及Munnich,Sachs和Wagner(2012)的固定迭代程序。