Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework.
翻译:最近关于设定功能的小型批量一致性(MBC)工作已提请人们注意,需要按顺序处理和汇总已分割的集成块块,同时保证所有分区的相同产出;然而,目前对MBC结构的限制导致模型的显示力有限;此外,先前的工作没有涉及在培训期间如何处理大型组,因为需要全套梯度。为解决这些问题,我们提议了一个通用MBC(UMBC)级的集成功能,可与任意的非MBC组件结合使用,同时仍满足MBC的要求,使更多的功能类能够在MBC设置中使用。此外,我们提议了一个高效的MBC培训算法,对全套梯度进行不偏袒的近似,对火车和测试时间的任何设定大小都具有恒定的记忆管理。我们进行了广泛的实验,包括图像完成、文本分类、不受监督的集群和高分辨率图像上的癌症检测,以核实我们可扩展的集成编码框架的效率和效力。