In most data-scientific approaches, the principle of Maximum Entropy (MaxEnt) is used to a posteriori justify some parametric model which has been already chosen based on experience, prior knowledge or computational simplicity. In a perpendicular formulation to conventional model building, we start from the linear system of phenomenological constraints and asymptotically derive the distribution over all viable distributions that satisfy the provided set of constraints. The MaxEnt distribution plays a special role, as it is the most typical among all phenomenologically viable distributions representing a good expansion point for large-N techniques. This enables us to consistently formulate hypothesis testing in a fully-data driven manner. The appropriate parametric model which is supported by the data can be always deduced at the end of model selection. In the MaxEnt framework, we recover major scores and selection procedures used in multiple applications and assess their ability to capture associations in the data-generating process and identify the most generalizable model. This data-driven counterpart of standard model selection demonstrates the unifying prospective of the deductive logic advocated by MaxEnt principle, while potentially shedding new insights to the inverse problem.
翻译:在大多数数据科学方法中,最大负载(MAxEnt)原则被用于事后证明某些参数模型的合理性,该模型已经根据经验、先前知识或计算简单性选定。在常规模型建设的垂直配方中,我们从阴性约束线性系统开始,不时地在所有可行的分布分布上进行分布,以满足所提供的一系列限制。最大负载分布具有特殊作用,因为它在所有具有生命力的细微分布中最为典型,代表着大N技术的良好扩展点。这使我们能够始终如一地以完全数据驱动的方式制定假设测试。数据支持的适当参数模型总是可以在模型选择结束时推导出。在最大恩特框架内,我们回收多种应用中所使用的主要分数和选择程序,并评估它们捕捉数据生成过程中的关联和确定最可实现的模式的能力。这一由数据驱动的标准模型选择对应方展示了MaxEnt原则所倡导的推算逻辑的统一前景,同时可能提出新的洞察到反问题。