Motivated by recent progress on online linear programming (OLP), we study the online decision making problem (ODMP) as a natural generalization of OLP. In ODMP, there exists a single decision maker who makes a series of decisions spread out over a total of $T$ time stages. At each time stage, the decision maker makes a decision based on information obtained up to that point without seeing into the future. The task of the decision maker is to maximize the accumulated reward while overall meeting some predetermined $m$-dimensional long-term goal (linking) constraints. ODMP significantly broadens the modeling framework of OLP by allowing more general feasible regions (for local and goal constraints) potentially involving both discreteness and nonlinearity in each local decision making problem. We propose a Fenchel dual-based online algorithm for ODMP. At each time stage, the proposed algorithm requires solving a potentially nonconvex optimization problem over the local feasible set and a convex optimization problem over the goal set. Under the uniform random permutation model, we show that our algorithm achieves $O(\sqrt{mT})$ constraint violation deterministically in meeting the long-term goals, and $O(\sqrt{m\log m}\sqrt{T})$ competitive difference in expected reward with respect to the optimal offline decisions. We also extend our results to the grouped random permutation model.
翻译:在网上线性编程(OLP)最近取得进展的推动下,我们研究在线决策问题(ODMP),将其作为OLP的自然概括。在ODMP中,存在一个单一的决策者,在总共1美元的时间阶段作出一系列决定。在每一个时间阶段,决策者根据到那时为止获得的信息作出决定,而没有看到未来。决策者的任务是尽量扩大累积的奖励,同时在总体上满足某些预先确定的美元层面的长期目标(链接)限制。ODMP大大扩大了OLP的建模框架,允许更普遍可行的区域(针对当地和目标限制),可能涉及每个当地决策问题中的离散性和不直线性。我们为ODMP提议一个基于双基在线算法。在每一个时间阶段,拟议的算法要求解决当地可行模型上的潜在非混杂优化问题,以及目标设定上的混杂优化问题。在统一的随机调整模式下,我们还展示了我们的算法,即:以美元(S&QQQQQ}长期目标,实现我们的最高目标。