We consider a class of optimization problems over stochastic variables where the algorithm can learn information about the value of any variable through a series of costly steps; we model this information acquisition process as a Markov Decision Process (MDP). The algorithm's goal is to minimize the cost of its solution plus the cost of information acquisition, or alternately, maximize the value of its solution minus the cost of information acquisition. Such bandit superprocesses have been studied previously but solutions are known only for fairly restrictive special cases. We develop a framework for approximate optimization of bandit superprocesses that applies to arbitrary processes with a matroid (and in some cases, more general) feasibility constraint. Our framework establishes a bound on the optimal cost through a novel cost amortization; it then couples this bound with a notion of local approximation that allows approximate solutions for each component MDP in the superprocess to be composed without loss into a global approximation. We use this framework to obtain approximately optimal solutions for several variants of bandit superprocesses for both maximization and minimization. We obtain new approximations for combinatorial versions of the previously studied Pandora's Box with Optional Inspection and Pandora's Box with Partial Inspection; as well as approximation algorithms for a new problem that we call the Weighing Scale problem.
翻译:暂无翻译