We consider a collection of statistically identical two-state continuous time Markov chains (channels). A controller continuously selects a channel with the view of maximizing infinite horizon average reward. A switching cost is paid upon channel changes. We consider two cases: full observation (all channels observed simultaneously) and partial observation (only the current channel observed). We analyze the difference in performance between these cases for various policies. For the partial observation case with two channels, or an infinite number of channels, we explicitly characterize an optimal threshold for two sensible policies which we name "call-gapping" and "cool-off". Our results present a qualitative view on the interaction of the number of channels, the available information, and the switching costs.
翻译:我们考虑的是一组统计上完全相同的连续两州时间马可夫链(马可夫链)。控制者不断选择一个频道,以期最大限度地获得无限平均回报。在频道变化时支付转换成本。我们考虑两个案例:全面观察(同时观察所有频道)和部分观察(只观察当前频道 ) 。我们分析不同政策中这些案例的绩效差异。对于两个频道或无限多的频道的局部观察案例,我们明确地为两个明智政策设定了最佳门槛,我们称之为“呼叫抓捕”和“冷却 ” 。我们的结果对频道数量、现有信息和转换成本之间的相互作用提供了定性观点。