Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

The Fastest Mixing Markov Chain on a Graph

The paper can be found here. The authors are Stephen Boyd, Persi Diaconis and Lin Xiao. I have found the paper while looking at the papers by Perso Diaconis, a notable mathematician and magician.

The paper talks about exactly what the title suggests: You are given a finite graph and you can set up a random walk on this graph by determining the transition probabilities between vertices that are connected by an edge. The walk must be symmetric so that the uniform distribution is a stationary distribution of this walk. Assuming that the associated Markov chain is irreducible and symmetric, the state distribution will converge to the uniform. The task is to maximize the rate of convergence of this. The solution is the Fastest Mixing Markov Chain on the graph (FMMC).

The authors show that this is a convex optimization problem and give a polynomial algorithm (based on semidefinite programming) to find the solution. A subgradient method is given that can be more effective for larger graphs. The solution is generalized to the non-symmetric case by considering reversible Markov chains.

Why I find this paper interesting? Well, it is always nice to find out that a problem is convex hence can be solved in an "efficient" manner. But the main reason is that if an agent wanted to find out the most about the states of its environment, it would need to find exactly the FMMC (assuming a finite state environment). Questions: What if the transitions probabilities cannot be changed in an arbitrary manner. The case I have in mind is when you have an Markovian Decision Problem and you want to find the fastest mixing policy, where the goal is to converge say e.g. to a distribution which is close to uniform (the uniform distribution might not be a solution). Is there a well-defined solution to this problem? Do the ideas generalize to this problem? To what extent? Can such a solution be found while interacting with the environment?

Comments

Popular posts from this blog

Constrained MDPs and the reward hypothesis

Bayesian Statistics in Medical Clinical Trials

Keynote vs. Powerpoint vs. Beamer