Posts

Showing posts from July, 2007

Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg, a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this, this, or this paper).
For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic prod…

Learning Symbolic Models

It is quite a common sense that successful generalization is the key to efficient learning in difficult environment. It appears to me that this must be especially true for reinforcement learning.
One potentially very powerful idea to achieve successful generalization is to learn symbolic models. Why? It is because a symbolic model (almost by definition) allows for very powerful generalizations (e.g. actions with parameters, state representation of environments with a variable number of objects with different object types, etc.).
JAIR just published the paper on this topic by H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling, with the title "Learning Symbolic Models of Stochastic Domains". A brief glance reveals that the authors propose a greedy learning method, assuming a particular representation. The learning problem itself was shown earlier to be NP-hard, hence this sounds like a valid approach.
However, one thing is badly missing from this approach: learning the represen…