Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

Learning Symbolic Models

It is quite a common sense that successful generalization is the key to efficient learning in difficult environment. It appears to me that this must be especially true for reinforcement learning.
One potentially very powerful idea to achieve successful generalization is to learn symbolic models. Why? It is because a symbolic model (almost by definition) allows for very powerful generalizations (e.g. actions with parameters, state representation of environments with a variable number of objects with different object types, etc.).
JAIR just published the paper on this topic by H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling, with the title "Learning Symbolic Models of Stochastic Domains". A brief glance reveals that the authors propose a greedy learning method, assuming a particular representation. The learning problem itself was shown earlier to be NP-hard, hence this sounds like a valid approach.
However, one thing is badly missing from this approach: learning the representation itself. It appears that the authors' assumption is that the state representation of the environment is given in an appropriate symbolic form. This is in my opinion a very strong assumption -- at least when it comes to deal with certain real-world problems when only noisy sensory information of the environment is available (think of robotics).
However, I must realize that I am disappointed only because I wrongly assumed (given the title) that the paper will be about the problem of learning representations. Will we ever be able to learn (symbolic) representation in an efficient manner? What are the conditions that allow efficient learning? Do we need the flexibility of symbolic representations to scale up reinforcement learning at all? Here, at UofA several people are interested in these questions (well, who is not??). More work needs to be done, but hopefully, some day you will here more about our efforts..

Comments

  1. It is an interesting blog ML.
    I too like to hear more about you regarding the advancements in RL in near future.

    ReplyDelete
  2. I planned to add material about RL, just it never happened. Stay tuned:)

    ReplyDelete

Post a Comment

Popular posts from this blog

Constrained MDPs and the reward hypothesis

Bayesian Statistics in Medical Clinical Trials

Keynote vs. Powerpoint vs. Beamer