Constrained MDPs and the reward hypothesis
It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic ...
The only problem is that MathML doesn't work with Safari!
ReplyDeleteTrue. Firefox might be an alternative for Mac people.
ReplyDeleteAnd Opera.
ReplyDeleteHow would you do LaTeX equations in high resolution so that they could be printed on t-shirts though?
I would use postscript or pdf assuming you have latex installed on your system.. But MathML fonts are also scalable; use HTML tags for this.
ReplyDeleteUpdate to myself: I have switched to MathJax and now I am using the script to be found here:
ReplyDeletehttp://irrep.blogspot.ca/2011/07/mathjax-in-blogger-ii.html
The quality is better, though the pages will load slower. But the days of ever increasing computer speed, this should not be a problem.
Oh, and I hope LaTeX works in the comments, now, too. Let's check: $Q^*(x,a) = \int dP(y|x,a) \left\{ r(x,a,y) + \gamma \max_{a'\in A} Q^*(y,a') \right\}$.
Oh, and Safari now seems to support MathML..
ReplyDelete