Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

Pitfalls of optimality in statistics

I was reading a little bit about robust statistics, as we are in the process of putting together a paper about entropy estimation where robustness comes up as an issue. While searching on the net for the best material to understand this topic (I am thinking about posting another article about what I have found), I have bumped into a nice paper (downloadable from here) by Peter J. Huber, one of the main figures in robust statistics, where he talks about a bunch of pitfalls around pursuing optimality in statistics. Huber writes eloquently -- he gives plenty of examples, motivates definitions. He is just great. I can only recommend this paper or his book. Now, what are the pitfalls he writes about? He distinguishes 4 types with the following syndromes:
  • The fuzzy concepts syndrome: sloppy translation of concepts into mathematics. Think about uniform vs. non-uniform convergence (sloppy asymptotics). In statistics a concrete example is the concept of efficiency which is defined in a non-uniform manner with respect to the estimable parameters, which allows for (weird) "super-efficient" estimators that pay special attention to some distinguished element of the parameter-space.
  • The straitjacket syndrome: the use of overly restrictive side conditions, such as requiring that an estimator is unbiased or equivariant (equivariant estimates in high dimensions are inadmissible in very simple situations). In Bayesian statistics another example might be the convenient but potentially inappropriate conjugate priors.
  • The scapegoat syndrome: confusing the model with reality (offering the model for the gods of statistics instead of the real thing, hoping that they will accept it). The classic example is the Eddington-Fisher argument. Eddington advocated the mean-absolute-deviation (MAD) instead of the root-mean-square (RMS) deviation as a measure of scale. Fisher argued that MAD estimates are highly inefficient (converge slowly) relative to the RMS deviation estimates if the sample comes from a normal distribution. Tukey has shown that the situation gets reversed even under small deviations from a normal model. The argument that under narrow conditions one estimator is better than some other should not be even made. Another example is perhaps classical optimal design and the fundamentalist approach in Bayesian statistics. Of course, there is nothing wrong with assumptions, but the results should be robust.
  • The souped-up car syndrome: by optimizing for speed we can end up with an elaborate gas-guzzler. Optimizing for one quantity (efficiency) may degrade another one (robustness). Practical solutions must find a balance between such contradicting requirements.
These syndromes are also everywhere in machine learning research. Wear protective gear as needed!

Comments

Popular posts from this blog

Constrained MDPs and the reward hypothesis

Bayesian Statistics in Medical Clinical Trials

Keynote vs. Powerpoint vs. Beamer