Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic ...

Approximating inverse covariance matrices

Phew, the last time I have posted an entry to my blog was a loong time ago.. Not that there was nothing interesting to blog about, just I always delayed things. (Btw, google changed the template which eliminated the rendering of the latex formulae, not happy.. Luckily, I could change back the template..) Now, as the actual contents:

I have just read the PAMI paper "Accuracy of Pseudo-Inverse Covariance Learning-A Random Matrix Theory Analysis" by D Hoyle (IEEE T. PAMI, 2011 vol. 33 (7) pp. 1470--1481).

The paper is about pseudo-inverse covariance matrices and their analysis based on random matrix theory analysis and I can say I enjoyed this paper quite a lot.

In short, the author's point is this:
Let d,n>0 be integers. Let ˆC be the sample covariance matrix of some iid data X1,,XnRd based on n datapoints and let C be the population covariance matrix (i.e., ˆC=E[X1X1]). Assume that d,n while α=n/d is kept fixed. Assume some form of "consistency" so that the eigenvalues of C tend to some distribution over [0,). Denote by Ln,d=d1E[C1ˆC2F] the reconstruction error of the inverse covariance matrix when one uses the pseudo-inverse.

Then, f(α):=limd,nLn,d will be well-defined (often). The main things is that f becomes unbounded as α1 (also α0, but this is not that interesting). In fact, for C=I, there is an exact formula for α1:

f(α)=Θ(α3/(1α)3).

Here is a figure that shows f (on the figure p denotes the dimension d).

Nice.

The author calls this the "peaking phenomenon": The worst case, from the point of view estimating C1, is when we have as many data points as is the number of dimensions (assuming full rank C, or this won't make sense because otherwise you could just add dummy dimensions to improve Ln,d). The explanation is that Ln,d is just sensitive to how small the smallest positive eigenvalue of ˆC is (this can be shown), and this smallest positive eigenvalue will become extremely small as α1 (and it does not matter whether α1 or α1+).

Now notice that having α=0.5 is much better than having α=1 (for large n,d). Thus, there are obvious ways of improving the sample covariance estimate!

In fact, the author then suggests that in practice, assuming nd, one should use bagging, while for nd one should use random projections. Then he shows experimentally that this improves things though unfortunately this is shown for Fisher discriminants that uses such inverses and the demonstration for Ln,d is lacking. He also notes that the "peaking phenomenon" can also be dealt with by other means, such as regularization where we are referred to Bickel and Levina (AoS, 2008).

Anyhow, one clear conclusion of this paper is that the pseudo-inverse must be abandoned when it comes to approximate the inverse covariance matrix. Why is this interesting? The issue of how well C1 can be estimated comes up naturally in regression, or even in reinforcement learning when using the projected Bellman error. And asymptotics is just a cool tool to study how things behave..

Comments

  1. The "peaking phenomenon" has long been known in the pattern recognition community (at least for Fisher discriminants). However, their conclusion was that n should always be >> d. They didn't discover regularization until later.

    ReplyDelete
  2. Interesting! Do you have a reference? What was really interesting for me is the exact asymptotic analysis that shows that the phenomenon is real.

    ReplyDelete

Post a Comment

Popular posts from this blog

Constrained MDPs and the reward hypothesis

Useful latex/svn tools (merge, clean, svn, diff)

Keynote vs. Powerpoint vs. Beamer