Sunday, 25 October 2009

Pitfalls of optimality in statistics

I was reading a little bit about robust statistics, as we are in the process of putting together a paper about entropy estimation where robustness comes up as an issue. While searching on the net for the best material to understand this topic (I am thinking about posting another article about what I have found), I have bumped into a nice paper (downloadable from here) by Peter J. Huber, one of the main figures in robust statistics, where he talks about a bunch of pitfalls around pursuing optimality in statistics. Huber writes eloquently -- he gives plenty of examples, motivates definitions. He is just great. I can only recommend this paper or his book. Now, what are the pitfalls he writes about? He distinguishes 4 types with the following syndromes:
  • The fuzzy concepts syndrome: sloppy translation of concepts into mathematics. Think about uniform vs. non-uniform convergence (sloppy asymptotics). In statistics a concrete example is the concept of efficiency which is defined in a non-uniform manner with respect to the estimable parameters, which allows for (weird) "super-efficient" estimators that pay special attention to some distinguished element of the parameter-space.
  • The straitjacket syndrome: the use of overly restrictive side conditions, such as requiring that an estimator is unbiased or equivariant (equivariant estimates in high dimensions are inadmissible in very simple situations). In Bayesian statistics another example might be the convenient but potentially inappropriate conjugate priors.
  • The scapegoat syndrome: confusing the model with reality (offering the model for the gods of statistics instead of the real thing, hoping that they will accept it). The classic example is the Eddington-Fisher argument. Eddington advocated the mean-absolute-deviation (MAD) instead of the root-mean-square (RMS) deviation as a measure of scale. Fisher argued that MAD estimates are highly inefficient (converge slowly) relative to the RMS deviation estimates if the sample comes from a normal distribution. Tukey has shown that the situation gets reversed even under small deviations from a normal model. The argument that under narrow conditions one estimator is better than some other should not be even made. Another example is perhaps classical optimal design and the fundamentalist approach in Bayesian statistics. Of course, there is nothing wrong with assumptions, but the results should be robust.
  • The souped-up car syndrome: by optimizing for speed we can end up with an elaborate gas-guzzler. Optimizing for one quantity (efficiency) may degrade another one (robustness). Practical solutions must find a balance between such contradicting requirements.
These syndromes are not to hard identify in machine learning research. Wear protective gear as needed!

No comments: