I found the paper posted by Marcus Hutter on arxiv quite interesting. The paper is about model (or rather predictor) selection. The idea is a familiar one, but the details appear to be novel: You want to find a model which yields small loss on the dataset available, while yielding a larger loss on most other datasets.

Classification: The simplest case is when we consider supervised learning and the target set is finite. Then you can count the number of target label variations such that the predictor's loss is smaller than its loss when the true targets are used. This idea sounds very similar to the way Rademacher complexity works, see e.g. the paper of Lugosi and Wegkamp, where a localized version of Rademacher complexity is investigated.

Regression: For continuous targets you can use a grid with an increasing resolution (assume that the range of targets is bounded) and count the number of gridpoints such that the predictor's loss is less than its loss on the true dataset.

With an appropriate normalization this converges to the volume of such target values (hopefully this set is measurable:)).

The paper does not go very far: Some examples are given that demonstrate that the criterion gives a computable procedure and that this procedure is reasonable. A quick comparison to alternatives is given. It will be interesting to see the further developments!

## No comments:

Post a Comment