Sunday, 20 May 2007

A notion of function compression

The following compressibility concept is introduced by Harnik and Naor in their recent paper: Given a function $f$ over some domain, a compression algorithm for $f$ should efficiently compress an input $x$ in a way that will preserve the information needed to compute $f(x)$. Note that if $f$ is available (and efficiently computable) then compression is trivial as then $y=f(x)$ will serve the purpose of the compact representation of $x$. Actually, the concept was originally studied in the framework of NP decision problems where unless $P=NP$ $f$ is not efficiently computable, hence the trivial solution is not available.

I am wondering if this compressibility notion could be used in learning theory or function approximation? Consider e.g. classification problems so that the output of $f$ is $\{0,1\}$. In order to prevent the trivial solution we may require that $x$ be compressed to some $y$ such that for some fixed (efficiently computable) function $g$, $f(g(y))=f(x)$. We do not require that $g(y)=x$, though if $g$ satisfies this then we certainly get a solution: if a compact representation of $\{ x| f(x)=1\}$ is available then $f$ can be well-compressed. We might allow of course for some error and study approximation rates. Of course, the notation applies to real-valued or more general functions, too. What is the class of functions that can be efficiently compressed? How is this class related to classical smoothness spaces (in the case of real-valued functions)? One obvious thought is that the smoother the target function is, the better the obvious discretization approach would be. It would be interesting to find out whether this new approch could lead to new regularity descriptions by varying the restrictions on the compression.

Another related thought is to compress the training samples in a learning problem such that the solution is kept (roughly) the same.

1 comment:

Christos said...

I think this question has been answered in some ways in algorithmic complexity/probability and at a more concrete instantiation, the minimum description length literature.

There is an additional element here, since in MDL you typically just want to compress $x \in X$ (or a sequence of $x_i$). However, there might be additional gains in our case, since we only care about preserving those aspects of $x$ required to approximate $f$.

Then I suppose that the question becomes in what sense you want to approximate $f$ - if you want something that works uniformly over $X$ then why should what the original input matter at all?