I just came across a paper by Philip M. Long, Rocco Servedio and Hans Ulrich Simon. (Here is a link to the paper titled "Discriminative Learning can Succeed where Generative Learning Fails".) The question investigated in the paper is the following:

We are in a classification setting and the learning problem is defined by a pair of jointly distributed random variables, (X,Y), where Y can take on the values +1 and -1. Question: How many iid copies of this pair does an algorithm need to (i) find a classifier that yields close to optimal performance with high probability (ii) find two score functions, one trained with the positive examples only, the other with the negative examples only such that the sign of the difference of the two score functions gives a classifier that is almost optimal with high probability?

The result in the paper is that there exists a class of distributions, parameterized by d (determining the dimension of samples) such that there is a discriminative algor…

We are in a classification setting and the learning problem is defined by a pair of jointly distributed random variables, (X,Y), where Y can take on the values +1 and -1. Question: How many iid copies of this pair does an algorithm need to (i) find a classifier that yields close to optimal performance with high probability (ii) find two score functions, one trained with the positive examples only, the other with the negative examples only such that the sign of the difference of the two score functions gives a classifier that is almost optimal with high probability?

The result in the paper is that there exists a class of distributions, parameterized by d (determining the dimension of samples) such that there is a discriminative algor…