Posts

Showing posts from August, 2007

Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg, a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this, this, or this paper).
For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic prod…

Discriminative vs. generative learning: which one is more efficient?

I just came across a paper by Philip M. Long, Rocco Servedio and Hans Ulrich Simon. (Here is a link to the paper titled "Discriminative Learning can Succeed where Generative Learning Fails".) The question investigated in the paper is the following:
We are in a classification setting and the learning problem is defined by a pair of jointly distributed random variables, (X,Y), where Y can take on the values +1 and -1. Question: How many iid copies of this pair does an algorithm need to (i) find a classifier that yields close to optimal performance with high probability (ii) find two score functions, one trained with the positive examples only, the other with the negative examples only such that the sign of the difference of the two score functions gives a classifier that is almost optimal with high probability?
The result in the paper is that there exists a class of distributions, parameterized by d (determining the dimension of samples) such that there is a discriminative algor…