Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

Numerical Errors, Perturbation Analysis and Machine Learning

Everyone hates numerical errors. We love to think that computers are machines with infinite precision. When I was a student, I really hated error analysis. It sounded like a subject that is set out to study an annoying side-effect of our imperfect computers, a boring detail that is miles away from anything that anyone would ever consider a nice part of mathematics. I will not try to convince you today that the opposite is true. However, even in error analysis there are some nice ideas and lessons to be learned. This post asks the question whether, if you are doing machine learning, you should care about numerical errors. This issue should be well understood. However, I don't think that it is as well appreciated as it should be, or that it received the attention it should. In fact, I doubt that the issue is discussed in any of the recent machine learning textbooks beyond the usual caveat "beware the numerical errors" (scary!). In this blog, I will illustrate the questi

Student Response Systems -- A Year Later

I have promised to come back with my experience after the Fall semester of 2013. Though the end of that semester passed a long ago, here are some thoughts: Overall my experience of socrative was very positive. During the first week I polled the class to see how large a percentage of the class has some kind of wifi enabled device that they could use. 90 out of the 95 students had some kind of device that they were willing to bring to the class, so I decided to give socrative a try. Socrative helped me tremendously to stay on the top of what everyone in the class knows. The way I used socrative was as follows: After every block of new material (usually 5-10 slides), I inserted a few questions to verify whether the students "got" the new material. This was all the "extra work" that I had to put in designing the class material due to socrative. And I would have done something similar without socrative anyways, so in fact I did not feel much of a difference here. Once t

Student Response Systems

There are plenty Student Response Systems (SRSs) out there. Which one to choose? This brief document summarizes what I have found on the web before the Fall of 2012. Radio communication-based systems The systems differ in a few characteristics. The first is what type of devices the students can use. Classical, iClicker like systems require the student buy a device which communicates with a receiver that the teacher should possess. Since our school uses iClickers, let me focus on them. The overall cost of first generation iClickers is $10/device, assuming the student sells back the device to the bookstore (which they can). This is not a major cost but who likes to pay when you don’t have to? The first limitation of iClicker-like systems is that they are bound to the smart classrooms and the computers there. Thus, if you are like me and use your own computer for projection, you will need to switch between screens to show the results of a poll. This makes the use of iCli