Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

Bayesian Statistics in Medical Clinical Trials

I came across a very interesting document.
The document is titled "Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials". It is a draft guidelines poster by the Center for Devices and Radiological Health of FDA, dated May 23, 2006.
Why is this interesting? The job of FDA (the US Food and Drug Administration) is to make sure that the decisions in any clinical trial are made in a scientifically sound manner. Clearly, when following the Bayesian approach the choice of the prior and the model can influence the decisions. What does FDA do in this situation?
The establish a process where they require a pre-specification (and agreement on) both the prior and the model, including an analysis of the operating characteristics of the design. This latter includes estimating the probability of erroneously approving an ineffective or unsafe device (the Type I error). This will typically be done by conducting Monte-Carlo simulations, where the Type I error is measured for the borderline cases when the device should not be approved. In the case of a large estimated Type I error, the trial will be rejected.
Is this a good procedure? If the simulations use a biased model then the estimated Type I error might be biased. Their response is that both the prior and the model should be backed up with scientific arguments and existing statistics. Yet another problem is that the calculations often use MCMC. How do you determine if your samples converged to the posterior? The samples of the posterior are not iid. How do you know that you took enough samples of the posterior? (Think of a mixture of Gaussian, with a narrow Gaussian proposal. If you sample from the mixture and then sample just a few points with Metropolis-Hastings, you will likely miss the second mode if the two modes are sufficiently far away.)
On the other hand, there are a number of potential advantages to a Bayesian design. If we accept that the model and the prior is good, then often the Bayesian analysis will require smaller sample sizes to reach a decision (if they are not, the conclusion might be wrong). It can also provide flexible methods for handling interim analyses (stopping when enough evidence is available for either approval or rejection) and sometimes good priors are available such as earlier studies on previous generations of a device or from
overseas studies. Such approaches can be used with a fequentist approach, too, but the frequentist analysis of deriving a procedure is often non-trivial, while the Bayesian "only" needs to be concerned about computational issues.
The document cites two trials that used Bayesian analysis. It appears that in both studies Bayesian analysis was used only as a supplementary information, i.e., the critical decisions (if a device is safe and minimally effective) were made using traditional,
frequentist methods.
Common to both the frequentist and the Bayesian approaches is the use of a number of unverified assumptions. In the frequentist case, if the design is simple then the typical assumption is only that there is a common underlying distribution to the outcome-patient pairs and that patients are selected uniformly at random from the population. This looks fairly minimal, but can be questioned nevertheless (drifts, environmental effects, sample biases, etc.). In a more complicated scenario there will be more assumptions. If the set of assumptions for the methods satisfy some containment relation then one naturally trusts the method that relies on less information. In the lack of containment the decision of which method to prefer is not so simple. In any case, it is very interesting to see how a regularity body (like FDA) wrestles with these fundamental issues. They look to act in a pretty reasonable manner. The existence of this document predicts that we should expect to see more decisions that used Bayesian analysis in the future. Is this good or bad? One could be concerned by the use of more unverified assumptions in the Bayesian analysis and that the probability of making an error can also increase because the calculations are non-trivial. Life is dangerous, is not it? But how dangerous will it be if Bayesian analysis is used routinely in assessing success in clinical trials? Time will tell for sure. Well, assuming some form of stationarity.