Search This Blog

Musings about machine learning and other things

Home

More…

Posts

Showing posts from April, 2008

Show All

Constrained MDPs and the reward hypothesis

Get link
Facebook
X
Pinterest
Email
Other Apps

- March 20, 2020

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic ...

6 comments

Ninja Carburglars

Get link
Facebook
X
Pinterest
Email
Other Apps

- April 05, 2008

see more crazy cat pics

Labels

aggregation
AI
annoyance
approximation theory
artificial intelligence
bayesian analysis
bayesian models
Beamer style
blogging
clinical trials

compression
conditioning number
constrained MDPs
curse of dimensionality
djvu
exploration
frequentist approach
Galerkin's method
Hilbert matrix
image processing
jbig2
Keynote
latex
learning theory
least-squares
mac osx
machine learning
Markovian Decision Processes
mathematics
matlab
MDPs
mixing
model selection
models
non-parametric statistics
Occam's razor
optimization tools
optogenetics
pdf
perturbation analysis
Powerpoint
presentation
reinforcement learning
representation learning
reward hypothesis
sample complexity
science
stability
statistics
supervised learning
svn
technical
technology
thunderbird
tools
X11

Show more Show less