printable pdf
比利时vs摩洛哥足彩 ,
university of california san diego

****************************

math 296 - graduate student colloquium

prof. yuhau zhu

uc san diego

a pde-based bellman equation for continuous-time reinforcement learning

abstract:

in this talk, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. when the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? we first demonstrate that the commonly used bellman equation is a first-order approximation to the true value function. we then introduce higher order pde-based bellman equation called phibe. we show that the solution to the i-th order phibe is an i-th order approximation to the true value function. additionally, even the first-order phibe outperforms the bellman equation in approximating the true value function when the system dynamics change slowly. we develop a numerical algorithm based on galerkin method to solve phibe when we possess only discrete-time trajectory data. numerical experiments are provided to validate the theoretical guarantees we propose. 

host: jon novak

february 14, 2024

3:00 pm

remote access via zoom
https://ucsd.zoom.us/j/6203698666

****************************