学术报告

当前位置: 首页 学术报告 正文
【40周年校庆学术活动】荔园杰出学者讲座第十五期:q-Learning in Continuous Time

时间:2023-06-15 20:39

主讲人 讲座时间
讲座地点 实际会议时间日
实际会议时间年月

深圳大学四十周年校庆暨数学学科四十周年庆

荔园杰出学者讲座十五


讲座题目:q-Learning in Continuous Time

主讲人:周迅宇 教授(哥伦比亚大学)

讲座时间:2023年6月17日14:30-15:30

讲座地点:深圳大学粤海校区国际会议厅大厅

内容概述:We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term “(little) q-function”. This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a “q-learning” theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor–critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022) and time-discretized conventional Q-learning algorithms. This is a joint work with Yanwei Jia.

主讲人简介:Xun Yu Zhou is the Liu Family Professor of Financial Engineering and the Director of the Nie Center for Intelligent Asset Management at Columbia University. He was the Nomura Professor of Mathematical Finance at University of Oxford before joining Columbia in 2016. His research covers stochastic control, dynamic portfolio selection, asset pricing, behavioral finance, and time inconsistency. Currently his research focuses on continuous-time reinforcement learning and applications to optimization broadly and to wealth management specifically. He is a recipient of the Wolfson Research Award from The Royal Society, the Outstanding Paper Prize from SIAM, the Alexander von Humboldt Research Fellowship, and the Croucher Senior Research Fellowship. He was an invited speaker at the 2010 International Congress of Mathematicians, a Humboldt Distinguished Lecturer at Humboldt University and an Archimedes Lecturer at Columbia. He is both an IEEE Fellow and a SIAM Fellow. Xun Yu Zhou received his PhD in Operations Research and Control Theory from Fudan University in 1989.

欢迎师生参加!

                                             菲律宾环球360注册账号

2023年06月15日