The Dynamics of Reinforcement Learning
[CFAR Outstanding PhD Student Seminar Series]
The Dynamics of Reinforcement Learning by Ms Clare Lyle
06 Apr 2022 | 4:30pm (Singapore Time)
Supervised learning requires the network to fit a fixed target function, hence value-based deep reinforcement learning (RL) agents must learn to predict a sequence of distinct value functions whose structure may change dramatically over the course of training.
The study revolves around an analysis of an idealised model of the dynamics induced by temporal-difference methods, from which we will corroborate two principal conclusions on deep RL agents. When rewards are sparse, we find that these learning dynamics can lead to representation collapse, resulting in agents which fail to perform policy improvement steps later in training.
When rewards are discontinuous, we find that the structure of early prediction targets discourages generalisation of value function updates between states, resulting in networks which ‘memorise’ the value function on the training environment but struggle to generalise to new observations. We show that straightforward regularisation and distillation methods can mitigate both of these failure modes, demonstrating the pitfalls induced by the learning dynamics of deep RL agents can be avoided with appropriate modifications to the training procedure.
In this talk, Ms Clare Lyle will present a line of work which studies the implications of this observation on both the stability and the generalisation properties of value-based deep RL algorithms.
SPEAKER
The Dynamics of Reinforcement Learning by Ms Clare Lyle
06 Apr 2022 | 4:30pm (Singapore Time)
Supervised learning requires the network to fit a fixed target function, hence value-based deep reinforcement learning (RL) agents must learn to predict a sequence of distinct value functions whose structure may change dramatically over the course of training.
The study revolves around an analysis of an idealised model of the dynamics induced by temporal-difference methods, from which we will corroborate two principal conclusions on deep RL agents. When rewards are sparse, we find that these learning dynamics can lead to representation collapse, resulting in agents which fail to perform policy improvement steps later in training.
When rewards are discontinuous, we find that the structure of early prediction targets discourages generalisation of value function updates between states, resulting in networks which ‘memorise’ the value function on the training environment but struggle to generalise to new observations. We show that straightforward regularisation and distillation methods can mitigate both of these failure modes, demonstrating the pitfalls induced by the learning dynamics of deep RL agents can be avoided with appropriate modifications to the training procedure.
In this talk, Ms Clare Lyle will present a line of work which studies the implications of this observation on both the stability and the generalisation properties of value-based deep RL algorithms.
SPEAKER
Ms Clare Lyle
PhD student, University of Oxford
Open Philanthropy AI Fellow and a Rhodes Scholar
PhD student, University of Oxford
Open Philanthropy AI Fellow and a Rhodes Scholar
Clare Lyle is a final-year DPhil student at the University of Oxford, advised by Marta Kwiatkowska and Yarin Gal. Her research interests focus on developing principled approaches to training models which generalise well to unseen data, with applications in both supervised and reinforcement learning. Prior to starting at Oxford, she obtained a BSc in Mathematics and Computer Science at McGill University. She is an Open Philanthropy AI Fellow and a Rhodes Scholar.
A*STAR celebrates International Women's Day
From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM