Learn about a new method for finding approximate solutions to large Markov decision problems in this Distinguished Lecture Series event hosted by computer science and engineering Professor Stephanie Gil.
Distinguished Lecture Series: Feature-Based Aggregation and Deep Reinforcement Learning
Presented by Dimitri P. Bertsekas, Professor at the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology in Cambridge, MA.
Thursday, April 26, 2018
2:30 p.m
College Avenue Commons (CAVC) 101, Tempe campus [map]
Abstract
MIT Professor Dimitri P. Bertsekas will provide an overview of policy iteration/self-learning methods for approximate solution of large Markov decision problems. The lecture’s focus will be on schemes that combine ideas from two major but heretofore unrelated approaches: feature-based aggregation, which has a long history in large-scale dynamic programming, and reinforcement learning based on deep neural networks, which achieved spectacular success recently in the context of games such as chess and Go.
Bertsekas will introduce features of the states of the original problem, and formulate a smaller “aggregate” Markov decision problem, whose states are representative features. The solution of this problem is then used as the basis for a new type of policy improvement, with features provided by a neural network.
He argues that the cost function of a policy is approximated more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby leading to potentially more effective policy iteration algorithms.
About the speaker
Dimitri Bertsekas is McAfee Professor of electrical engineering and computer science at the Massachusetts Institute of Technology, where he has taught since 1979. His research has been in optimization, control, and their applications.
He has written several textbooks and monographs in these areas, including the widely used textbooks “Introduction to Probability,” “Nonlinear Programming,” “Convex Optimization Algorithms,” and “Dynamic Programming and Optimal Control.”
His awards include the INFORMS 1997 Prize for Research Excellence in the interface between operations research and computer science for his book “Neuro-Dynamic Programming” (co-authored with John Tsitsiklis), the 2014 ACC Richard E. Bellman Control Heritage Award for “contributions to the foundations of deterministic and stochastic optimization-based methods in systems and control,” the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, and the SIAM/MOS 2015 George B. Dantzig Prize.
In 2001, he was elected to the United States National Academy of Engineering for “pioneering contributions to fundamental research, practice and education of optimization/control theory, and especially its application to data communication networks.”
His most recent book, “Abstract Dynamic Programming” (Athena Scientific, 2018), explores theoretical issues in dynamic programming with implications for deep reinforcement learning.