Skip to Main Content

CSC6129/DDA6105 Reinforcement Learning: Home

Course Description

This course is a basic introduction to reinforcement learning algorithms and their applications. Topics include: multi-armed bandits; finite Markov decision processes; dynamic programming; Monte-Carlo methods; temporal-difference learning; actor-critic methods; off-policy learning; introduction to deep variants of the aforementioned algorithms, including deep Q-learning, policy gradient methods, and actor-critic methods.

Recommended Books

Reinforcement Learning, Second Edition

Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics.

Bandit Algorithms

Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics.

Dynamic Programming and Optimal Control

The first of the two volumes of the leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an introduction to the far-reaching methodology of Neuro-Dynamic Programming. The first volume is oriented towards modeling, conceptualization, and finite-horizon problems, but also includes a substantive introduction to infinite horizon problems that is suitable for classroom use. The second volume is oriented towards mathematical analysis and computation, and treats infinite horizon problems extensively. The text contains many illustrations, worked-out examples, and exercises.

Neuro-Dynamic Programming

This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

Neuro-dynamic programming uses neural network approximations to overcome the "curse of dimensionality" and the "curse of modeling" that have been the bottlenecks to the practical application of dynamic programming and stochastic control to complex problems. The methodology allows systems to learn about their behavior through simulation, and to improve their performance through iterative reinforcement.

Recommended Databases

Librarian