On Boundedness of Q‐Learning Iterates for Stochastic Shortest Path Problems

0.00 Avg rating—0 Votes

Article ID:	iaor20132376
Volume:	38
Issue:	2
Start Page Number:	209
End Page Number:	227
Publication Date:	May 2013
Journal:	Mathematics of Operations Research
Authors:	Yu Huizhen, Bertsekas Dimitri P
Keywords:	networks: path, programming: critical path

Abstract:

We consider a totally asynchronous stochastic approximation algorithm, Q‐learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost‐free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q‐learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis (1994) and establishing completely the convergence of Q‐learning for these SSP models.

Reviews

Required fields are marked *. Your email address will not be published.