The authors consider a class of Markov decision processes with finite state and action spaces which, essentially, is determined by the following condition: The state space is irreducible under the action of any stationary policy. However, except by this restriction, the transition law is completely unknown to the controller. In this context, the authors find a set of policies under which the frequency estimators of the transition law are strongly consistent and then, this result is applied to construct adaptive asymptotically discount-optimal policies.