This paper considers manufacturing systems with a failure-prone machine and a stochastic demand. The production is controlled by a kanban scheme. Although the kanban-control enjoys many applications, in a stochastic setting, the problem of defining the optimal number of circulating kanbans remains unsolved except for some special cases. Seeks an optimal kanban-controlled policy which minimizes long run average inventory and backlog costs. Uses perturbation analysis to estimate the gradients of the cost functional with respect to the number of circulating kanbans, and then employs an iterative algorithm, which is a constant step-size stochastic approximation procedure, to find the optimal number of circulating kanbans. Proves that the perturbed path is a shifted version of the nominal path, and the gradient estimate is consistent. Also conducts numerical experiments to investigate the performance of the proposed algorithms.