In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous‐time MDPs with the discounted criterion and finite‐state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.