1. The authors develop a stochastic dynamic programming model of grazing behaviour for a generalist mammalian herbivore. The model considers that behaviour depends upon three state variables: stored energy, digestible gut fill and indigestible gut fill. When the plant community comprises two alternative species, the animal must choose between five alternative behaviours: grazing species i, grazing species j, grazing whichever species it encounters, resting or ruminating. 2. The authors use the model to distinguish diet preference and diet selection. Diet preference is the diet selected by the animal when it is operating under a minimum of environmental constraints and diet selection refers to the way in which environmental constraints modify the animal’s diet preference. 3. Although the model can be used for any mammal grazing in any plant community, the authors demonstrate solutions derived from parameter values relevant to sheep grazing a grass-clover plant community. 4. The model demonstrates that diet preference may depend on the relative intake rates of the two alternative plant species. Furthermore, preference may depend on the absolute intake rates at which the relative comparison is made. The model demonstrates that the optimal diet should have a temporal pattern across the day and that it may be sensitive to predation hazard. The model also predicts total daily intake. 5. The authors use the model to demonstrate that the complex patterns of diet preference are further modified when considering total abundance of species in the community (e.g. cover). 6. They explain how the model is heuristic in pointing out reasons why the literature on diet selection in this system, and in herbivores more generally, is equivocal on what is the basis of selection and preference.