Suppose Zt is the square of a time series Yt whose conditional mean is zero. We do not specify a model for Yt, but assume that there exists a p×1 parameter vector Φ such that the conditional distribution of Zt|Zt−1 is the same as that of Zt|ΦTZt−1, where Zt−1=(Zt−1,…,Zt−p)T for some lag p≥1. Consequently, the conditional variance of Yt is some function of ΦTZt−1. To estimate Φ, we propose a robust estimation methodology based on density power divergences (DPD) indexed by a tuning parameter α∈[0,1], which yields a continuum of estimators, {Φˆα;α∈[0,1]}, where α controls the trade‐off between robustness and efficiency of the DPD estimators. For each α, Φˆα is shown to be strongly consistent. We develop data‐dependent criteria for the selection of optimal α and lag p in practice. We illustrate the usefulness of our DPD methodology via simulation studies for ARCH‐type models, where the errors are drawn from a gross‐error contamination model and the conditional variance is a linear and/or nonlinear function of ΦTZt−1. Furthermore, we analyze the Chicago Board Options Exchange Dow Jones volatility index data and show that our DPD approach yields viable models for the conditional variance, which are as good as, or superior to, ARCH/GARCH models and two other divergence‐based models in terms of in‐sample and out‐of‐sample forecasts.