The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints. Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity. One feature selection algorithm via concave minimization reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4.