This paper presents a new mathematical programming-based learning methodology for separation of two types of data. Specifically, we develop a new l1-norm error distance metric and use it to develop a Mixed 0–1 Integer and Linear Programming (MILP) model that optimizes the interplay of user-provided discriminant functions, including kernel functions for support vector machines, to implement a nonlinear, nonconvex and/or disjoint decision boundary for the best separation of data at hand. With the concurrent optimization of discriminant functions, the MILP-based learning can be used for finding the optimal and least complex classification rule for noise-free data and for implementing a most robust classification rule for real-life data with noise. With extensive experiments on separation of two dimensional artificial data sets that are clean and noisy, we graphically illustrate the aforementioned advantages of the new MILP-based learning methodology. With experiments on real-life benchmark datasets from the UC Irvine Repository of machine learning databases, in comparison with the multisurface method and the support vector machines, we demonstrate the advantage of using and concurrently optimizing more than a single discriminant function for a robust separation of real-life data, hence the utility of the proposed methodology in supervised learning.