We consider the linear classification method consisting of separating two sets of points in d-space by a hyperplane. We wish to determine the hyperplane which minimises the sum of distances from all misclassified points to the hyperplane. To this end two local descent methods are developed, one grid-based and one optimisation-theory based, and are embedded into a VNS metaheuristic scheme. Computational results show these approaches to be complementary, leading to a single hybrid VNS strategy which combines both approaches to exploit the strong points of each. Extensive computational tests show that the resulting method can always be expected to approach the global optimum close enough that any deviations from the global optimum are irrelevant with respect to the classification power.