In this work, a review and comprehensive evaluation of heuristics and metaheuristics for the m-machine flow shop scheduling problem with the objective of minimising total tardiness is presented. Published reviews about this objective usually deal with a single machine or parallel machines and no recent methods are compared. Moreover, the existing reviews do not use the same benchmark of instances and the results are difficult to reproduce and generalise. We have implemented a total of 40 different heuristics and metaheuristics and we have analysed their performance under the same benchmark of instances in order to make a global and fair comparison. In this comparison, we study from the classical priority rules to the most recent tabu search, simulated annealing and genetic algorithms. In the evaluations we use the experimental design approach and careful statistical analyses to validate the effectiveness of the different methods tested. The results allow us to clearly identify the state-of-the-art methods.