This paper highlights the need to reduce the dimension of the feature space in classification problems of high dimensions without sacrificing the classification power considerably. We propose a methodology for classification tasks which comprises three phases: (i) feature selection, (ii) automatic generation of fuzzy if–then rules and (iii) reduction of the rule base while retaining its high classification power. The first phase is executed by using FeatureSelector, a software developed solely for feature extraction in pattern recognition and classification problems. This is the first time that the FeatureSelector is used as a preprocessor for rule based classification systems. In the second phase, a standard fuzzy rule based classification system is modified and invoked with the most important features extracted by the FeatureSelector as the new set of features. In the third phase, a modified threshold accepting algorithm (MTA), proposed elsewhere by the authors is used for minimizing the number of rules in the classification system while guaranteeing high classification power. The number of rules used and the classification power are taken as the objectives for this multi objective combinatorial global optimization problem. The methodology proposed here has been successfully demonstrated for two well-known problems: (i) the wine classification problem, which includes 13 feature variables in its original form and (ii) the Wisconsin breast cancer determination problem, which has 9 feature variables. In conclusion, the results are encouraging as there is no remarkable reduction in the classification power in both the problems, despite the fact that some features have been deleted from the study by resorting to feature selection. Also, the MTA outperformed the original threshold accepting algorithm for the test problems considered here. The authors suggest that classification problems having higher feature dimensions can be solved successfully within the framework of the methodology presented here. The high classification powers obtained for both the problems when working with less feature variables than the original number is the significant achievement of this study.