Article ID: | iaor20051425 |
Country: | United Kingdom |
Volume: | 55 |
Issue: | 4 |
Start Page Number: | 361 |
End Page Number: | 374 |
Publication Date: | Apr 2004 |
Journal: | Journal of the Operational Research Society |
Authors: | Perry W.L., Sullivan T.J. |
Keywords: | statistics: general |
Data collected on known terrorist organizations allow intelligence agencies to build a statistical database of features for each group and an observed level of development of chemical, biological, radiological, or nuclear (CBRN) weapons. For the intelligence analyst, a statistical exploration of the structure of the multivariate data is helpful for determining which subset of features – and the relative contribution of each feature in the subset – best discriminate between levels of CBRN weapons development. The resulting function that is used to discriminate between CBRN development levels is called the ‘classifier’. Once the appropriate subset of indicators has been identified and a classifier developed, intelligence agencies will be better able to focus their information gathering and to assess the effect that changes in a terrorist group's features will have on their CBRN weapons development. Additionally, the classifier will enable the intelligence agency to predict the CBRN weapons development level of terrorist group where the feature set of the group is known but the level is unknown. In this analysis, we compare three approaches for building a classifier that best predicts CBRN weapons development levels using a training set with 45 observations; (1) heuristic pattern recognition approach that couples a weighted Minkowski distance metric with a nonparametric kernel-based classification method, (2) classification trees, and (3) discriminant analysis. Where possible, cross-validation is conducted on the data to ensure that the resulting classifier is not overly dependent on the training set. This initial analysis provides some interesting results and suggests a reasonable starting point for finding structure in the data as more observations are added.