Article ID: | iaor20114285 |
Volume: | 25 |
Issue: | 2 |
Start Page Number: | 380 |
End Page Number: | 389 |
Publication Date: | Apr 2011 |
Journal: | Advanced Engineering Informatics |
Authors: | Soibelman Lucio, Garrett James H, de Oliveira Daniel P |
Keywords: | networks |
The physical condition of American infrastructure systems has raised concerns that have been addressed, in part, by studies addressing their condition assessment. Condition assessment aims at describing current condition and estimating remaining service lives of infrastructure network components. This is a predominantly time‐based analysis, which can be complemented by the spatial analysis of physical condition data of infrastructure components. More specifically, exploratory spatial data analysis might identify areas with high failure rates and generate local indicators of condition for subsets of pipe segments within the physical network. Such local indicators can be used in cost/benefit analysis for planning capital investments with the advantage of allowing the identification of critical customers within critical regions and therefore better accounting for social costs. This paper aims to provide an approach to spatial data clustering of networked infrastructure failure data, which is presented and demonstrated by applying it to a drinking water pipe breakage dataset. Clusters are, in this paper, the set of break points that occurred in regions that present high breakage density per mile of pipe compared to the pipes in their vicinity. The proposed approach can be framed as density‐based data clustering approach, and its output consists of a hierarchical clustering of breaks. The root node of the cluster hierarchy, which contains the set of all break points, is subsequently partitioned into smaller clusters. In this hierarchy, clusters are subdivided to reflect different breakage densities along the network space. Therefore, clusters in the lower level of the hierarchy present more homogeneous breakage rates. The results of the proposed approach are assessed according to their sensitivity to choice of parameters and according to a clustering quality measure. The chosen parameters are shown to provide results that are superior compared to a range of parameter choices, in terms of clustering quality, and also compared to the less important though relevant criteria of the number and size of clusters generated.