A Cluster‐Based Context‐Tree Model for Multivariate Data Streams with Applications to Anomaly Detection

0.00 Avg rating—0 Votes

Article ID:	iaor20117842
Volume:	23
Issue:	3
Start Page Number:	364
End Page Number:	376
Publication Date:	Jun 2011
Journal:	INFORMS Journal on Computing
Authors:	Wan Guohua, Jiang Wei, Brice Pierre
Keywords:	datamining

Abstract:

Many applications, such as telecommunication and commercial video broadcasting systems, computer and networks, and Web mining, require modeling data streams that exhibit context dependency. Context dependency refers to the fact that the statistical distribution of a new sample is heavily conditioned by a set of the most recent samples that precedes it. However, statistical models such as context trees (CTs) that capture context dependency tend to be poorly scalable. This paper proposes a solution to the scalability problem of these models by transforming a data stream into high‐level aggregates of clusters instead of modeling the original data stream. Using an information‐theoretical approach, we leverage existing clustering techniques for static categorical data sets to capture dynamic data streams based on the CT models. Because the proposed approach can be applied repeatedly on different levels of a clustering hierarchy, it is suitable for predicting trends and detecting anomalies at any aggregate (or detail) level required. Experimental results, including video stream modeling, network intrusion detection, and Monte Carlo simulations, show that the proposed method is efficient in capturing high‐level aggregates of large‐scale dynamic systems and very effective for trend prediction and anomaly detection.

Reviews

Required fields are marked *. Your email address will not be published.