Testing and Spot‐Checking of Data Streams

0.00 Avg rating—0 Votes

Article ID:	iaor20121118
Volume:	34
Issue:	1
Start Page Number:	67
End Page Number:	80
Publication Date:	Sep 2002
Journal:	Algorithmica
Authors:	Feigenbaum , Kannan , Strauss , Viswanathan
Keywords:	computers: data-structure

Abstract:

We consider the tasks of testing and spot‐checking for data streams . These testers and spot‐checkers are potentially useful in real‐time or near real‐time applications that process huge data sets. Crucial aspects of the computational model include the space complexity of the testers and spot‐checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot‐checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass).A sampling‐tester [GGR] for a property P samples some (but usually not all) of its input and, with high probability, outputs PASS if the input has property P and FAIL if the input is far {from} having P , for an appropriate sense of “far.” A streaming‐tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far {from} having P . A sampling‐tester can make its samples in any order; a streaming‐tester sees the input from left to right.We consider the groupedness property (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [EKK+] in the context of sampling spot‐checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED‐SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming‐tester but not by a sampling‐tester and other (promise) problems for which the reverse is true.

Reviews

Required fields are marked *. Your email address will not be published.