Identifying High-Cardinality Hosts from Network-wide Traffic Measurements

Identifying High-Cardinality Hosts from Network-wide Traffic Measurements Host cardinality is defined as the number of distinct peers that a host communicates with in the network. There have been several algorithms proposed to monitor network traffic and identify high-cardinality hosts at a centralized network operation center (NOC). Due to massive amounts ofdistributed data and limitations on transforming and processing them at the NOC, it is desirable to design mergable and reversible data structures summarizing traffic measurements in a distributednetwork monitoring system. A mergable data structure summarizes traffic measurements at each local monitor, and these summaries from different monitors can be merged at the NOC, while preserving the error guarantee without increasing space. A reversible data structure can report interested (high-cardinality) hosts efficiently using compressed information without querying every single host in the network. In this paper, we propose a new data streaming algorithm to identify high-cardinality hosts over the networkwide traffic measurements. Our algorithm introduces a new mergable and reversible data structure for the distributed network monitoring system, which is designed by Noisy Group Testing. We have theoretically analyzed our algorithm and evaluated it against real-world data sets.