SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA

SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA This paper proposes SSketch, a novel automated computing framework for FPGA-based online analysis of big data with dense (non-sparse) correlation matrices. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. The stream of input data is used by SSketch for adaptive learning and updating a corresponding ensemble of lower dimensional data structures, a.k.a., A sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. To facilitate automation, SSketch takes advantage of a HW/SW co-design approach: It provides an Application Programming Interface (API) that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrates up to 200 folds speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general purpose processor.