Systems and methods for sparse matrix vector multiplication (SpMV) are disclosed. The systems and methods include a novel streaming reduction architecture for floating point accumulation and a novel on-chip cache design optimized for streaming compressed sparse row (CSR) matrices. The present disclosure is also directed to implementation of the reduction circuit and/or processing elements for SpMV processing into a personality for the Convey HC-1 computing device.