A method for training a system for reconstructing a magnetic resonance image includes: under-sampling image data from each of a plurality of fully-sampled images; and inputting the under-sampled image data to a multi-scale neural network comprising sequentially connected layers. Each layer has an input for receiving input image data and an output for outputting reconstructed image data. Each layer performs a process comprising: decomposing the array of input image data; applying a thresholding function to the decomposed image data, to form a shrunk data, the thresholding function outputting a value asymptotically approaching one when the thresholding function receives an input having a magnitude greater than a first value, reconstructing the shrunk data for combining with a reconstructed image data output by another one of the layers to form updated reconstructed image data, and machine-learning at least one parameter of the decomposing, the thresholding function, or the reconstructing.