A system for identifying arrhythmias based on cardiac waveforms includes a storage system storing a trained deep neural network system, wherein the trained deep neural system includes a trained representation neural network and a trained classifier neural network. A processing system is communicatively connected to the storage system and configured to receive cardiac waveform data for a patient, identify a time segment in the cardiac waveform data, and transform the time segment into a spectrum image. The processing system is further configured to generate, with the representation neural network, a latent representation from the spectrum image, and then to generate, with the classifier neural network, an arrhythmia classifier from the latent representation.