Techniques for the operation and use of a model that learns the general representation of multimodal images is disclosed. In various examples, methods from representation learning are used to find a common basis for representation of medical images. These include aspects of encoding, fusion, and downstream tasks, with use of the general representation and model. In an example, a method for generating a modality-agnostic model includes receiving imaging data, encoding the imaging data by mapping data to a latent representation, fusing the encoded data to conserve latent variables corresponding to the latent representation, and training a model using the latent representation. In an example, a method for processing imaging data using a trained modality-agnostic model includes receiving imaging data, encoding the data to the defined encoding, processing the encoded data with a trained model, and performing imaging processing operations based on output of the trained model.