In an ultrasound diagnosis apparatus according to an embodiment, processing circuitry obtains volume video data of a patient acquired by a transesophageal echocardiography probe. The processing circuitry sets, with the volume video data, a three-dimensional coordinate system that matches a display orientation of image data of the patient acquired by a body-surface ultrasound probe, on the basis of a positional relationship between the transesophageal echocardiography probe and the patient. The processing circuitry causes a display screen to display image data generated from the volume video data by using the set three-dimensional coordinate system. The processing circuitry receives, from an operator, a designation related to calculating movement information in a region of interest of the patient, the designation being received in an image displayed on the display screen. The processing circuitry calculates the movement information by performing processing including a tracking process, while using the volume video data.