Previously captured three-dimensional image data of a subject from which a reference image is extracted are provided in the form of time-series volume data 14, 15 composed of two-dimensional image data groups which are recorded over at least one period of a periodically moving organ of the subject, with time-phase information of a biological signal of the subject added thereto. Two-dimensional image data whose time-phase information is identical with the time-phase information of the biological signal of the subject at the time when an ultrasonic image is captured are extracted from the three-dimensional image data so as to provide the reference image.