Ultrasound data is aligned spatially and temporally for volume rendering of a fetal heart or other cyclically moving object. A sequence of ultrasound data is obtained for each of a plurality of planes, such as acquiring data representing each plane over one or more cycles. The different planes are scanned sequentially in a step mode acquisition. The data is aligned temporally and spatially to create data representing volumes at different times throughout the cycle. The alignment uses similarity of the ultrasound data in time and space.