There is provided a method for generating a three-dimensional (3D) ultrasound image of a tissue volume using at least one processor, including: generating a series of two-dimensional (2D) ultrasound images of the tissue volume associated with a plurality of positions, respectively, along a scanning direction of the tissue volume; estimating, for each pair of consecutive 2D ultrasound images of the series of 2D ultrasound images, a distance between the positions associated with the pair of consecutive 2D ultrasound images based on a classification of a difference image generated from the pair of consecutive 2D ultrasound images using a deep neural network to produce a plurality of estimated distances associated with the plurality of pairs of consecutive 2D ultrasound images, respectively; modifying the number of 2D ultrasound images in the series of 2D ultrasound images based on the plurality of estimated distances to produce a modified series of 2D ultrasound images; and rendering the 3D ultrasound image of the tissue volume based on the modified series of 2D ultrasound images. There is also provided a corresponding system for generating a 3D ultrasound image of a tissue volume.