Disclosed are various embodiments for predicting and avoiding collisions during radiotherapy. A depth map produced by at least one three-dimensional camera is obtained by a computing device. The computing device identifies a plurality of objects in the depth map, wherein the plurality of objects comprise a radiation therapy machine and a patient. The computing device generates a corresponding three-dimensional model for each one of the plurality of objects. The computing device then determines whether the corresponding three-dimensional model for each one of the plurality objects overlaps with another corresponding three-dimensional model for another one of the plurality objects.