A method and apparatus for capturing images of a scene using a capsule device including a camera are disclosed. An image sequence is captured using the camera when the capsule device travels through a human gastrointestinal tract. Also, structured-light images are captured using the camera by projecting structured light to one or more objects in a field of view of the camera when the capsule device travels through the human gastrointestinal tract. The structured-light images are interleaved with regular images in the image sequence. The distance information with respect to the capsule camera associated with objects of the selected image is derived. Both the image sequence and the distance information are outputted. A method of determining the size of an object of interest utilizing the distance information is also disclosed. In another method, the distance information is used to scale object or adjust intensities.