A novel framework for fast and continuous registration between two imaging modalities is disclosed. The approach makes it possible to completely determine the rigid transformation between multiple sources at real-time or near real-time frame-rates in order to localize the cameras and register the two sources. A disclosed example includes computing or capturing a set of reference images within a known environment, complete with corresponding depth maps and image gradients. The collection of these images and depth maps constitutes the reference source. The second source is a real-time or near-real time source which may include a live video feed. Given one frame from this video feed, and starting from an initial guess of viewpoint, the real-time video frame is warped to the nearest viewing site of the reference source. An image difference is computed between the warped video frame and the reference image. The viewpoint is updated via a Gauss-Newton parameter update and certain of the steps are repeated for each frame until the viewpoint converges or the next video frame becomes available. The final viewpoint gives an estimate of the relative rotation and translation between the camera at that particular video frame and the reference source. The invention has far-reaching applications, particularly in the field of assisted endoscopy, including bronchoscopy and colonoscopy. Other applications include aerial and ground-based navigation.