A projection, image, and depth capture system projects content into a scene and captures images of the scene as the user interacts with the content. The system uses depth analysis to determine location and distance of available surfaces in the scene onto which the content can be projected. Due to the complexity of this analysis and the inherent imperfections of the electronic and optical components, depth analysis possesses inherent noise that may adversely affect the accuracy of the projected image onto the surface. The system is configured with noise compensation technology that averages depth information over multiple image frames captured from the scene. The averaged information leads to a more consistent measurement of the distance to the surface, which in turn allows for more accurate focus of the projected content.