A method and system for monitoring buildings (including houses and office buildings) by performing video content analysis based on two-dimensional image data and depth data are disclosed. Occupation and use of such buildings may be monitored with higher accuracy to provide higher energy efficiency usage, to control operation of components therein, and/or provide better security. Height data may be obtained from depth data to provide greater reliability in object detection, object classification and/or event detection.