A method and system for monitoring a retail environment by performing video content analysis based on two-dimensional image data and depth data are disclosed. Accuracy in customer actions to provide assistance, change marketing behavior, safety and theft, for example, is increase by analyzing video containing two-dimensional image data and associated depth data. Height data may be obtained from depth data to assist in object detection, object classification (e.g., detection a customer or inventory) and/or event detection.