An information processing apparatus includes an image acquiring unit to acquire captured image of a watching target person and a target object as a reference for a behavior, the captured image containing depth information indicating depths per pixel, a foreground area extracting unit to extract a foreground area on the basis of a difference between a foreground image and the captured image, the foreground image being set to contain a depth of the foreground and a behavior presuming unit to presume the behavior about the target object by determining whether a positional relationship between the foreground area and the target object area satisfies a predetermined condition or not on the basis of referring to the depths of the pixels in the foreground area based on the depth information, the condition being set on the assumption that the extracted foreground area is related to the behavior.