A device and a method for learning behavior of a pet in response to instructions provided to the pet are disclosed. The device comprises at least one image capturing unit, at least one speaker, and a controller. The at least one speaker announces pre-determined instructions to the pet. The pre-determined instructions are associated with a plurality of pre-defined postures of the pet. The at least one image capturing unit captures postures of the pet in response to announcement of the pre-determined instructions. The controller compares the postures of the pet captured with each of the plurality of pre-defined postures determined to learn behavior of the pet in response to the pre-determined instructions announced.