The present invention concerns a human emotional/behavioural/psychological state estimation system comprising a group of sensors and devices and a processing unit. The group of sensors and devices includes: a video-capture device; a skeletal and gesture recognition and tracking device; a microphone; a proximity sensor; a floor pressure sensor; user interface means; and one or more environmental sensors. The processing unit is configured to: acquire or receive a video stream captured by the video-capture device and data items provided by the skeletal and gesture recognition and tracking device, the microphone, the proximity sensor, the floor pressure sensor, the environmental sensor(s), and also data items indicative of interactions of a person under analysis with the user interface means; detect one or more facial expressions and a position of eye pupils, a body shape and features of voice and breath of the person under analysis; and estimate an emotional/behavioural/psychological state of the person on the basis of the acquired/received data items, of the detected facial expression(s), position of the eye pupils, body shape and features of the voice and the breath of the person, and of one or more predefined reference mathematical models modelling human emotional/behavioural/psychological states.