An information-processing apparatus includes a display displaying an image; a memory recording voice data having a voice pronounced at each of plural observation points of the image; a gaze detector generating gaze data by detecting a gaze of a user; a voice input device generating voice data associated with a time axis identical to that of the gaze data by receiving a voice of the user; and a processor to analyze a attention period where a attention degree of the gaze to each of the plural observation points is a predetermined value or greater, based on the gaze data, set a period where the voice is pronounced with respect to the voice data as an important voice period, based on the voice data, and generate calibration data based on a time lag between the attention period and the important voice period.