A viewing state detection device (6) is configured to include an image input unit (11) to which temporally consecutive captured images including an audience and information on the captured time of the captured images are input, an area detector (12) that detects a skin area of the audience from the captured images, a vital information extractor (13) that extracts vital information of the audience based on time-series data of the skin area, a viewing state determination unit (17) that determines the viewing state of the audience based on the vital information, a content information input unit (14) to which content information including at least temporal information of the content is input, and a content viewing state storage unit (19) that stores the viewing state in association with the temporal information of the content.