Various embodiments of a video-based fall risk assessment system are disclosed. During operation, this fall risk assessment system can receives a sequence of video frames including a person being monitored for fall risk assessment. The system next generates a sequence of action labels for the sequence of video frames by, for each video frame in the sequence of video frames: estimating a pose of the person within the video frame; and classifying the estimated pose as a given action among a set of predetermined actions. Next, the system identifies a subset of action labels within the sequence of action labels. The system next extracts a set of gait features for the person from a subset of video frames within the sequence of video frames corresponding to the subset of action labels. Subsequently, the system analyzes the set of extracted gait features to generate a fall risk assessment for the person. In some embodiments, the sequence of video frames is captured during a predetermined time period, such as an hour, a day, or a week.