A walking assistance device may be controlled using a personalized gait policy. A preset event is detected during a current gait cycle based on state information associated with a motion of the walking assistance device, a reward value of state information associated with the motion is evaluated at a point in time at which the event is detected, and a personalized gait policy is updated based on state information associated with the motion when updating of the personalized gait policy is determined to be required based on the reward value.