A machine learning device that includes a training section configured perform training based on plural training data each configured from oral cavity information detected from a specimen and at least one determination value selected from the group consisting of a probability of periodontal disease contraction of a provider of the specimen or a state of periodontal disease, so as to train the function to decide the determination value. The training section includes a reward calculation section, a function updating section, and a convergence determination section. The reward calculation section calculates a reward for a defined result of the probability of periodontal disease contraction or the state of periodontal disease based on the training data. The function updating section updates the function so as to increase the reward calculated by the reward calculation section. The convergence determination section causes repeated calculations and updates until a predetermined convergence condition is satisfied.