Symptoms and methods for predicting the incidence of a disease or disorder are disclosed. A system for predicting the incidence of a disease or disorder includes a web-based symptom checker for producing a structured dataset, a data analysis component for producing a multivariate dataset from the structured dataset, and a feature construction component for producing a linear combination of orthogonal symbols representative of a disease or disorder. A method for predicting the incidence of a disease or disorder includes producing a multivariate dataset representing patient symptom counts, performing feature construction analysis on the multivariate dataset, creating a time series model using weekly illness incidence data, and applying the time series model to new illness incidence data to predict the incidence of a disease or disorder in the future.