Described is a disease prediction system using open source data. The system includes a preprocessing module, a learning module, and a prediction module. The preprocessing module receives a dataset of N trend results related to a disease event and generates an enhanced filter signal (EFS) curve related to the disease event. The learning module receives the EFS curve and generates a predicted number of cases of the disease event and, using a plurality of machine learning methods, generates a plurality of predictions that the disease event will happen within a future time period. The prediction module determines precision and recall for each of the plurality of predictions and, based on the precision and recall, provides a likelihood that the disease event will occur.