Synthetic representative HCV subtypes, including a 1a and 1b genome, dubbed Bole1a and Bole1b, are provided using an inventive method of Bayesian phylogenetic tree analysis, ancestral sequence reconstruction and covariance analysis. Bole1a branches centrally among 390 full-genome sequences used in its design, a carefully curated 143 sequence full-genome dataset, and separate genomic regions including an independent set of 214 E1E2 sequences from a Baltimore cohort. Bole1a is phylogenetically representative of widely circulating strains. Full genome non-synonymous diversity comparison and 9-mer peptide coverage analysis showed that Bole1a is able to provide more coverage (94% and 78% respectively) than any other sequence in the dataset including H77, a traditional reference sequence. Bole1a also provides unsurpassed epitope coverage when compared to all known T cell epitopes.