Kenneth Beckman, John Garbe, Nick Jones, Mark Murphy, Noah Strom
University of Minnesota Genomics Center (UMGC), Minneapolis, MN, USA
The genome-wide analysis of DNA methylation at CpG (5mC) sites can be achieved by several methods, with array-based approaches from Illumina having dominated the analytical space for two decades, and sequencing-based approaches (WGBS, RRBS) becoming common as NGS costs have dropped. The nature of the information provided by DNAm arrays is strikingly different from NGS. Compared to arrays, which interrogate tens of thousands of molecules and provide CpG site beta values (5mC/C) with high precision, NGS typically interrogates tens of reads at a given locus, with single-site 5mC/C precision limited by read depth. At the same time, compared to NGS, which provides allelic information on neighboring CpG sites found within a single read and reports on all mappable CpG sites in the genome, arrays mask allelic information and target a small fraction of CpG sites. Until recently, moreover, neither arrays nor NGS have allowed for the facile genome-wide measurement of hydroxymethylcytosine (5hmc) along with 5mC.
With the introduction of the duet multiomics solution evoC from biomodal, genome-wide 6-base sequencing (A, C, T, G, 5mC, 5hmC) is possible in a single workflow. Consequently, we have benchmarked biomodal’s evoC to Illumina’s Infinium HumanMethylationEPIC v2 and MethylationScreeningArray-48 in order to provide bi-directional insights into the platforms. We find that in terms of CpG site beta values, the platforms are well correlated (with quantitative agreement limited, as expected, by sequencing depth), bolstering confidence in both. Our comparison reveals that arrays are less able to distinguish extremes of methylation state (100% methylated or 0% methylated) than evoC and suggests that by leveraging allelic information from neighboring CpG sites, the precision of lower depth sequencing to deliver a comparable beta value to arrays may be improved. We have modeled the use of evoC to complement arrays in terms of performance, cost, and utility in epigenetic clock analysis, and have attempted to generate a list of array CpG site reliability using evoC data as a truth model.