Jean Teyssandier, Nicholas Harding, Sabri Jamal, Michael J. Wilson, Gary Frewin, Nicola Wong, William Stark, Mark S. Hill, Páidí Creed1
1biomodal Ltd, Cambridge, United Kingdom
We present analysis software optimised to analyse 5mC and 5hmC data at scale and describe its performance on a novel liquid biopsy dataset.
Methylation data has diverse applications in cancer, including early-stage diagnosis through liquid biopsy, classification to guide treatment pathways, and prognosis. However, analyzing methylation data poses significant challenges, as it is constrained by scalability and usability issues.
Our recently introduced technology, duet multiomics solution evoC, enables the reading of 6-base information (A, T, G, C, 5-mC and 5-hmC) from DNA, further amplifying the complexity and scale of datasets generated in a single sequencing experiment. To address this, we present a fast and scalable software package for the analysis of 6-base genomes, using multi-core out-of-memory processing to enable extremely efficient computation, even for datasets that are too large to fit into memory.
Our approach enables the analysis of large datasets, scaling to thousands of samples. Unlike existing tools that exceed typical laptop memory with ~10 samples, the analysis software can efficiently process a colorectal cancer liquid biopsy dataset of over 100 samples in minutes on a standard laptop. Furthermore, it runs a complex DMR model (logistic regression with covariates) genome-wide in under an hour, a task infeasible with current tools.
The analysis software combines efficient computation with tools for exploratory (e.g., plotting, correlation) and downstream analyses (e.g., DMR identification, PCA). Designed for efficiency and ease of use, it enables users to rapidly transition from raw data to actionable insights and publication-ready results. As multiomic data become the standard in cancer research, our data structure supports the integration of additional data types, allowing us to handle combined genomic and epigenomic data from solutions like duet evoC. This will enable streamlined and efficient multiomic analysis to uncover deeper biological insights.