The 6-base genome

What is the 6-base genome?

Genetics does not tell the whole story

In the field of genomics, the traditional understanding of the genome has been centred around the four canonical nucleobases: adenine (A), cytosine (C), guanine (G), and thymine (T). But information is not just encoded within the genetic sequence – epigenetic modifications, such as DNA methylation, also control how genes are expressed and are influenced by various factors such as age, lifestyle, and disease thereby bridging the gap between genetics and environmental influences.

The emergence of the ‘6-base genome’, which includes the modified bases 5-methylcytosine (5mC) and its oxidised form, 5-hydroxymethylcytosine (5hmC), marks a significant evolution in our understanding of genetic and epigenetic regulation. This blog aims to introduce 5mC and 5hmC and show how by unlocking the 6-base genome we can make meaningful steps towards releasing all the information encoded in our genetics and improve our understanding of the dynamic interactions that govern cellular processes.

Beyond the traditional 4-base genome: the addition of 5mC and 5hmC

The 4-base genome, consisting of A, C, G, and T, is the foundation of genetic information – but whilst these form the backbone of genetic information, DNA methylation adds another layer of complexity and functionality. The modifications of cytosine to form 5mC and 5hmC are a key epigenetic mechanism, influencing chromatin structure and accessibility, recruiting or repelling transcription factors, and ultimately modulating gene expression. By dynamically altering DNA methylation, a cell can quickly adapt to changes in its environment, regulate its development, and maintain genomic stability

5-methylcytosine (5mC)

5mC is formed through the addition of a methyl group to the 5th carbon of the cytosine pyrimidine ring. This process, known as DNA methylation, is catalysed by DNA methyltransferases (DNMTs). Methylation of cytosine is a key epigenetic marker, influencing gene expression by modifying DNA in a way that does not change the genetic sequence but affects how genes are turned on or off. Methylation of gene promoters generally leads to gene silencing, playing a vital role in cellular differentiation, X-chromosome inactivation, and the suppression of transposable elements.

5-hydroxymethylcytosine (5hmC)

5hmC is generated by the oxidation of 5mC through the action of the ten-eleven translocation (TET) family of enzymes. This modification is thought to be an intermediate step in the active demethylation of DNA, thereby playing a crucial part in gene regulation. The presence of 5hmC is particularly abundant in neuronal cells, suggesting a significant role in brain development and function.

The 6-base genome: a new layer of complexity

The 6-base genome emphasises that genetic information is not only encoded by the sequence of the four canonical bases but also by the chemical modifications of these bases. These modifications can influence gene expression, cellular identity, and an organism’s ability to respond to environmental changes.

The aberrant distribution of 5mC and 5hmC has been implicated in a range of diseases, most notably in cancer, where DNA methylation patterns are often dysregulated. Changes to patterns of DNA methylation (hypermethylation or hypomethylation) are among the earliest changes observed in cancer, indicating that these epigenetic modifications have direct impacts on cell function and disease progression. Identifying and better understanding of these modifications has the potential to lead to the development of novel diagnostic tools, targeted therapies, and personalised medicine approaches.

Sequencing the 6-base genome – one workflow, one solution

Current methods to identify cytosine modifications, including bisulfite sequencing in conjunction with whole-genome sequencing, are limited to producing a 5-base genome; they can identify modified cytosine but fall short in distinguishing between 5mC and 5hmC. These approaches also often need more time and effort to generate lower-quality data as workflows require splitting of the DNA sample and running experiments in parallel, requiring multiple data analyses and resulting in significant information loss. The duet multiomics solution evoC is a single workflow solution providing the most efficient way to sequence the 6-base genome – all four canonical bases plus distinguish 5mC and 5hmC simultaneously from the same DNA fragment using existing next-generation sequencing infrastructure.

Finally, see it all

While genetics focuses on the study of inherited traits determined by DNA sequences, epigenetics explores how those traits can be modified by external or environmental factors without altering the DNA sequence itself. It is becoming increasingly clear that the regulation of gene expression is a highly complex and dynamic process, influenced by a wide array of genetic and epigenetic factors. The 6-base genome offers a profound development to our understanding of genetic complexity, unlocking new dimensions of gene regulation and expression. The addition of 5mC and 5hmC to our genetic toolkit with the 6-base genome is not just an expansion—it’s a revolution that challenges our previous notions of gene regulation, pushing the boundaries of our knowledge and inviting us to rethink the complexity of life at the molecular level.

Cambridge Epigenetix is now biomodal