DNA sequencing has come a long way since the completion of the Human Genome Project. Sir Shankar Balasubramanian, a chemist and a professor at Cambridge University, has made significant contributions to the field, including the development of a method for reading the primary sequence of DNA quickly and the exploration of a quadruple helix structure. His work has revolutionised DNA sequencing technology, enabling the construction of genomes and leading to significant advances in the field.
Sequencing technology is being used in various fields, including cancer research, rare disease diagnosis, tracking the emergence of new pathogens, and studying modifications to DNA that go beyond the four bases. In cancer research, sequencing the genomes of cancer cells enables researchers to gain a better understanding of the disease and develop more targeted treatments. In rare disease diagnosis, sequencing the genomes of families can help identify mutations unique to the affected individuals, aiding diagnosis and treatment. Sequencing technology is also being used to track the evolution of pathogens, like the SARS-CoV-2 virus that causes COVID-19, to develop more effective vaccines and treatments.
There are three main areas of focus in DNA-based research: reading, writing, and editing. Sequencing technology falls under the reading category. The next area of focus is writing, with a lot of effort going into improving ways of making DNA, including making extremely long DNA. Writing is still lagging behind reading, but it will catch up soon. DNA is also being used for the storage of information and is the most volume-efficient way of storing data. The third area of focus is genome editing, and a number of methods have improved the capacity to edit genomes, such as the CRISPR and CAS Nine technology.
Balasubramanian’s contributions have not only advanced the field of genetics but also have significant implications for personalised medicine. By sequencing an individual’s entire genome, doctors can identify genetic variations that may predispose that person to certain diseases and develop personalised treatment plans tailored to their unique genetic makeup. Furthermore, Balasubramanian’s work has also contributed to the field of synthetic biology. By understanding how DNA works and developing new sequencing technologies, scientists can now create entirely new organisms that can be used for a variety of purposes, such as producing new medicines or cleaning up the environment.
In conclusion, sequencing technology has undergone transformative changes in recent years, and it has the potential to change how we practice medicine in the future. Balasubramanian’s work has had a significant impact on the field of genetics, and his contributions to the development of new sequencing technologies and the study of epigenetics have greatly advanced our understanding of how DNA works and how genetic variations contribute to disease. As we continue to explore the mysteries of the human genome, we can expect that Balasubramanian’s work and sequencing technology will continue to play an essential role in advancing the field of genetics.
[00:01:28.330] – Gerhard Fasol
[00:01:29.890] – Sir Shankar Balasubramanian
[00:01:31.180] – Gerhard Fasol
Hello. Fantastic to see you. Thank you so very much for sacrificing and donating your time to us here today.
[00:01:44.850] – Sir Shankar Balasubramanian
It’s a pleasure.
[00:01:47.490] – Gerhard Fasol
It’s, you know, preparing for this event. I worked through a little bit of your work and I’m so impressed by what you have achieved. I was almost blown away by all your achievements. So it’s a very great honour that you can find the time for us here.
[00:02:08.670] – Sir Shankar Balasubramanian
It’s a pleasure to be here. And a shame I can’t be in Japan
[00:02:17.090] – Gerhard Fasol
I started Trinity in Japan seven years ago. And before the virus, we used to have in-person dinner meetings, and in November we’ll start again with dinner meetings. So when you come to Japan, if you let me know, then I’d love to organise a dinner meeting with you. Either Trinity in Japan or Pica.
[00:02:39.850] – Sir Shankar Balasubramanian
Would love to. Actually, it’s been, I think, three years ago that I was last in Japan. Yeah, I was in Kobe for a week. And then five years ago, I spent quite a long time in Kyoto.
[00:02:57.950] – Gerhard Fasol
It’s beautiful. And there’s a lot of IPS work and medical work there.
[00:03:02.750] – Martin Morris
[00:03:03.140] – Sir Shankar Balasubramanian
It was also culturally a very enriching experience to be there. Actually, it was wonderful.
[00:03:12.950] – Gerhard Fasol
I go almost every month to Kyoto. It’s so wonderful. And at the moment, there are zero tourists. I was about 13 years at Trinity. I did my PhD and then I was a fellow for about ten years and I was tenured lecturer at university in principal. I could have stayed forever at university, but then I moved to Japan and now Trinity in Japan. The purpose is really a is alumni organisation, but I want to make much more out of it. One of the things I’m doing for these Zoom discussions now, I’m inviting people who are the best people I can find as discussion partners in Japan, in addition to Trinity members. So today I have invited Professor Yamamoto, who you might know. Do you know him?
[00:04:07.190] – Sir Shankar Balasubramanian
We haven’t met, but I certainly know of him. So. Professor Yamamoto. Pleasure to meet you.
[00:04:14.630] – Martin Morris
Through Zoom yeah, nice to meet you. And we are looking forward to looking to your talk. And behind me here is Kkotaguchi, associate professor, and Mickey Katoka, professor, and also the Akihito, the assistant professor. We are all working on the fortune of sequencing the needs, so by using your technology and we are looking forward to your talk.
[00:04:50.070] – Sir Shankar Balasubramanian
Well, it’s a pleasure to meet you. Through Zoom.
[00:04:56.450] – Gerhard Fasol
We thought we’ll start at 715, so would it be.
[00:05:00.170] – Gerhard Fasol
Okay to start now?
[00:05:04.770] – Sir Shankar Balasubramanian
[00:05:05.750] – Gerhard Fasol
Or if you like, we can have there are not so many people, so if you like, you can also have a short introduction of the people here. I will just go along how I see them here on my screen. So there’s Ray Schenfeld, who is he’s in Thailand at the moment. And he’s doing I think he has a business development company. Then there’s Martin Morris, who is one of the world’s greatest experts on Japanese housing and building works and constructions. And he’s Professor Chiba. And he’s a Trinity alumni. And maybe you can say a word. Martin, do you like to do me?
[00:05:54.910] – Martin Morris
[00:05:55.380] – Martin Morris
Now. Basically, I’m a historian of architecture, and most of my research has been concerned with the development of the Japanese house and the relationships between house types of different status groups and how the house evolves over time. Really? From the ancient period, the period of the big Tumili tombs, the Coffin period through to the early mark, and the end of the Edo period.
[00:06:27.470] – Gerhard Fasol
Then we have Professor Komoto here from Kyushu University. He’s professor of mathematics, and he’s also director of an institute for industrial mathematics. And then we have David Otenga, who is business strategy manager for the German company Size, the Size group, the optics group. So, shall we start? So instead of myself introducing you, I already said I was blown away. I’m deeply, deeply impressed by all the fantastic work you have done which will impact our world for a long time to come. And you have Professor Yamamoto here, who is using the equipment you invented and the methods you invented. So I better ask you if you like to maybe you can tell us a few words how you came to where you are today and then about your work and your companies, maybe.
[00:07:37.250] – Sir Shankar Balasubramanian
Yeah, absolutely happy to tell you about my pathway to the career that I’ve been in. And also, I guess there’s quite a lot of business experts in the meeting. So we could also happy to talk about some of the business aspects of science as well, if that would be of interest.
[00:08:02.110] – Gerhard Fasol
Oh, that’s very interesting.
[00:08:04.030] – Sir Shankar Balasubramanian
Yes, that’s a component. Gerhard as. It’s a small group, very happy to do the whole thing as a dialogue and a conversation. So if anybody wants to chip in with questions or discussions in the middle, very happy to make it more of a discussion as well.
[00:08:28.360] – Gerhard Fasol
Thank you. One thing also, I should ask again, I’ve asked you already, but I’m recording this, and afterwards I would like to publish it on the YouTube site so that people can watch it. So if you want to delete something afterwards, you can tell me and I can delete it if you want. If that’s something you regret having said.
[00:08:49.440] – Sir Shankar Balasubramanian
Or something like that should be fine. I’ll try to avoid saying it the best way. Okay, well, a little bit about myself then. So actually I’ll start with my origins. I was actually born in southern India. Well, it was called Madras then, it’s called Chennai today. And I still have quite a lot of family there, in fact. So my parents came to the UK when I was a baby, so I didn’t really spend any time growing up in India. I grew up in England, actually, in the north of England, just little village outside Liverpool in the north of England. That’s where home was for me as a child. And I suppose growing up there, one gets influenced by a number of things music and sport and football and so forth. And I guess up until the age of 17, I saw myself as a professional footballer, not a scientist. But I wasn’t good enough as a footballer. So I left school and went to university in 1985 and I I came to Cambridge to study natural sciences. Actually, I was not a student at Trinity, I was I was at Fitzwilliam College as as an undergraduate and studied natural sciences.
[00:10:28.980] – Sir Shankar Balasubramanian
And then afterwards I stayed in Cambridge to do a PhD. And my specialisation was chemistry. I’m a chemist and I spent my PhD studying the mechanisms of natural catalyst enzymes and how they catalyse chemical reactions. So that was the focus of my PhD. After that, I spent two years in America doing postdoctoral work in a lab in Pennsylvania, central Pennsylvania, supervised by a chap called Stephen Benkovic. And I didn’t say my PhD supervisor was a chap called Chris Abel, who sadly passed away last year, but was a colleague, senior colleague up until then as well. So it’s good to talk about turning points in a career. Actually, one turning point for me was I wanted to actually be a footballer and that didn’t happen. That was a turning point at the age of 17. Another turning point for me, I suppose, was I recall doing my exams in 1988, my final exams, and I thought I’d mess them up and therefore the PH. D plan needed to be changed. So I almost went with a colleague to the United States to execute a business plan to set up a chain of wine bars in the US.
[00:12:17.050] – Sir Shankar Balasubramanian
And I told my parents this in 1988. I said, there’s a change of plan and I’m going to do this. And they knew my friend and actually they were very open and supportive and said, if that’s what you want to do, that’s what you should do. Anyway, it turned out I was wrong about my exams. I’d actually done quite well and was eligible to do a PhD. So I stayed and did a PhD. Incidentally, my friend with whom I was going to set up a chain of wine bars, he is now the provest of a university in Boston. So he also ended up becoming a professor and an academic. So I was in the States and nearly stayed there, actually. And I remember there was a lot of pressure from my peers who were All American to stay there. The sort of view was, this is actually where you need to be if you want to do science. And one of my colleagues in Cambridge, senior colleague, Alan First, very famous and brilliant protein engineer, he contacted me and he persuaded me to come back to Cambridge and persuaded me very much that the environment here was an environment in which you could be successful.
[00:13:51.270] – Sir Shankar Balasubramanian
So that that was another turning point for me. So I I came back to Cambridge in 1994. I also became a Fellow of Trinity in, in 1994. So I’ve I’ve been there for 27 years now as a Fellow, and Gerhard I we probably overlapped at some point.
[00:14:13.700] – Gerhard Fasol
I moved to Japan in 91.
[00:14:17.010] – Sir Shankar Balasubramanian
Just missed each other.
[00:14:18.610] – Gerhard Fasol
Yeah, I was Fellow between I did my PhD from when was that? 78 to 81. And I got the junior fellowship in, you know, the research fellowship in 81. And then in 86 I became Teaching Fellow and Director of Studies at the same time, lecture in the Cavendish. And my supervisor for the PhD was Apiophy. I don’t know if you know him. He’s still alive. He’s more than 100 years old now. And I hear from my colleagues that they visit him sometimes, but I haven’t seen him for quite some time.
[00:14:57.980] – Sir Shankar Balasubramanian
Tremendous well. We must have overlapped a little bit, I guess in the late 80s, early.
[00:15:04.840] – Gerhard Fasol
Ninety s. Oh, yes, I’m sure, I’m sure we have many friends like you write in one of the articles I’ve seen you write that Cambridge is a very small world and we will have many, many common friends. We were at Trinity at the same time. We overlapped.
[00:15:21.920] – Sir Shankar Balasubramanian
Yes, we must have done. So anyway, that’s a little bit of history of how I got to Cambridge. And it’s interesting actually, the office I’m in now used to be the office of Alan First. Sir Alan First, who was I came back. He was also my PhD examiner.
[00:15:46.690] – Martin Morris
[00:15:47.390] – Sir Shankar Balasubramanian
My PhD examiner. So I have memories of being in this room, in the hot seat, being interrogated by Sir Allen and another distinguished chemist in 1991. Anyway, it’s somewhat ironic that I ended up having the same room as my office.
[00:16:07.140] – Gerhard Fasol
This is in Trinity now. This room is in Trinity. Or that’s in the chemistry department in Lensfield Road.
[00:16:13.910] – Sir Shankar Balasubramanian
No Lensfield road. Exactly. It’s the same building built in the.
[00:16:19.450] – Gerhard Fasol
Late ninety s. You will know Steve Elliott, I think I know him forever.
[00:16:25.530] – Sir Shankar Balasubramanian
Is an Emiritus professor now, but he still comes into the department. Absolutely. Steve is very much still here.
[00:16:32.740] – Gerhard Fasol
When I did my PhD, he was like one desk down or so. I know him since I think he was a couple of years in front of me. Maybe one, two years in front of me, or three years in front of.
[00:16:44.520] – Sir Shankar Balasubramanian
Oh, yes, Steve very much around, actually. And Steve, you might remember he has, apart from being a brilliant chemist, he’s also a brilliant expert on wine.
[00:16:57.740] – Gerhard Fasol
He’s always the chief of the wine committee and he organises his trips to the south of France. I got invitations, but I never got a chance to go.
[00:17:08.840] – Sir Shankar Balasubramanian
I very much enjoyed, as a fellow, going to some of his wine tutorials on wines from different regions. Extremely, absolutely. So I started here in 1994. I think I was 27 years old, and I think it’s fair to say I had lots of ideas, but perhaps I didn’t have one single clear direction at the beginning.
[00:17:39.230] – Gerhard Fasol
We have many now. That doesn’t seem to be a problem.
[00:17:44.390] – Sir Shankar Balasubramanian
Well, it was interesting, I think, early on, the work that led to the sequencing actually came from some of the very early work. And you might read, I’ve written an article for the Annual Record of Trinity recently, which will appear in December, and I was writing this. One of the things I wrote is, looking back to the very beginning of my career, I could view it as being very adventurous and pioneering, or that I didn’t really have a clue what I was trying to do, because I was trying many things and it’s a matter of perspective as to which of those it was. But I do think in many levels, one has a greater sense of adventure early in one’s career in research, and over time, you do become entrenched in fields and ideas and so forth. I do feel hugely important to support and empower young academic researchers at the beginning of their career, because in many disciplines, that’s arguably when they do their best work. And I look at my own work and it was something I started early in my career where I wasn’t very clear about which way I was going.
[00:19:16.610] – Sir Shankar Balasubramanian
That’s turned out to be the most impactful contribution I’ve made.
[00:19:23.490] – Gerhard Fasol
The next generation sequencing, right, absolutely.
[00:19:27.900] – Sir Shankar Balasubramanian
So let me say a little bit about my area. I converged on DNA. So DNA is a unifying theme in what I study, and I think it was influenced by some of the history in Cambridge. And of course, there was a chemist who built this building, Lord Todd, who was previous Master of Christ. So Lord Todd, he he was one of the key people who discovered the chemical structure of particularly RNA system molecule to DNA, some of the constituents of DNA. And this work came before Watson and Crick’s work on the three dimensional folded structure of DNA, which led to the understanding of the genetic code. And there was much more afterwards. Sidney Brenner is one of the pioneers of molecular genetics, molecular biology, and one of the key people responsible for understanding that DNA information is converted to RNA information, which is then converted to the architecture of proteins, the machinery of life. And then Fred Sanger, who invented the first widely used method for reading the sequence of DNA. In fact, Fred Sanger’s house is around the corner from where I live and there’s a blue mark on it for everyone to see.
[00:21:20.220] – Sir Shankar Balasubramanian
So Cambridge certainly is one of the special environments for the history of building understanding of DNA. So as a young researcher, it was quite a natural thing to think about DNA, and there are a number of things that I’ve been working on over the years, I’ll say a bit about each, but one is what turned out to be a method for reading the primary sequence of DNA very quickly. Another, actually, is, in a sense, you could see it as a bit of a challenge to the Watson Crick structure, in that Watson and Crick were responsible, along with Roslyn Franklin, who generated the X ray data for this two stranded double helix model of DNA, which is the structure that DNA adopts most of the time in living systems. But it’s not the only structure. And so we’ve spent over 20 years working on a quadruple helix structure that we call a quadruplex, which I’ll say a little bit about. And then the third area is DNA, you will know is it’s a linear polymer, natural polymer, composed of four different types of building blocks in general. And we abbreviate this to the letters GC, A and T.
[00:23:01.210] – Sir Shankar Balasubramanian
So you can certainly think of it as being a linear code with letters organised in a particular way. And a human genome has about 3.2 billion of these letters organised in 23 chromosomes in one copy. Now, some of these letters actually have other natural modified forms that constitutes a different type of code, and I’ll say a little bit about that as well. So how did the efforts we started here go into next generation sequencing? Let me spend 10 minutes or so on this and a little bit about some of the applications, but Professor Yamamoto is a greater expert than I am on what you can learn from sequencing genomes, so would invite him to also join in at that point. So in nature, when a cell divides, it also has to make an extra copy of its DNA for the daughter cell. And this occurs through a process called DNA replication. And it’s a remarkable natural process and it uses machines called DNA polymerases. Now, I actually spent part of my time in the US working on understanding how these molecular machines work. So one of the things I was doing when I started in the UK was studying how these machines work.
[00:24:52.020] – Sir Shankar Balasubramanian
And what they do is they take one strand of DNA and they make a second strand and it’s almost like zipping up a second strand by building the second polymer. And what, what Watson and Crick showed is that the bases on one strand connect with the bases on the other strand in a very precise way. And the G base pairs up with a C base and the A base pairs up with a T base. So they click together. And this is the part that you see in the middle of a double helix. And this very precise recognition is the molecular basis for the genetic code for the formation of RNA and also the translation of RNA into proteins. So it’s hugely important. So a polymerase will synthesise a second strand. Now, I was doing some experiments and they involved fluorescent labelling of DNA and using an imaging system that could image fluorescence. And during these experiments, I had need for a laser of a particular wavelength, which I didn’t have because I’m an organic chemist biochemist. So I was walking around the department trying to see if anyone had a laser with particular characteristics. And someone told me about a colleague, a physical chemist called David Klennerman, who had lots of instrumentation and lasers and he had started the same time as me.
[00:26:42.960] – Sir Shankar Balasubramanian
So I spoke to him and he helped me finish those experiments. Then we continued to chat and during that conversation, we generated some ideas based around watching one of these machines synthesise DNA, but to do it at the single molecule level. So around about that time, this was 1995, it became possible to image one molecule by having one fluorescent beacon attached to it at room temperature. This was relatively new, so we actually wrote a grant in 1995 to do this. It was very much blue skies, open ended research. And it was during this work that we saw the potential to actually read one strand of DNA by incorporating building blocks with this machine, where we colour coded each building block. So each letter would have its own colour. A would be blue, G would be green, C would be yellow, T would be red. For example, if you could control that process and image it, you actually read a sequence by seeing a series of colour changes. Now, the way we were doing these experiments was to take a small glass chip as a surface and to break DNA up into lots of fragments on the surface and you could image them all separate fragments simultaneously.
[00:28:29.490] – Sir Shankar Balasubramanian
So we could see a way of actually sequencing lots and lots and lots of fragments in parallel. And rather like the advances in microchips and processors, they’ve got smaller and smaller over time and the features get smaller and smaller over time and the density gets higher and the processing power gets higher by miniaturisation and parallelisation. So by analogy with that, we sort of visualised a way that could lead to sequencing at very, very large, fast scale. And we calculated it would be possible to sequence billions of letters of DNA in an experiment. So these were calculations and thoughts back in 1990, 619, 97. Now, the Human Genome Project was relatively early at that point in time and they were using machines that had automated and optimised the method that Fred Sanger had created in the late 1970s. And they were sequencing at a rate of about hundreds of thousands of letters per experiment, which is very good, but we could see a way of increasing that to billions. So this was an extra sort of 10,000 to 10,0000 fold enhancement. So we did proof of concept work and then we decided to form a company.
[00:30:16.630] – Sir Shankar Balasubramanian
The reason we decided to form a company, we were not at all business minded people. We needed to find the funds to build a team that included chemists, biochemists, physical chemists, instrumentation engineers, computer scientists who could write code design software and also some geneticists to help us apply the technology. So this effort, we realised it would require a team, a very interdisciplinary team, working together. And at that time we were junior faculty members and also the grant funding system didn’t really have a way to build a team like that. So we decided to raise money from venture capital and we were introduced to one venture capital fund. We went to them in 1997 with this very ambitious proposal that needed a lot of work and had a lot of technical risk and I don’t know if they would fund that project today, but they decided to fund it then, even though it was very risky. And so we started the company shortly afterwards and we built a team and the team was actually here in Lensfield Road for two years. So it wasn’t really a company with its own premises, it was still in our labs and we were doing proof of concept work.
[00:32:11.500] – Sir Shankar Balasubramanian
And after two years, everybody felt this was going well, so we needed to raise even more money and increase the size of the team to then try and turn this into a technology that would be a product, commercial technology. So we gradually moved everyone out onto a science park not far from Cambridge, and it was located near the Sanger Institute, which Professor Yamamoto has probably visited our genome centre so that they could test the technology as and when it was ready. So what happened next, over the next three, four years or so, is a lot of hard work by a very talented team of people to crack the chemistry. It involved optimising the chemistry. To control this process, the polymerase machine had to be engineered using protein engineering methods to optimise it, because we were doing something somewhat unnatural with it. We had to engineer the surface and build a chip and microfluidics systems, optimise the imaging systems, and also start thinking about how to assemble a genome. Now, what this technology does is it reads lots and lots of fragments, millions, in fact, billions, on today’s systems. And so you read short fragments and you have to construct a genome from this.
[00:34:03.150] – Sir Shankar Balasubramanian
And the way this is done is by a process called alignment, and it requires there to be a reference genome, which the first reference was from the International Human Genome Project. There are now other references and Professor Yamamoto has generated a Japanese human genome, it’s actually hugely important to have reference genomes that embrace ethnicity. Otherwise you have an inappropriate reference and what happens is, computationally you realign all these sequence fragments to the reference and you’ve rebuild your genome and you can start to look at differences. So another important step in this evolution is that we were aiming to sequence single molecules and we moved towards sequencing amplified molecules. And here we acquired a technology that came from a different lab, in fact from a company called Manta. And this was invented by Pascal Meyer and his team. And essentially it’s a method where if you attach a single DNA molecule to a surface, you can amplify it into hundreds of copies of identical sequence. So now when you actually do the sequencing, you get a stronger signal from many molecules and the imaging system could be more simplified and cheaper, which is important building a production instrument.
[00:35:55.410] – Sir Shankar Balasubramanian
So the first genome was sequenced in 2005 and it was a very small bacteria phage. And in 2006, by this stage, the company was probably about 100 people. It had a commercial unit and we were thinking about manufacturing and scaling up. So the very first system was shipped in 2006 and it was called the Genome Analyzer. It was called the One G Genome Analyzer because it could sequence a gigabase of DNA, a billion bases of DNA in one experiment. So it had realised the aspirations we set in 1997 as a billion based sequencer and it was shipped to genome centres around the world. And shortly after that, Illumina, an American company, they had done some very nice work with what were called DNA microarrays that were used in the past for looking at differences between genomes. And they realised that sequencing was probably going to supersede that approach of analysing genomes. So they acquired the company and invested more into improving and developing the technology. And today their machines will sequence literally trillions of bases per experiment. So this is more than a million times higher than it was when we started the project. And I think going at full pelt, these instruments will generate effectively one human genome an hour per instrument.
[00:38:03.210] – Sir Shankar Balasubramanian
So our aspiration early on was to find a way to enable human genome sequencing at scale, at population scale, and this is happening today. So in terms of how does this help? Well, I should say sequencing has become relatively routine in basic research on living systems, not just humans, but other organisms as well. And that’s because every living entity and also pathogens, have a genome that’s made up of either DNA or RNA. And so you can get this fundamental information by sequencing. So it is core to all of the workings of biology. But I suppose the real driving force for the Human Genome Project and the field of genomics has been to make a difference to human health, to contribute to how we think about medicine. And I thought I’d just briefly touch on three medical areas where rapid sequencing at lower cost is making a difference. So the first area I’ll talk about is cancer. Now, cancer is caused by changes to your DNA, so that is the cause of cancers. Therefore, it’s rational to try and build understanding of cancers by looking at the sequence and the sequence changes. And the reason to do this is actually today there are a number of cancer therapies that are prescribed on the basis of genetic changes that are particular to that cancer.
[00:40:25.370] – Sir Shankar Balasubramanian
And often there are changes that affect genes that code for proteins that can either make that protein overactive or to make that protein inactive. And either of those is a functional change that can alter the biochemistry of the cell. And so there are examples where a particular gene acquires a mutation that makes it overactive overactive to the point where it’s driving uncontrolled proliferation of the cell. So a drug or inhibitor that blocks that overactive protein can precisely address the cause that’s driving the cancer. So this sort of information can be very useful. The thing about cancers is, of course, we all have genomes that are unique. This means no two cancers are the same. Every cancer causing pattern of mutations occurs against a background genome that’s unique, therefore, the cancer is unique. So to address that, many brilliant scientists have been sequencing cancer genomes to build understanding, particularly about common patterns of mutations that may provide clues as to sort of classifying the mechanisms particular to that cancer type in order to address it therapeutically. And I would say there’s great progress being made in building understanding. And there are many anecdotal examples where sequencing a tumour in real time has enabled the clinician to make a decision about treatment.
[00:42:41.730] – Sir Shankar Balasubramanian
But what I would say is we’re still in the relatively early phases of this. And I think over the next 1020 years, evaluating what all of this information is telling us will help us see just how well we can address cancer therapies. I should say the information is also providing guidance as to what future therapies may be developed against what are the targets for future therapies. So there’s immense value in knowledge building. Now, the second area I’ll point out is rare diseases. And firstly, rare diseases are so called because they’re not common like other diseases. And very few cases can be detected globally. And many of them don’t have names, it’s not known what the cause is, and they’re therefore very difficult to diagnose and near impossible to treat. Now, the fact is one in 17 people have a rare disease because there are lots of different rare diseases. If you collect them together, you could actually classify them collectively as being relatively common. And most rare diseases are genetic in origin and they manifest very early in life as a paediatric development disorder. So what some pioneering researchers started doing when whole genome sequencing was possible and I’ll name one prominent pioneer is Stephen Kingsmore.
[00:44:54.350] – Sir Shankar Balasubramanian
He’s Irish, but based in the US, runs a paediatric hospital and it’s a concept called sequencing a trio. And what happens is mum and dad walk into a paediatric clinic with a very young child having disorder that’s very difficult to diagnose. They run many tests and often the tests rule out possibilities but don’t inform what is wrong. And so they sequence mum, dad and child. And the child’s genome is inherited from a combination of mum and dad, of course. And what they can do is very quickly evaluate genetic changes that are unique to the child and then mine that information to see if it provides clues as to what the disorder may be. Now, Kingsmore is in the Guinness Book of Records for doing this very speedily, and I think the very latest is something like 16 hours from taking blood sample to sequencing to interpreting the sequence information and making a clinical diagnosis. This was published, I think, in the New England Journal. Now, in this case, it was a mutation in a protein involved in shuttling vitamins in and out. It was a nutritional disorder which could be corrected by diet. So there are now many anecdotal cases of rare diseases where a diagnosis is made.
[00:46:51.250] – Sir Shankar Balasubramanian
This is the cause of the problem from genome sequencing, and in some of those cases it’s treatable, in other cases it’s not yet treatable, but knowing the cause is the first step towards developing a treatment. So I think this is a hugely important area and also, I would say, an area that historically has been neglected by pharma companies because it’s just very difficult and there aren’t many people. So now it’s being tackled using this approach. Now, the third area is infectious diseases and pathogens. Very topical, of course. Now, in January 2020, this is when in China, they sequenced the pathogen that was a causative agent for what we, we now call COVID-19, and they, they actually sequenced it from a sample. This is one approach people use to sequence pathogens, is you can take a sample that has lots and lots of microorganisms, human cells, et cetera, and you isolate all the DNA and you sequence all of it. And then using a process termed metagenomics, you can computationally sift through that and assemble the genome of the pathogen. And that’s how they sequenced SARS Cove Two and actually recognised it as the causative agent.
[00:48:47.610] – Sir Shankar Balasubramanian
And of course, today, and there’s a lot of this going on in many countries, certainly in Japan, the UK, but many countries, a proportion of people who test positive for COVID-19 are having the pathogen sequenced. It’s being done at scale. There’s actually a website that logs all the genomes that have been sequenced and it’s over 4 million now from hundreds of countries around the world. Now, this is providing a way to track the emergence of variants. And it’s the variants, of course, that are. Prolonging this pandemic, and at the moment, it’s the delta variant, which has increased transmissibility. So sequencing pathogens at scale across the world and then pooling the data and sharing the information is providing a very graphic and detailed readout of what’s going on. What are the emerging variants? What are the variants that are actually dominating? Where are they geographically? So I think this is helpful as a readout as to what’s going on. It is, of course, also providing information that’s helpful for the design of vaccines, opportunity to really pay tribute to everyone who’s been involved in the foundational science and development and evolution of vaccines. And, of course, the vaccines are evolving, and they need to evolve as the virus evolves.
[00:50:42.040] – Sir Shankar Balasubramanian
And so genome sequencing is helping provide the information to guide the next generation of vaccines and boosters and so forth. Okay, I’m going to just briefly say a little bit about the other two areas, and then I will stop after that. I’m going to talk about modified bases, actually, because the double helix has this Watson Crit code in the middle. That is the genetic code. It’s the primary code source of information. But there are two grooves in the double helix, and the wide one is called the major groove. And when DNA is a double helix, machines in living systems called proteins, they actually read a code that rests largely in the major groove. It’s another dimension of information, if you like, and there are chemical changes that occur naturally. Some people call these epigenetic changes. They call it epig because the primary genetic code is constant. It’s fixed. This is a dynamic code, and it can alter the chemistry that projects into the major grooves. And so it can influence the way DNA is being interpreted or read by machinery. You can think of it as the genome tells you about the genes that encode for proteins.
[00:52:28.650] – Sir Shankar Balasubramanian
These are the components, and this is a sort of hardware of living systems. But this dynamic, chemically changing information layer that we call epigenetics is like software. And you can have the same hardware, but if you run different software, you end up with something completely different. And this is why, when a single cell with one genome differentiates into all the different cell types that make up an organism, the genomes are identical. The hardware is the same, but depending on whether it’s skin or brain or heart or lung, the software is different. So this is the epigenome. And in human DNA, besides GCT and A, there are at least four chemically dynamic versions of the letter C. So we’re studying these additional dynamic letters to understand what they’re doing. And part of this has involved developing chemistry to sequence these additional letters. In fact, I started a company called Cambridge Epigenetics a few years ago, and they have a product coming out soon that will sequence not four letters, but six letters of DNA. Because I think it’s going to be useful when we sequence genomes and sequence other organisms to get the genetic information, but also access what the software is doing at the same time.
[00:54:33.990] – Sir Shankar Balasubramanian
Now, the most common one of these variants is called methyl C. It just has a chemical group called a methyl group sticking out in the major groove. And there are already applications of this in cancer testing. And there’s a remarkable phenomenon whereby, well, firstly, cells in a developing foetus, in a pregnant woman, a lot of the placental DNA has DNA from the foetus and it leaks out into the mother’s blood. So Dennis Lowe from Hong Kong discovered this many years ago and it’s the basis for non invasive prenatal testing. You draw blood from the mother, you sequence the DNA, you get information about the foetus without having to do an invasive test that has risks for pregnancy. Well, it turns out that cancer cells, they die, they necrose and they birth and they spill their contents, and part of their contents is their DNA. And now you have DNA that doesn’t have a cell around it and it fragments and it floats around in the blood. Eventually it gets destroyed. But it turns out that you can draw blood. And if you have the capacity to sequence all the DNA in your blood, draw very deeply in someone who has cancer, you can pick up patterns of mutations and epigenetic patterns that indicate that there is a tumour just from a blood test.
[00:56:38.350] – Sir Shankar Balasubramanian
And the epigenetic patterns can also tell you which tissue the DNA originated from. So now there are a number of mainly companies developing such tests. And in fact, there is a trial in the UK, in the NHS, being conducted in association with a company called Grail to asymptomatically test 140,000 people with a blood test to see if they can detect cancer. The aspiration is to detect much earlier than today’s methods detect, because actually, with all diseases, early detection, earlier detection is much more likely to lead to prevention and disease management. So this whole area, I think, is fundamentally important and interesting and has much more mileage in terms of future discoveries. The third area I work on is four stranded structures called g quadriplexes. Very briefly. What I’ll say is it was discovered many years ago that stretches of DNA or RNA that have lots of the g base arranged in a particular pattern. If you put them into water in a test tube with a little bit of salt, they spontaneously fold up into a structure that has four strands, not two, rather like a knot. Because this G letter G has an unusual capacity, Watson crick base pairing has g recognises C and T recognises A.
[00:58:42.550] – Sir Shankar Balasubramanian
But if you have g’s in a certain arrangement, you can have four g’s that recognise each other to form a sort of a square arrangement called a g tetrad. And if you have lots of g’s, these tetrads can stack to form a well defined quadruple helix structure. So for many years, they were thought to be a very curious structural feature in a test tube. But what we and others in the field as well have shown is these structures can actually form in human DNA, in cells and in tissues. And we found that they form in regions of the genome that control genes. So it’s emerging that they may represent a previously unknown way of controlling genes through these structures. We don’t yet know exactly how or why, but we’ve made observations that suggest that. Now, one of the potentially useful directions that we may take this is we found that many of the early anticancer drugs, which are also toxic, some of which are still used today, they basically work by attacking the DNA. And because cancer cells divide more rapidly than non cancer cells, it has a particularly strong effect on cancer cells. But there are toxic side effects.
[01:00:31.530] – Sir Shankar Balasubramanian
Some of us in the field have shown that if you target these four stranded structures with small drug like molecules, they have been shown to be quite interesting when you test them on cancer cells and cancer cell models. And so there’s some potential here for really quite a radically new way of thinking about drugging tumours. There has been one clinical trial in Canada, actually, with a molecule that acts in this way, and this was against breast cancers with a particular genetic mutation. We also found that if you target these structures in cancer cells that have particular mutations that are common to some cancers, that they work extremely well. So still more work to be done on these four stranded structures. But that’s where we are. I thought I would just end some long range thoughts on the future of where DNA based research is going, because we are going through a transformation in terms of understanding DNA and manipulating DNA. We really are. I think of this in sort of three layers. So, one layer is DNA is an information molecule. And there are three things you need to do with information. You need to be able to write it, you need to be able to read it, and if possible, you need to be able to manipulate it.
[01:02:32.930] – Sir Shankar Balasubramanian
So, been talking a lot about reading DNA, and you read DNA by sequencing, and we can now do it 10 million times faster and cheaper than we used to be able to. So that’s been useful, I should say. This technology and indeed other competing technologies are going to make it faster and cheaper. This is going to continue. But at the moment, reading is not the bottleneck. Writing is to do with how we synthesise DNA. There’s a huge amount of effort going into improving ways of making DNA, making extremely long DNA indeed. Can we make genes and genomes synthetically? So it’s lagging somewhat behind reading. Reading is cheaper and faster than writing still. But the writing is is going to catch up. So there’ll be a time when we can write the code quite rapidly, quickly and at scale. Now, one of the corollaries of reading and writing that’s non biological is DNA is actually, in terms of volume, it’s the most volume efficient way of storing data. People have demonstrated that you can store books, films, any information that can be digitised can be stored in a DNA code. There is one view that for long term storage of information, why don’t we store it in DNA?
[01:04:42.520] – Sir Shankar Balasubramanian
There’s even an estimate that in a box, a cardboard box type volume, you could store the information of everything that’s been recorded in history, including the immense amount that’s been recorded in the last few years. And you don’t require any energy to store it, you just retrieve it by sequencing when you need it. So it’s a view for that to be practical, writing and reading need to be matched in terms of speed and cost. Now, the third area, many people would call it genome editing, and a number of methods have improved the capacity to edit genomes and the system using CRISPR and CAS Nine, which is now quite famous in science, certainly a breakthrough in that field. So it is now possible to go into genomes and make very controlled changes. And this is, of course, it’s being piloted for useful applications. So these three areas are undergoing huge transformative changes technologically. And so I think in the future we will see, I think, transformation in terms of the application of combinations of these areas to how we practise medicine. I think it will also affect biotechnology in agriculture, for example. It may even help provide some input towards the challenges we face with our ecosystem as the climate evolves quicker than we know how to manage adaptation.
[01:07:01.540] – Sir Shankar Balasubramanian
And also, there may be some scope for DNA to play a role in more generally how we record, manipulate and interpret information. So that’s the future. I’ve gone over the time I said I would, Gerard, so I would stop there.
[01:07:23.100] – Gerhard Fasol
Thank you so very much. This was a fantastic overview of such important work you have done. And I forgot to mention at the beginning that I can’t count the prizes which you have won recently, and I’m sure there are many more prizes to come which recognise your work. Now, I had two questions. One you have already answered, which is the storage of and rewrite of information. Storage of information. The second question I have, which is a puzzle which I’ve always I never understood, is it’s often said that 98% of the genetic material is not understood of the DNA, is not understood the purpose. Is that correct or what is this not understood genetic material doing?
[01:08:23.900] – Sir Shankar Balasubramanian
Very good question. First, let’s break down what makes up a genome. So a gene is a unit of the genome that comprises coding information that directly codes for the amino acid building blocks that make up a protein. So these are protein coding genes. And this is a sort of classical view of genetics that DNA encodes for genes and the genes encode for proteins. And the proteins are the building blocks of biological systems and also the catalysts and the machinery that make things happen. That explains about one and a half percent of the genome. Which raises the question what about the other 98%? And there’s an evolution of ideas and I would also say not everyone is in agreement yet about the other 98%. So an early view is that this part of the genome that does not code for proteins, which some people call non coding DNA is just junk that’s accumulated over the course of evolution. Actually it would be more efficient if we didn’t have that junk. And an organism that Sydney Brenner used to work on, a nematode worm called Cligons has similar number of genes to a human genome. But the genome is tiny, much more efficient.
[01:10:31.800] – Sir Shankar Balasubramanian
And so one view was actually you can operate with a small genome but actually human beings are not worms. We’re more sophisticated than worms. So there is actually evidence. Now DNA gets converted to protein information by an intermediate stage where the DNA is transcribed into RNA which also has four letters. And ultimately it’s that RNA that uses a machine called a ribosome to make proteins.
[01:11:12.500] – Gerhard Fasol
We had Venki previously at one of our discussions here.
[01:11:18.030] – Sir Shankar Balasubramanian
So you will have heard all about the ribosome in great detail from Venki. Now, it turns out that of this remaining 98% most of it is active in the sense that it gets converted to RNA. So it is not dormant. There are things going on. And some of these pieces of RNA, people now call them long non coding RNA. They have been shown to have function. They recruit proteins, machines and they locate themselves to parts of the genome and they control genes. So the human genome as it’s evolved today this 98% is not expendable. You delete it at your peril even though you don’t know what it’s doing. Because there is evidence that some of it is involved in doing some finer regulation, some control. We don’t fully understand that, but it is doing something. I think one should regard this non coding part of the genome or some people call it the dark matter of the genome. Just because we don’t understand yet what it’s doing doesn’t mean it’s unimportant. And certainly if you delete parts of that genome you can change the characteristics of the cell and the organism and you can have severe defects in development.
[01:13:12.620] – Sir Shankar Balasubramanian
So I would argue that we should hold fire on judging whether it’s a waste or not. The other aspect, I suppose more philosophical is evolution is not intelligent design. Things happen and unless they bring a disadvantage to the survival and reproduction of the organism, they stick around. And if they stick around, nature often finds some use for it and remodels around it. So in that sense, our genomes are the product of many, many years of changes that have been inherited through evolution. And eventually a lot of these bits find a use and the whole system sort of remodels around these new entities and it starts using them. If you then remove it, there can be a consequence. I’m not a biologist and I personally struggle with the complexity of biology and cannot make logical sense of most of it. But what I accept is that it’s fuzzy and it’s complex and just because you don’t understand things, that doesn’t mean they’re not doing anything, particularly if it’s in DNA and it’s been there a while, it’s probably important, even though we don’t understand why yet.
[01:15:11.840] – Gerhard Fasol
Thank you for very detailed answer. Now, what I was thinking is next, Professor Yamamoto, if you want to have questions or discussion or comments, I think that would be wonderful. What do you think, Shankar?
[01:15:29.560] – Professor Yamamoto
First of all, thank you very much, professor Brahmanian. Professor, it’s a great talk and great story and four of us here listening and all admire and really a great talk. Thank you very much. I have two questions or two point of discussion and let me ask. All right, the first one is, as you say, probably for the clinical purpose of sequencing, the cancer and the layer interactable, disease and infection, those three are important, but we are thinking equally important. One more important target of the sequencing for general sequencing is the personal health care and risk assessment. So we really need to work on the risk assessment of the general people. And for this purpose we created the general cohort and biobank in Japan. And in Japan, I’m sorry, we have created 150,000 of the people, 150 general people we’ve recruited. And we are planning to sickness 100,000 of these people. And those people are not recruited in the hospital, in the general people, so are the healthy people. But these healthy people sequencing will give us a great information. One is a disease, I mean the drug development target, which we usually call the loss of function mutation, the suppressor mutation.
[01:17:46.010] – Professor Yamamoto
And this is important. But we also can create the very good strong basis of the risk assessment, disease risk assessment, especially for those disease polygenic and lifestyle associated disease or the common disease. We really need a four genome sequence, information and microarray technology for the risk assessment. So we’ve been working on this. Let me propose fourth important point, in addition to your three critical important sequencing target is the general people’s target, general people’s original sequencing. This is a first and we do we are determining the 100,000 Japanese sequence right now by using the Illumina technology. So Illumina, the Nobase sequence and we are almost halfway of the Practise compared to the UK UK generals or the autobahn US generals. Our size is relatively tiny, but we have a strong cohort background. So our sample may be a good one. Let me tell you one more thing. The second point is you are six letter sequencing. That’s great and interesting. And behold, you are six letter sequencing. Let me introduce my colleague here. So he hasoka our professor Fumiki Katsoka, and he’s designing and helping, I mean organising the 100,000 for general sequencing project of Japan.
[01:19:52.440] – Professor Yamamoto
And he is a K Kotaguchi professor and she’s been working on the RNA sequencing analysis of the peripheral brat for the phenomena in the hospital. So for the phenomena centre is very important that this is a pioneer. And here is the Akihito Otsky. And he’s been working on the Oxford nanopore sequencing. And we have a special design of the long read nanopore sequencing, which I’m going to tell you okay, so your 6th rate sequencing is very interesting and important, especially the use of the code Abraham. Presenter. And also I’ve been thinking code Abraham is very interesting in your 6th rate sequencing. And for that purpose we’ve been interested in the epigenetic quite a lot. So we created Bus and three generation cohort, which is we recruited a pregnant mother and we’ve recruited 23,000 pregnant mother and then baby will come hosher and almost half of the husband, our partner and grandma grandpa grandma grandpa. So those Bus and three generation cohort is very powerful and inferential indeed UK, MRC Medical Research Council UK challenged this, but they couldn’t do that, so they failed. And in USA, NIH also tried the Bus and three generation cohort which didn’t work.
[01:21:49.950] – Professor Yamamoto
So we are the only one large scale Bus and regeneration cohort in the world and we’ve corrected the code abroad. So the code abroad and we’ve been chasing hollowing up, chasing the babies. So after the Bus and five years old and now around 780 years old, so the epision of modification will occur after Bus, but we have the code breath. So no maturation. No maturation and afterward and general sequence will tell how your six retired sequencing will be powerful regeneration cohort, really. My second question is what do you think about the use of our cohort data for your 6th rate of sequencing? That’s my question. But let me finish up this by introducing Akihito’s Practise. So we have VAS and Tree generation which says grandma grandpa father or grandma grandpa mother, those Torios and we can use the Mandarian inheritance verification or Ella. So the use of the trio and with sequenced 111 1111 trios by Oxford Nanopore wrong read sequence which gives us the information of the structure variant and we are now creating structure variant difference panel and it gives us ample fruitful information. And we just finished writing a paper and we publish anyways at Saturn timing will publish.
[01:24:00.060] – Professor Yamamoto
The structure variant is very interesting, the ethnic difference of the structure variant whatsoever. So the wrong read sequence sequencing, you’ve developed this type of the next generation sequencer sequencing and we’ve been a faithful students of your invention, but those are very interesting. So my first question is, what do you think about the risk assessment, especially polygenic risk score? That’s my first question. Second question is the use of six letter sequencing for our person tree generation cohort, especially code abroad to the charity development. And that’s my second question.
[01:24:52.860] – Sir Shankar Balasubramanian
Thank you. Very interesting and important points and projects. So I think on the first one, there are a number of biobank projects around the world and I should say the technology is there to do anything you want. So now, actually, the value has shifted to the samples and the characterisation of the samples. I think this is now the key. You can sequence anything. So having collections that are well characterised with longitudinal data, follow up data, all curated accurately, I think is hugely important for these long. And these are long term projects, because you have to literally follow the history of individuals from birth all the way through to the end of their life, tracking their medical history and seeing what we can learn about the information that was there in the DNA at the beginning. Also the history of the environment that people are exposed to, including medicines, but not only medicines. So I think to understand the genetic basis of who we are, and this includes all types of predispositions, is arguably the ultimate fundamental question. The only way to address this is first, it has to be at scale. Of course, you can’t do this on small cohorts, it needs to be powered with scale.
[01:27:00.400] – Sir Shankar Balasubramanian
And these projects also need to be set up in a way that’s guaranteed to have continuity on a time scale of 50 to 100 years, I would say. So there has to be a commitment to the project to see it through. And I think that there aren’t too many projects in the world that probably have all of these parameters where they need to be. So my sort of short answer is that this is hugely important, hugely important long term project. And I think part of the challenges are the collection of people that you choose at the beginning, collecting continued information about all of these people through their entire lives, and then bringing together all of that information with other molecular measurable information, such as genome sequence. And of course, there are many other omics that one needs to bring together and then one will have a very complex data set and I think the AI machine learning people will this problem is made for those developments. So very excited about this project, even though I may not be alive when all the interesting outcomes are there for us to learn from, because it’s long.
[01:28:42.500] – Professor Yamamoto
Term, professor, it’s a very important three generation cohort. The baby will become sadi, then the father mother will become 60, then grandma grandpa will become 90. So we can cover the one lifespan by 130 years.
[01:29:09.420] – Sir Shankar Balasubramanian
That’s a key approach that’s an excellent approach. And then you can of course, over the subsequent 30 years, you can test your predictions yeah. As the other generations. It’s a very clever design and I look forward to learning about what comes out of this study. We should follow up and next time I’m in Japan would love to come visit your institute with the six letter thing and the Cordoba samples. Absolutely. I think you should measure Epigenetics in these cases.
[01:30:04.400] – Professor Yamamoto
Thank you very much, professor.
[01:30:07.440] – Sir Shankar Balasubramanian
[01:30:08.220] – Gerhard Fasol
Fantastic. I think at the beginning, professor Yamamoto, you asked one more question, which you didn’t ask now is for this project, how to make it cheaper, which I think, Shankar, you already addressed also by explaining how you follow really, the Moore’s curve, the semiconductor curve.
[01:30:31.720] – Professor Yamamoto
Professor Faso, my first question was already answered by Difficult Brahmanian. Professor. Can I call you Shankar? Please call me Massey.
[01:30:55.680] – Sir Shankar Balasubramanian
[01:30:56.820] – Gerhard Fasol
Okay. Shankar, we have already taken much longer time than you had initially planned for us. So this was absolutely fantastic. And now I understand very well why many committees decide to award you these very important prizes. And I hope you will get many more. And especially also we have seen now with Yamamoto and say how important the societal impact is on our human life of your work.
[01:31:34.940] – Sir Shankar Balasubramanian
Well, thank you very much and thank you for your questions and discussion. It’s been a pleasure. Look forward to visiting Japan sometime soon.
[01:31:47.840] – Gerhard Fasol
Yes. I hope you join us also for one of our Trinity events and we can make a bigger event for you.
[01:31:56.240] – Sir Shankar Balasubramanian
That would be wonderful. I will keep in touch with you as my travel plans evolve.
[01:32:03.120] – Gerhard Fasol
Of course. Thank you so very much. Thank you.
[01:32:06.340] – Sir Shankar Balasubramanian
Thank you. Take care.
[01:32:07.820] – Gerhard Fasol
[01:32:08.470] – Sir Shankar Balasubramanian
Bye bye bye.
[01:32:12.600] – Professor Yamamoto
[01:32:13.480] – Martin Morris