Discovery of a major genetic risk factor for stroke in South Asians
Exome sequencing of 75k Pakistanis identifies a NOTCH3 genetic variant, accounting for 1-2% of strokes in South Asians
For today's post I want to revisit one of the old stories, a research work by my colleagues at Regeneron. I wrote about this on Twitter last year when the work was preprinted. I also mentioned it briefly on Substack in the 2023 round up and discussed it on The Genetics Podcast. But now that it is officially published, I thought I should write a short Substack post about it.
I like this work for many reasons. It hits many of my favorite themes:
Non-European populations-based discoveries
Genetic convergence between Mendelian and common diseases
Therapeutic implications of non-European genetic discoveries.
The discovery
My colleagues (Rodriguez-Flores et al.) did an exome-wide association study of stroke in around 75,000 individuals of South-Asian ancestry from the Pakistan Genomics Resource (PGR) and uncovered an important genetic risk factor of stroke among South Asians. A missense variant, p.Arg231Cys, in NOTCH3 that is 30 times enriched among South Asians (MAF=0.58%) compared to Europeans (MAF=0.019%) was found to increase the risk of stroke more than 3-fold.
CADASIL
The gene NOTCH3 is a known Mendelian gene for stroke. Pathogenic missense mutations in NOTCH3 cause an autosomal dominant stroke syndrome called CADASIL (Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy), discovered in the 1970s. The condition is characterized by early-onset recurrent stroke that gradually destroy the blood vessels in the brain, resulting in motor and sensory deficits, dementia and death, typically before the age of 60 years. It is through the genetic mapping of CADASIL, NOTCH3 was first cloned in humans.
In addition to its history, the genetics of CADASIL is also fascinating. NOTCH3 is a huge gene with 33 exons that code for 2321 amino acids. It encodes a transmembrane receptor expressed in the vascular smooth muscle cells. The receptor needs to be cleaved to release its intracellular domain from the plasma membrane, which then swims through the cytoplasm into the nucleus to activate the transcription of its target genes. The most important part of the protein, however, is its extracellular domain that interacts with the ligand and initiates the NOTCH3 signaling cascade, which involves cleavage by multiple enzymes (including gamma secretase, the same one that cleaves amyloid beta protein).
The extracellular portion of NOTCH3 is unique and made of six cysteine residues repeating again and again 34 times. So far, almost all the CADASIL mutations (including p.Arg231Cys discovered now in South Asians) were found within the extracellular domain, that either add or remove extra cysteines, both of which disrupt the NOTCH3 function. Any change in the cysteine residues in the extracellular domain triggers protein misfolding, resulting in aggregation, which ultimately damages the vessel walls through infarction and inflammation.
Convergence between Mendelian and common disease
The most interesting aspect of the current finding is it signals a convergence between a rare Mendelian disease, affecting a few thousands and a common disease, affecting more than 100 million individuals around the world.
One of the first things we do when reviewing GWAS results is we search for genes that are already known to be involved in the disease, particularly the ones that cause Mendelian diseases. For example, when you run a GWAS of LDL cholesterol, you'll see tall towers rising from near APOB, LDLR, PCSK9 and many other genes, all of which were known to be mutated in familial hypercholesterolemia. But such convergences rarely occur for brain-related conditions.
The most recent GWAS of stroke involved more than a million participants, and yet there was no signal near NOTCH3. But in the current work, only 75k individuals were involved and there was only one signal, which was near NOTCH3. That is one of the benefits of studying diverse populations: what cannot be found in more than a million individuals in one population can be found easily in few thousands in another.
Why is it important to find a convergence between rare, low frequency and common variants? Risk variants with different allele frequencies will often differ in their penetrance, which gives us an opportunity to study the phenotypic consequence of perturbing the gene at severe, moderate and mild levels. And this knowledge of a correlation between gene disruption and phenotypic change can be crucial for drug development.
Therapeutic relevance
The convergence between Mendelian and common disease also has therapeutic implications. CADASIL is a rare disease, impacting around 2 to 4 per 100,000 individuals. One of the challenges of developing drugs for rare diseases is the small target population. The new finding potentially expands the target population for a drug that targets NOTCH3. Stroke is a common disease, particularly in South Asian populations. The current work projects that p.Arg231Cys variant alone can explain 1-2% of the stroke cases in Pakistan and possibly in other South Asian countries, which will translate into hundreds of thousands of cases. So, now there is a good reason for companies to work on drugs targeting NOTCH3.
The one I explained above is one of the underappreciated uses of studying non-European populations. Diversity not only can help discover new risk genes, but it can also expand target populations for known risk genes, which is a major factor that influences investment decisions in the drug development field. APOL1 is a good example. The reasons why nearly 20 companies are currently working on drugs targeting APOL1 is there is a huge target population—hundreds of thousands of Africans and African Americans. Likewise, we may see companies going after NOTCH3 in the future.