Discover more from GWAS Stories
2022 roundup of human genetics
A brief look at the most exciting stories of 2022
As the year 2022 comes to an end, I look back at all things that I tweeted over the past 12 months. So much has happened. So many exciting papers and preprints have come out in the field of human genetics (or genomics, if you like). Scientists have got better at reading, editing and even writing genomes. Now we know the sequences of A, T, G and Cs in the regions of the human genome that were once invisible through the eyes of sequencing machines. Biobanks have started growing all across the world. As a result, the sample size in genome-wide association studies (GWAS) has continued to become bigger and bigger. The sample size of the largest GWAS to date has crossed 5 million humans. Representation of non-European ancestries in human genetic studies has been increasing at a faster pace. Industry-academic collaboration is now the rule rather than the exception. The phrases ‘human genetics’ and ‘drug discovery’ are co-occurring in the scientific literature more than ever before. Overall, it has been an amazing year for the human genetics field.
I have selected a list of papers and preprints published in 2022 to highlight in this post. It goes without saying that the list does not cover all the human genetic advancements of 2022, and it merely represents my personal liking. I might have left out some important papers either because I haven’t read or heard about them, or because I do not have the domain expertise to appreciate their importance. What I consider mind-blowing might be seen as merely interesting by someone else. Despite these biases, I hope you find this post informative.
I have sorted the papers under various themes. I have written Twitter threads for some of these papers and have provided links to the same, in case you’d like to read a more detailed summary of the paper.
Natural selection at a speed never seen before
We have heard many stories of natural selection in the past how certain genetic variants that protect humans against certain diseases rose to high frequency in certain human populations. But nothing like the one told by Klunk et al. in their paper published just a few weeks ago in Nature that blew away the minds of the entire scientific community, which you can appreciate from the overflowing excitement in the tweet posted by my colleague who loves both genetics and history (He’s not involved in the study).
Studying more than 200 ancient genomes of humans who lived in Europe either before, during or after the bubonic plague epidemic (popularly known as the black death), one of the deadliest disease outbreaks that humankind has ever faced, scientists have uncovered regions of the human genome that underwent positive selection at unimaginable speed. The strongest signal was found in chromosome 5 near a splicing variant in ERAP2 that encodes an aminopeptidase involved in the antigen presentation process. The allele frequency of this variant has changed from ~40% to ~70% just in a matter of few years (1346-1352) during which the pandemic slaughtered nearly half of the European population.
Hypermutated human genomes
As humans procreate, new mutations enter the gene pool generation after generation—a fundamental process that drives human evolution. On average 6o to 70 de novo mutations occur per genome per generation, and any variation in this mutation rate is mainly due to variations in parental age, especially fathers’. However, once in a blue moon some humans are born with an abnormally high number of de novo mutations. With rapidly increasing large-scale sequencing efforts, scientists are finally able to get a glimpse of what causes such hypermutation events. In a study published in Nature early this year, Kaplanis et al. report an analysis of ~20k trios where they identified 12 individuals with a hypermutated genome. Digging into the genome sequences and hospital records of the parents of the 12 outliers, the authors were able to identify the causes of the germline mutation spillovers. In two families, the father had a rare mutation that inactivated DNA repair genes leading to error-prone germline replication. In five families, the father underwent chemotherapy treatment just before the offspring were conceived.
A rare cause of monogenic obesity in mice and humans
Agouti gene (Asip) was the first monogenic cause of obesity discovered in mice in 1992. Agouti mouse is a fat yellow mouse with a spontaneous structural mutation that reprograms a skin-specific gene Asip to express throughout the body, including the hypothalamic neurons where the protein binds and antagonizes Mc4r leading to uncontrollable appetite and obesity. More than thirty years later since its discovery, scientists have encountered the human counterpart of agouti mice: a girl with red hair, hyperphagia and extreme obesity with a structural mutation leading to the ubiquitous expression of ASIP. The findings were published just a few days ago in Nature Metabolism by Kempf et al. Fortunately, there is an FDA-approved medicine, setmelanotide, an MC4R agonist, which might be able to treat this girl and three others with the same mutation that the authors identified by screening a childhood obesity cohort. This is a great example of a structural mutation disrupting gene regulation leading to a monogenic disease and adds ASIP to the list of genes in which mutations lead to impressively similar consequences in mice and humans.
Non-coding mutations that awaken a repressed gene in pancreatic beta cells causing congenital hyperinsulinism
Hexokinase and glucokinase are isoenzymes that catalyze the first step of glycolysis and act as glucose sensors controlling the influx of glucose into the cells. These two enzymes have evolved to serve different purposes. Hexokinase, expressed ubiquitously throughout the body except in liver and pancreatic beta cells, has a high glucose affinity, thereby it ensures constant glucose supply to the cells. But glucokinase, expressed only in liver and pancreatic beta cells, has low glucose affinity, thereby it ensures glucose enters the pancreatic beta cells and liver cells only when blood glucose levels are high leading to insulin secretion and glycogen synthesis respectively. Although it’s been known for a long time that hexokinase is repressed in pancreatic beta cells, scientists had no idea how evolution has programmed this repression. And now scientists have found the first clues in the genomes of patients with a rare disease called congenital hyperinsulinism. In a paper published this year1 in Nature Genetics, Wakeling et al report the discovery of a series of non-coding de novo mutations within a 42bp conserved intronic region of HK1 that awaken the hexokinase from its eternal sleep in the pancreatic beta cells leading to uncontrolled insulin secretion. This is a rare example of non-coding rare variants causing monogenic disease. We will see more such examples in the upcoming years as whole genome sequencing is now gaining momentum. Also, this is an exceptional example of allelic heterogeneity where regulatory and coding mutations in the same gene (HK1) lead to wildly different consequences.
The tale of twins who shared more than an identical genome
Myelofibrosis is a type of blood cancer in which proliferating cancer cells gradually replace the entire bone marrow with nothing but scar thereby bringing blood cell production in the body to a complete halt. It typically starts with a somatic mutation in a blood stem cell that slowly expands and replaces all the normal blood cells with clones of its own. In a paper published in Nature Medicine, Sousos et al. describe an extraordinary case of a pair of monozygotic twins who shared not just an identical genome but also an identical set of somatic mutations in the blood cells. One of those mutations is a cancer driver mutation that went through a remarkably similar life course resulting in myelofibrosis in both twins almost at the same age. Lineage tracing placed the timing of the mutation all the way back to when both the twins were still inside the uterus raising the possibility that the mutation occurred in one of the twins and spread to the other via transplacental transmission highly likely. This study in a way represents Nature’s clinical trial (n=2) evaluating the fate of a somatic blood cell mutation when the timing of the mutation and genetic background were held constant.
A saturated GWAS of height in 5.4 million individuals—the beginning of the end
Sir Peter Donnelly, the current CEO of Genomics Plc, who served as the director of Wellcome Trust Centre for Human Genetics at Oxford from 2007 to 2018, posted a tweet on June 7th of this year.
The tweet took many pioneers in the GWAS field down a memory lane back to June 7th, 2008 which marked the beginning of the GWAS era with the publication of the first large-scale GWAS of seven common diseases2. Since then the GWAS train has been running non-stop with the end nowhere to be seen. The sample sizes grew year by year. So did the number of genome-wide hits for many traits and diseases. While everyone enjoyed the ride, perhaps very few had any idea where the train is headed. A paper published this year in Nature offers a glimpse of what might be the end of this great GWAS journey.
Using a sample size of 5.4 million individuals, scientists from the GIANT consortium have finally reached one of the endpoints of the GWAS race—a fully saturated GWAS. That is, at a sample size of 5.4 million, the scientists have found every common variant associations with the height they can find in European ancestries.
So, is that the final stop? Definitely not. It’s just the first of many to come. The next stop would be the point where a polygenic risk score for height performs equally well in all human populations.
(Link to Twitter thread)
From exomes to genomes—whole genome sequencing of the UK Biobank
Both 2021 and 2022 have seen remarkable progress in human genetics. Thanks to UK Biobank which has shown the world again and again what it takes to advance science to unimaginable limits: a singular genetic resource that is easily accessible to every scientist across the globe.
One of the biggest stories of 2021 was the completion of exome sequencing of nearly half a million individuals from the UK Biobank. While the ink on the Nature paper from Regeneron scientists describing the UK Biobank exomes was still wet, deCODE scientists posted a preprint in Nov 2021 announcing the completion of whole genome sequencing of the first 150,000 UK Biobank participants. Eight months later the final version was published in Nature becoming one of the biggest stories of 2022.
In the paper, the deCODE scientists take the readers on a journey through the dark side of the human genome—the noncoding regions that were once considered junk—shedding light on the regulatory variants, structural variants, repetitive regions, footprints of natural selection and evolutionary traces of the ancestors of the UK Biobank who span almost the entire globe.
From transcriptomes to proteomes—the UK Biobank-Pharma Proteomics Project
On Oct 7th, 2010, the National Institute of Health (NIH) launched the Genotype Tissue Expression (GTEx) project with the goal of understanding how genetic variants affect gene expression levels in the major tissues of the human body. The project was a huge success and today, the GTEx website holds the most comprehensive catalogue of genetic effects of gene expression in more than 50 human tissues.
The central dogma of biology states that genetic information flows from DNA → RNA → protein. But until recently, there have been no large-scale initiatives like GTEx to catalogue the genetic effects of protein expression in humans. Thanks to recent advancements in proteomics technologies such as SomaScan and Olink, the proteomics era has finally begun. On Dec 7th, 2020 (a decade after NIH’s GTEx announcement), UK Biobank announced the launch of the UK Biobank-Pharma Proteomics Project (UKB-PPP) to create the world’s largest proteomics dataset comprising ~1500 blood proteins measured in ~50,000 UK Biobank participants. Two major papers describing an initial analysis of this dataset were preprinted this year—one describing the effects of common variants on protein expression and the other describing rare variants—marking a major milestone in the progress towards understanding the genetic effects of protein expression.
On Jan 25th, 2018, deCODE scientists published a paper in Science introducing human geneticists to a new concept called “genetic nurture” that made many of the readers’ heads spin. Carl Zimmer published a story about the paper in New York Times with the title “You Are Shaped by the Genes You Inherit. And Maybe by Those You Don’t.”
In the paper, Kong et al demonstrated for the first time in humans a genetic phenomenon where even the genetic variations in the parents’ genomes that are not passed down to the children influence the children’s behavioral traits such as educational attainment. Such indirect genetic effects flow from parents to children not through the genes the parents pass to their children but through the environment the parents create for their children (i.e. the nurture) which by itself is influenced by the parental genetic makeup. In a nutshell, the parents’ genomes, the children’s genomes and the family environment are all entangled in a complex web of correlations.
In the subsequent years, Kong et al’s work catalyzed new lines of investigations in the field of behavioral genetics: statistical ways to disentangle direct from indirect genetic effects, and how the dilution of direct genetic effects with indirect genetic effects distorts our understanding of GWAS results.
In the hindsight, it was clear that the only way forward to measure direct genetic effects in GWAS is to employ a within-family design. That is, instead of correlating genetic differences with phenotypic differences in unrelated individuals from the general population (as done in a typical GWAS), correlate genetic differences between siblings within a family with their phenotypic differences, thereby essentially removing any confounding by family environment or indirect genetic effects. The publication of the first within-family GWAS based on 180,000 siblings this year in Nature Genetics by Howe et al. is therefore a major milestone in the field of behavioral genetics.
A step in the right direction
The Mexican city prospective study
There is a famous saying: ‘From humble beginnings come great things’. In 1998, a team of nurses started visiting every household in the Coyoacán and Iztapalapa districts of Mexico as part of a health project, funded by the Mexican Ministry of Health, that will study the causes of high mortality observed in the Mexicans. The staff obtained informed consent from every household, collected blood samples, conducted a structured interview collecting information about socioeconomic and health statuses and measured blood pressure, pulse rate, height, weight etc. and recorded the details in a hand-held device. The nurses did this every day for the next 6 years, successfully collecting health data for 150,000 individuals. And that’s how the Mexican City Prospective Study (MCPS) was born.
Thanks to the visionary scientists from Mexico and Oxford who sowed the seeds in 1998, even before the human genome project was completed. Now, 25 years later, the Mexican communities have started reaping the benefits by being part of the human genomics research that is transforming health care. Today, MCPS is one of the first largest non-European cohorts with whole exome (n=140,000) and genome (n=10,000) sequencing data. A preprint by Ziyatdinov et al. describing the genetic data of the MCPS cohort was published this year in June, and it will undoubtedly become one of the important human genetics papers of 2023.
Polygenic risk scores—looking beyond the continental ancestries
The polygenic risk score (PRS) has been one of the hottest topics this year. At the speed the field is moving, we might as well see in 2023 banners on the road screaming “PRS is coming soon to a clinic near you”. But still, we haven’t addressed fully the elephant in the room—poor PRS performance in non-European ancestries. Studies that compare PRS performance across ancestries typically focus only on continent-level or major subcontinent-level ancestries—Europeans, Africans, South Asians, East Asians and Latin Americans.
In a paper published this year in Nature Medicine, Kamiza et al highlight the pitfall of treating all African ancestries as a single group in the PRS studies. The authors show that a PRS for LDL cholesterol derived based on an African American training sample appeared to perform well overall in individuals from Sub-Saharan Africa (SSA). But when looking at the PRS performance in individuals from two different regions of Africa—Zulu and Uganda—separately, the PRS performed wildly differently. The PRS explained 8.14% in Zulu but a mere 0.026% in Ugandans.
Genetics of age of onset of T2D—South Asians vs Europeans
I still remember the day in 2019 when I received a call from my sister-in-law. She was concerned about a wound that my brother has been carrying in his leg for almost a month. My brother was 43 years old and overweight by then. On my advice, my sister-in-law took my brother to a local physician who diagnosed my brother with type 2 diabetes (T2D) and started him on medications. The news didn’t come as a shock to me given our strong family history.
I graduated from medical school in 2009 in India and worked as a resident doctor at various hospitals before starting my post-graduation training in 2011. The department where I worked the longest is general surgery. I still remember those days. There were always at least two patients in the ward with severe diabetic foot ulcers, and as the resident doctor, I had to perform wound debridement and dressing, a task that I dreaded the most.
T2D is very common in India. More importantly, the prevalence of the disease has been steadily and steeply rising. A survey found that the prevalence in my home state, Tamil Nadu, was 72% higher in 2000 compared to 1989. We are now in 2023. Despite these facts, the Indian population is heavily under-represented in GWAS. In a large multi-ancestry GWAS of T2D published this year, South Asians represented merely ~8% of the total sample.
A preprint published this year by Srinivasan et al. represents a big step in the right direction to address the under-representation of South Asians in global genetic studies. The authors report that the heritability of age of onset of T2D is more than 3 times higher in South Asians compared to Europeans. A PRS for age of onset of T2D derived based on South Asians performed worse in Europeans, suggesting that the genetic architecture of T2D disease onset and progression might be different in South Asians compared to Europeans.
Assortative mating—Japan vs Europe
When I was a boy my mother often said to me
Get married boy and see how happy you will be
I have looked all over, but no girlie can I find,
Who seems to be just like the little girl I have in mind,
I will have to look around until the right one I have found.
I want a girl, just like the girl that married dear old Dad,
She was a pearl and the only girl that Daddy ever had,
A good old fashioned girl with heart so true,
One who loves nobody else but you,
I want a girl, just like the girl that married dear old Dad.
-A popular song from 1911 by Harry Von Tilzer
A Twitter thread that I posted during this year’s American Society of Human Genetics (ASHG) went popular—both in good and bad ways. I wrote about a human genetic resource that Danish Saleheen, a physician-scientist, is building in Pakistan, leveraging the high level of consanguinity being practised in certain communities for hundreds of years.
Consanguinity (marrying within close relatives) and endogamy (marrying within a small community) are extreme forms of assortative mating where individuals choose their partners whose genomes are very similar to their own.
On the other end, there are milder forms of assortative mating that are omnipresent where individuals choose their partners with phenotypes (education, socioeconomic status, language, diet, religion etc.) very similar to their own. Since most such phenotypes are heritable, these individuals are in reality choosing partners whose genomes are similar to their own to some extent. And this leads to undesirable consequences in GWAS.
This year, the first large-scale genetic study of assortative mating in a non-European population (Japanese) was published in Nature Human Behavior. The study finds interesting differences in assortative mating between Europeans and Japanese many of which likely reflect cultural differences in marriage practices.
Drug target discoveries
In 1999, Catherin Boileau, a geneticist from the French National Institute of Health and Medical Research (Inserm), and her colleagues mapped a region in chromosome 1 that segregated in a large French family with autosomal dominant hypercholesterolemia (ADH). Little did they know that they have just scratched the surface of one of the scientific breakthroughs of the 21st century. After confirming the linkage in additional 12 families, Boileau and colleagues published their discovery—a newly found genetic locus for ADH—in American Journal of Human Genetics. The team went on to study these families further and zeroed in on the exact mutations that caused ADH: two gain-of-function missense variants in PCSK9, a gene that the authors predicted to play a critical role in cholesterol homeostasis. The findings were published in Nature Genetics in 2003.
Over the next couple of years, Helen Hobbs, Jonathan Cohen and colleagues from the University of Texas, followed the breadcrumbs from Boileau’s discoveries (and the mice studies that followed immediately) and landed on the breakthrough finding—individuals with loss of function mutations in PCSK9 have lower LDL cholesterol and lower risk of heart disease—which led to the development of PCSK9 inhibitors for the treatment of hypercholesterolemia that are in the market today.
The PCSK9 discovery birthed a new field of genetic research: protective genetics. Pharmaceutical industries started investing millions of dollars to sequence humans from all over the world to identify genes in which loss of function mutations confer protection again diseases. A bulk of such investments was made in exome sequencing the entire UK Biobank which was completed last year kickstarting the era of protective genetics. The first major discovery—GPR75 loss of function mutations and protection against obesity—was made in 2021 by Regeneron scientists.
Adding to this list of protective associations, multiple genes that are potential drug targets made their debut this year.
Loss of function mutations in CHRNB2 protect against smoking addiction (Rajagopal et al. medRxiv)
Loss of function mutations in MAP3K15 protect against type 2 diabetes (Nag et al. Sci Adv)
I was fortunate to have made one of the discoveries in the above list.
Progress in psychiatric genetics
On Oct 6th this year, Karuna Therapeutics made a big announcement: their investigation therapy KarXT (xanomeline-trospium) for the treatment of schizophrenia showed positive results in the phase-3 trial. The investigational drug preferentially targets M1 and M4 muscarinic receptors in the central nervous system.
Although I was excited about the announcement like many others in the Psychiatry field, I couldn’t resist the thoughts about the disconnect between genetics and the mechanisms of drugs being used or tested for the treatment of psychiatric disorders.
We may not have hit the PCSK9 of schizophrenia yet. But the schizophrenia genetics field has made tremendous progress this year. Two Nature papers published side by side this year paint a full picture of the genetic architecture of schizophrenia spanning the entire allele frequency spectrum.
At the beginning of the GWAS era, the scientists who studied psychiatric diseases had no idea of the length of the journey they are about to embark on. But very soon, after the first attempts of mapping the schizophrenia risk loci, they realized that there was a really long road ahead. Though early GWAS of schizophrenia did not reveal any significant loci, they did whisper (as Eric Lander puts it3) the secret to discovering the schizophrenia genes: sample size.
Diseases like schizophrenia that negatively influence reproductive fitness are under strong selective pressure. Any genetic variant that increases the disease risk even mildly will be purged from the gene pool. So, what will be left are the ones that have risk effects so small that they become invisible to the eyes of natural selection (common variants) and the ones that are so new that natural selection hasn’t had the time to purge them yet (rare variants). Both the extreme smallness of the effects of common variants and the extreme rareness of the rare variants make them impossible to identify without extremely large sample sizes.
Given the (genetic) constraints under which the psychiatric geneticists were operating, what they have achieved today in terms of understanding the common and rare variant architecture of schizophrenia is extraordinarily impressive.
It has been a big year for autism genetics. Scientists in the autism field have made great progress this year by publishing many exciting papers despite a major backlash they received last year from the autism community.
If there is one word that can best describe autism, it is “heterogeneity”. No human condition is as heterogenous as autism and that’s why the word “spectrum” has become a synonym for autism.
A major goal of studying the genetics of autism is to uncover the full genetic spectrum that lies underneath the phenotypic spectrum of autism. Scientists have been successful in the past in only uncovering the genetic factors—highly penetrant de novo mutations—that cluster at one end of the spectrum where lie the most severe autistic individuals who often present with intellectual disabilities and co-morbid developmental disorders. Thanks to SPARK (Simons Foundation Powering Autism Research), an online research initiative that has successfully recruited more than 35,000 individuals and families with autism through community engagement, now scientists were finally able to see more than just the extremes of the genetic spectrum of autism.
Some of the major autism genetics publications from this year:
Zhou et al. (Nature Genetics) report an analysis of the most recent SPARK data that identified not just de novo variants but also inherited variants influencing autism risk.
Fu et al. (Nature Genetics) and Wang et al. (PNAS) report a joint analysis of autism and developmental disorders uncovering more than 600 genes in which rare deleterious mutations substantially increase the risk of autism or developmental disorder or both.
Antaki et al. (Nature Genetics) report an analysis of the full genotypic spectrum—rare de novo and inherited variants and common variants—that contribute to autism risk in ~11,000 families
Warrier et al. (Nature Genetics) report on the common and rare variant contributions of autistic symptoms in ~13,000 autistic individuals.
Wigdor et al. (Cell Genomics) report a compelling genetic investigation of the “female-protective effect” against autism.
This post has grown longer than I pictured when I started writing it. And still, there are many stories that need to be told: progress in statistical methods, genetic investigations that read like a detective story, studies that pull out big insights from a small N like Newton Scamander's magical suitcase, new and fascinating phenotypes that are being GWASed and many more. But I’ll stop here and save the rest for another day. If you’re in the mood for more genetics, tune in to the episode of The Genetics Podcast where I discuss with Patrick Short (Co-Founder and CEO of Sano Genetics) some of the biggest human genetics stories of 2022 and things I am looking forward to in 2023.
Thank you for reading GWAS stories. If you enjoyed the post, recommend it to a friend. I wish you a very happy new year 2023!
This work was preprinted in 2021 and published in 2022. I highlighted this in one of my 2021 year end threads. Since I love this study so much, I chose this to highlight again in 2022.