It's been a few weeks since I posted on Substack. I was attending the World Congress of Psychiatric Genetics (WCPG) conference in Singapore. The first time I attended WCPG was in 2014 in Copenhagen, Denmark. It was a turning point in my career. I met a group of wonderful researchers from Aarhus University through a friend, and long story short: I quit my job in India and moved to Aarhus, Denmark, in September 2015 to embark on a full time research career. I spent next 5.5 years in Denmark and then moved to New York to work for Regeneron as an industry scientist.
I've been attending WCPG every year for the past 10 years. Time flies! With my current work profile, I am starting to feel distant from WCPG, scientifically speaking. To be honest, the reason for me to continue to attend this conference is to meet my former colleagues at Aarhus University, who are now my close friends. One thing that I envy the Danes most is their love for travel. They never miss an opportunity to travel, explore the world and enjoy life. This year, they planned a week long vacation in Bali, Indonesia. I decided to give myself a break and tag along with my friends. We spent a wonderful 7 days in different parts of Bali exploring its people, food, culture and importantly, its immense natural beauty.
As I've spent a little too much time away from work, and soon will be traveling to India, I've decided to skip the American Society of Human Genetics (ASHG) conference (the one that is most relevant to my line of work) this year. Expectedly, the FOMO is hitting me strongly. So, I decided to browse through the abstracts to learn about the most interesting talks scheduled for this year. During last year's ASHG, I wrote a long Twitter post, highlighting some of the talks that I found interesting. Many found my post useful. So, I decided to do the same this year for the benefit of readers, many of whom I am sure are now in Denver, Colorado, preparing themselves for five days of exciting science.
The range of topics typically presented at the ASHG is exhaustively wide, and it's near impossible to cover them all. So, I am restricting to the ones that are most relevant to my interests. I'll highlight the talks under various themes.
Exomes
As you may know, at Regeneron Genetics Center (RGC), we are mainly focussed on studying rare coding variants in the human genome and their phenotypic associations. RGC, along with six other pharma companies, funded the whole exome sequencing of all half a million participants of the UK Biobank. This data was made available to all researchers across the world, and it has been a few years since the data was released. We published an initial analysis of this dataset two years ago, and so did other industry teams. But I believe that there are much more discoveries and insights waiting to be found in the UK Biobank exomes.
So, one of the things that I specifically look for every year is new genetic discoveries made using UK biobank exomes. One such highlight last year was the discovery of new obesity risk gene BSN, encoding a synaptic protein basoon, with an effect size comparable to Mendelian obesity genes like MC4R. This work done by researchers at MRC Epidemiology Unit in University of Cambridge was published in Nature Genetics early this year. Unfortunately, as far as I can see, I didn't find any similar major discoveries driven by UK Biobank exomes being presented at this year's ASHG. If you find anything, please post in the comments.
Nevertheless, a few exome sequencing studies caught my interest.
On Friday, Frederick Satterstrom from Broad Institute in Cambridge, USA, is presenting a large exome-wide association study (ExWAS) of autism based on more than 60,000 cases and 170,000 controls, the largest to date. This work was also presented at WCPG by Jack Fu from Broad Institute. This work represents a big progress in autism genetics. The increased sample size has massively boosted the statistical power for gene discovery, tripling the number of genes linked to autism. Still the discovery is being driven primarily by de novo mutations, though we are starting to see more inherited autism risk genes as well.
There are two presentations based on an exome-wide association study of inflammatory bowel disease (IBD) by Mingrui Yu (Wednesday, oral presentation) and Ruifei Zhu (Wednesday, poster presentation) from Broad Institute based on nearly 40,000 cases and 65,000 controls. I am particularly interested in Yu's talk where the authors report that individuals who carry cystic fibrosis mutations are protected from IBD. The abstract reads "we found that delF508, the predominant CF-causing variant that accounts for 70% of all CFTR mutations observed in CF patients, has a significant protective effect against IBD (p=1.7E-10, beta=-0.30, se=0.048). This association was successfully replicated in the follow-up dataset (p=3.1E-05, beta=-0.16, se=0.038)." It's a fascinating finding, particularly when you learn about the possible biological mechanism the authors hint at in the abstract. "It is shown that CFTR serves as epithelial receptor for S. Typhi transluminal migration and that heterozygous deltaF508 mice translocated significantly fewer S. typhi into the gastrointestinal submucosa than wild-type CFTR mice. Therefore, it is plausible that the protective effect of CFTR in IBD may stem from similar interactions with yet unidentified bacteria."
On Tuesday's plenary session, Duncan Palmer from Big Data Institute in Oxford, UK, is presenting a large-scale rare variant association analysis from the Biobank Rare Variant analysis (BRaVa) consortium. It is a collaboration across 16 biobanks across the world that has harmonized rare variants identified in more than a million individuals. For more details, check out their website. There is also a poster on Thursday by Frederik Lassen from Oxford University on rare recessive effect association study based on an analysis of compound heterozygous variations in the BraVa dataset.
Genes and Health Study is a biobank dedicated to British individuals of South Asian origin. This is one of my favorite biobanks and I've highlighted in the past examples of rare human knockouts identified in the Genes and Health cohort:
a British-Pakistani woman who was found to be a knockout for HAO1, encoding the a liver enzyme, which is the RNAi target of Alnylam's lumasiran, now an FDA approved medicine for the treatment of a rare disease called primary hyperoxaluria type 1.
a British-Bangladeshi man who was found to be a knockout for MC3R, encoding melanocortin 3 receptor, whose biological role in childhood growth and puberty timing came to light only recently through an exome-wide association study of age at menarche in the UK Biobank. Thanks to the Genes and Health participant, who has helped scientists to understand the phenotypic consequence of complete loss of MC3R.
Many exciting work based on Genes and Health Study biobank are being presented this year at ASHG. On Thursday, Hye In Kim from Pfizer is presenting on an interesting analysis of human knockouts identified in the Genes and Health Study based on exome sequencing of 44,028 British South Asians. A part of this analysis links the genetic insights from rare homozygous loss of function variants with clinical trial success. This is in relation to one of my recent posts on the value of human genetics in predicting drug development success. Here the authors find that drugs that work by inhibiting genes are more likely to succeed, when individuals completely lacking this gene are found to exist in the general population (as evidenced by homozygous loss of function variant carriers in Genes and Health Study). If I were attending ASHG, I'd definitely won’t miss this talk.
Genomes
Recently, UK Biobank has released whole genome sequencing data on its half a million individuals. My last Substack post was on the cost effective value of WGS vs WES in terms of gene discovery. As I have discussed in the post, there are many challenges to address before the field can switch from WES to WGS for gene discovery at population level. So, I am obviously super interested on all the work presented at the ASHG based on large scale WGS datasets. In addition to UK Biobank, academic researchers now also have access to WGS data of more than 150,000 individuals from All of Us biobank. Not surprisingly, this year there are many presentations based on WGS data from UK Biobank and All of Us.
On Wednesday, Ryan Dhindsa from Baylor College of Medicine in Houston, USA, is presenting an analysis of inherited chromosomally integrated human herpesvirus 6 (HHV6) in more than 730,000 human genomes. This is one of my favorites this year. I've known many use cases of WGS data. But I've never thought about this particular one: identifying individuals who carry a HHV-6 viral genome integrated into their germline. Apparently, HHV-6 is the "only virus known to transmit through the human germline". The authors report that 1.1% of the 730k individuals carried HHV-6 in their genomes and these individuals had an increased risk of skin cancer, particularly basal cell carcinoma. It's previously known that HHV-6 viral DNA is often detected in the basal cell carcinoma tumors. The new finding suggests that "germline, rather than somatic viral exposure, predisposes individuals to basal cell carcinoma". Fascinating!
On Thursday, Konrad Karczewski from Massachusetts General Hospital, USA (gnomAD team) presents an all x all common and rare variant association analysis in 245,000 whole genomes from All of Us. This team from Broad Institute is reputed for their work in building useful genomic resources like gnomAD browser, genebass, pan-UKB GWAS downloads etc. The authors plan to meta-analyze results from UK Biobank and All of Us and release data iteratively with a final target sample size of 1 million whole genomes.
One of the problems of analyzing rare variants from noncoding genome as I've discussed in my previous post is the lack of well defined genomic boundaries like we have for gene sequences and the lack of well characterized variant effects like we have for coding variants. However, resources like ENCODE do provide a map of promoter and enhancer regions in the human genome across a range of cell types as a starting point to perform aggregate rare variant associations. On Friday, Jack Flanagan from Seoul National University in Republic of Korea presents on region-based rare variant analysis of UK Biobank whole genomes. The abstract reads "By leveraging data on enhancer/promoter-gene interactions and key epigenetic markers across over 1,500 cell types, our analysis provides a better understanding of the role of rare variants in complex traits."
On Friday, Harry Wright from University of Exeter in Exeter, UK, presents a WGS-based rare variant association analysis of anthropometric traits in 750,000 individuals. The authors have discovered some interesting new rare variant associations, for example, a 5' UTR variant near FGF18 having a large effect association with height. The authors highlight that this variant is loss of function intolerant and that is why earlier exome sequencing-based analyses failed to find an association with height. The same team has recently published a WGS-based rare variant association of height in 333,000 individuals.
Apart from All of Us, an another biobank that contain WGS data is TOPMed (though not at the scale seen in UK Biobank and All of Us). On Wednesday, Margaret Sunitha Selvaraj from Massachusetts General Hospital in Boston, USA presents a WGS-based noncoding rare variant association analysis of LDL cholesterol in 246,000 individuals from UK Biobank and TOPMed cohorts.
One other obvious use case of WGS data is estimate heritability of complex traits based on both genome-wide common and rare variants. The Twitter- and Substack-famous Sasha Gusev (highly recommend Sasha’s Subtack) has asked multiple times on Twitter why no one is doing heritability analysis using UK Biobank WGS data. Hyein Jung from Kyung Hee University in Seoul, Republic of Korea has now answered Sasha's request. Jung is presenting on Saturday on rare variant heritability analysis of complex traits using WGS data from the UK Biobank. The authors report that total heritability of height based on common and rare variants is whopping 82.25%, which is almost all of the twin heritability, which ranges between 80 to 90%. The abstract reads "Our results showed that for height, we accounted for 82.25% of the 90% twin heritability, while for BMI, we accounted for 39.37% of the 90% twin heritability." It seems the missing heritability of height is found, but not that of BMI. I wonder why we couldn't find for BMI. What's happening with the BMI heritability?
Structural variants
One important use case of WGS is study structural variants (SVs), particularly the rare noncoding ones. This year, there are many talks on this topic.
On Friday, Santosh Atanur from AstraZeneca in Cambridge, UK, is presenting a phenome-wide association study of structural variants identified in 460k UK Biobank genomes. The authors write in the abstract that 98% of the SVs are noncoding, of which 11% spanning known enhancer regions. The authors also highlight few interesting examples such as a 5kb deletion overlapping a cardiac pericyte and vascular smooth muscle enhancer increasing the risk of atherosclerotic heart disease.
On Wednesday, Simone Rubinacci from Brigham and Women's Hospital in Boston, USA, also presents an SV analysis based on 500k UK Biobank genomes. The authors have discovered many interesting associations including an Alu insertion in the promoter of an endothelin gene (EDN3) associated with blood pressure, which reminds me of the famous PHACTR1 saga.
On Wednesday, Emma Pierce-Hoffman from Broad Institute in Cambridge, USA, is presenting on structural variant discovery in ~100k All of Us whole genomes.
Once important benefit of having SV data for hundreds of thousands of individuals is to build the SV constraint map of human genome, that is, to map the critical regions of the human genome that are intolerant to large structural changes. Of course, gnomAD team is on it. Xuefang Zhao from MGH in Boston, USA, is presenting a poster on Thursday on the "Functional impact of 2.7M structural variants across global populations".
On Wednesday, Shubham Saini from 23andMe, USA, presents analysis of copy number variants (CNVs) identified using genotyping data of 5 million 23andMe participants. Given the sheer scale of the data, the authors were able to identify not only common but also rare CNVs and their phenotypic associations. The research work that comes out of 23andMe have never failed to amaze me. Such a great resource! Yet, the company is not getting a break from bad press recently. I hope the company turns things around soon and gets back on its feet.
Polygenic risk scores
As always, you can find hundreds of presentations on polygenic risk score (PRS) in any human genetics conference. Consequently, the bar to get excited about a work related to PRS keeps rising year after year. Currently, the field is interested more on clinical utility of PRS. Particularly, I am interested in the clinical value of PRS as a screening tool rather than a diagnostic tool. We have seen good examples in the past, for example, screening for individuals at risk of fracture using PRS of bone mineral density. On that front, one abstract caught my eye. On Wednesday, Rosalind Eeles from The Institute of Cancer Research in UK is presenting the results of BARCODE 1 study that evaluated the value of prostate cancer PRS over traditional screening tools (such as PSA) in identifying men in middle to elderly age groups at risk of prostate cancer. The abstract reads "It detects a high proportion of clinically significant disease compared with PSA or MRI based screening programs and MRI missed a significant proportion (17-67%) of cancers found on biopsy. This is the first study to assess if this approach will be useful in population screening programs."
Speaking of PRS, another abstract caught my interest. Many of you might be aware of the resource PGS catalog which hosts PRS weights for hundreds of phenotypes that can be used to generate PRS in your cohort without the need to train models yourself. The Finngen team has used all the models available from PGS catalog to generate PRS for more than 3000 phenotypes in 400,000 Finngen participants and tested associations withs nearly 5000 clinical end points. The results are made available through a "PGS browser". Nikita Kolosov from The Ohio University College of Medicine in USA is presenting this work on Wednesday.
Proteomics
UK Biobank released Olink proteomics data measured in 50,000 participants a year ago. Again, this is an industry-led effort called Pharma Proteomics Project (PPP), and the flagship papers came out in 2023. I've always been excited about the various applications of the PPP resource since the beginning (as you can see in my 2022 round up and Genetics Podcast interview). Among the many applications, one specific application, which I think we will hear more and more about in the near future, that I am closely watching for, is the value of proteomics in predicting disease risks. As far as I've understand, the consensus is that proteomics risk score is orthogonal to polygenic risk score as it is capturing a lot of environmental risks. This is going to prove valuable in drug development, particularly for developing biomarkers to assess clinical trial results. On Wednesday, Manik Garg from AstraZeneca is presenting a disease prediction model called "MILTON" trained based on plasma and urine biomarkers including Olink proteomics data in the UK Biobank. The authors note that addition of proteomics data remarkably boosted the prediction performance. The authors further performed genetic association analysis of MILTON-predicted phenotypes to identify new associations, which they replicate in Finngen data. The authors have also provided the results via a browser.
Metabolomics
Speaking disease prediction using proteomics data, an another layer of omics data that has been recently generated for all half a million UK Biobank participants is plasma metabolomics by company Nightingale Health. On Wednesday, Jeffrey Barrett, the CSO of the company, is presenting on the first pass analysis of this huge dataset. The authors generated metabolomic and polygenic risk scores for 30 chronic diseases and compared the prediction performances between the two. The abstract reads "The metabolomic scores are more strongly associated than polygenic scores for all diseases tested except common cancers, and the metabolomics tracked observed changes in risk profile across time in longitudinal samples". This is reminiscent of what we are seeing using the proteomics risk score. Like, proteomics risk score, I expect that metabolomic risk score too is orthogonal to polygenic risk score. The authors have further performed genetic association analysis of metabolomic phenotypes and found many interesting associations. The authors also note in the abstract that UK Biobank metabolomics dataset and related GWAS summary statistics will be made available to researchers in Autumn 2024.
Drug targets
If you have even a slightest interest in using human genetics to advance drug development, you shouldn't miss the plenary session titled "The Promise and Payoff of Human Genetics and Genomics: Paths from Bench to Bedside" scheduled for Saturday. I am particularly excited about two of the talks, one by David Goldstein (CEO of Actio Biosciences) and the other by David Altshuler (CSO of Vertex Pharmaceuticals), pioneers and reputed leaders in the human genetics field.
Therapeutics
This is one of my favorite themes at the ASHG. Every year we get to hear some fascinating story of innovative drugs being designed to treat challenging rare diseases. Last year, I highlighted a creative therapeutic design to treat Angelman syndrome. This year we have an another interesting story. On Wednesday, Alban Ziegler from Columbia University in New York, USA, is presenting about an intrathecal, allele-specific antisense oligonucleotide therapy designed to treat an individual with a rare disease called KIF1A associated neurological disorder (KAND). Here the allele-specific targeting is achieved by targeting a noncoding variant in phase with the pathogenic mutation. This is one of the ways human genetics is helping with drug development: by identifying genetic markers in cis to allele-specifically target the RNA to treat conditions caused by genetic defects in haploinsufficient genes.
I’ve bookmarked more presentations. But I’ll stop here. I hope you find this curation helpful. Before you go, a quick update. As you may know, I’ve been doing quarterly podcast episodes with Patrick Short on The Genetics Podcast. The latest episode was released recently in which I discuss five interesting human genetics work from the third quarter of 2024 (all of which were covered in my past Substack posts). Do check out!
Finally, I’d like to give a big shout out to Patrick Short for his incredible work in hosting The Genetics Podcast. He has now completed more than 150 episodes! At this year’s ASHG, Patrick is hosting a get-together for all past, present and future guests and listeners of The Genetics Podcast. Sign up via the link or reach out to Patrick directly via Twitter DM, if you’d like to attend the event.
Great post. Incredible to see the heritability estimates approach twin heritability once you add in rare variants
Great post!