The world's largest database of human knockouts
A physician-scientist is transforming Pakistan's cultural challenges into research opportunities
Happy Friday! This week’s post is closely related to the last week’s. If complete loss of a human gene is compatible with life, it is likely that humans without that gene are living somewhere in Pakistan. Danish Saleheen, a Pakistani physician scientist at Columbia University, is on a mission to identify all such humans and build the world’s largest database of human knockouts. Pakistani Genomic Resource (PGR) was conceived nearly two decades ago. What began as a research project to study the genetics of myocardial infarction (initially called as Pakistan Risk of Myocardial Infarction Study (PROMIS)) has slowly transformed into PGR now. Some of you might be familiar with PGR’s flagship paper published in Nature in 2017 that for the first time familiarized the field with the phrase “human knockouts”. Danish Saleheen is a familiar name among many industry genetic researchers, as PGR has grown over the years through the funding of many biotech and pharma companies, including Regeneron Genetics Center. I’ve always amazed with the value of this resource for drug development. I have shared many stories of how human knockouts from South Asian populations have helped drug developers assess safety of inhibiting a gene or its product (e.g. HAO1 RNAi for the treatment of primary hyperoxaluria type 1, APOL1-inhibitors for the treatment of APOL1-mediated kidney disease in African Americans etc.). The PGR will become a major resource for drug developers in the upcoming years. Below is a Twitter thread I wrote during the American Society of Human Genetics (ASHG) conference in 2022 on a talk by Danish on the PGR.
From the Twitter archives
Danish Saleheen stunned the audience with his story of building the world's largest cohort of human knockouts in Pakistan, world's 5th most populous country with the highest level of consanguinity.
Starting with around 10,000 individuals sequenced in 2017, now the cohort comprise around 200,000 individuals recruited, 80,000 of whom were exome sequenced. The goal is to sequence 1 million.
Based on these data, they have identified so far >14,000 human knockouts for >5,000 genes. To achieve the same in European populations, you'll have to sequence >11 million individuals.
Such a high prevalence of knockouts is due to the extremely high rate of consanguineous unions in the communities. In the current sample, around 40% are born to first cousin unions. Marrying outside the family circle is considered a 'taboo' in certain communities.
Extensive phenotyping of these individuals are being done by linking to clinical records, administering questionnaires and through physical and clinical examinations at the site of recruitment.
The most powerful of all these approaches is the call back study—the ability to recontact the individuals (footnote), do a cascade screening of the family members and perform an in-depth phenotyping of the whole family.
Danish shared a mind-blowing experience where he recontacted an individual who was a knockout for APOC3. It turned out his wife was also a knockout and, as a result, all their 9 children. It was a jaw dropping moment to see the pedigree of this family.
Cascade screening this family and their relatives identified 33 knockouts and hundreds of heterozygous carriers of APOC3 mutation. For context, there is zero KO in the gnomAD database.
This motivated them to test for cardio protective effect of APOC3 homozygous mutations, given the previous reports in heterozygous individuals. And this led to a shocking revelation.
Unlike the heterozygotes, the homozygotes for APOC3 are not protected from heart attack. Instead, they seem to be at increased risk1. Exemplified by the fact that the fisherman through whom all these knockouts were traced died few months later of heart attack.
An important lesson to learn here: our predictions of the full picture based on the half we see don't turn out always right. It also highlights the extreme value of having such a genetic database.
How can we use this resource to gain insights into the GWAS findings? Danish and colleagues took all the GWAS loci identified for NAFLD and prioritized a list of 82 genes that are likely to be causal using different methods. Of which, for 22 genes, human knockouts were found in the Pakistani Genomic Resource database. Now they are planning to study these knockouts extensively.
I am stunned to learn how extremely valuable this genetic resource is to the genomics community. While Danish concluded his talk, I was lost in thoughts, wondering if it'll be ever possible to establish such a resource in India.
The phenotype profile in human knockouts from consanguineous families should be interpreted cautiously, as the homozygous mutation is often part of a long-stretch of homozygous segment called of regions of homozygosity (ROH). Hence, if the person happens to also carry another loss of function or missense variant in a neighbor gene, it is difficult to tell apart the phenotypic consequences of that gene from the current. Thanks to Patrick Sulem, Head of Clinical Sequencing at deCODE genetics, for kindly reminding me about this point during the ASHG 2022 when my Twitter thread was making rounds on the internet.