Unless you have been living recently on a remote island with no internet connection and working on a research grant or manuscript, you must have heard about the chatbot from openAI that is making the headlines of every news outlet in the world. The chatbot, called as GPT-3 (Generative Pre-trained Transformer 3), has blown away the minds of millions, including mine, by its extra-ordinary human-like conversational skills and unbelievable knowledge spanning every scientific domain imaginable, including human genetics.
According to Wikipedia, “GPT-3 was trained on hundreds of billions of words and is also capable of coding in CSS, JSX, and Python, among others.”
For fun, I asked GPT-3 to write a bed time GWAS story. And here is the output.
Once upon a time in a faraway land, there was a beautiful princess named Lily. Lily was kind, gentle, and loved by all who knew her.
One day, a wicked sorcerer cast a spell on Lily that made her fall into a deep sleep. The sorcerer was jealous of Lily’s beauty and wanted to keep her asleep forever.
The kingdom was in despair, but they were determined to find a way to wake the princess. They searched high and low for a solution, but nothing seemed to work.
Finally, they turned to the field of genetics for answers. A group of scientists began to study the genetics of the princess and the sorcerer, in hopes of finding a way to reverse the spell.
After many months of hard work, the scientists discovered a genetic variant that was present in the sorcerer, but not in the princess. They believed that this variant was responsible for the sorcerer’s magical powers, and that it may hold the key to waking the princess.
The scientists worked quickly to develop a medication that could target the genetic variant and reverse the spell. The gave the medication to the princess, and after a few days, she began to stir.
The princess slowly opened her eyes, and the kingdom rejoiced. The spell was broken, and Lily was finally awake.
The scientists were hailed as heroes, and their work in genetics was celebrated throughout the land. And from that day on, the princess lived happily every after.
Wow! The bot knows what a GWAS is. It knows that one of the uses of GWAS is to identify drug targets. And it knows how to creatively blend the key ingredients of GWAS and bedtime story to deliver a magical GWAS tale. That is truly mind blowing.
Impressed by the many extraordinary abilities of the chatbot, I wondered how much will the bot score if it takes an IQ test.
Many of the commonly used IQ tests take long time to finish and I have neither the time nor the patience to use those. But luckily, there is a simple but effective test that have been used to quantify fluid intelligence of the participants of the UK Biobank. The test comprises of 13 questions that tap the verbal and numerical reasoning abilities of the participants. The total score (which is simply the total number of correct answers) reasonably approximates the cognitive function of the participants, as reflected by a high genetic correlation between the GWAS of fluid intelligence in the UK Biobank and the GWAS of g factor (i.e., general cognitive ability) in an independent cohort measured using varied cognitive tests, as shown by both Davies et al. (2018) and Savage et al. (2018).
Without further ado, let me jump straight into the 13 questions and chatbot’s answers. I’ve displayed the answer choices for each question as a poll, if you’d like to test yourself before looking at the correct answer.
Q1: Numerical addition test
As always the case, the initial questions are easier compared to the later ones. The first question evaluates one’s numerical addition skill.
Add the following numbers together: 1 2 3 4 5 - is the answer?
And the bot’s answer was
Q2: Identify largest number
Which number is the largest?
And the bot’s answer was
Q3: Word interpolation
Bud is to Flower as Child is to?
And the bot’s answer was
The bot gave a correct interpretation of the question, but then interestingly, it gave a wrong answer: “grow”. The correct answer is ‘adult’, which is what most of the 200,000 + UK Biobank participants chose as you can see from the distribution plot below available from the UK Biobank website.
When I told the bot that it answered wrongly, the bot disagreed with me and machinesplained me why the correct answer is ‘grow’ and not ‘adult’. I was a bit awestruck by the bot’s demonstration of reasoning ability, the very thing that I was trying to assess using a formal test.
I guess both ‘grow’ and ‘adult’ could be right answers based on how we interpret the question. If we are looking for a noun that matches with the word ‘child’ in the same way ‘flower’ matches with ‘bud’, the correct answer is ‘adult’. But if we are looking for a verb that best describes the the action of ‘child’ in the same way that verb ‘flower’ describes the action of ‘bud’, then the correct answer is ‘grow’. However, to keep with the expected answer and to get a total score that can be fairly compared with the average score of the population, I did not give the bot mark for this question.
Q4: Positional arithmetic
11 12 13 14 15 16 17 18
Divide the sixth number to the right of twelve by three. Is the answer?
And the bot’s answer was
The bot failed to answer this question correctly. And when I told the bot it was wrong, it apologized and accepted that the correct answer is 6.
So, it appears the bot lacks spatial skills. It struggles to identify a number based on its position relative to an another number. Interesting!
Q5: Family relationship calculation
If Truda's mother's brother is Tim's sister's father, what relation is Truda to Tim?
This is a bit difficult question as it requires you to mentally map the relationship across the individuals to find out the relationship between Truda and Tim. And the bot nailed it in a fraction of a second.
Q6: Conditional arithmetic
If sixty is more than half of seventy-five, multiply twenty-three by three. If not subtract 15 from eighty-five. Is the answer?"
Amusingly, the bot got confused with this question. It was unable to understand the conditional logic behind the question. It said “it is impossible to determine the correct answer”
The correct answer is, of course, 69, which most of the UK Biobank participants answered.
When I explained it to the bot, it agreed that it’s possible to determine the answer, and the right answer is indeed 69. So, it seems the bot lack conditional logic skills too.
Q7: Synonym
Stop means the same as?
This was easy-peasy for the bot.
Q8: Chained arithmetic
If David is twenty-one and Owen is nineteen and Daniel is nine years younger than David, what is half their combined age?
The bot aced this one too.
Q9: Concept interpolation
Age is to Years as Height is to?
The bot got the right answer. But it kind of also got the wrong answer when it explained how it found the answer. Anyway, I let this one slide and gave full mark as its first choice ‘Metres’ was the correct answer.
It’s interesting that the bot felt that ‘Tall’ can also be a right answer. So did a small proportion of the UK Biobank participants.
The answer ‘Tall’ can be right only if the question read ‘Age is to Old as Height is to?’ So, even though the bot answered this right, it appears that it still need to improve on its concept interpolation skill.
Q10: Arithmetic sequence recognition
150 ... 137 ... 125 ... 114 ... 104 ... What comes next?
The bot had no clue here. It completely failed to recognize the sequence logic beneath the numbers displayed. The bot’s answer was
When I explained the logic behind the number sequence, it humbly accepted its mistake.
Honestly, this is a slightly tough question. I remember struggling when I tried it the first time.
Q11: Antonym
Relaxed means the opposite of?
Easy peasy. The bot got it right, of course.
Q12: Square sequence recognition
100 ... 99 ... 95 ... 86 ... 70 ... What comes next?
The bot didn’t recognize the sequence logic here as well. It gave a wrong answer.
And when I explained it, it accepted that its answer is incorrect.
Actually, this question is the toughest of all. I didn’t figure it out when I tried. And very few participants in the UK Biobank even attempted to answer and among them, most said that they said they don’t know the answer.
But it’s interesting to see that the bot’s inability to identify the numeric logic underlying the sequence of numbers. We can’t blame the bot as it was never meant to perform such tasks, I guess.
Q13: Subset inclusion logic
If some flinks are plinks and some plinks are stinks then some flinks are definitely stinks?
Although the bot gave the correct answer, I was not satisfied with its explanation. It almost appears like the bot got it right by fluke. I don’t think it understands the subset logic either.
Okay, it’s time to calculate the total score and see where the bot stands when compared to the total scores of the UK participants. Out of the 13 questions, the bot answered 8 correctly (1+1+0+0+1+0+1+1+1+0+1+0+1). This is slightly higher than the population average which is 6.16 as per UK Biobank’s website.
It’s conventional to display the IQ scores with a mean 100 and standard deviation 15. A majority of the population lie within ±1SD (i.e., between 85 to 115). An IQ score below 2 SD from the population mean (i.e. 70) is often considered suggestive of a cognitive disability. So, if we rescale the GPT-3 bot’s IQ score, it will be around 115 (100+15). Not an impressive score for a human. But an amazing score for an AI, if you ask me.
Finally, I asked the bot to summarize my experience evaluating its cognitive skills using the fluid intelligence test, and it did a great job.
It feels both amazing and scary to see what this bot has accomplished today and imagine what it will accomplish in just a few years from now.
—Veera
Would be interesting to see how the model performs in the "repeat" experiment!
How do you convert the bot's score out of 13 to IQ?