Monday, August 11, 2014

Promethease from 23andMe

Take a look at this image (click on it to make it large enough – it’s from Population Genetics, 2nd Edition by John H. Gillespie, p. 3).

A gene - showing coding nucleotides and SNPs

It’s the reference allele (gene) which codes for the alcohol dehydrogenase enzyme in a particular species of fruit fly. Fruit flies need this enzyme to handle the alcohol in rotting fruit. There are 768 coding bases in this gene (a coding base, or nucleotide, is one of the four constituents of DNA commonly abbreviated to A, G, C and T).

Grouped in threes, the nucleotides code for amino acids: so at position 578 the sequence AAG codes for lysine. Change it to AAC and you get threonine instead. This change of a single base is called a SNP – a ‘Single Nucleotide Polymorphism’ pronounced ‘snip’. Two coding sequences in a population which differ in one or more SNPs are called alleles. Note that the string of amino acids listed in the image, when assembled together, constitute the alcohol dehydrogenase enzyme in variants depending upon which SNPs were present..

SNPs are happening all the time in DNA due to mutations, e.g. copy errors of various kinds. Mostly they so mess up the gene that it can’t function properly, the organism dies without reproducing and thus natural selection ‘purifies’ the genome. In some places, though, a SNP merely alters the function slightly and creates variation between individuals. Most traits such as height, intelligence and personality are under the control of many different alleles ‘of small effect’ so just looking at one SNP won’t tell you too much.

The human genome contains at least 30,000 genes constituted from three billion base pairs on 23 chromosome pairs. One of these pairs is the sex-determining chromosome pair: XX (you’re a girl) or XY (you’re a boy). In each chromosome pair you get one of the chromosomes from your father and the other from your mother. However, each of these chromosomes has itself been randomised from the two corresponding chromosomes in each of your parents through a process called recombination (except for the Y chromosome which has no female variant to pair with in its chromosome and is thus handed on unchanged).

Humans have at least ten million SNPs – the number increases with research. Many studies have looked at people with medical conditions and tried to work out if they have some specific SNPs which non-sufferers lack. When you have a genome analysis – as with 23andMe – they put your sample through a chip (made by a company called Illumina) which knows about a million SNPs reflecting those currently considered important and significant by the research community. You can download your personal raw data from the 23andMe website and here’s mine (text version; Excel spreadsheet version - download, don't try to preview, they're too large). Be warned, the raw data lists around a million SNPs and occupies 20/30 MB. It also takes a while to load. It’s also completely meaningless by itself.

Here’s some background on human genome SNPs. There’s a standard database called SNPedia which centralises what’s known and gives a reference number (such as rs1234 - a tutorial SNP) to each unique SNP. Here is what the 23andMe raw data looks like (just the first few entries!)
# This data file generated by 23andMe at: Tue Apr 23 09:13:29 2013
# Below is a text version of your data.  Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier
# (an rsid or an internal id), its location on the reference human genome, and the
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing
# improvements in our ability to call genotypes. More information about these changes can be found at:
# More information on reference human assembly build 37 (aka Annotation Release 104):
# rsid chromosome position genotype
rs4477212 1 82154 AA
rs3094315 1 752566 AA
rs3131972 1 752721 GG
rs12124819 1 776546 AA
rs11240777 1 798959 GG
rs6681049 1 800007 CC
rs4970383 1 838555 AC
rs4475691 1 846808 CT
rs7537756 1 854250 AG
rs13302982 1 861808 GG
rs1110052 1 873558 GT
rs2272756 1 882033 AG
So now we come to Promethease. This is a self-service program which links your raw data to the current scientific literature. For $5 you get a report which tells you what is known about the unique set of SNPs which define you (at least as far as 23andMe presently go - some way short of a full genome analysis which is still too expensive).

Promethease has a reputation as being difficult to use; it is not. Here’s the YouTube video which I watched and then knew exactly what to do. It was no problem at all.

And here’s the report I got back (zipped, 40 MB). It’s basically not too hard to interpret and the help links are good. I found Medical Conditions particularly informative once I looked at the help link to understand the graphics.

A I expected, there are few surprises. I was pleased to be in the 12% where exercise actually loses you weight. And of the SNPs currently known to be associated with Autism, I have only a few. Testicular cancer – not so good.

Do you want to know more?