2.1 Genotyping arrays

2.1 Genotyping arrays#

The majority of existing datasets in commerical direct to consumer companies such as 23andme, as well as datasets for large genome-wide association studies, are derived from genotyping arrays. (However, this is changing. within the next year there will be millions of NGS datasaets available from multiple projects including the UK Biobank, All of Us, Million Veterans Program, and others.) Note, you might hear this technology also referred to as SNP chips or SNP microarrays.

While each copy of our genome is around 3 billion base pairs, the majority of that sequence is identical between individuals. In fact, two copies of the genome will tend to differ only at 1 in 1,000 bases. We will refer to these single base pair differences as SNPs, or Single Nucleotide Polymorphisms. Some SNPs are very common - i.e. the same variant may be found in a large percentage of people. Other SNPs are rare, meaning they are not found in many people. We will talk much more about nomenclature and properties of SNPs in the next module.

The main idea of genotyping arrays is to save costs by only analyzing a small subset of the genome. Works from the early days following the human genome project (see the Hapmap Project) found that the majority of common genetic variation in humans could be captured by around 500,000 SNPs. Note, many more common SNPs exist, but many SNPs are highly correlated with each other and thus provide redundant information. Thus, rather than sequencing the entire genome of many people, we can do targeted analysis of these known “tag” SNPs.

To perform SNP genotyping using microarray technology, DNA is fragmented and then hybridized to a large array containing hundreds of thousands of these different probes (short sequences). The majority of SNPs have only two possible bases (alleles) that are common in the population. For each SNP, the array has two probes designed to capture each these two possible bases at a given site. By the intensities of the two probesd (let’s call them “A” and “B”), we can determine whether a person is homozygous for A (AA), homozygous for B (BB), or heterozygous (AB).

Most genotyping arrays target between 500,000 to 1.5 million SNPs. Commonly used arrays from Affymetrix and Illumina, as well as 23andme’s custom chip, have around 600,000. Larger arrays exist. For example, the Illumina Human Omni2.5S-8 chip captures around 2.5 million SNPs. There are also custom arrays designed to get higher resolution at certain regions of the genome of interest. For example, a “Human Origins” chip available from ThermoFisher is specifically designed for study of human history. The “immunochip” from Illumina has dense coverage of SNPs near loci associated with major autoimmune and inflammatory diseases.