Introduction

IGB can be used to visualize data from Affymetrix showing the locations of GeneChip Expression Array probe sets aligned to a genome.

Chip design procedures have varied somewhat from chip to chip, but, in general, probes on Affymetrix's standard commercial expression microarrays (such as the Human U133 chip) are designed to match the 3' regions of known or computed mRNA sequences.  These probes are typically 18 to 25 bases long depending on the chip.

In standard arrays, probes are grouped conceptually into probe sets, groups of probes that are expected to measure expression for individual known or computationally-deduced mRNA molecules. 

Probe sequences are typically selected from the 3' regions of these design sequences in order to maximize the amount of sample mRNA to be measured.  (This so-called "3-prime bias" will most likely change as sample preparation protocols improve over time.) 

These design sequences may be identical to known mRNA sequences in GenBank, or they may have been produced computationally by merging ESTs or mRNA sequences into a single sequence, sometimes called a "consensus" sequence. 

IGB can be used to visualize the location of design sequences and probes within the genomic sequence.  Seeing where these sequences are located within the genome can be extremely useful when several overlapping probe sets recognize diverse mRNAs originating from the same gene.  Around 60% of human genes produce multiple variants, and these diverse variants often exhibit very complex configurations of exons and introns.  As a result, it often helps to be able to view a diagram of these complex structures together with probes in order to determine which individual mRNAs are being detected by a given probe set. 

To make this easier, IGB shows where the design sequences and their probes align to the genome together with alignments between the genomic sequence and known or predicted mRNAs.  By showing probes, known mRNAs, and design sequences in the same view, IGB makes it possible to determine quickly which known mRNA a probe set could detect.

However, it is important to note that these genome alignments can only be as good as the genomic sequence on which they are based.  Genomes vary in their level of quality – for example, the human and fruit fly genomes are very high quality, having gone through several releases involving many refinements.  More recently-sequenced genomes (such as mouse or rat) are not yet as reliable.  For the less-refined genomes, there may be many examples of design or mRNA sequences that simply do not align anywhere to the genomic sequence or, if they do align, do so imperfectly.  Readers interested in understanding more about these issues can read more about this topic at the following link:   http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=12149135/.