Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 29 Next »

Introduction

This page presents images from the IGB software showing a variety of data types and data sets. Most of these images are from data sets created in the Loraine Lab. Some examples (e.g., the tiling array examples) come from data harvested from the Gene Expression Omnibus.

Edge matching ESTs to alternative splicing

Edge matching is a specialized visualization technique for genomics data that makes it easy to compare boundaries. Edge matching is especially useful for genes that undergo alternatives splicing. We can assess the degree to which individual splicing choices are made by looking at overlapping data from EST alignments or RNA-Seq data. The example below shows a visualization in IBG 6.6, which introduces a new iconography for gene models that uses arrows to indicate strand. (Arrows pointing right mean the annotation is on the plus (forward) strand and arrows pointing left indicate minus strand annotations.) Note how in this case that one of the gene models suggests an exon skipping event but only one overlapping EST supports it. Also note that the track label font is enlarged for legibility. (This comes from feature request 3191400 submitted by a user on the IGB forum.)

Viewing RNA-Seq and expression tiling array data in the same view in IGB 6.3

In this image, you see three different Arabidopsis expression tiling array data sets corresponding to cold (GSM243694), high salt (GSM243703), and drought (GSM243707) treatments assayed using the Affymetrix AtTile1R tiling array platform. The data were loaded in simple graph format, where probe intensities are shown as vertical bars. The graphs were then configured via the Graph Adjuster tab's Graph Thresholding option to display a bar underneath groups of consecutive probes with intensity values above a certain threshold. Note how the bars seem to correspond to know exons in the gene models displayed in the TAIR9 track. At the top of the display are short read Illumina RNA-Seq data (75 bases per read) from plants undergoing severe drought stress. Note that graph thresholding seems to suggest that this gene contains a previously undiscovered five prime exon. However, the RNA-Seq data contain no reads that support this idea. This example illustrates some of the ambiguities that can arise from using data from high-throughput methods - like tiling arrays and Illumina sequencing. Consider that the data sets are enormous and noisy! Thus, purely through chance we will observe at least some genes adjacent to what tiling arrays seem to suggest are unannotated exons. If the RNA-Seq data supported the idea that this region of high probe intensity is indeed an exon, then we would be much more likely to believe the conclusion, because the odds of two entirely different expression measurement technologies giving the same spurious result are very small. 


Visualizing RNA-Seq reads aligned onto a genome

This image from IGB shows short read sequences aligned onto the Arabidopsis A_thaliana_Jun_2009 (TAIR9) genome. The track above the Coordinates track presents part of two Arabidopsis gene models, with sequence data loaded. Note how the reads seem to support two alternative splicing variations in the overlapping gene. Also note the nucleotide differences between the aligned reads (upper track) and the reference sequence.

Visualizing output from TopHat and BowTie

The image below shows output from TopHat and BowTie, programs that align short reads from Illumina sequencing experiments onto a  genome. The top two tracks are from a BED format file that TopHat creates in which each line of the BED file represents a splicing choice and the score field indicates the number of reads that supported that choice.  The two floating graphs (red and green) are from WIG files the programs produce that summarize the number of reads covering individual regions. In this particular experiment, the control sample (green) produced many more reads than the treatment sample (red), and so the fact that there are more reads overall in the treatment (red) sample tells us this gene is up-regulated under the treatment.


  • No labels