Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Introduction

This page presents images from the IGB software showing a variety of data types and data sets. Most of these images are from data sets created in the Loraine Lab. Some examples (e.g., the tiling array examples) come from data harvested from the Gene Expression Omnibus.

Viewing RNA-Seq and tiling array in the same view

In this image, you see three different tiling array data sets corresponding to cold (GSM..), high salt (GSM..), and drought (GSM..treatments) assayed using the Affymetrix AtTile1R tiling array platform. The data were loaded and graphs configured so that groups of consecutive probes with intensity values above a certain threshold are marked with a solid bar, suggesting these probes correspond to exons. At the top of the display are short read Illumina RNA-Seq data (75 bases per read) from plants undergoing severe drought stress. Note that graph thresholding seems to suggest that this gene contains a previously undiscovered five prime exons. However, the RNA-Seq data contain no reads that support this idea. This example illustrates some of the ambiguities that can arise from using data from high-throughput methods - like tiling arrays and Illumina sequencing. Consider that the data sets are enormous! Thus, purely through chance we will observe at least some genes adjacent to what tiling arrays seem to suggest are unannotated exons. If the RNA-Seq data supported the idea that this region of high probe intensity is indeed an exon, then we would be much more likely to believe, because the odds of two entirely different expression measurement technologies giving the same spurious result are astronomically small. If you are familiar with basic probability theory, you will agree that the probability of two rare events occurring simultaneously is very, very small!


Visualizing RNA-Seq reads aligned onto a genome

This image from IGB shows short read sequences aligned onto the Arabidopsis A_thaliana_Jun_2009 (TAIR9) genome. The track above the Coordinates track presents part of two Arabidopsis gene models, with sequence data loaded. Note how the reads seem to support two alternative splicing variations in the overlapping gene. Also note the nucleotide differences between the aligned reads (upper track) and the reference sequence.

Visualizing output from TopHat and BowTie

The image below shows output from TopHat and BowTie, programs that align short reads from Illumina sequencing experiments onto a  genome. The top two tracks are from a BED format file that TopHat creates in which each line of the BED file represents a splicing choice and the score field indicates the number of reads that supported that choice.  The two floating graphs (red and green) are from WIG files the programs produce that summarize the number of reads covering individual regions. In this particular experiment, the control sample (green) produced many more reads than the treatment sample (red), and so the fact that there are more reads overall in the treatment (red) sample tells us this gene is up-regulated under the treatment.


  • No labels