Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

Introduction

This page presents images from the IGB software showing a variety of data types and data sets. Most of these images are from data sets created in the Loraine Lab. Some examples (e.g., the tiling array examples) come from data harvested from the Gene Expression Omnibus.

RNA-Seq data reveal high level of alternative splicing in Arabidopsis thaliana GRP7 (AT2G21660).

This image from IGB 6.6 shows how IGB can capture a rich data scene and convey details about alternative splicing. The image depicts Arabidopsis thaliana GRP7 (AT2G21660) encoding "a small glycine-rich RNA binding protein" that "regulates..circadian oscillations of its own transcript." (Functional annotation comes from The Arabidopsis Information Resource locus page for AT2G21660.) In several RNA-Seq data sets of seedlings harvested at diverse ages, GRP7 RNA appeared among the top ten most highly-expressed transcripts.  Moreover, the splicing pattern of GRP7 is highly diverse, as depicted in the image below. Here, junctions predicted by the TopHat spliced alignment program are shown together with the number of reads supporting each junction - this comes from the "score" field of the junction.bed file TopHat produces. To generate this image, we selected genome version A_thaliana_Jun_2009 and displayed the TAIR10 All data set from the IGB QuickLoad data sources. We opened a file containing TopHat junctions and loaded features overlapping the GRP7 region. We configured IGB to merge minus and plus strand junction features into a single track and used the Strand option under Track Preferences to color the TopHat junction features by strand. We chose a dark red shade to indicate plus strand junction features and a dark blue shade for minus strand features. Gene models are shown in green with arrows on the intronic regions to indicate the direction of transcription.  Note how the 3-prime intron undergoes a rich pattern of alternative splicing but the upstream intron appears to undergo a more restricted pattern of splicing; TopHat predicted only two alternative introns. 

Counting reads in a region with no annotated gene

This image illustrates using the selection tool to select a set of reads over a region of the Arabidopsis genome with no annotated protein-coding gene. Note that the number of selected items appears in the lower left corner of the IGB window. The version of IBG is 6.5., released in April, 2011.

Edge matching ESTs to alternative splicing

Edge matching is a specialized visualization technique for genomics data that makes it easy to compare boundaries. Edge matching is especially useful for genes that undergo alternatives splicing. We can assess the degree to which individual splicing choices are made by looking at overlapping data from EST alignments or RNA-Seq data. The example below shows a visualization in IBG 6.6, which introduces a new iconography for gene models that uses arrows to indicate strand. (Arrows pointing right mean the annotation is on the plus (forward) strand and arrows pointing left indicate minus strand annotations.) Note how in this case that one of the gene models suggests an exon skipping event but only one overlapping EST supports it. Also note that the track label font is enlarged for legibility. (This comes from feature request 3191400 submitted by a user on the IGB forum.)

Viewing RNA-Seq and expression tiling array data in the same view in IGB 6.3

In this image, you see three different Arabidopsis expression tiling array data sets corresponding to cold (GSM243694), high salt (GSM243703), and drought (GSM243707) treatments assayed using the Affymetrix AtTile1R tiling array platform. The data were loaded in simple graph format, where probe intensities are shown as vertical bars. The graphs were then configured via the Graph Adjuster tab's Graph Thresholding option to display a bar underneath groups of consecutive probes with intensity values above a certain threshold. Note how the bars seem to correspond to know exons in the gene models displayed in the TAIR9 track. At the top of the display are short read Illumina RNA-Seq data (75 bases per read) from plants undergoing severe drought stress. Note that graph thresholding seems to suggest that this gene contains a previously undiscovered five prime exon. However, the RNA-Seq data contain no reads that support this idea. This example illustrates some of the ambiguities that can arise from using data from high-throughput methods - like tiling arrays and Illumina sequencing. Consider that the data sets are enormous and noisy! Thus, purely through chance we will observe at least some genes adjacent to what tiling arrays seem to suggest are unannotated exons. If the RNA-Seq data supported the idea that this region of high probe intensity is indeed an exon, then we would be much more likely to believe the conclusion, because the odds of two entirely different expression measurement technologies giving the same spurious result are very small. 


Visualizing RNA-Seq reads aligned onto a genome

This image from IGB shows short read sequences aligned onto the Arabidopsis A_thaliana_Jun_2009 (TAIR9) genome. The track above the Coordinates track presents part of two Arabidopsis gene models, with sequence data loaded. Note how the reads seem to support two alternative splicing variations in the overlapping gene. Also note the nucleotide differences between the aligned reads (upper track) and the reference sequence.

Visualizing output from TopHat and BowTie

The image below shows output from TopHat and BowTie, programs that align short reads from Illumina sequencing experiments onto a  genome. The top two tracks are from a BED format file that TopHat creates in which each line of the BED file represents a splicing choice and the score field indicates the number of reads that supported that choice.  The two floating graphs (red and green) are from WIG files the programs produce that summarize the number of reads covering individual regions. In this particular experiment, the control sample (green) produced many more reads than the treatment sample (red), and so the fact that there are more reads overall in the treatment (red) sample tells us this gene is up-regulated under the treatment.


  • No labels