Introduction

A paper published in 2011 titled Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes describes a study that used RNA-Seq to characterize genes expressed in reproductive structures from maize.

About the data

The study sequenced fourteen libraries prepared from fourteen sample types, including pollen, ovules, tassels, anthers, etc. One replicate per sample type was included.

To find out more about the study,

In this tutorial, you'll learn how to load these data into IGB, view genome coverage graphs for a subset of samples, and identify genes that are expressed primarily in ovules, pollen, or other sample types.

To make these data available for visualization in IGB, we downloaded the sequence data from the Short Read Archive and then aligned the sequences onto a maize reference genome assembly using the spliced alignment tool TopHat. We used a program called wiggles (from the TopHat distribute) to generate coverage graph files for each sample. We then set up the files on IGB's QuickLoad site at http://www.igbquickload.org/quickload.

For a description of how we set up the data in the IGB QuickLoad system, please see this page from the IGB Developer's Guide:

What's a coverage graph?

A coverage graph in an RNA-Seq experiment is a type of genome graph that reports the number of reads overlapping regions in the genome. In a coverage graph, the x-coordinates are the genomic base pairs and the y coordinates represent the number of reads overlapping each genomic base.

In this experiment, graphs represent read counts; no normalization has been done. Thus simply comparing graph heights "by eye" can be misleading; you will need to take into account the fact that different samples produced different numbers of mapped reads. However, the number of mapped reads per sample type was similar across the different samples and so the relative heights of coverage graphs can give a reasonably good approximation of gene expression levels per sample.

Here is an example of the first few lines from a bedgraph file:

1    0    6    1
1    6    9    2
1    9    10    4
1    10    35    5
1    35    41    4

The first column reports the name of the sequence (chromosome 1), the next two columns give the start and end coordinates of the first region, and the third number reports the number of reads that overlap that region. Note that the coordinates are in interbase.

When we processed these data to make them available via IGB QuickLoad, we also computed read alignment distributions for each library. You can view the read alignment distribution files in the same Web directories where the coverage graphs and read alignment files reside. Click the IGB Quickload Data Source to view the data repository and navigate to this folder:

http://igbquickload.org/quickload/Z_mays_B73_Mar_2010/SRP006463/

Setting up

Follow these instructions to configure IGB to get started looking at gene expression in maize reproductive tissues.

Note that the gene models for maize should automatically load.

Now start opening coverage graphs from different samples from the RNA-Seq experiment.

The four data sets will automatically be added to the Data Management Table. Don't worry if you accidentally add a different data set. You can remove it by unchecking the box or clicking the red x.

Take a moment to customize the colors. Click the color squares in the Data Management Table (FG is foreground, BG is background) or choose File->Preferences->Tracks to open a new Window that let's you customize different aspects of the display at  several at once. Click each row in the table and give each row a white background. Choose colors for individual rows by clicking on them and then selecting options.

Tip: Simplify the track name to make the Track Labels easier to read.

Tip: Color the gene track black (foreground) and white (background) to make attractive images for slides and printing.

Setting IGB Preferences - The preferences window from IGB 6.7.0


Once you've set up the tracks, zoom in on a region.

IGB should then scroll and zoom to the region you entered. Next, click Load Data and and proceed from the first example below, after adjusting Y-axis scales as described in the next section.

TIP: You can also use IGB links from a Web site to navigate a genome in IGB. An IGB link looks like this: http://localhost:7085/UnibrowControl?seqid=1&end=61254530&start=51631058&version=Z_mays_B73_Mar_2010. When you click the link, IGB should scroll and zoom to the location indicated in the link, provided your computer allows access to non-standard ports. (This may not work on every computer.)

Once the data load, you'll see a view that looks a bit like the image below. To match the image below, you'll need need to rescale the graphs, add a y-axis, and so.

Note to advanced users: If you are running the high-memory version of IGB and have a fast internet connection, you can probably load all the data for chromosome 1 for all four graphs. However, if you try this and out of memory, you can always delete a graph (using the red X) and click the trashcan icon at the bottom right of IGB to force what is called "garbage collection," a programming term that means recovering memory.

Rescale the graphs, add y-axis

Use the Graph Adjuster tab to put all the graphs into the same scale, thus make it possible to compare them. Click the Graph Adjuster tab and then click the Select All button to select all the graphs. While all are selected to the following:

After setting the colors and configuring the Graphs, you should see something like the following image. (The Ovule graph is selected here to show its Graph Adjuster settings.)

View coverage graphs from four samples

Explore the data.

Now that the graphs use the same scale, you can zoom and pan around the chromosome and observe how different genes are expressed in different samples.

Use IGB's zoom controls to zoom around in the data

To zoom in, click the place you want to zoom - the vertical stripe (called a zoom stripe) will hop to the location of your most recent click and also will focus zooming so that when you zoom in or out, the zoom stripe will remain stationary while the scene around it expands or contracts. To zoom in, drag the horizontal slider (top right) to the right. To zoom out, drag the horizontal slider to the left.

Stretch the display vertically to show more detail, including gene model labels

To stretch the display in the vertical dimension, drag the the vertical slider (left side of the IGB window) up. To compress the display (and make it fit inside the IGB main data display), drag the vertical slider down.

To zoom in a region, click-drag your mouse along the number line. During the drag, you'll see a blue highlight.  When you release the mouse, you'll jump zoom to the region you selected. 

Jump-zoom to a gene model by double-clicking on it.

To zoom in on a gene model, just double-click it. This shortcut will help you quickly inspect interesting genes and their structure.

Scroll and zoom around the chromosome, making note of places where one or more samples appear to have more reads covering a particular region and double-click on gene models to zoom in for a closer look.

Also, keep your eyes open for regions with reads but no annotated gene. When you find one, click the corresponding read alignments file for the sample whose graph contained expression data. Load the reads to get an idea of the intron/exon structure for the unannotated gene.

Visit the following regions (see below) to see examples of new genes and differential expression across sample types.

Tip: To navigate quickly to the regions shown below, enter the regions (as shown) in the upper left text box and press the Return key on your keyboard. IGB will then jump to the region you entered. You can also search for gene models by name using the Search tab.

Examples of new genes and/or differential expression across sample types

A multi-kilobase view of chromosome 1

Note how different graphs have peaks at different regions, but all graphs use the same scale.


Gene expression high in ovule and lower in leaves and embryo samples.


A single-exon gene expressed in pollen but not the other sample types.


A new gene?

To replicate this image, modify the Max Stack Depth setting for this track, which controls the amount of vertical space above the two gene models.


Get more info from Phytozome, maizegenome.org, or google.


A complex locus with multiple predicted splicing variants.


Data loaded for all of chromosome one