In this assignment, you'll practice using the subprocess module to count reads overlapping genes in an RNA-Seq data set.
Use the subprocess object to repeatedly run samtools to count the number of reads overlapping a list of genes given in a bed file.
To remind you, you can count the number of reads overlapping a region using a pipe or using samtools view with the -c option.
Write a program countReads.py that uses samtools view to count the number of reads overlapping gene regions listed in a gene locations BED file.
The interface to the program should work as follows:
Output should be tab-delimited, as follows:
where gene is the name of the gene, start is its start position, end is its end position, and count is the number of reads that overlap the gene.
If the option u (-unique) is provided, only count the uniquely mapped reads.
How to turn it in
Check it into your part of the class repository. Dr. Loraine will test yours and your fellow students code using an automated grading program, so be sure that when you first check in your file
- the name of the file is countReadsWithSamtools.py
- the file resides in your home directory in the class repository
- the file is executable