Skip to end of metadata
Go to start of metadata

Introduction

The goal of this assignment will be to give you experience validating your own code.

Being able to identify errors in your own code is a critical skill in bioinformatics programming because most of the time, we don't have dedicated testers to tell us when our code doesn't do what it should.

How to validate your code

The best way to check that a program executes without error is to run it with inputs where you know the correct answer.

For example, a program like checkBed.py should print FAIL when given a non-BED file and should print PASS when given a BED file.

However, will it fail in the right way on every type of non-BED file?

For example, if you give it a comma-separated file, will it print FAIL or die with error message, as follows?

Icon

If you have checked in code without testing with known inputs, it is extremely likely that your code has a bug.

In this exercise, your goal will be to find the bugs in your code and fix them and in so doing, strengthen your testing and debugging skills.

Note that in a workplace setting, getting a reputation for writing buggy code can negatively affect your career. I've know of many developers and scientists who lost their jobs because their co-workers got sick of dealing with their buggy, poorly-tested code and untrustworthy results. Don't let that happen to you. Use this opportunity to practice good coding habits and test that your code does what it is supposed to do.

Getting started

To get started, create examples of sample input and sample input in which you know the answer. Create at least one such file for each script. For scripts that print information about a file (like checkBed.py), you should make more than one file, including a file that passes the test and a file that fails. Note that you since the six scripts do very similar tasks, you can probably re-use many of the same files.

Last but not least, you should make at least one file that is pathological, i.e., something that a user might accidentally create and pass as input to your script.

A file with non content (i.e. zero size) is a good example of a pathological file.

For each script, do the following tests:

Check that the script is executable.

If Dr. Loraine checks out a fresh copy of the repo and tries to run your script as described in the spec, will it run or will she get an error like the following?

Fixing this error is easily done using the command svn propset command, as discussed in class.

If you missed that discussion, see: http://blog.chilly.ca/?p=167. Or search.

Test for success.

Check that the program gives the correct answer with correct input.

For each script, made an example BED file. For example, to test checkBed.py, run it on a BED file you know conforms to the spec.

For findDuplicates.py, run it on files that have duplicates and files that don't. Test files that repeat lines in multiple locations in the same file.

Creating files for testing is easy using UNIX utilities like head and file re-direction operators. For example, the following UNIX commands create a file with the two lines appearing at the start and end of the file.

Test for failure.

If the program is checking a file format, like checkBed.py, then make a file that should fail the test. For example, run it on a tab-delimited file that doesn't have twelve fields. Or run it on a file that isn't tab-separated at all. In both cases, it ought to print FAIL.

Also test your pathological cases

How does your script handle files with zero content? How does it deal with files that are not even plain text files?

If the specification doesn't say how the script should handle pathological cases, ask the person who wrote the spec (Dr. Loraine, in this case) what the script should do.

Check that you can pipe the output of the script into another program.

As we discussed in class, scripts are often used as part of pipelines that pass input into other programs.

For this to work, your script needs to print to the stdout file stream.

Check that when you pipe output of your script into another program, the program prints what you expect.

For example, a correct implementation of checkBed.py when combined with wc -l ought to do the following:

Test each of your six simple scripts and check in any changes to the class repo.

The due date for your bug fixes is Monday morning at 9 am.

Good luck and post any questions on the Yahoo group.

  • No labels