Now that you have a solid foundation in working with command-line UNIX tools, it's time for you to start learning how to write your own programs for working with data.

For this, you'll learn python, a versatile, higher level language that supports interactive programming. One of the most powerful aspects of python is that you can write complex programs function by function, testing and debugging each function as you go. In bioinformatics, this feature is especially powerful because you can easily quickly write and then use new functions to test assumptions about your data.

Many popular tools in bioinformatics are written in python, the Galaxy workflow system being one example. TopHat, which you have already run, is also implemented (partly) in python.

Interested students should take a look at the Galaxy system as an excellent example of a long-lived and popular on-line bioinformatics tool. Look around to see examples of production-quality python code by looking through the Galaxy source code repository at BitBucket:

Goals for this week

If you are new to programming, you may find this overwhelming at first. Take heart. You'll have the next several weeks to practice and get comfortable with the basic aspects of the python language.

Read or Watch

Short demonstration by an enthusiastic user of python. Describes python syntax, and commonly used commands:

Python tutorial hosted on; intended for novice programmers but assumes some knowledge of computers, esp. UNIX

Very basic on-line textbook for novice programmers. Read this if you are new to programming. (You can skip Chapter 5.8, 6.)

Tutorial by Andrew Dalke describing the process of writing a parser in python, demonstrates creating and using a new object class in python.

Exercises to build your knowledge