Now that you have a solid foundation in working with command-line UNIX tools, it's time for you to start learning how to write your own programs for working with data.
For this, you'll learn python, a versatile, higher level language that supports interactive programming. One of the most powerful aspects of python is that you can write complex programs function by function, testing and debugging each function as you go. In bioinformatics, this feature is especially powerful because you can easily quickly write and then use new functions to test assumptions about your data.
Many popular tools in bioinformatics are written in python, the Galaxy workflow system being one example. TopHat, which you have already run, is also implemented (partly) in python.
Interested students should take a look at the Galaxy system as an excellent example of a long-lived and popular on-line bioinformatics tool. Look around to see examples of production-quality python code by looking through the Galaxy source code repository at BitBucket:
- Galaxy source code repository - Go to https://bitbucket.org/galaxy/galaxy-central
- Galaxy main site - Go to http://usegalaxy.org
Goals for this week
- Run python interactive interpreter (python shell)
- Write functions and test them incrementally in the interpreter
- Open a file and read the data line by line
- Understand how to use basic control structures in python, including
- looping constructions (while, conditions)
- conditionals (if, else)
- Understand how to use basic data structures in python, including
If you are new to programming, you may find this overwhelming at first. Take heart. You'll have the next several weeks to practice and get comfortable with the basic aspects of the python language.
Read or Watch
- Science And Python: retrospective of a (mostly) successful decade[http://www.youtube.com/watch?feature=player_embedded&v=F4rFuIb1Ie4#]!
- Beginning Python for Bioinformatics - http://onlamp.com/pub/a/python/2002/10/17/biopython.html?page=1
Short demonstration by an enthusiastic user of python. Describes python syntax, and commonly used commands:
- Python programming in 15 minutes - http://www.youtube.com/watch?v=yk022Kz8zqg
Python tutorial hosted on python.org; intended for novice programmers but assumes some knowledge of computers, esp. UNIX
- The Python Tutorial - http://docs.python.org/tutorial/index.html
Very basic on-line textbook for novice programmers. Read this if you are new to programming. (You can skip Chapter 5.8, 6.)
- How to think like a computer scientist - http://www.openbookproject.net/thinkcs/python/english2e/index.html
Tutorial by Andrew Dalke describing the process of writing a parser in python, demonstrates creating and using a new object class in python.
- Parsing FASTA files - http://www.dalkescientific.com/writings/NBN/parsing.html