Reproducible research in bioinformatics means performing and documenting often complex analyses with sufficient documentation and code verification to ensure that (a) some-one else can reproduce the analysis and obtain identical results and (b) the analysis results are correct and are consistent with the experimenter's interpretation.
Reproducible research and how to conduct is an active topic of research in bioinformatics and science in general. Many scientists have recognized its importance and are working hard to create tools to make bioinformatics analyses more reproducible.
You'll learn about these tools, how to use them, and what their limitations are.
Reproducible Research defined
Reproducible research in bioinformatics means:
- Documenting analysis steps well enough that would allow some-one else to reproduce your analysis and achieve the same result. Use tools like Sweave (R) to document and develop analyses.
- Testing code elements to ensure correctness. For example, if I write a method that calculates a value given an input, I should write testing code that checks that the method fails when given incorrect input and succeeds when given correct input.
- Programming defensively when using human-entered or external data sets. Whenever you use a file provided by some-one else or created by a computer program (which may contain bugs,) state and then test your assumptions about the file.
- Version control all code. Use version control systems to develop and document code as you write it. For example, when producing a report that goes to another researcher, document the revision of code used.
Reading and videos
Keith Baggerly's lecture Forensic Bioinformatics in High-Throughput Biology
- Watch this: http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/
- The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics Oct 2010