“Progress in science depends on new techniques, new discoveries and new ideas, probably in that order.” - Sydney Brenner

Developing a new computational method for genomics is exciting but can also be daunting at first. It seems to be as much an art as it is an exact science. Here I summarize some lessons learned having gone through the whole process once:

  1. Avoid reinventing the wheel. Reuse existing solutions that people have tested before and innovate on top of them.
  2. Simplicity is key. Fancy math does not always lead to better results. Before applying any complex techniques, try a simple one first.
  3. Iterate fast and optimize later. Experimental code does not have to be tidy or efficient (it just has to be correct). Optimize once you know something works.
  4. Establish a benchmark early on. A benchmark is not only essential for demonstrating performance improvements in the end, but also for testing out various approaches while developing a tool.
  5. If you can’t solve a big problem, solve a subpart first. Similarly, if you can’t solve a general problem, first solve a specific case. Recognize inter-dependencies of sub-problems and modularize. Eventually you will get there!
  6. Be systematic. I spent a lot of time adjusting model parameters on an ad-hoc basis in order to make my method work better for specific cases. What really helped me understand how things work was systematically studying the effect of different parameter values across all possible scenarios.
  7. Abstraction is the best way to make a method generalize. I remember a time when I kept finding new cases where my method breaks down. Instead of addressing each problem individually, it was much better to summarize them into classes of errors, and solve them all at once by extending the model.