circa 2011
This is a guest lecture that I’ve given to computer science research about how to think about software engineering principles when writing code to support computer science research.
There are some tips about creating reproduceable research. These tips predate the rise of iPython notebook, as well as Github and all that. If I were updating this talk, I would mention those.
A key theme is that there is a tradeoff in how much time to spend on traditional software engineering tasks like testing, documentation (in which I include choosing names), and so on. Too little time and you cannot trust your results, and others will not build on your code. Too much time and you do not try out enough new ideas. There is a difficult judgement call here.
It’s important to note that this talk was aimed for an audience of computer science students, who (hopefully!) have been repeatedly told that the quality, readability, and maintainability of their code matters. Therefore I emphasize other aspects that they may not have considered, namely, flexibility, reproduceability, and time to developement.
The advice that I give here, some of it I would not give to students in mathematics and sciences, who have not always been exposed to some of the basics of code quality. For that audience, I would place a lot more emphasis on maintainability, choosing good names, and so on.
I wrote a blog post that summarizes much of the material from the slides.
Ali Eslami, currently of Google Deepmind, has an excellent page on software patterns for machine learning research.
I’ve written a bit about the philosophy of these pages on my talks page.