HS2 - herding spikes at scale

HS2 - or Herding Spikes 2 in full - is the new version of our spike sorting software for CMOS-based high density, large arrays. Compared to the first version, this implementation now supports a whole range of receding systems. There are also substantial speed improvements, and some of the algorithms have been refined. All code and documentation are available here: https://github.com/mhhennig/HS2

HS2 uses a combination of event source localisation, rather excessive (one may argue) dimensionality reduction, followed by density-based clustering. This is very fast and memory efficient, and makes it possible to work with very large data sets, which may contain many tens of millions of spikes. Data can be analysed even on pretty ordinary hardware, such as a desktop PC or even a laptop with good specs. Initially developed for in vitro recordings from the retina and from cultured neurons with a 4k channel array, we found the method works well also for data from other CMOS-based devices.

Now, if you have just recorded an experiment in your own lab, you would like to know how well this works, and how it compares with other solutions, is it any better? In fact, users of CMOS arrays in vitro or in vivo increasingly spoiled for choice - quite a few end-to-end spike sorting solutions now exist:

Interestingly, all these packages use either a variant of template matching, or dimensionality reduction through event localisation - two approaches that appear to be particularly suitable in cases where many, dense channels create a combinatorial nightmare.

While some comparisons between different methods have been made - see e.g. the YASS paper and the JRCLUST website - the results are not entirely consistent. Likely this shows that there are pros and cons for each of the methods. Performance may not be consistent across different data sets (electrode drift is a good example), and the tools clearly differ in their ability to scale (where we certainly do well). The jury, as well as new, useful ground truth data sets, are out.

The large new data sets from CMOS probes cannot be curated manually, instead we have to trust the algorithm. So an important next step, in addition to further improving our methods, would be to create reproducible benchmarks that take all these factors into account. Moreover, it would be interesting to reproducibly analyse raw recordings with different pipelines. More work for all of us…

Thanks go to the EU (Erasmus Mundus) and Google (European Doctoral Fellowship) for funding Martino, and the Thouron award for funding Cole.