Anyone who has worked with extracellular recordings will know that extracting spike trains of single neurons from raw data, a task known as spike sorting, is a difficult endeavor (to put it mildly). Traditionally, manual curation of sorted data, or even manual sorting, was feasible as data sets were reasonably small. While keeping a human in the loop allows for resolving common failures of sorting algorithms, it is worth noting that two people analysing the same data set will likely come up with different results. Along with the unreliability of manual intervention, the influx of large-scale data, generated by novel, dense arrays with hundreds or even thousands of recording channels, makes having a human operator impractical. To exploit these new amazing data sets, automated, reproducible approaches are essential.
Relying on automated (or semi-automated) spike sorting algorithms, however, brings about a slew of issues that must be addressed. First, many approaches are now available for general use. While most of these algorithms have been assessed on synthetic or recorded ground truth data, it is difficult to predict how they will generalize to new data sets given the limitations of current validation datasets and metrics. Second, each algorithm implements its own pipeline for storing, preparing, processing, sorting, and curating extracellular recordings. This lack of standardization leads to the creation of entire workflows based off of a single spike sorting software and creates a barrier to collaboration among different experimental labs. Third, even two labs using the same sorting algorithm can generate wildly different results based on how they choose to run the underlying sorter. Tuning the adjustable parameters for a sorter, given the recording setup and experimental design, is more a matter intuition than standard procedure. Fourth, despite the availability of modern sorting solutions, many labs rely on home-grown solutions that are implemented by a capable lab member and tailored to work on a specific experimental set-up. These personalized sorting algorithms are difficult to adapt to novel experiments and, again, make collaboration among labs more difficult. Given the wide-range of issues, how can spike sorting be approached in a way that is standardized yet flexible, sophisticated yet reproducible?
Here we present a new approach to solving these problems based on a common spike sorting workflow that can be adapted to the variety of preexisting sorting approaches. This common workflow consists of three steps: First, the raw data is preprocessed and prepared for sorting. Second, the prepared data is passed through the spike sorting software and the preliminary results of the analysis are generated. Third, the preliminary results are curated, either automatically or manually, and the final results are stored. A unified framework should thus provide a standardized way of accessing both the raw data and the results of a sorting algorithm, be capable of running any spike sorter, and should allow for the creation of standardized functions that can operate on both raw and sorted data. Unlike other approaches to standardized analysis, we avoid creating a new, unified data format or simple converters between formats. Instead, we provide a standardized API for accessing, sorting, and curating extracellular data regardless of the underlying file format.
This approach has significant advantages to previous solutions and opens up exciting new possibilities, for instance:
The python based software framework we created is called SpikeInterface. The core components are the SpikeExtractors, which provide standardized access to raw and sorted data of any supported file format, and SpikeToolkit, which provides functions to preprocess data, run spike sorters, compare algorithms, and curate results. Additional repositories that cover a variety of applications of our framework are in development including (but not limited to): visualization tools, ground truth comparisons, and a GUI for running sorting pipelines.
This project has now reached a stage where we can release the main components of our API. SpikeExtractors version 0.5.0 and SpikeToolkit version 0.3.0 have now been released for general use. Both can be installed using pip and can be referenced by their version numbers when used for data analysis. All changes that add or alter the functionality of our API will be released with new version numbers and a predefined release schedule. Analysis done with our API, therefore, is fully reproducible.
Do get in touch with us if you are interested, either as a user or a contributor.
This is a joint project with major contributions from:
Jeremy Magland - Center for Computational Biology (CCB), Flatiron Institute, New York, United States
We are grateful to the Wellcome Trust for funding this work, and to the University of Pennsylvania for a Thouron award to Cole.