Unsupervised Learning in Neuroscience
Advances in recording technology driven by large-scale neuroscience projects (e.g. BRAIN initiative, Human Brain Project) promise to deliver unprecedented high-resolution recordings of brain activity. This flood of data is already straining both traditional methods of analysis and the existing neuroinformatics infrastructure. Analysis techniques requiring human intervention, such as manual editing of clusters in spike-sorting, do not scale to high-dimensional data. More promising are data mining techniques that reveal structure without human intervention. For example, unsupervised learning algorithms (e.g. sparse coding, independent component analysis) can extract features from both neural and behavioral signals. Correlations between the resulting features can reveal the computational function of the region being investigated.
Because unsupervised learning methods are typically computationally intensive, they may require sophisticated implementations. Because there is no widely accepted common neuroscience data format, these implementations must be adapted to ad-hoc, idiosyncratic formats. This limits the reuse potential of existing code and disincentivizes new software development. In the past year, a group of large data producers have collaborated in the Neurodata Without Borders (NWB) project to advance an HDF5-based common data format. Here we are building a free web service on top of NWB that (1): exposes HDF5-based NWB datasets over an HTTP JSON API; (2): allows users to execute computationally intensive data analyses at scale; (3) provides job management and visualization capabilities. The service will leverage the co-location of data and high-performance computing resources at the National Energy Research Scientific Computing Center (NERSC). A library of analysis algorithms will be made available for processing NWB-formatted datasets at CRCNS, visualizing the results, and sharing them with other users. At the 2015 Society for Neuroscience annual meeting, we presented a poster describing this idea and garnered the interest of enough attendees to generate a mailing list of 30-some potential users. We will present an update at the 2016 annual meeting in November.
As a pilot case, we are applying convolutional sparse coding to electrophysiological recordings of the rodent hippocampus. The hippocampus is a subcortical brain region known for its roles in navigation and memory. It contains neurons that track the spatiotemporal trajectory of an animal through the environment. Its primary inputs come from the neighboring entorhinal cortex, which in turn receives input from multiple sensory cortical regions. There is evidence to suggest that connections between the hippocampus and entorhinal cortex encode associations between points along a spatiotemporal trajectory and specific objects, experiences, and sensations. Such associations would comprise the backbone of an episodic memory system.
Due to the hippocampus' status as a hub for signals encoding many sensory modalities, it is an ideal region in which to perform large-scale sweeps for correlations between neural and behavioral/stimulus variables. We are currently focusing on animal location, as this has a known relationship to hippocampal activity. Later this year we aim to apply the same methods to features extracted from wireframe models of foraging rats. While the head direction of rats is known to inform a representation of velocity in the hippocampus, it is still unknown whether other features of body configuration are similarly reflected.