Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Zhu, Hanlin; Li, Xue; Sun, Liuyang; He, Fei; Zhao, Zhengtuo; Luan, Lan; Tran, Ngoc Mai; Xie, Chong

Statistics > Machine Learning

arXiv:2003.08533 (stat)

[Submitted on 19 Mar 2020]

Title:Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Authors:Hanlin Zhu, Xue Li, Liuyang Sun, Fei He, Zhengtuo Zhao, Lan Luan, Ngoc Mai Tran, Chong Xie

View PDF

Abstract:Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.

Comments:	11 pages, 5 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2003.08533 [stat.ML]
	(or arXiv:2003.08533v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2003.08533

Submission history

From: Ngoc Mai Tran [view email]
[v1] Thu, 19 Mar 2020 01:33:00 UTC (662 KB)

Statistics > Machine Learning

Title:Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators