Statistics > Applications
[Submitted on 31 Jul 2021 (v1), last revised 16 Feb 2023 (this version, v4)]
Title:Less is more: balancing noise reduction and data retention in fMRI with data-driven scrubbing
View PDFAbstract:Artifacts in functional MRI (fMRI) data cause deviations from common distributional assumptions, introduce spatial and temporal outliers, and reduce the signal-to-noise ratio of the data -- all of which can have negative consequences for downstream statistical analysis. Scrubbing is a technique for excluding fMRI volumes thought to be contaminated by artifacts and generally comes in two flavors. Motion scrubbing based on subject head motion-derived measures is popular but suffers from a number of drawbacks, especially high rates of censoring of individual volumes and entire subjects. Alternatively, data-driven scrubbing methods like DVARS are based on observed noise in the processed fMRI timeseries and may avoid some of these issues. Here we propose "projection scrubbing", a novel data-driven scrubbing method based on a statistical outlier detection framework and strategic dimension reduction, including independent component analysis (ICA), to isolate artifactual variation. We undertake a comprehensive comparison of motion scrubbing with data-driven projection scrubbing and DVARS.
We argue that an appropriate metric for the success of scrubbing is maximal data retention subject to reasonable performance on typical benchmarks of functional connectivity. We find that stringent motion scrubbing yields worsened validity, worsened reliability, and produced small improvements to fingerprinting. Meanwhile, data-driven scrubbing methods tend to yield greater improvements to fingerprinting while not generally worsening validity or reliability. Importantly, however, data-driven scrubbing excludes a fraction of the number of volumes or entire sessions compared to motion scrubbing. The ability of data-driven fMRI scrubbing to improve data retention without negatively impacting the quality of downstream analysis has major implications for sample sizes in population neuroscience research.
Submission history
From: Damon Pham [view email][v1] Sat, 31 Jul 2021 20:39:43 UTC (33,676 KB)
[v2] Tue, 3 May 2022 17:27:57 UTC (25,040 KB)
[v3] Sat, 24 Sep 2022 06:51:26 UTC (34,295 KB)
[v4] Thu, 16 Feb 2023 18:33:07 UTC (41,802 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.