Terrier: A Deep Learning Repeat Classifier

Turnbull, Robert; Young, Neil D.; Tescari, Edoardo; Skerratt, Lee F.; Kosch, Tiffany A.

Quantitative Biology > Genomics

arXiv:2503.09312 (q-bio)

[Submitted on 12 Mar 2025]

Title:Terrier: A Deep Learning Repeat Classifier

Authors:Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, Tiffany A. Kosch

View PDF HTML (experimental)

Abstract:Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Existing tools often struggle to classify divergent taxa due to biases in reference libraries, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on RepBase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of RepBase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice and fruit flies), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian and flatworm genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

Comments:	11 pages, 9 figures
Subjects:	Genomics (q-bio.GN); Machine Learning (cs.LG)
ACM classes:	I.2
Cite as:	arXiv:2503.09312 [q-bio.GN]
	(or arXiv:2503.09312v1 [q-bio.GN] for this version)
	https://doi.org/10.48550/arXiv.2503.09312

Submission history

From: Robert Turnbull [view email]
[v1] Wed, 12 Mar 2025 12:03:26 UTC (449 KB)

Quantitative Biology > Genomics

Title:Terrier: A Deep Learning Repeat Classifier

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Genomics

Title:Terrier: A Deep Learning Repeat Classifier

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators