Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations

Li, Yutong

Abstract:Building an independent misspelling detector and serve it before correction can bring multiple benefits to speller and other search components, which is particularly true for the most commonly deployed noisy-channel based speller systems. With rapid development of deep learning and substantial advancement in contextual representation learning such as BERTology, building a decent misspelling detector without having to rely on hand-crafted features associated with noisy-channel architecture becomes more-than-ever accessible. However BERTolgy models are trained with natural language corpus but Maps Search is highly domain specific, would BERTology continue its success. In this paper we design 4 stages of models for misspeling detection ranging from the most basic LSTM to single-domain augmented fine-tuned BERT. We found for Maps Search in our case, other advanced BERTology family model such as RoBERTa does not necessarily outperform BERT, and a classic cross-domain fine-tuned full BERT even underperforms a smaller single-domain fine-tuned BERT. We share more findings through comprehensive modeling experiments and analysis, we also briefly cover the data generation algorithm breakthrough.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2108.06842 [cs.CL]
	(or arXiv:2108.06842v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2108.06842

Computer Science > Computation and Language

Title:Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators