Towards Lipreading Sentences with Active Appearance Models

Sterpu, George; Harte, Naomi

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:1805.11688 (eess)

[Submitted on 29 May 2018]

Title:Towards Lipreading Sentences with Active Appearance Models

Authors:George Sterpu, Naomi Harte

View PDF

Abstract:Automatic lipreading has major potential impact for speech recognition, supplementing and complementing the acoustic modality. Most attempts at lipreading have been performed on small vocabulary tasks, due to a shortfall of appropriate audio-visual datasets. In this work we use the publicly available TCD-TIMIT database, designed for large vocabulary continuous audio-visual speech recognition. We compare the viseme recognition performance of the most widely used features for lipreading, Discrete Cosine Transform (DCT) and Active Appearance Models (AAM), in a traditional Hidden Markov Model (HMM) framework. We also exploit recent advances in AAM fitting. We found the DCT to outperform AAM by more than 6% for a viseme recognition task with 56 speakers. The overall accuracy of the DCT is quite low (32-34%). We conclude that a fundamental rethink of the modelling of visual features may be needed for this task.

Comments:	Presented at The 14th International Conference on Auditory-Visual Speech Processing (AVSP 2017)
Subjects:	Image and Video Processing (eess.IV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1805.11688 [eess.IV]
	(or arXiv:1805.11688v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.1805.11688

Submission history

From: George Sterpu [view email]
[v1] Tue, 29 May 2018 19:57:12 UTC (3,881 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Towards Lipreading Sentences with Active Appearance Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Towards Lipreading Sentences with Active Appearance Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators