Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Athar, ShahRukh; Saito, Shunsuke; Yang, Zhengyu; Pidhorsky, Stanislav; Cao, Chen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.19593 (cs)

[Submitted on 28 Jul 2024 (v1), last revised 30 Jul 2024 (this version, v2)]

Title:Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Authors:ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorsky, Chen Cao

View PDF HTML (experimental)

Abstract:Creating photorealistic avatars for individuals traditionally involves extensive capture sessions with complex and expensive studio devices like the LightStage system. While recent strides in neural representations have enabled the generation of photorealistic and animatable 3D avatars from quick phone scans, they have the capture-time lighting baked-in, lack facial details and have missing regions in areas such as the back of the ears. Thus, they lag in quality compared to studio-captured avatars. In this paper, we propose a method that bridges this gap by generating studio-like illuminated texture maps from short, monocular phone captures. We do this by parameterizing the phone texture maps using the $W^+$ space of a StyleGAN2, enabling near-perfect reconstruction. Then, we finetune a StyleGAN2 by sampling in the $W^+$ parameterized space using a very small set of studio-captured textures as an adversarial training signal. To further enhance the realism and accuracy of facial details, we super-resolve the output of the StyleGAN2 using carefully designed diffusion model that is guided by image gradients of the phone-captured texture map. Once trained, our method excels at producing studio-like facial texture maps from casual monocular smartphone videos. Demonstrating its capabilities, we showcase the generation of photorealistic, uniformly lit, complete avatars from monocular phone captures. The project page can be found at this http URL

Comments:	ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.19593 [cs.CV]
	(or arXiv:2407.19593v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.19593

Submission history

From: ShahRukh Athar [view email]
[v1] Sun, 28 Jul 2024 21:26:33 UTC (42,099 KB)
[v2] Tue, 30 Jul 2024 02:20:28 UTC (42,099 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators